Skip to content
Snippets Groups Projects
Commit 4cd67adc authored by Greg Kroah-Hartman's avatar Greg Kroah-Hartman
Browse files

Merge tag 'misc-habanalabs-next-2021-09-01' of...

Merge tag 'misc-habanalabs-next-2021-09-01' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-next

Oded writes:

This tag contains habanalabs driver changes for v5.15:

- Add a new uAPI (under the cs ioctl) to enable to user to reserve
  signals and signal them from within its workloads, while the driver
  performs the waiting. This allows finer granularity of pipelining
  between the different engines and resource utilization.

- Add a new uAPI (under the wait_for_cs ioctl) to allow waiting
  on multiple command submissions (workloads) at the same time. This
  is an optimization for the user process so it won't need to call
  multiple times to the wait_for_cs ioctl.

- Add new feature of "state dump", which can be triggered through new
  debugfs node. This is a similar concept to the kernel panic dump.
  This new mechanism retrieves information from the device in case
  one of the workloads that was sent by the user got stuck. This is
  very helpful for debugging the hang.

- Add a new debugfs node to perform lookup of user pointers that are
  mapped to habana device's pmmu.

- Fix to the tracking of user process when running inside a container.

- Allow user to map more than 4GB of memory to the device MMU in single
  IOCTL call.

- Minimize number of register reads done in GAUDI during user operation.

- Allow user to retrieve the device's server type that the device is
  connected to.

- Several fixes to the code of waiting on interrupts on behalf of the
  user.

- Fixes and improvements to the hint mechanism in our VA allocation.

- Update the firmware header files to the latest version while
  maintaining backward compatibility with older firmware versions.

- Multiple fixes to various bugs.

* tag 'misc-habanalabs-next-2021-09-01' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (61 commits)
  habanalabs/gaudi: hwmon default card name
  habanalabs: add support for f/w reset
  habanalabs/gaudi: block ICACHE_BASE_ADDERESS_HIGH in TPC
  habanalabs: cannot sleep while holding spinlock
  habanalabs: never copy_from_user inside spinlock
  habanalabs: remove unnecessary device status check
  habanalabs: disable IRQ in user interrupts spinlock
  habanalabs: add "in device creation" status
  habanalabs/gaudi: invalidate PMMU mem cache on init
  habanalabs/gaudi: size should be printed in decimal
  habanalabs/gaudi: define DC POWER for secured PMC
  habanalabs/gaudi: unmask out of bounds SLM access interrupt
  habanalabs: add userptr_lookup node in debugfs
  habanalabs/gaudi: fetch TPC/MME ECC errors from F/W
  habanalabs: modify multi-CS to wait on stream masters
  habanalabs/gaudi: add monitored SOBs to state dump
  habanalabs/gaudi: restore user registers when context opens
  habanalabs/gaudi: increase boot fit timeout
  habanalabs: update to latest firmware headers
  habanalabs/gaudi: minimize number of register reads
  ...
parents ba1dc7f2 8ea32183
Branches
No related tags found
No related merge requests found
Showing
with 3474 additions and 783 deletions
...@@ -215,6 +215,17 @@ Description: Sets the skip reset on timeout option for the device. Value of ...@@ -215,6 +215,17 @@ Description: Sets the skip reset on timeout option for the device. Value of
"0" means device will be reset in case some CS has timed out, "0" means device will be reset in case some CS has timed out,
otherwise it will not be reset. otherwise it will not be reset.
What: /sys/kernel/debug/habanalabs/hl<n>/state_dump
Date: Oct 2021
KernelVersion: 5.15
Contact: ynudelman@habana.ai
Description: Gets the state dump occurring on a CS timeout or failure.
State dump is used for debug and is created each time in case of
a problem in a CS execution, before reset.
Reading from the node returns the newest state dump available.
Writing an integer X discards X state dumps, so that the
next read would return X+1-st newest state dump.
What: /sys/kernel/debug/habanalabs/hl<n>/stop_on_err What: /sys/kernel/debug/habanalabs/hl<n>/stop_on_err
Date: Mar 2020 Date: Mar 2020
KernelVersion: 5.6 KernelVersion: 5.6
...@@ -230,6 +241,14 @@ Description: Displays a list with information about the currently user ...@@ -230,6 +241,14 @@ Description: Displays a list with information about the currently user
pointers (user virtual addresses) that are pinned and mapped pointers (user virtual addresses) that are pinned and mapped
to DMA addresses to DMA addresses
What: /sys/kernel/debug/habanalabs/hl<n>/userptr_lookup
Date: Aug 2021
KernelVersion: 5.15
Contact: ogabbay@kernel.org
Description: Allows to search for specific user pointers (user virtual
addresses) that are pinned and mapped to DMA addresses, and see
their resolution to the specific dma address.
What: /sys/kernel/debug/habanalabs/hl<n>/vm What: /sys/kernel/debug/habanalabs/hl<n>/vm
Date: Jan 2019 Date: Jan 2019
KernelVersion: 5.1 KernelVersion: 5.1
......
...@@ -10,4 +10,5 @@ HL_COMMON_FILES := common/habanalabs_drv.o common/device.o common/context.o \ ...@@ -10,4 +10,5 @@ HL_COMMON_FILES := common/habanalabs_drv.o common/device.o common/context.o \
common/asid.o common/habanalabs_ioctl.o \ common/asid.o common/habanalabs_ioctl.o \
common/command_buffer.o common/hw_queue.o common/irq.o \ common/command_buffer.o common/hw_queue.o common/irq.o \
common/sysfs.o common/hwmon.o common/memory.o \ common/sysfs.o common/hwmon.o common/memory.o \
common/command_submission.o common/firmware_if.o common/command_submission.o common/firmware_if.o \
common/state_dump.o
...@@ -314,8 +314,6 @@ int hl_cb_create(struct hl_device *hdev, struct hl_cb_mgr *mgr, ...@@ -314,8 +314,6 @@ int hl_cb_create(struct hl_device *hdev, struct hl_cb_mgr *mgr,
spin_lock(&mgr->cb_lock); spin_lock(&mgr->cb_lock);
rc = idr_alloc(&mgr->cb_handles, cb, 1, 0, GFP_ATOMIC); rc = idr_alloc(&mgr->cb_handles, cb, 1, 0, GFP_ATOMIC);
if (rc < 0)
rc = idr_alloc(&mgr->cb_handles, cb, 1, 0, GFP_KERNEL);
spin_unlock(&mgr->cb_lock); spin_unlock(&mgr->cb_lock);
if (rc < 0) { if (rc < 0) {
...@@ -552,7 +550,7 @@ int hl_cb_mmap(struct hl_fpriv *hpriv, struct vm_area_struct *vma) ...@@ -552,7 +550,7 @@ int hl_cb_mmap(struct hl_fpriv *hpriv, struct vm_area_struct *vma)
vma->vm_private_data = cb; vma->vm_private_data = cb;
rc = hdev->asic_funcs->cb_mmap(hdev, vma, cb->kernel_address, rc = hdev->asic_funcs->mmap(hdev, vma, cb->kernel_address,
cb->bus_address, cb->size); cb->bus_address, cb->size);
if (rc) { if (rc) {
spin_lock(&cb->lock); spin_lock(&cb->lock);
......
This diff is collapsed.
...@@ -9,16 +9,70 @@ ...@@ -9,16 +9,70 @@
#include <linux/slab.h> #include <linux/slab.h>
void hl_encaps_handle_do_release(struct kref *ref)
{
struct hl_cs_encaps_sig_handle *handle =
container_of(ref, struct hl_cs_encaps_sig_handle, refcount);
struct hl_ctx *ctx = handle->hdev->compute_ctx;
struct hl_encaps_signals_mgr *mgr = &ctx->sig_mgr;
spin_lock(&mgr->lock);
idr_remove(&mgr->handles, handle->id);
spin_unlock(&mgr->lock);
kfree(handle);
}
static void hl_encaps_handle_do_release_sob(struct kref *ref)
{
struct hl_cs_encaps_sig_handle *handle =
container_of(ref, struct hl_cs_encaps_sig_handle, refcount);
struct hl_ctx *ctx = handle->hdev->compute_ctx;
struct hl_encaps_signals_mgr *mgr = &ctx->sig_mgr;
/* if we're here, then there was a signals reservation but cs with
* encaps signals wasn't submitted, so need to put refcount
* to hw_sob taken at the reservation.
*/
hw_sob_put(handle->hw_sob);
spin_lock(&mgr->lock);
idr_remove(&mgr->handles, handle->id);
spin_unlock(&mgr->lock);
kfree(handle);
}
static void hl_encaps_sig_mgr_init(struct hl_encaps_signals_mgr *mgr)
{
spin_lock_init(&mgr->lock);
idr_init(&mgr->handles);
}
static void hl_encaps_sig_mgr_fini(struct hl_device *hdev,
struct hl_encaps_signals_mgr *mgr)
{
struct hl_cs_encaps_sig_handle *handle;
struct idr *idp;
u32 id;
idp = &mgr->handles;
if (!idr_is_empty(idp)) {
dev_warn(hdev->dev, "device released while some encaps signals handles are still allocated\n");
idr_for_each_entry(idp, handle, id)
kref_put(&handle->refcount,
hl_encaps_handle_do_release_sob);
}
idr_destroy(&mgr->handles);
}
static void hl_ctx_fini(struct hl_ctx *ctx) static void hl_ctx_fini(struct hl_ctx *ctx)
{ {
struct hl_device *hdev = ctx->hdev; struct hl_device *hdev = ctx->hdev;
int i; int i;
/* Release all allocated pending cb's, those cb's were never
* scheduled so it is safe to release them here
*/
hl_pending_cb_list_flush(ctx);
/* Release all allocated HW block mapped list entries and destroy /* Release all allocated HW block mapped list entries and destroy
* the mutex. * the mutex.
*/ */
...@@ -53,6 +107,7 @@ static void hl_ctx_fini(struct hl_ctx *ctx) ...@@ -53,6 +107,7 @@ static void hl_ctx_fini(struct hl_ctx *ctx)
hl_cb_va_pool_fini(ctx); hl_cb_va_pool_fini(ctx);
hl_vm_ctx_fini(ctx); hl_vm_ctx_fini(ctx);
hl_asid_free(hdev, ctx->asid); hl_asid_free(hdev, ctx->asid);
hl_encaps_sig_mgr_fini(hdev, &ctx->sig_mgr);
/* Scrub both SRAM and DRAM */ /* Scrub both SRAM and DRAM */
hdev->asic_funcs->scrub_device_mem(hdev, 0, 0); hdev->asic_funcs->scrub_device_mem(hdev, 0, 0);
...@@ -130,9 +185,6 @@ void hl_ctx_free(struct hl_device *hdev, struct hl_ctx *ctx) ...@@ -130,9 +185,6 @@ void hl_ctx_free(struct hl_device *hdev, struct hl_ctx *ctx)
{ {
if (kref_put(&ctx->refcount, hl_ctx_do_release) == 1) if (kref_put(&ctx->refcount, hl_ctx_do_release) == 1)
return; return;
dev_warn(hdev->dev,
"user process released device but its command submissions are still executing\n");
} }
int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx) int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx)
...@@ -144,11 +196,8 @@ int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx) ...@@ -144,11 +196,8 @@ int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx)
kref_init(&ctx->refcount); kref_init(&ctx->refcount);
ctx->cs_sequence = 1; ctx->cs_sequence = 1;
INIT_LIST_HEAD(&ctx->pending_cb_list);
spin_lock_init(&ctx->pending_cb_lock);
spin_lock_init(&ctx->cs_lock); spin_lock_init(&ctx->cs_lock);
atomic_set(&ctx->thread_ctx_switch_token, 1); atomic_set(&ctx->thread_ctx_switch_token, 1);
atomic_set(&ctx->thread_pending_cb_token, 1);
ctx->thread_ctx_switch_wait_token = 0; ctx->thread_ctx_switch_wait_token = 0;
ctx->cs_pending = kcalloc(hdev->asic_prop.max_pending_cs, ctx->cs_pending = kcalloc(hdev->asic_prop.max_pending_cs,
sizeof(struct hl_fence *), sizeof(struct hl_fence *),
...@@ -200,6 +249,8 @@ int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx) ...@@ -200,6 +249,8 @@ int hl_ctx_init(struct hl_device *hdev, struct hl_ctx *ctx, bool is_kernel_ctx)
goto err_cb_va_pool_fini; goto err_cb_va_pool_fini;
} }
hl_encaps_sig_mgr_init(&ctx->sig_mgr);
dev_dbg(hdev->dev, "create user context %d\n", ctx->asid); dev_dbg(hdev->dev, "create user context %d\n", ctx->asid);
} }
...@@ -229,31 +280,86 @@ int hl_ctx_put(struct hl_ctx *ctx) ...@@ -229,31 +280,86 @@ int hl_ctx_put(struct hl_ctx *ctx)
return kref_put(&ctx->refcount, hl_ctx_do_release); return kref_put(&ctx->refcount, hl_ctx_do_release);
} }
struct hl_fence *hl_ctx_get_fence(struct hl_ctx *ctx, u64 seq) /*
* hl_ctx_get_fence_locked - get CS fence under CS lock
*
* @ctx: pointer to the context structure.
* @seq: CS sequences number
*
* @return valid fence pointer on success, NULL if fence is gone, otherwise
* error pointer.
*
* NOTE: this function shall be called with cs_lock locked
*/
static struct hl_fence *hl_ctx_get_fence_locked(struct hl_ctx *ctx, u64 seq)
{ {
struct asic_fixed_properties *asic_prop = &ctx->hdev->asic_prop; struct asic_fixed_properties *asic_prop = &ctx->hdev->asic_prop;
struct hl_fence *fence; struct hl_fence *fence;
spin_lock(&ctx->cs_lock); if (seq >= ctx->cs_sequence)
if (seq >= ctx->cs_sequence) {
spin_unlock(&ctx->cs_lock);
return ERR_PTR(-EINVAL); return ERR_PTR(-EINVAL);
}
if (seq + asic_prop->max_pending_cs < ctx->cs_sequence) { if (seq + asic_prop->max_pending_cs < ctx->cs_sequence)
spin_unlock(&ctx->cs_lock);
return NULL; return NULL;
}
fence = ctx->cs_pending[seq & (asic_prop->max_pending_cs - 1)]; fence = ctx->cs_pending[seq & (asic_prop->max_pending_cs - 1)];
hl_fence_get(fence); hl_fence_get(fence);
return fence;
}
struct hl_fence *hl_ctx_get_fence(struct hl_ctx *ctx, u64 seq)
{
struct hl_fence *fence;
spin_lock(&ctx->cs_lock);
fence = hl_ctx_get_fence_locked(ctx, seq);
spin_unlock(&ctx->cs_lock); spin_unlock(&ctx->cs_lock);
return fence; return fence;
} }
/*
* hl_ctx_get_fences - get multiple CS fences under the same CS lock
*
* @ctx: pointer to the context structure.
* @seq_arr: array of CS sequences to wait for
* @fence: fence array to store the CS fences
* @arr_len: length of seq_arr and fence_arr
*
* @return 0 on success, otherwise non 0 error code
*/
int hl_ctx_get_fences(struct hl_ctx *ctx, u64 *seq_arr,
struct hl_fence **fence, u32 arr_len)
{
struct hl_fence **fence_arr_base = fence;
int i, rc = 0;
spin_lock(&ctx->cs_lock);
for (i = 0; i < arr_len; i++, fence++) {
u64 seq = seq_arr[i];
*fence = hl_ctx_get_fence_locked(ctx, seq);
if (IS_ERR(*fence)) {
dev_err(ctx->hdev->dev,
"Failed to get fence for CS with seq 0x%llx\n",
seq);
rc = PTR_ERR(*fence);
break;
}
}
spin_unlock(&ctx->cs_lock);
if (rc)
hl_fences_put(fence_arr_base, i);
return rc;
}
/* /*
* hl_ctx_mgr_init - initialize the context manager * hl_ctx_mgr_init - initialize the context manager
* *
......
...@@ -209,12 +209,12 @@ static int userptr_show(struct seq_file *s, void *data) ...@@ -209,12 +209,12 @@ static int userptr_show(struct seq_file *s, void *data)
if (first) { if (first) {
first = false; first = false;
seq_puts(s, "\n"); seq_puts(s, "\n");
seq_puts(s, " user virtual address size dma dir\n"); seq_puts(s, " pid user virtual address size dma dir\n");
seq_puts(s, "----------------------------------------------------------\n"); seq_puts(s, "----------------------------------------------------------\n");
} }
seq_printf(s, seq_printf(s, " %-7d 0x%-14llx %-10llu %-30s\n",
" 0x%-14llx %-10u %-30s\n", userptr->pid, userptr->addr, userptr->size,
userptr->addr, userptr->size, dma_dir[userptr->dir]); dma_dir[userptr->dir]);
} }
spin_unlock(&dev_entry->userptr_spinlock); spin_unlock(&dev_entry->userptr_spinlock);
...@@ -235,7 +235,7 @@ static int vm_show(struct seq_file *s, void *data) ...@@ -235,7 +235,7 @@ static int vm_show(struct seq_file *s, void *data)
struct hl_vm_hash_node *hnode; struct hl_vm_hash_node *hnode;
struct hl_userptr *userptr; struct hl_userptr *userptr;
struct hl_vm_phys_pg_pack *phys_pg_pack = NULL; struct hl_vm_phys_pg_pack *phys_pg_pack = NULL;
enum vm_type_t *vm_type; enum vm_type *vm_type;
bool once = true; bool once = true;
u64 j; u64 j;
int i; int i;
...@@ -261,7 +261,7 @@ static int vm_show(struct seq_file *s, void *data) ...@@ -261,7 +261,7 @@ static int vm_show(struct seq_file *s, void *data)
if (*vm_type == VM_TYPE_USERPTR) { if (*vm_type == VM_TYPE_USERPTR) {
userptr = hnode->ptr; userptr = hnode->ptr;
seq_printf(s, seq_printf(s,
" 0x%-14llx %-10u\n", " 0x%-14llx %-10llu\n",
hnode->vaddr, userptr->size); hnode->vaddr, userptr->size);
} else { } else {
phys_pg_pack = hnode->ptr; phys_pg_pack = hnode->ptr;
...@@ -320,6 +320,77 @@ static int vm_show(struct seq_file *s, void *data) ...@@ -320,6 +320,77 @@ static int vm_show(struct seq_file *s, void *data)
return 0; return 0;
} }
static int userptr_lookup_show(struct seq_file *s, void *data)
{
struct hl_debugfs_entry *entry = s->private;
struct hl_dbg_device_entry *dev_entry = entry->dev_entry;
struct scatterlist *sg;
struct hl_userptr *userptr;
bool first = true;
u64 total_npages, npages, sg_start, sg_end;
dma_addr_t dma_addr;
int i;
spin_lock(&dev_entry->userptr_spinlock);
list_for_each_entry(userptr, &dev_entry->userptr_list, debugfs_list) {
if (dev_entry->userptr_lookup >= userptr->addr &&
dev_entry->userptr_lookup < userptr->addr + userptr->size) {
total_npages = 0;
for_each_sg(userptr->sgt->sgl, sg, userptr->sgt->nents,
i) {
npages = hl_get_sg_info(sg, &dma_addr);
sg_start = userptr->addr +
total_npages * PAGE_SIZE;
sg_end = userptr->addr +
(total_npages + npages) * PAGE_SIZE;
if (dev_entry->userptr_lookup >= sg_start &&
dev_entry->userptr_lookup < sg_end) {
dma_addr += (dev_entry->userptr_lookup -
sg_start);
if (first) {
first = false;
seq_puts(s, "\n");
seq_puts(s, " user virtual address dma address pid region start region size\n");
seq_puts(s, "---------------------------------------------------------------------------------------\n");
}
seq_printf(s, " 0x%-18llx 0x%-16llx %-8u 0x%-16llx %-12llu\n",
dev_entry->userptr_lookup,
(u64)dma_addr, userptr->pid,
userptr->addr, userptr->size);
}
total_npages += npages;
}
}
}
spin_unlock(&dev_entry->userptr_spinlock);
if (!first)
seq_puts(s, "\n");
return 0;
}
static ssize_t userptr_lookup_write(struct file *file, const char __user *buf,
size_t count, loff_t *f_pos)
{
struct seq_file *s = file->private_data;
struct hl_debugfs_entry *entry = s->private;
struct hl_dbg_device_entry *dev_entry = entry->dev_entry;
ssize_t rc;
u64 value;
rc = kstrtoull_from_user(buf, count, 16, &value);
if (rc)
return rc;
dev_entry->userptr_lookup = value;
return count;
}
static int mmu_show(struct seq_file *s, void *data) static int mmu_show(struct seq_file *s, void *data)
{ {
struct hl_debugfs_entry *entry = s->private; struct hl_debugfs_entry *entry = s->private;
...@@ -349,7 +420,7 @@ static int mmu_show(struct seq_file *s, void *data) ...@@ -349,7 +420,7 @@ static int mmu_show(struct seq_file *s, void *data)
return 0; return 0;
} }
phys_addr = hops_info.hop_info[hops_info.used_hops - 1].hop_pte_val; hl_mmu_va_to_pa(ctx, virt_addr, &phys_addr);
if (hops_info.scrambled_vaddr && if (hops_info.scrambled_vaddr &&
(dev_entry->mmu_addr != hops_info.scrambled_vaddr)) (dev_entry->mmu_addr != hops_info.scrambled_vaddr))
...@@ -491,11 +562,10 @@ static int device_va_to_pa(struct hl_device *hdev, u64 virt_addr, u32 size, ...@@ -491,11 +562,10 @@ static int device_va_to_pa(struct hl_device *hdev, u64 virt_addr, u32 size,
struct hl_vm_phys_pg_pack *phys_pg_pack; struct hl_vm_phys_pg_pack *phys_pg_pack;
struct hl_ctx *ctx = hdev->compute_ctx; struct hl_ctx *ctx = hdev->compute_ctx;
struct hl_vm_hash_node *hnode; struct hl_vm_hash_node *hnode;
u64 end_address, range_size;
struct hl_userptr *userptr; struct hl_userptr *userptr;
enum vm_type_t *vm_type; enum vm_type *vm_type;
bool valid = false; bool valid = false;
u64 end_address;
u32 range_size;
int i, rc = 0; int i, rc = 0;
if (!ctx) { if (!ctx) {
...@@ -1043,6 +1113,60 @@ static ssize_t hl_security_violations_read(struct file *f, char __user *buf, ...@@ -1043,6 +1113,60 @@ static ssize_t hl_security_violations_read(struct file *f, char __user *buf,
return 0; return 0;
} }
static ssize_t hl_state_dump_read(struct file *f, char __user *buf,
size_t count, loff_t *ppos)
{
struct hl_dbg_device_entry *entry = file_inode(f)->i_private;
ssize_t rc;
down_read(&entry->state_dump_sem);
if (!entry->state_dump[entry->state_dump_head])
rc = 0;
else
rc = simple_read_from_buffer(
buf, count, ppos,
entry->state_dump[entry->state_dump_head],
strlen(entry->state_dump[entry->state_dump_head]));
up_read(&entry->state_dump_sem);
return rc;
}
static ssize_t hl_state_dump_write(struct file *f, const char __user *buf,
size_t count, loff_t *ppos)
{
struct hl_dbg_device_entry *entry = file_inode(f)->i_private;
struct hl_device *hdev = entry->hdev;
ssize_t rc;
u32 size;
int i;
rc = kstrtouint_from_user(buf, count, 10, &size);
if (rc)
return rc;
if (size <= 0 || size >= ARRAY_SIZE(entry->state_dump)) {
dev_err(hdev->dev, "Invalid number of dumps to skip\n");
return -EINVAL;
}
if (entry->state_dump[entry->state_dump_head]) {
down_write(&entry->state_dump_sem);
for (i = 0; i < size; ++i) {
vfree(entry->state_dump[entry->state_dump_head]);
entry->state_dump[entry->state_dump_head] = NULL;
if (entry->state_dump_head > 0)
entry->state_dump_head--;
else
entry->state_dump_head =
ARRAY_SIZE(entry->state_dump) - 1;
}
up_write(&entry->state_dump_sem);
}
return count;
}
static const struct file_operations hl_data32b_fops = { static const struct file_operations hl_data32b_fops = {
.owner = THIS_MODULE, .owner = THIS_MODULE,
.read = hl_data_read32, .read = hl_data_read32,
...@@ -1110,12 +1234,19 @@ static const struct file_operations hl_security_violations_fops = { ...@@ -1110,12 +1234,19 @@ static const struct file_operations hl_security_violations_fops = {
.read = hl_security_violations_read .read = hl_security_violations_read
}; };
static const struct file_operations hl_state_dump_fops = {
.owner = THIS_MODULE,
.read = hl_state_dump_read,
.write = hl_state_dump_write
};
static const struct hl_info_list hl_debugfs_list[] = { static const struct hl_info_list hl_debugfs_list[] = {
{"command_buffers", command_buffers_show, NULL}, {"command_buffers", command_buffers_show, NULL},
{"command_submission", command_submission_show, NULL}, {"command_submission", command_submission_show, NULL},
{"command_submission_jobs", command_submission_jobs_show, NULL}, {"command_submission_jobs", command_submission_jobs_show, NULL},
{"userptr", userptr_show, NULL}, {"userptr", userptr_show, NULL},
{"vm", vm_show, NULL}, {"vm", vm_show, NULL},
{"userptr_lookup", userptr_lookup_show, userptr_lookup_write},
{"mmu", mmu_show, mmu_asid_va_write}, {"mmu", mmu_show, mmu_asid_va_write},
{"engines", engines_show, NULL} {"engines", engines_show, NULL}
}; };
...@@ -1172,6 +1303,7 @@ void hl_debugfs_add_device(struct hl_device *hdev) ...@@ -1172,6 +1303,7 @@ void hl_debugfs_add_device(struct hl_device *hdev)
INIT_LIST_HEAD(&dev_entry->userptr_list); INIT_LIST_HEAD(&dev_entry->userptr_list);
INIT_LIST_HEAD(&dev_entry->ctx_mem_hash_list); INIT_LIST_HEAD(&dev_entry->ctx_mem_hash_list);
mutex_init(&dev_entry->file_mutex); mutex_init(&dev_entry->file_mutex);
init_rwsem(&dev_entry->state_dump_sem);
spin_lock_init(&dev_entry->cb_spinlock); spin_lock_init(&dev_entry->cb_spinlock);
spin_lock_init(&dev_entry->cs_spinlock); spin_lock_init(&dev_entry->cs_spinlock);
spin_lock_init(&dev_entry->cs_job_spinlock); spin_lock_init(&dev_entry->cs_job_spinlock);
...@@ -1283,6 +1415,12 @@ void hl_debugfs_add_device(struct hl_device *hdev) ...@@ -1283,6 +1415,12 @@ void hl_debugfs_add_device(struct hl_device *hdev)
dev_entry->root, dev_entry->root,
&hdev->skip_reset_on_timeout); &hdev->skip_reset_on_timeout);
debugfs_create_file("state_dump",
0600,
dev_entry->root,
dev_entry,
&hl_state_dump_fops);
for (i = 0, entry = dev_entry->entry_arr ; i < count ; i++, entry++) { for (i = 0, entry = dev_entry->entry_arr ; i < count ; i++, entry++) {
debugfs_create_file(hl_debugfs_list[i].name, debugfs_create_file(hl_debugfs_list[i].name,
0444, 0444,
...@@ -1297,6 +1435,7 @@ void hl_debugfs_add_device(struct hl_device *hdev) ...@@ -1297,6 +1435,7 @@ void hl_debugfs_add_device(struct hl_device *hdev)
void hl_debugfs_remove_device(struct hl_device *hdev) void hl_debugfs_remove_device(struct hl_device *hdev)
{ {
struct hl_dbg_device_entry *entry = &hdev->hl_debugfs; struct hl_dbg_device_entry *entry = &hdev->hl_debugfs;
int i;
debugfs_remove_recursive(entry->root); debugfs_remove_recursive(entry->root);
...@@ -1304,6 +1443,9 @@ void hl_debugfs_remove_device(struct hl_device *hdev) ...@@ -1304,6 +1443,9 @@ void hl_debugfs_remove_device(struct hl_device *hdev)
vfree(entry->blob_desc.data); vfree(entry->blob_desc.data);
for (i = 0; i < ARRAY_SIZE(entry->state_dump); ++i)
vfree(entry->state_dump[i]);
kfree(entry->entry_arr); kfree(entry->entry_arr);
} }
...@@ -1416,6 +1558,28 @@ void hl_debugfs_remove_ctx_mem_hash(struct hl_device *hdev, struct hl_ctx *ctx) ...@@ -1416,6 +1558,28 @@ void hl_debugfs_remove_ctx_mem_hash(struct hl_device *hdev, struct hl_ctx *ctx)
spin_unlock(&dev_entry->ctx_mem_hash_spinlock); spin_unlock(&dev_entry->ctx_mem_hash_spinlock);
} }
/**
* hl_debugfs_set_state_dump - register state dump making it accessible via
* debugfs
* @hdev: pointer to the device structure
* @data: the actual dump data
* @length: the length of the data
*/
void hl_debugfs_set_state_dump(struct hl_device *hdev, char *data,
unsigned long length)
{
struct hl_dbg_device_entry *dev_entry = &hdev->hl_debugfs;
down_write(&dev_entry->state_dump_sem);
dev_entry->state_dump_head = (dev_entry->state_dump_head + 1) %
ARRAY_SIZE(dev_entry->state_dump);
vfree(dev_entry->state_dump[dev_entry->state_dump_head]);
dev_entry->state_dump[dev_entry->state_dump_head] = data;
up_write(&dev_entry->state_dump_sem);
}
void __init hl_debugfs_init(void) void __init hl_debugfs_init(void)
{ {
hl_debug_root = debugfs_create_dir("habanalabs", NULL); hl_debug_root = debugfs_create_dir("habanalabs", NULL);
......
...@@ -7,11 +7,11 @@ ...@@ -7,11 +7,11 @@
#define pr_fmt(fmt) "habanalabs: " fmt #define pr_fmt(fmt) "habanalabs: " fmt
#include <uapi/misc/habanalabs.h>
#include "habanalabs.h" #include "habanalabs.h"
#include <linux/pci.h> #include <linux/pci.h>
#include <linux/hwmon.h> #include <linux/hwmon.h>
#include <uapi/misc/habanalabs.h>
enum hl_device_status hl_device_status(struct hl_device *hdev) enum hl_device_status hl_device_status(struct hl_device *hdev)
{ {
...@@ -23,6 +23,8 @@ enum hl_device_status hl_device_status(struct hl_device *hdev) ...@@ -23,6 +23,8 @@ enum hl_device_status hl_device_status(struct hl_device *hdev)
status = HL_DEVICE_STATUS_NEEDS_RESET; status = HL_DEVICE_STATUS_NEEDS_RESET;
else if (hdev->disabled) else if (hdev->disabled)
status = HL_DEVICE_STATUS_MALFUNCTION; status = HL_DEVICE_STATUS_MALFUNCTION;
else if (!hdev->init_done)
status = HL_DEVICE_STATUS_IN_DEVICE_CREATION;
else else
status = HL_DEVICE_STATUS_OPERATIONAL; status = HL_DEVICE_STATUS_OPERATIONAL;
...@@ -44,6 +46,7 @@ bool hl_device_operational(struct hl_device *hdev, ...@@ -44,6 +46,7 @@ bool hl_device_operational(struct hl_device *hdev,
case HL_DEVICE_STATUS_NEEDS_RESET: case HL_DEVICE_STATUS_NEEDS_RESET:
return false; return false;
case HL_DEVICE_STATUS_OPERATIONAL: case HL_DEVICE_STATUS_OPERATIONAL:
case HL_DEVICE_STATUS_IN_DEVICE_CREATION:
default: default:
return true; return true;
} }
...@@ -129,8 +132,8 @@ static int hl_device_release(struct inode *inode, struct file *filp) ...@@ -129,8 +132,8 @@ static int hl_device_release(struct inode *inode, struct file *filp)
hl_ctx_mgr_fini(hdev, &hpriv->ctx_mgr); hl_ctx_mgr_fini(hdev, &hpriv->ctx_mgr);
if (!hl_hpriv_put(hpriv)) if (!hl_hpriv_put(hpriv))
dev_warn(hdev->dev, dev_notice(hdev->dev,
"Device is still in use because there are live CS and/or memory mappings\n"); "User process closed FD but device still in use\n");
hdev->last_open_session_duration_jif = hdev->last_open_session_duration_jif =
jiffies - hdev->last_successful_open_jif; jiffies - hdev->last_successful_open_jif;
...@@ -308,9 +311,15 @@ static void device_hard_reset_pending(struct work_struct *work) ...@@ -308,9 +311,15 @@ static void device_hard_reset_pending(struct work_struct *work)
container_of(work, struct hl_device_reset_work, container_of(work, struct hl_device_reset_work,
reset_work.work); reset_work.work);
struct hl_device *hdev = device_reset_work->hdev; struct hl_device *hdev = device_reset_work->hdev;
u32 flags;
int rc; int rc;
rc = hl_device_reset(hdev, HL_RESET_HARD | HL_RESET_FROM_RESET_THREAD); flags = HL_RESET_HARD | HL_RESET_FROM_RESET_THREAD;
if (device_reset_work->fw_reset)
flags |= HL_RESET_FW;
rc = hl_device_reset(hdev, flags);
if ((rc == -EBUSY) && !hdev->device_fini_pending) { if ((rc == -EBUSY) && !hdev->device_fini_pending) {
dev_info(hdev->dev, dev_info(hdev->dev,
"Could not reset device. will try again in %u seconds", "Could not reset device. will try again in %u seconds",
...@@ -682,6 +691,44 @@ int hl_device_set_debug_mode(struct hl_device *hdev, bool enable) ...@@ -682,6 +691,44 @@ int hl_device_set_debug_mode(struct hl_device *hdev, bool enable)
return rc; return rc;
} }
static void take_release_locks(struct hl_device *hdev)
{
/* Flush anyone that is inside the critical section of enqueue
* jobs to the H/W
*/
hdev->asic_funcs->hw_queues_lock(hdev);
hdev->asic_funcs->hw_queues_unlock(hdev);
/* Flush processes that are sending message to CPU */
mutex_lock(&hdev->send_cpu_message_lock);
mutex_unlock(&hdev->send_cpu_message_lock);
/* Flush anyone that is inside device open */
mutex_lock(&hdev->fpriv_list_lock);
mutex_unlock(&hdev->fpriv_list_lock);
}
static void cleanup_resources(struct hl_device *hdev, bool hard_reset, bool fw_reset)
{
if (hard_reset)
device_late_fini(hdev);
/*
* Halt the engines and disable interrupts so we won't get any more
* completions from H/W and we won't have any accesses from the
* H/W to the host machine
*/
hdev->asic_funcs->halt_engines(hdev, hard_reset, fw_reset);
/* Go over all the queues, release all CS and their jobs */
hl_cs_rollback_all(hdev);
/* Release all pending user interrupts, each pending user interrupt
* holds a reference to user context
*/
hl_release_pending_user_interrupts(hdev);
}
/* /*
* hl_device_suspend - initiate device suspend * hl_device_suspend - initiate device suspend
* *
...@@ -707,16 +754,7 @@ int hl_device_suspend(struct hl_device *hdev) ...@@ -707,16 +754,7 @@ int hl_device_suspend(struct hl_device *hdev)
/* This blocks all other stuff that is not blocked by in_reset */ /* This blocks all other stuff that is not blocked by in_reset */
hdev->disabled = true; hdev->disabled = true;
/* take_release_locks(hdev);
* Flush anyone that is inside the critical section of enqueue
* jobs to the H/W
*/
hdev->asic_funcs->hw_queues_lock(hdev);
hdev->asic_funcs->hw_queues_unlock(hdev);
/* Flush processes that are sending message to CPU */
mutex_lock(&hdev->send_cpu_message_lock);
mutex_unlock(&hdev->send_cpu_message_lock);
rc = hdev->asic_funcs->suspend(hdev); rc = hdev->asic_funcs->suspend(hdev);
if (rc) if (rc)
...@@ -819,6 +857,11 @@ static int device_kill_open_processes(struct hl_device *hdev, u32 timeout) ...@@ -819,6 +857,11 @@ static int device_kill_open_processes(struct hl_device *hdev, u32 timeout)
usleep_range(1000, 10000); usleep_range(1000, 10000);
put_task_struct(task); put_task_struct(task);
} else {
dev_warn(hdev->dev,
"Can't get task struct for PID so giving up on killing process\n");
mutex_unlock(&hdev->fpriv_list_lock);
return -ETIME;
} }
} }
...@@ -885,7 +928,7 @@ static void device_disable_open_processes(struct hl_device *hdev) ...@@ -885,7 +928,7 @@ static void device_disable_open_processes(struct hl_device *hdev)
int hl_device_reset(struct hl_device *hdev, u32 flags) int hl_device_reset(struct hl_device *hdev, u32 flags)
{ {
u64 idle_mask[HL_BUSY_ENGINES_MASK_EXT_SIZE] = {0}; u64 idle_mask[HL_BUSY_ENGINES_MASK_EXT_SIZE] = {0};
bool hard_reset, from_hard_reset_thread, hard_instead_soft = false; bool hard_reset, from_hard_reset_thread, fw_reset, hard_instead_soft = false;
int i, rc; int i, rc;
if (!hdev->init_done) { if (!hdev->init_done) {
...@@ -894,8 +937,9 @@ int hl_device_reset(struct hl_device *hdev, u32 flags) ...@@ -894,8 +937,9 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
return 0; return 0;
} }
hard_reset = (flags & HL_RESET_HARD) != 0; hard_reset = !!(flags & HL_RESET_HARD);
from_hard_reset_thread = (flags & HL_RESET_FROM_RESET_THREAD) != 0; from_hard_reset_thread = !!(flags & HL_RESET_FROM_RESET_THREAD);
fw_reset = !!(flags & HL_RESET_FW);
if (!hard_reset && !hdev->supports_soft_reset) { if (!hard_reset && !hdev->supports_soft_reset) {
hard_instead_soft = true; hard_instead_soft = true;
...@@ -947,11 +991,13 @@ int hl_device_reset(struct hl_device *hdev, u32 flags) ...@@ -947,11 +991,13 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
else else
hdev->curr_reset_cause = HL_RESET_CAUSE_UNKNOWN; hdev->curr_reset_cause = HL_RESET_CAUSE_UNKNOWN;
/* /* If reset is due to heartbeat, device CPU is no responsive in
* if reset is due to heartbeat, device CPU is no responsive in * which case no point sending PCI disable message to it.
* which case no point sending PCI disable message to it *
* If F/W is performing the reset, no need to send it a message to disable
* PCI access
*/ */
if (hard_reset && !(flags & HL_RESET_HEARTBEAT)) { if (hard_reset && !(flags & (HL_RESET_HEARTBEAT | HL_RESET_FW))) {
/* Disable PCI access from device F/W so he won't send /* Disable PCI access from device F/W so he won't send
* us additional interrupts. We disable MSI/MSI-X at * us additional interrupts. We disable MSI/MSI-X at
* the halt_engines function and we can't have the F/W * the halt_engines function and we can't have the F/W
...@@ -970,15 +1016,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags) ...@@ -970,15 +1016,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
/* This also blocks future CS/VM/JOB completion operations */ /* This also blocks future CS/VM/JOB completion operations */
hdev->disabled = true; hdev->disabled = true;
/* Flush anyone that is inside the critical section of enqueue take_release_locks(hdev);
* jobs to the H/W
*/
hdev->asic_funcs->hw_queues_lock(hdev);
hdev->asic_funcs->hw_queues_unlock(hdev);
/* Flush anyone that is inside device open */
mutex_lock(&hdev->fpriv_list_lock);
mutex_unlock(&hdev->fpriv_list_lock);
dev_err(hdev->dev, "Going to RESET device!\n"); dev_err(hdev->dev, "Going to RESET device!\n");
} }
...@@ -989,6 +1027,8 @@ int hl_device_reset(struct hl_device *hdev, u32 flags) ...@@ -989,6 +1027,8 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
hdev->process_kill_trial_cnt = 0; hdev->process_kill_trial_cnt = 0;
hdev->device_reset_work.fw_reset = fw_reset;
/* /*
* Because the reset function can't run from heartbeat work, * Because the reset function can't run from heartbeat work,
* we need to call the reset function from a dedicated work. * we need to call the reset function from a dedicated work.
...@@ -999,31 +1039,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags) ...@@ -999,31 +1039,7 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
return 0; return 0;
} }
if (hard_reset) { cleanup_resources(hdev, hard_reset, fw_reset);
device_late_fini(hdev);
/*
* Now that the heartbeat thread is closed, flush processes
* which are sending messages to CPU
*/
mutex_lock(&hdev->send_cpu_message_lock);
mutex_unlock(&hdev->send_cpu_message_lock);
}
/*
* Halt the engines and disable interrupts so we won't get any more
* completions from H/W and we won't have any accesses from the
* H/W to the host machine
*/
hdev->asic_funcs->halt_engines(hdev, hard_reset);
/* Go over all the queues, release all CS and their jobs */
hl_cs_rollback_all(hdev);
/* Release all pending user interrupts, each pending user interrupt
* holds a reference to user context
*/
hl_release_pending_user_interrupts(hdev);
kill_processes: kill_processes:
if (hard_reset) { if (hard_reset) {
...@@ -1057,12 +1073,15 @@ int hl_device_reset(struct hl_device *hdev, u32 flags) ...@@ -1057,12 +1073,15 @@ int hl_device_reset(struct hl_device *hdev, u32 flags)
} }
/* Reset the H/W. It will be in idle state after this returns */ /* Reset the H/W. It will be in idle state after this returns */
hdev->asic_funcs->hw_fini(hdev, hard_reset); hdev->asic_funcs->hw_fini(hdev, hard_reset, fw_reset);
if (hard_reset) { if (hard_reset) {
hdev->fw_loader.linux_loaded = false;
/* Release kernel context */ /* Release kernel context */
if (hdev->kernel_ctx && hl_ctx_put(hdev->kernel_ctx) == 1) if (hdev->kernel_ctx && hl_ctx_put(hdev->kernel_ctx) == 1)
hdev->kernel_ctx = NULL; hdev->kernel_ctx = NULL;
hl_vm_fini(hdev); hl_vm_fini(hdev);
hl_mmu_fini(hdev); hl_mmu_fini(hdev);
hl_eq_reset(hdev, &hdev->event_queue); hl_eq_reset(hdev, &hdev->event_queue);
...@@ -1292,6 +1311,10 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass) ...@@ -1292,6 +1311,10 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
if (rc) if (rc)
goto user_interrupts_fini; goto user_interrupts_fini;
/* initialize completion structure for multi CS wait */
hl_multi_cs_completion_init(hdev);
/* /*
* Initialize the H/W queues. Must be done before hw_init, because * Initialize the H/W queues. Must be done before hw_init, because
* there the addresses of the kernel queue are being written to the * there the addresses of the kernel queue are being written to the
...@@ -1361,6 +1384,8 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass) ...@@ -1361,6 +1384,8 @@ int hl_device_init(struct hl_device *hdev, struct class *hclass)
hdev->compute_ctx = NULL; hdev->compute_ctx = NULL;
hdev->asic_funcs->state_dump_init(hdev);
hl_debugfs_add_device(hdev); hl_debugfs_add_device(hdev);
/* debugfs nodes are created in hl_ctx_init so it must be called after /* debugfs nodes are created in hl_ctx_init so it must be called after
...@@ -1567,31 +1592,13 @@ void hl_device_fini(struct hl_device *hdev) ...@@ -1567,31 +1592,13 @@ void hl_device_fini(struct hl_device *hdev)
/* Mark device as disabled */ /* Mark device as disabled */
hdev->disabled = true; hdev->disabled = true;
/* Flush anyone that is inside the critical section of enqueue take_release_locks(hdev);
* jobs to the H/W
*/
hdev->asic_funcs->hw_queues_lock(hdev);
hdev->asic_funcs->hw_queues_unlock(hdev);
/* Flush anyone that is inside device open */
mutex_lock(&hdev->fpriv_list_lock);
mutex_unlock(&hdev->fpriv_list_lock);
hdev->hard_reset_pending = true; hdev->hard_reset_pending = true;
hl_hwmon_fini(hdev); hl_hwmon_fini(hdev);
device_late_fini(hdev); cleanup_resources(hdev, true, false);
/*
* Halt the engines and disable interrupts so we won't get any more
* completions from H/W and we won't have any accesses from the
* H/W to the host machine
*/
hdev->asic_funcs->halt_engines(hdev, true);
/* Go over all the queues, release all CS and their jobs */
hl_cs_rollback_all(hdev);
/* Kill processes here after CS rollback. This is because the process /* Kill processes here after CS rollback. This is because the process
* can't really exit until all its CSs are done, which is what we * can't really exit until all its CSs are done, which is what we
...@@ -1610,7 +1617,9 @@ void hl_device_fini(struct hl_device *hdev) ...@@ -1610,7 +1617,9 @@ void hl_device_fini(struct hl_device *hdev)
hl_cb_pool_fini(hdev); hl_cb_pool_fini(hdev);
/* Reset the H/W. It will be in idle state after this returns */ /* Reset the H/W. It will be in idle state after this returns */
hdev->asic_funcs->hw_fini(hdev, true); hdev->asic_funcs->hw_fini(hdev, true, false);
hdev->fw_loader.linux_loaded = false;
/* Release kernel context */ /* Release kernel context */
if ((hdev->kernel_ctx) && (hl_ctx_put(hdev->kernel_ctx) != 1)) if ((hdev->kernel_ctx) && (hl_ctx_put(hdev->kernel_ctx) != 1))
......
// SPDX-License-Identifier: GPL-2.0 // SPDX-License-Identifier: GPL-2.0
/* /*
* Copyright 2016-2019 HabanaLabs, Ltd. * Copyright 2016-2021 HabanaLabs, Ltd.
* All Rights Reserved. * All Rights Reserved.
*/ */
...@@ -240,11 +240,15 @@ int hl_fw_send_cpu_message(struct hl_device *hdev, u32 hw_queue_id, u32 *msg, ...@@ -240,11 +240,15 @@ int hl_fw_send_cpu_message(struct hl_device *hdev, u32 hw_queue_id, u32 *msg,
/* set fence to a non valid value */ /* set fence to a non valid value */
pkt->fence = cpu_to_le32(UINT_MAX); pkt->fence = cpu_to_le32(UINT_MAX);
rc = hl_hw_queue_send_cb_no_cmpl(hdev, hw_queue_id, len, pkt_dma_addr); /*
if (rc) { * The CPU queue is a synchronous queue with an effective depth of
dev_err(hdev->dev, "Failed to send CB on CPU PQ (%d)\n", rc); * a single entry (although it is allocated with room for multiple
goto out; * entries). We lock on it using 'send_cpu_message_lock' which
} * serializes accesses to the CPU queue.
* Which means that we don't need to lock the access to the entire H/W
* queues module when submitting a JOB to the CPU queue.
*/
hl_hw_queue_submit_bd(hdev, queue, 0, len, pkt_dma_addr);
if (prop->fw_app_cpu_boot_dev_sts0 & CPU_BOOT_DEV_STS0_PKT_PI_ACK_EN) if (prop->fw_app_cpu_boot_dev_sts0 & CPU_BOOT_DEV_STS0_PKT_PI_ACK_EN)
expected_ack_val = queue->pi; expected_ack_val = queue->pi;
...@@ -663,17 +667,15 @@ int hl_fw_cpucp_info_get(struct hl_device *hdev, ...@@ -663,17 +667,15 @@ int hl_fw_cpucp_info_get(struct hl_device *hdev,
hdev->event_queue.check_eqe_index = false; hdev->event_queue.check_eqe_index = false;
/* Read FW application security bits again */ /* Read FW application security bits again */
if (hdev->asic_prop.fw_cpu_boot_dev_sts0_valid) { if (prop->fw_cpu_boot_dev_sts0_valid) {
hdev->asic_prop.fw_app_cpu_boot_dev_sts0 = prop->fw_app_cpu_boot_dev_sts0 = RREG32(sts_boot_dev_sts0_reg);
RREG32(sts_boot_dev_sts0_reg); if (prop->fw_app_cpu_boot_dev_sts0 &
if (hdev->asic_prop.fw_app_cpu_boot_dev_sts0 &
CPU_BOOT_DEV_STS0_EQ_INDEX_EN) CPU_BOOT_DEV_STS0_EQ_INDEX_EN)
hdev->event_queue.check_eqe_index = true; hdev->event_queue.check_eqe_index = true;
} }
if (hdev->asic_prop.fw_cpu_boot_dev_sts1_valid) if (prop->fw_cpu_boot_dev_sts1_valid)
hdev->asic_prop.fw_app_cpu_boot_dev_sts1 = prop->fw_app_cpu_boot_dev_sts1 = RREG32(sts_boot_dev_sts1_reg);
RREG32(sts_boot_dev_sts1_reg);
out: out:
hdev->asic_funcs->cpu_accessible_dma_pool_free(hdev, hdev->asic_funcs->cpu_accessible_dma_pool_free(hdev,
...@@ -1008,6 +1010,11 @@ void hl_fw_ask_halt_machine_without_linux(struct hl_device *hdev) ...@@ -1008,6 +1010,11 @@ void hl_fw_ask_halt_machine_without_linux(struct hl_device *hdev)
} else { } else {
WREG32(static_loader->kmd_msg_to_cpu_reg, KMD_MSG_GOTO_WFE); WREG32(static_loader->kmd_msg_to_cpu_reg, KMD_MSG_GOTO_WFE);
msleep(static_loader->cpu_reset_wait_msec); msleep(static_loader->cpu_reset_wait_msec);
/* Must clear this register in order to prevent preboot
* from reading WFE after reboot
*/
WREG32(static_loader->kmd_msg_to_cpu_reg, KMD_MSG_NA);
} }
hdev->device_cpu_is_halted = true; hdev->device_cpu_is_halted = true;
...@@ -1055,6 +1062,10 @@ static void detect_cpu_boot_status(struct hl_device *hdev, u32 status) ...@@ -1055,6 +1062,10 @@ static void detect_cpu_boot_status(struct hl_device *hdev, u32 status)
dev_err(hdev->dev, dev_err(hdev->dev,
"Device boot progress - Thermal Sensor initialization failed\n"); "Device boot progress - Thermal Sensor initialization failed\n");
break; break;
case CPU_BOOT_STATUS_SECURITY_READY:
dev_err(hdev->dev,
"Device boot progress - Stuck in preboot after security initialization\n");
break;
default: default:
dev_err(hdev->dev, dev_err(hdev->dev,
"Device boot progress - Invalid status code %d\n", "Device boot progress - Invalid status code %d\n",
...@@ -1238,11 +1249,6 @@ static void hl_fw_preboot_update_state(struct hl_device *hdev) ...@@ -1238,11 +1249,6 @@ static void hl_fw_preboot_update_state(struct hl_device *hdev)
* b. Check whether hard reset is done by boot cpu * b. Check whether hard reset is done by boot cpu
* 3. FW application - a. Fetch fw application security status * 3. FW application - a. Fetch fw application security status
* b. Check whether hard reset is done by fw app * b. Check whether hard reset is done by fw app
*
* Preboot:
* Check security status bit (CPU_BOOT_DEV_STS0_ENABLED). If set, then-
* check security enabled bit (CPU_BOOT_DEV_STS0_SECURITY_EN)
* If set, then mark GIC controller to be disabled.
*/ */
prop->hard_reset_done_by_fw = prop->hard_reset_done_by_fw =
!!(cpu_boot_dev_sts0 & CPU_BOOT_DEV_STS0_FW_HARD_RST_EN); !!(cpu_boot_dev_sts0 & CPU_BOOT_DEV_STS0_FW_HARD_RST_EN);
...@@ -1953,8 +1959,8 @@ static void hl_fw_dynamic_update_linux_interrupt_if(struct hl_device *hdev) ...@@ -1953,8 +1959,8 @@ static void hl_fw_dynamic_update_linux_interrupt_if(struct hl_device *hdev)
if (!hdev->asic_prop.gic_interrupts_enable && if (!hdev->asic_prop.gic_interrupts_enable &&
!(hdev->asic_prop.fw_app_cpu_boot_dev_sts0 & !(hdev->asic_prop.fw_app_cpu_boot_dev_sts0 &
CPU_BOOT_DEV_STS0_MULTI_IRQ_POLL_EN)) { CPU_BOOT_DEV_STS0_MULTI_IRQ_POLL_EN)) {
dyn_regs->gic_host_halt_irq = dyn_regs->gic_host_irq_ctrl; dyn_regs->gic_host_halt_irq = dyn_regs->gic_host_pi_upd_irq;
dyn_regs->gic_host_ints_irq = dyn_regs->gic_host_irq_ctrl; dyn_regs->gic_host_ints_irq = dyn_regs->gic_host_pi_upd_irq;
dev_warn(hdev->dev, dev_warn(hdev->dev,
"Using a single interrupt interface towards cpucp"); "Using a single interrupt interface towards cpucp");
...@@ -2122,8 +2128,7 @@ static void hl_fw_linux_update_state(struct hl_device *hdev, ...@@ -2122,8 +2128,7 @@ static void hl_fw_linux_update_state(struct hl_device *hdev,
/* Read FW application security bits */ /* Read FW application security bits */
if (prop->fw_cpu_boot_dev_sts0_valid) { if (prop->fw_cpu_boot_dev_sts0_valid) {
prop->fw_app_cpu_boot_dev_sts0 = prop->fw_app_cpu_boot_dev_sts0 = RREG32(cpu_boot_dev_sts0_reg);
RREG32(cpu_boot_dev_sts0_reg);
if (prop->fw_app_cpu_boot_dev_sts0 & if (prop->fw_app_cpu_boot_dev_sts0 &
CPU_BOOT_DEV_STS0_FW_HARD_RST_EN) CPU_BOOT_DEV_STS0_FW_HARD_RST_EN)
...@@ -2143,8 +2148,7 @@ static void hl_fw_linux_update_state(struct hl_device *hdev, ...@@ -2143,8 +2148,7 @@ static void hl_fw_linux_update_state(struct hl_device *hdev,
} }
if (prop->fw_cpu_boot_dev_sts1_valid) { if (prop->fw_cpu_boot_dev_sts1_valid) {
prop->fw_app_cpu_boot_dev_sts1 = prop->fw_app_cpu_boot_dev_sts1 = RREG32(cpu_boot_dev_sts1_reg);
RREG32(cpu_boot_dev_sts1_reg);
dev_dbg(hdev->dev, dev_dbg(hdev->dev,
"Firmware application CPU status1 %#x\n", "Firmware application CPU status1 %#x\n",
...@@ -2235,6 +2239,10 @@ static int hl_fw_dynamic_init_cpu(struct hl_device *hdev, ...@@ -2235,6 +2239,10 @@ static int hl_fw_dynamic_init_cpu(struct hl_device *hdev,
dev_info(hdev->dev, dev_info(hdev->dev,
"Loading firmware to device, may take some time...\n"); "Loading firmware to device, may take some time...\n");
/*
* In this stage, "cpu_dyn_regs" contains only LKD's hard coded values!
* It will be updated from FW after hl_fw_dynamic_request_descriptor().
*/
dyn_regs = &fw_loader->dynamic_loader.comm_desc.cpu_dyn_regs; dyn_regs = &fw_loader->dynamic_loader.comm_desc.cpu_dyn_regs;
rc = hl_fw_dynamic_send_protocol_cmd(hdev, fw_loader, COMMS_RST_STATE, rc = hl_fw_dynamic_send_protocol_cmd(hdev, fw_loader, COMMS_RST_STATE,
......
This diff is collapsed.
...@@ -141,7 +141,7 @@ int hl_device_open(struct inode *inode, struct file *filp) ...@@ -141,7 +141,7 @@ int hl_device_open(struct inode *inode, struct file *filp)
hl_cb_mgr_init(&hpriv->cb_mgr); hl_cb_mgr_init(&hpriv->cb_mgr);
hl_ctx_mgr_init(&hpriv->ctx_mgr); hl_ctx_mgr_init(&hpriv->ctx_mgr);
hpriv->taskpid = find_get_pid(current->pid); hpriv->taskpid = get_task_pid(current, PIDTYPE_PID);
mutex_lock(&hdev->fpriv_list_lock); mutex_lock(&hdev->fpriv_list_lock);
...@@ -194,7 +194,6 @@ int hl_device_open(struct inode *inode, struct file *filp) ...@@ -194,7 +194,6 @@ int hl_device_open(struct inode *inode, struct file *filp)
out_err: out_err:
mutex_unlock(&hdev->fpriv_list_lock); mutex_unlock(&hdev->fpriv_list_lock);
hl_cb_mgr_fini(hpriv->hdev, &hpriv->cb_mgr); hl_cb_mgr_fini(hpriv->hdev, &hpriv->cb_mgr);
hl_ctx_mgr_fini(hpriv->hdev, &hpriv->ctx_mgr); hl_ctx_mgr_fini(hpriv->hdev, &hpriv->ctx_mgr);
filp->private_data = NULL; filp->private_data = NULL;
...@@ -318,12 +317,16 @@ int create_hdev(struct hl_device **dev, struct pci_dev *pdev, ...@@ -318,12 +317,16 @@ int create_hdev(struct hl_device **dev, struct pci_dev *pdev,
hdev->asic_prop.fw_security_enabled = false; hdev->asic_prop.fw_security_enabled = false;
/* Assign status description string */ /* Assign status description string */
strncpy(hdev->status[HL_DEVICE_STATUS_MALFUNCTION], strncpy(hdev->status[HL_DEVICE_STATUS_OPERATIONAL],
"disabled", HL_STR_MAX); "operational", HL_STR_MAX);
strncpy(hdev->status[HL_DEVICE_STATUS_IN_RESET], strncpy(hdev->status[HL_DEVICE_STATUS_IN_RESET],
"in reset", HL_STR_MAX); "in reset", HL_STR_MAX);
strncpy(hdev->status[HL_DEVICE_STATUS_MALFUNCTION],
"disabled", HL_STR_MAX);
strncpy(hdev->status[HL_DEVICE_STATUS_NEEDS_RESET], strncpy(hdev->status[HL_DEVICE_STATUS_NEEDS_RESET],
"needs reset", HL_STR_MAX); "needs reset", HL_STR_MAX);
strncpy(hdev->status[HL_DEVICE_STATUS_IN_DEVICE_CREATION],
"in device creation", HL_STR_MAX);
hdev->major = hl_major; hdev->major = hl_major;
hdev->reset_on_lockup = reset_on_lockup; hdev->reset_on_lockup = reset_on_lockup;
...@@ -532,7 +535,7 @@ hl_pci_err_detected(struct pci_dev *pdev, pci_channel_state_t state) ...@@ -532,7 +535,7 @@ hl_pci_err_detected(struct pci_dev *pdev, pci_channel_state_t state)
result = PCI_ERS_RESULT_NONE; result = PCI_ERS_RESULT_NONE;
} }
hdev->asic_funcs->halt_engines(hdev, true); hdev->asic_funcs->halt_engines(hdev, true, false);
return result; return result;
} }
......
...@@ -94,6 +94,8 @@ static int hw_ip_info(struct hl_device *hdev, struct hl_info_args *args) ...@@ -94,6 +94,8 @@ static int hw_ip_info(struct hl_device *hdev, struct hl_info_args *args)
hw_ip.first_available_interrupt_id = hw_ip.first_available_interrupt_id =
prop->first_available_user_msix_interrupt; prop->first_available_user_msix_interrupt;
hw_ip.server_type = prop->server_type;
return copy_to_user(out, &hw_ip, return copy_to_user(out, &hw_ip,
min((size_t) size, sizeof(hw_ip))) ? -EFAULT : 0; min((size_t) size, sizeof(hw_ip))) ? -EFAULT : 0;
} }
......
...@@ -65,7 +65,7 @@ void hl_hw_queue_update_ci(struct hl_cs *cs) ...@@ -65,7 +65,7 @@ void hl_hw_queue_update_ci(struct hl_cs *cs)
} }
/* /*
* ext_and_hw_queue_submit_bd() - Submit a buffer descriptor to an external or a * hl_hw_queue_submit_bd() - Submit a buffer descriptor to an external or a
* H/W queue. * H/W queue.
* @hdev: pointer to habanalabs device structure * @hdev: pointer to habanalabs device structure
* @q: pointer to habanalabs queue structure * @q: pointer to habanalabs queue structure
...@@ -80,8 +80,8 @@ void hl_hw_queue_update_ci(struct hl_cs *cs) ...@@ -80,8 +80,8 @@ void hl_hw_queue_update_ci(struct hl_cs *cs)
* This function must be called when the scheduler mutex is taken * This function must be called when the scheduler mutex is taken
* *
*/ */
static void ext_and_hw_queue_submit_bd(struct hl_device *hdev, void hl_hw_queue_submit_bd(struct hl_device *hdev, struct hl_hw_queue *q,
struct hl_hw_queue *q, u32 ctl, u32 len, u64 ptr) u32 ctl, u32 len, u64 ptr)
{ {
struct hl_bd *bd; struct hl_bd *bd;
...@@ -222,8 +222,8 @@ static int hw_queue_sanity_checks(struct hl_device *hdev, struct hl_hw_queue *q, ...@@ -222,8 +222,8 @@ static int hw_queue_sanity_checks(struct hl_device *hdev, struct hl_hw_queue *q,
* @cb_size: size of CB * @cb_size: size of CB
* @cb_ptr: pointer to CB location * @cb_ptr: pointer to CB location
* *
* This function sends a single CB, that must NOT generate a completion entry * This function sends a single CB, that must NOT generate a completion entry.
* * Sending CPU messages can be done instead via 'hl_hw_queue_submit_bd()'
*/ */
int hl_hw_queue_send_cb_no_cmpl(struct hl_device *hdev, u32 hw_queue_id, int hl_hw_queue_send_cb_no_cmpl(struct hl_device *hdev, u32 hw_queue_id,
u32 cb_size, u64 cb_ptr) u32 cb_size, u64 cb_ptr)
...@@ -231,15 +231,6 @@ int hl_hw_queue_send_cb_no_cmpl(struct hl_device *hdev, u32 hw_queue_id, ...@@ -231,15 +231,6 @@ int hl_hw_queue_send_cb_no_cmpl(struct hl_device *hdev, u32 hw_queue_id,
struct hl_hw_queue *q = &hdev->kernel_queues[hw_queue_id]; struct hl_hw_queue *q = &hdev->kernel_queues[hw_queue_id];
int rc = 0; int rc = 0;
/*
* The CPU queue is a synchronous queue with an effective depth of
* a single entry (although it is allocated with room for multiple
* entries). Therefore, there is a different lock, called
* send_cpu_message_lock, that serializes accesses to the CPU queue.
* As a result, we don't need to lock the access to the entire H/W
* queues module when submitting a JOB to the CPU queue
*/
if (q->queue_type != QUEUE_TYPE_CPU)
hdev->asic_funcs->hw_queues_lock(hdev); hdev->asic_funcs->hw_queues_lock(hdev);
if (hdev->disabled) { if (hdev->disabled) {
...@@ -258,10 +249,9 @@ int hl_hw_queue_send_cb_no_cmpl(struct hl_device *hdev, u32 hw_queue_id, ...@@ -258,10 +249,9 @@ int hl_hw_queue_send_cb_no_cmpl(struct hl_device *hdev, u32 hw_queue_id,
goto out; goto out;
} }
ext_and_hw_queue_submit_bd(hdev, q, 0, cb_size, cb_ptr); hl_hw_queue_submit_bd(hdev, q, 0, cb_size, cb_ptr);
out: out:
if (q->queue_type != QUEUE_TYPE_CPU)
hdev->asic_funcs->hw_queues_unlock(hdev); hdev->asic_funcs->hw_queues_unlock(hdev);
return rc; return rc;
...@@ -328,7 +318,7 @@ static void ext_queue_schedule_job(struct hl_cs_job *job) ...@@ -328,7 +318,7 @@ static void ext_queue_schedule_job(struct hl_cs_job *job)
cq->pi = hl_cq_inc_ptr(cq->pi); cq->pi = hl_cq_inc_ptr(cq->pi);
submit_bd: submit_bd:
ext_and_hw_queue_submit_bd(hdev, q, ctl, len, ptr); hl_hw_queue_submit_bd(hdev, q, ctl, len, ptr);
} }
/* /*
...@@ -407,7 +397,7 @@ static void hw_queue_schedule_job(struct hl_cs_job *job) ...@@ -407,7 +397,7 @@ static void hw_queue_schedule_job(struct hl_cs_job *job)
else else
ptr = (u64) (uintptr_t) job->user_cb; ptr = (u64) (uintptr_t) job->user_cb;
ext_and_hw_queue_submit_bd(hdev, q, ctl, len, ptr); hl_hw_queue_submit_bd(hdev, q, ctl, len, ptr);
} }
static int init_signal_cs(struct hl_device *hdev, static int init_signal_cs(struct hl_device *hdev,
...@@ -426,8 +416,9 @@ static int init_signal_cs(struct hl_device *hdev, ...@@ -426,8 +416,9 @@ static int init_signal_cs(struct hl_device *hdev,
cs_cmpl->sob_val = prop->next_sob_val; cs_cmpl->sob_val = prop->next_sob_val;
dev_dbg(hdev->dev, dev_dbg(hdev->dev,
"generate signal CB, sob_id: %d, sob val: 0x%x, q_idx: %d\n", "generate signal CB, sob_id: %d, sob val: %u, q_idx: %d, seq: %llu\n",
cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val, q_idx); cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val, q_idx,
cs_cmpl->cs_seq);
/* we set an EB since we must make sure all oeprations are done /* we set an EB since we must make sure all oeprations are done
* when sending the signal * when sending the signal
...@@ -435,17 +426,37 @@ static int init_signal_cs(struct hl_device *hdev, ...@@ -435,17 +426,37 @@ static int init_signal_cs(struct hl_device *hdev,
hdev->asic_funcs->gen_signal_cb(hdev, job->patched_cb, hdev->asic_funcs->gen_signal_cb(hdev, job->patched_cb,
cs_cmpl->hw_sob->sob_id, 0, true); cs_cmpl->hw_sob->sob_id, 0, true);
rc = hl_cs_signal_sob_wraparound_handler(hdev, q_idx, &hw_sob, 1); rc = hl_cs_signal_sob_wraparound_handler(hdev, q_idx, &hw_sob, 1,
false);
return rc; return rc;
} }
static void init_wait_cs(struct hl_device *hdev, struct hl_cs *cs, void hl_hw_queue_encaps_sig_set_sob_info(struct hl_device *hdev,
struct hl_cs *cs, struct hl_cs_job *job,
struct hl_cs_compl *cs_cmpl)
{
struct hl_cs_encaps_sig_handle *handle = cs->encaps_sig_hdl;
cs_cmpl->hw_sob = handle->hw_sob;
/* Note that encaps_sig_wait_offset was validated earlier in the flow
* for offset value which exceeds the max reserved signal count.
* always decrement 1 of the offset since when the user
* set offset 1 for example he mean to wait only for the first
* signal only, which will be pre_sob_val, and if he set offset 2
* then the value required is (pre_sob_val + 1) and so on...
*/
cs_cmpl->sob_val = handle->pre_sob_val +
(job->encaps_sig_wait_offset - 1);
}
static int init_wait_cs(struct hl_device *hdev, struct hl_cs *cs,
struct hl_cs_job *job, struct hl_cs_compl *cs_cmpl) struct hl_cs_job *job, struct hl_cs_compl *cs_cmpl)
{ {
struct hl_cs_compl *signal_cs_cmpl;
struct hl_sync_stream_properties *prop;
struct hl_gen_wait_properties wait_prop; struct hl_gen_wait_properties wait_prop;
struct hl_sync_stream_properties *prop;
struct hl_cs_compl *signal_cs_cmpl;
u32 q_idx; u32 q_idx;
q_idx = job->hw_queue_id; q_idx = job->hw_queue_id;
...@@ -455,14 +466,51 @@ static void init_wait_cs(struct hl_device *hdev, struct hl_cs *cs, ...@@ -455,14 +466,51 @@ static void init_wait_cs(struct hl_device *hdev, struct hl_cs *cs,
struct hl_cs_compl, struct hl_cs_compl,
base_fence); base_fence);
/* copy the SOB id and value of the signal CS */ if (cs->encaps_signals) {
/* use the encaps signal handle stored earlier in the flow
* and set the SOB information from the encaps
* signals handle
*/
hl_hw_queue_encaps_sig_set_sob_info(hdev, cs, job, cs_cmpl);
dev_dbg(hdev->dev, "Wait for encaps signals handle, qidx(%u), CS sequence(%llu), sob val: 0x%x, offset: %u\n",
cs->encaps_sig_hdl->q_idx,
cs->encaps_sig_hdl->cs_seq,
cs_cmpl->sob_val,
job->encaps_sig_wait_offset);
} else {
/* Copy the SOB id and value of the signal CS */
cs_cmpl->hw_sob = signal_cs_cmpl->hw_sob; cs_cmpl->hw_sob = signal_cs_cmpl->hw_sob;
cs_cmpl->sob_val = signal_cs_cmpl->sob_val; cs_cmpl->sob_val = signal_cs_cmpl->sob_val;
}
/* check again if the signal cs already completed.
* if yes then don't send any wait cs since the hw_sob
* could be in reset already. if signal is not completed
* then get refcount to hw_sob to prevent resetting the sob
* while wait cs is not submitted.
* note that this check is protected by two locks,
* hw queue lock and completion object lock,
* and the same completion object lock also protects
* the hw_sob reset handler function.
* The hw_queue lock prevent out of sync of hw_sob
* refcount value, changed by signal/wait flows.
*/
spin_lock(&signal_cs_cmpl->lock);
if (completion_done(&cs->signal_fence->completion)) {
spin_unlock(&signal_cs_cmpl->lock);
return -EINVAL;
}
kref_get(&cs_cmpl->hw_sob->kref);
spin_unlock(&signal_cs_cmpl->lock);
dev_dbg(hdev->dev, dev_dbg(hdev->dev,
"generate wait CB, sob_id: %d, sob_val: 0x%x, mon_id: %d, q_idx: %d\n", "generate wait CB, sob_id: %d, sob_val: 0x%x, mon_id: %d, q_idx: %d, seq: %llu\n",
cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val, cs_cmpl->hw_sob->sob_id, cs_cmpl->sob_val,
prop->base_mon_id, q_idx); prop->base_mon_id, q_idx, cs->sequence);
wait_prop.data = (void *) job->patched_cb; wait_prop.data = (void *) job->patched_cb;
wait_prop.sob_base = cs_cmpl->hw_sob->sob_id; wait_prop.sob_base = cs_cmpl->hw_sob->sob_id;
...@@ -471,17 +519,14 @@ static void init_wait_cs(struct hl_device *hdev, struct hl_cs *cs, ...@@ -471,17 +519,14 @@ static void init_wait_cs(struct hl_device *hdev, struct hl_cs *cs,
wait_prop.mon_id = prop->base_mon_id; wait_prop.mon_id = prop->base_mon_id;
wait_prop.q_idx = q_idx; wait_prop.q_idx = q_idx;
wait_prop.size = 0; wait_prop.size = 0;
hdev->asic_funcs->gen_wait_cb(hdev, &wait_prop); hdev->asic_funcs->gen_wait_cb(hdev, &wait_prop);
kref_get(&cs_cmpl->hw_sob->kref);
/*
* Must put the signal fence after the SOB refcnt increment so
* the SOB refcnt won't turn 0 and reset the SOB before the
* wait CS was submitted.
*/
mb(); mb();
hl_fence_put(cs->signal_fence); hl_fence_put(cs->signal_fence);
cs->signal_fence = NULL; cs->signal_fence = NULL;
return 0;
} }
/* /*
...@@ -506,7 +551,60 @@ static int init_signal_wait_cs(struct hl_cs *cs) ...@@ -506,7 +551,60 @@ static int init_signal_wait_cs(struct hl_cs *cs)
if (cs->type & CS_TYPE_SIGNAL) if (cs->type & CS_TYPE_SIGNAL)
rc = init_signal_cs(hdev, job, cs_cmpl); rc = init_signal_cs(hdev, job, cs_cmpl);
else if (cs->type & CS_TYPE_WAIT) else if (cs->type & CS_TYPE_WAIT)
init_wait_cs(hdev, cs, job, cs_cmpl); rc = init_wait_cs(hdev, cs, job, cs_cmpl);
return rc;
}
static int encaps_sig_first_staged_cs_handler
(struct hl_device *hdev, struct hl_cs *cs)
{
struct hl_cs_compl *cs_cmpl =
container_of(cs->fence,
struct hl_cs_compl, base_fence);
struct hl_cs_encaps_sig_handle *encaps_sig_hdl;
struct hl_encaps_signals_mgr *mgr;
int rc = 0;
mgr = &hdev->compute_ctx->sig_mgr;
spin_lock(&mgr->lock);
encaps_sig_hdl = idr_find(&mgr->handles, cs->encaps_sig_hdl_id);
if (encaps_sig_hdl) {
/*
* Set handler CS sequence,
* the CS which contains the encapsulated signals.
*/
encaps_sig_hdl->cs_seq = cs->sequence;
/* store the handle and set encaps signal indication,
* to be used later in cs_do_release to put the last
* reference to encaps signals handlers.
*/
cs_cmpl->encaps_signals = true;
cs_cmpl->encaps_sig_hdl = encaps_sig_hdl;
/* set hw_sob pointer in completion object
* since it's used in cs_do_release flow to put
* refcount to sob
*/
cs_cmpl->hw_sob = encaps_sig_hdl->hw_sob;
cs_cmpl->sob_val = encaps_sig_hdl->pre_sob_val +
encaps_sig_hdl->count;
dev_dbg(hdev->dev, "CS seq (%llu) added to encaps signal handler id (%u), count(%u), qidx(%u), sob(%u), val(%u)\n",
cs->sequence, encaps_sig_hdl->id,
encaps_sig_hdl->count,
encaps_sig_hdl->q_idx,
cs_cmpl->hw_sob->sob_id,
cs_cmpl->sob_val);
} else {
dev_err(hdev->dev, "encaps handle id(%u) wasn't found!\n",
cs->encaps_sig_hdl_id);
rc = -EINVAL;
}
spin_unlock(&mgr->lock);
return rc; return rc;
} }
...@@ -581,14 +679,21 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs) ...@@ -581,14 +679,21 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
if ((cs->type == CS_TYPE_SIGNAL) || (cs->type == CS_TYPE_WAIT)) { if ((cs->type == CS_TYPE_SIGNAL) || (cs->type == CS_TYPE_WAIT)) {
rc = init_signal_wait_cs(cs); rc = init_signal_wait_cs(cs);
if (rc) { if (rc)
dev_err(hdev->dev, "Failed to submit signal cs\n"); goto unroll_cq_resv;
} else if (cs->type == CS_TYPE_COLLECTIVE_WAIT) {
rc = hdev->asic_funcs->collective_wait_init_cs(cs);
if (rc)
goto unroll_cq_resv; goto unroll_cq_resv;
} }
} else if (cs->type == CS_TYPE_COLLECTIVE_WAIT)
hdev->asic_funcs->collective_wait_init_cs(cs);
if (cs->encaps_signals && cs->staged_first) {
rc = encaps_sig_first_staged_cs_handler(hdev, cs);
if (rc)
goto unroll_cq_resv;
}
spin_lock(&hdev->cs_mirror_lock); spin_lock(&hdev->cs_mirror_lock);
/* Verify staged CS exists and add to the staged list */ /* Verify staged CS exists and add to the staged list */
...@@ -613,6 +718,11 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs) ...@@ -613,6 +718,11 @@ int hl_hw_queue_schedule_cs(struct hl_cs *cs)
} }
list_add_tail(&cs->staged_cs_node, &staged_cs->staged_cs_node); list_add_tail(&cs->staged_cs_node, &staged_cs->staged_cs_node);
/* update stream map of the first CS */
if (hdev->supports_wait_for_multi_cs)
staged_cs->fence->stream_master_qid_map |=
cs->fence->stream_master_qid_map;
} }
list_add_tail(&cs->mirror_node, &hdev->cs_mirror_list); list_add_tail(&cs->mirror_node, &hdev->cs_mirror_list);
...@@ -834,6 +944,8 @@ static void sync_stream_queue_init(struct hl_device *hdev, u32 q_idx) ...@@ -834,6 +944,8 @@ static void sync_stream_queue_init(struct hl_device *hdev, u32 q_idx)
hw_sob = &sync_stream_prop->hw_sob[sob]; hw_sob = &sync_stream_prop->hw_sob[sob];
hw_sob->hdev = hdev; hw_sob->hdev = hdev;
hw_sob->sob_id = sync_stream_prop->base_sob_id + sob; hw_sob->sob_id = sync_stream_prop->base_sob_id + sob;
hw_sob->sob_addr =
hdev->asic_funcs->get_sob_addr(hdev, hw_sob->sob_id);
hw_sob->q_idx = q_idx; hw_sob->q_idx = q_idx;
kref_init(&hw_sob->kref); kref_init(&hw_sob->kref);
} }
......
...@@ -124,7 +124,7 @@ static int alloc_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args, ...@@ -124,7 +124,7 @@ static int alloc_device_memory(struct hl_ctx *ctx, struct hl_mem_in *args,
spin_lock(&vm->idr_lock); spin_lock(&vm->idr_lock);
handle = idr_alloc(&vm->phys_pg_pack_handles, phys_pg_pack, 1, 0, handle = idr_alloc(&vm->phys_pg_pack_handles, phys_pg_pack, 1, 0,
GFP_KERNEL); GFP_ATOMIC);
spin_unlock(&vm->idr_lock); spin_unlock(&vm->idr_lock);
if (handle < 0) { if (handle < 0) {
...@@ -528,6 +528,33 @@ static inline int add_va_block(struct hl_device *hdev, ...@@ -528,6 +528,33 @@ static inline int add_va_block(struct hl_device *hdev,
return rc; return rc;
} }
/**
* is_hint_crossing_range() - check if hint address crossing specified reserved
* range.
*/
static inline bool is_hint_crossing_range(enum hl_va_range_type range_type,
u64 start_addr, u32 size, struct asic_fixed_properties *prop) {
bool range_cross;
if (range_type == HL_VA_RANGE_TYPE_DRAM)
range_cross =
hl_mem_area_crosses_range(start_addr, size,
prop->hints_dram_reserved_va_range.start_addr,
prop->hints_dram_reserved_va_range.end_addr);
else if (range_type == HL_VA_RANGE_TYPE_HOST)
range_cross =
hl_mem_area_crosses_range(start_addr, size,
prop->hints_host_reserved_va_range.start_addr,
prop->hints_host_reserved_va_range.end_addr);
else
range_cross =
hl_mem_area_crosses_range(start_addr, size,
prop->hints_host_hpage_reserved_va_range.start_addr,
prop->hints_host_hpage_reserved_va_range.end_addr);
return range_cross;
}
/** /**
* get_va_block() - get a virtual block for the given size and alignment. * get_va_block() - get a virtual block for the given size and alignment.
* *
...@@ -536,6 +563,8 @@ static inline int add_va_block(struct hl_device *hdev, ...@@ -536,6 +563,8 @@ static inline int add_va_block(struct hl_device *hdev,
* @size: requested block size. * @size: requested block size.
* @hint_addr: hint for requested address by the user. * @hint_addr: hint for requested address by the user.
* @va_block_align: required alignment of the virtual block start address. * @va_block_align: required alignment of the virtual block start address.
* @range_type: va range type (host, dram)
* @flags: additional memory flags, currently only uses HL_MEM_FORCE_HINT
* *
* This function does the following: * This function does the following:
* - Iterate on the virtual block list to find a suitable virtual block for the * - Iterate on the virtual block list to find a suitable virtual block for the
...@@ -545,13 +574,19 @@ static inline int add_va_block(struct hl_device *hdev, ...@@ -545,13 +574,19 @@ static inline int add_va_block(struct hl_device *hdev,
*/ */
static u64 get_va_block(struct hl_device *hdev, static u64 get_va_block(struct hl_device *hdev,
struct hl_va_range *va_range, struct hl_va_range *va_range,
u64 size, u64 hint_addr, u32 va_block_align) u64 size, u64 hint_addr, u32 va_block_align,
enum hl_va_range_type range_type,
u32 flags)
{ {
struct hl_vm_va_block *va_block, *new_va_block = NULL; struct hl_vm_va_block *va_block, *new_va_block = NULL;
struct asic_fixed_properties *prop = &hdev->asic_prop;
u64 tmp_hint_addr, valid_start, valid_size, prev_start, prev_end, u64 tmp_hint_addr, valid_start, valid_size, prev_start, prev_end,
align_mask, reserved_valid_start = 0, reserved_valid_size = 0; align_mask, reserved_valid_start = 0, reserved_valid_size = 0,
dram_hint_mask = prop->dram_hints_align_mask;
bool add_prev = false; bool add_prev = false;
bool is_align_pow_2 = is_power_of_2(va_range->page_size); bool is_align_pow_2 = is_power_of_2(va_range->page_size);
bool is_hint_dram_addr = hl_is_dram_va(hdev, hint_addr);
bool force_hint = flags & HL_MEM_FORCE_HINT;
if (is_align_pow_2) if (is_align_pow_2)
align_mask = ~((u64)va_block_align - 1); align_mask = ~((u64)va_block_align - 1);
...@@ -564,13 +599,21 @@ static u64 get_va_block(struct hl_device *hdev, ...@@ -564,13 +599,21 @@ static u64 get_va_block(struct hl_device *hdev,
size = DIV_ROUND_UP_ULL(size, va_range->page_size) * size = DIV_ROUND_UP_ULL(size, va_range->page_size) *
va_range->page_size; va_range->page_size;
tmp_hint_addr = hint_addr; tmp_hint_addr = hint_addr & ~dram_hint_mask;
/* Check if we need to ignore hint address */ /* Check if we need to ignore hint address */
if ((is_align_pow_2 && (hint_addr & (va_block_align - 1))) || if ((is_align_pow_2 && (hint_addr & (va_block_align - 1))) ||
(!is_align_pow_2 && (!is_align_pow_2 && is_hint_dram_addr &&
do_div(tmp_hint_addr, va_range->page_size))) { do_div(tmp_hint_addr, va_range->page_size))) {
if (force_hint) {
/* Hint must be respected, so here we just fail */
dev_err(hdev->dev,
"Hint address 0x%llx is not page aligned - cannot be respected\n",
hint_addr);
return 0;
}
dev_dbg(hdev->dev, dev_dbg(hdev->dev,
"Hint address 0x%llx will be ignored because it is not aligned\n", "Hint address 0x%llx will be ignored because it is not aligned\n",
hint_addr); hint_addr);
...@@ -596,6 +639,16 @@ static u64 get_va_block(struct hl_device *hdev, ...@@ -596,6 +639,16 @@ static u64 get_va_block(struct hl_device *hdev,
if (valid_size < size) if (valid_size < size)
continue; continue;
/*
* In case hint address is 0, and arc_hints_range_reservation
* property enabled, then avoid allocating va blocks from the
* range reserved for hint addresses
*/
if (prop->hints_range_reservation && !hint_addr)
if (is_hint_crossing_range(range_type, valid_start,
size, prop))
continue;
/* Pick the minimal length block which has the required size */ /* Pick the minimal length block which has the required size */
if (!new_va_block || (valid_size < reserved_valid_size)) { if (!new_va_block || (valid_size < reserved_valid_size)) {
new_va_block = va_block; new_va_block = va_block;
...@@ -618,6 +671,17 @@ static u64 get_va_block(struct hl_device *hdev, ...@@ -618,6 +671,17 @@ static u64 get_va_block(struct hl_device *hdev,
goto out; goto out;
} }
if (force_hint && reserved_valid_start != hint_addr) {
/* Hint address must be respected. If we are here - this means
* we could not respect it.
*/
dev_err(hdev->dev,
"Hint address 0x%llx could not be respected\n",
hint_addr);
reserved_valid_start = 0;
goto out;
}
/* /*
* Check if there is some leftover range due to reserving the new * Check if there is some leftover range due to reserving the new
* va block, then return it to the main virtual addresses list. * va block, then return it to the main virtual addresses list.
...@@ -670,7 +734,8 @@ u64 hl_reserve_va_block(struct hl_device *hdev, struct hl_ctx *ctx, ...@@ -670,7 +734,8 @@ u64 hl_reserve_va_block(struct hl_device *hdev, struct hl_ctx *ctx,
enum hl_va_range_type type, u32 size, u32 alignment) enum hl_va_range_type type, u32 size, u32 alignment)
{ {
return get_va_block(hdev, ctx->va_range[type], size, 0, return get_va_block(hdev, ctx->va_range[type], size, 0,
max(alignment, ctx->va_range[type]->page_size)); max(alignment, ctx->va_range[type]->page_size),
type, 0);
} }
/** /**
...@@ -731,29 +796,16 @@ int hl_unreserve_va_block(struct hl_device *hdev, struct hl_ctx *ctx, ...@@ -731,29 +796,16 @@ int hl_unreserve_va_block(struct hl_device *hdev, struct hl_ctx *ctx,
return rc; return rc;
} }
/**
* get_sg_info() - get number of pages and the DMA address from SG list.
* @sg: the SG list.
* @dma_addr: pointer to DMA address to return.
*
* Calculate the number of consecutive pages described by the SG list. Take the
* offset of the address in the first page, add to it the length and round it up
* to the number of needed pages.
*/
static u32 get_sg_info(struct scatterlist *sg, dma_addr_t *dma_addr)
{
*dma_addr = sg_dma_address(sg);
return ((((*dma_addr) & (PAGE_SIZE - 1)) + sg_dma_len(sg)) +
(PAGE_SIZE - 1)) >> PAGE_SHIFT;
}
/** /**
* init_phys_pg_pack_from_userptr() - initialize physical page pack from host * init_phys_pg_pack_from_userptr() - initialize physical page pack from host
* memory * memory
* @ctx: pointer to the context structure. * @ctx: pointer to the context structure.
* @userptr: userptr to initialize from. * @userptr: userptr to initialize from.
* @pphys_pg_pack: result pointer. * @pphys_pg_pack: result pointer.
* @force_regular_page: tell the function to ignore huge page optimization,
* even if possible. Needed for cases where the device VA
* is allocated before we know the composition of the
* physical pages
* *
* This function does the following: * This function does the following:
* - Pin the physical pages related to the given virtual block. * - Pin the physical pages related to the given virtual block.
...@@ -762,17 +814,18 @@ static u32 get_sg_info(struct scatterlist *sg, dma_addr_t *dma_addr) ...@@ -762,17 +814,18 @@ static u32 get_sg_info(struct scatterlist *sg, dma_addr_t *dma_addr)
*/ */
static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx, static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx,
struct hl_userptr *userptr, struct hl_userptr *userptr,
struct hl_vm_phys_pg_pack **pphys_pg_pack) struct hl_vm_phys_pg_pack **pphys_pg_pack,
bool force_regular_page)
{ {
u32 npages, page_size = PAGE_SIZE,
huge_page_size = ctx->hdev->asic_prop.pmmu_huge.page_size;
u32 pgs_in_huge_page = huge_page_size >> __ffs(page_size);
struct hl_vm_phys_pg_pack *phys_pg_pack; struct hl_vm_phys_pg_pack *phys_pg_pack;
bool first = true, is_huge_page_opt;
u64 page_mask, total_npages;
struct scatterlist *sg; struct scatterlist *sg;
dma_addr_t dma_addr; dma_addr_t dma_addr;
u64 page_mask, total_npages;
u32 npages, page_size = PAGE_SIZE,
huge_page_size = ctx->hdev->asic_prop.pmmu_huge.page_size;
bool first = true, is_huge_page_opt = true;
int rc, i, j; int rc, i, j;
u32 pgs_in_huge_page = huge_page_size >> __ffs(page_size);
phys_pg_pack = kzalloc(sizeof(*phys_pg_pack), GFP_KERNEL); phys_pg_pack = kzalloc(sizeof(*phys_pg_pack), GFP_KERNEL);
if (!phys_pg_pack) if (!phys_pg_pack)
...@@ -783,6 +836,8 @@ static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx, ...@@ -783,6 +836,8 @@ static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx,
phys_pg_pack->asid = ctx->asid; phys_pg_pack->asid = ctx->asid;
atomic_set(&phys_pg_pack->mapping_cnt, 1); atomic_set(&phys_pg_pack->mapping_cnt, 1);
is_huge_page_opt = (force_regular_page ? false : true);
/* Only if all dma_addrs are aligned to 2MB and their /* Only if all dma_addrs are aligned to 2MB and their
* sizes is at least 2MB, we can use huge page mapping. * sizes is at least 2MB, we can use huge page mapping.
* We limit the 2MB optimization to this condition, * We limit the 2MB optimization to this condition,
...@@ -791,7 +846,7 @@ static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx, ...@@ -791,7 +846,7 @@ static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx,
*/ */
total_npages = 0; total_npages = 0;
for_each_sg(userptr->sgt->sgl, sg, userptr->sgt->nents, i) { for_each_sg(userptr->sgt->sgl, sg, userptr->sgt->nents, i) {
npages = get_sg_info(sg, &dma_addr); npages = hl_get_sg_info(sg, &dma_addr);
total_npages += npages; total_npages += npages;
...@@ -820,7 +875,7 @@ static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx, ...@@ -820,7 +875,7 @@ static int init_phys_pg_pack_from_userptr(struct hl_ctx *ctx,
j = 0; j = 0;
for_each_sg(userptr->sgt->sgl, sg, userptr->sgt->nents, i) { for_each_sg(userptr->sgt->sgl, sg, userptr->sgt->nents, i) {
npages = get_sg_info(sg, &dma_addr); npages = hl_get_sg_info(sg, &dma_addr);
/* align down to physical page size and save the offset */ /* align down to physical page size and save the offset */
if (first) { if (first) {
...@@ -1001,11 +1056,12 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, ...@@ -1001,11 +1056,12 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
struct hl_userptr *userptr = NULL; struct hl_userptr *userptr = NULL;
struct hl_vm_hash_node *hnode; struct hl_vm_hash_node *hnode;
struct hl_va_range *va_range; struct hl_va_range *va_range;
enum vm_type_t *vm_type; enum vm_type *vm_type;
u64 ret_vaddr, hint_addr; u64 ret_vaddr, hint_addr;
u32 handle = 0, va_block_align; u32 handle = 0, va_block_align;
int rc; int rc;
bool is_userptr = args->flags & HL_MEM_USERPTR; bool is_userptr = args->flags & HL_MEM_USERPTR;
enum hl_va_range_type va_range_type = 0;
/* Assume failure */ /* Assume failure */
*device_addr = 0; *device_addr = 0;
...@@ -1023,7 +1079,7 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, ...@@ -1023,7 +1079,7 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
} }
rc = init_phys_pg_pack_from_userptr(ctx, userptr, rc = init_phys_pg_pack_from_userptr(ctx, userptr,
&phys_pg_pack); &phys_pg_pack, false);
if (rc) { if (rc) {
dev_err(hdev->dev, dev_err(hdev->dev,
"unable to init page pack for vaddr 0x%llx\n", "unable to init page pack for vaddr 0x%llx\n",
...@@ -1031,14 +1087,14 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, ...@@ -1031,14 +1087,14 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
goto init_page_pack_err; goto init_page_pack_err;
} }
vm_type = (enum vm_type_t *) userptr; vm_type = (enum vm_type *) userptr;
hint_addr = args->map_host.hint_addr; hint_addr = args->map_host.hint_addr;
handle = phys_pg_pack->handle; handle = phys_pg_pack->handle;
/* get required alignment */ /* get required alignment */
if (phys_pg_pack->page_size == page_size) { if (phys_pg_pack->page_size == page_size) {
va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST]; va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST];
va_range_type = HL_VA_RANGE_TYPE_HOST;
/* /*
* huge page alignment may be needed in case of regular * huge page alignment may be needed in case of regular
* page mapping, depending on the host VA alignment * page mapping, depending on the host VA alignment
...@@ -1053,6 +1109,7 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, ...@@ -1053,6 +1109,7 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
* mapping * mapping
*/ */
va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE]; va_range = ctx->va_range[HL_VA_RANGE_TYPE_HOST_HUGE];
va_range_type = HL_VA_RANGE_TYPE_HOST_HUGE;
va_block_align = huge_page_size; va_block_align = huge_page_size;
} }
} else { } else {
...@@ -1072,12 +1129,13 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, ...@@ -1072,12 +1129,13 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
spin_unlock(&vm->idr_lock); spin_unlock(&vm->idr_lock);
vm_type = (enum vm_type_t *) phys_pg_pack; vm_type = (enum vm_type *) phys_pg_pack;
hint_addr = args->map_device.hint_addr; hint_addr = args->map_device.hint_addr;
/* DRAM VA alignment is the same as the MMU page size */ /* DRAM VA alignment is the same as the MMU page size */
va_range = ctx->va_range[HL_VA_RANGE_TYPE_DRAM]; va_range = ctx->va_range[HL_VA_RANGE_TYPE_DRAM];
va_range_type = HL_VA_RANGE_TYPE_DRAM;
va_block_align = hdev->asic_prop.dmmu.page_size; va_block_align = hdev->asic_prop.dmmu.page_size;
} }
...@@ -1100,8 +1158,23 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, ...@@ -1100,8 +1158,23 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
goto hnode_err; goto hnode_err;
} }
if (hint_addr && phys_pg_pack->offset) {
if (args->flags & HL_MEM_FORCE_HINT) {
/* Fail if hint must be respected but it can't be */
dev_err(hdev->dev,
"Hint address 0x%llx cannot be respected because source memory is not aligned 0x%x\n",
hint_addr, phys_pg_pack->offset);
rc = -EINVAL;
goto va_block_err;
}
dev_dbg(hdev->dev,
"Hint address 0x%llx will be ignored because source memory is not aligned 0x%x\n",
hint_addr, phys_pg_pack->offset);
}
ret_vaddr = get_va_block(hdev, va_range, phys_pg_pack->total_size, ret_vaddr = get_va_block(hdev, va_range, phys_pg_pack->total_size,
hint_addr, va_block_align); hint_addr, va_block_align,
va_range_type, args->flags);
if (!ret_vaddr) { if (!ret_vaddr) {
dev_err(hdev->dev, "no available va block for handle %u\n", dev_err(hdev->dev, "no available va block for handle %u\n",
handle); handle);
...@@ -1181,17 +1254,19 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, ...@@ -1181,17 +1254,19 @@ static int map_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
bool ctx_free) bool ctx_free)
{ {
struct hl_device *hdev = ctx->hdev;
struct asic_fixed_properties *prop = &hdev->asic_prop;
struct hl_vm_phys_pg_pack *phys_pg_pack = NULL; struct hl_vm_phys_pg_pack *phys_pg_pack = NULL;
u64 vaddr = args->unmap.device_virt_addr;
struct hl_vm_hash_node *hnode = NULL; struct hl_vm_hash_node *hnode = NULL;
struct asic_fixed_properties *prop;
struct hl_device *hdev = ctx->hdev;
struct hl_userptr *userptr = NULL; struct hl_userptr *userptr = NULL;
struct hl_va_range *va_range; struct hl_va_range *va_range;
u64 vaddr = args->unmap.device_virt_addr; enum vm_type *vm_type;
enum vm_type_t *vm_type;
bool is_userptr; bool is_userptr;
int rc = 0; int rc = 0;
prop = &hdev->asic_prop;
/* protect from double entrance */ /* protect from double entrance */
mutex_lock(&ctx->mem_hash_lock); mutex_lock(&ctx->mem_hash_lock);
hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)vaddr) hash_for_each_possible(ctx->mem_hash, hnode, node, (unsigned long)vaddr)
...@@ -1214,8 +1289,9 @@ static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, ...@@ -1214,8 +1289,9 @@ static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
if (*vm_type == VM_TYPE_USERPTR) { if (*vm_type == VM_TYPE_USERPTR) {
is_userptr = true; is_userptr = true;
userptr = hnode->ptr; userptr = hnode->ptr;
rc = init_phys_pg_pack_from_userptr(ctx, userptr,
&phys_pg_pack); rc = init_phys_pg_pack_from_userptr(ctx, userptr, &phys_pg_pack,
false);
if (rc) { if (rc) {
dev_err(hdev->dev, dev_err(hdev->dev,
"unable to init page pack for vaddr 0x%llx\n", "unable to init page pack for vaddr 0x%llx\n",
...@@ -1299,7 +1375,7 @@ static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args, ...@@ -1299,7 +1375,7 @@ static int unmap_device_va(struct hl_ctx *ctx, struct hl_mem_in *args,
kfree(hnode); kfree(hnode);
if (is_userptr) { if (is_userptr) {
rc = free_phys_pg_pack(hdev, phys_pg_pack); free_phys_pg_pack(hdev, phys_pg_pack);
dma_unmap_host_va(hdev, userptr); dma_unmap_host_va(hdev, userptr);
} }
...@@ -1669,6 +1745,7 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size, ...@@ -1669,6 +1745,7 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
return -EINVAL; return -EINVAL;
} }
userptr->pid = current->pid;
userptr->sgt = kzalloc(sizeof(*userptr->sgt), GFP_KERNEL); userptr->sgt = kzalloc(sizeof(*userptr->sgt), GFP_KERNEL);
if (!userptr->sgt) if (!userptr->sgt)
return -ENOMEM; return -ENOMEM;
...@@ -2033,7 +2110,7 @@ void hl_vm_ctx_fini(struct hl_ctx *ctx) ...@@ -2033,7 +2110,7 @@ void hl_vm_ctx_fini(struct hl_ctx *ctx)
* another side effect error * another side effect error
*/ */
if (!hdev->hard_reset_pending && !hash_empty(ctx->mem_hash)) if (!hdev->hard_reset_pending && !hash_empty(ctx->mem_hash))
dev_notice(hdev->dev, dev_dbg(hdev->dev,
"user released device without removing its memory mappings\n"); "user released device without removing its memory mappings\n");
hash_for_each_safe(ctx->mem_hash, i, tmp_node, hnode, node) { hash_for_each_safe(ctx->mem_hash, i, tmp_node, hnode, node) {
......
...@@ -470,14 +470,14 @@ static void hl_mmu_v1_fini(struct hl_device *hdev) ...@@ -470,14 +470,14 @@ static void hl_mmu_v1_fini(struct hl_device *hdev)
if (!ZERO_OR_NULL_PTR(hdev->mmu_priv.hr.mmu_shadow_hop0)) { if (!ZERO_OR_NULL_PTR(hdev->mmu_priv.hr.mmu_shadow_hop0)) {
kvfree(hdev->mmu_priv.dr.mmu_shadow_hop0); kvfree(hdev->mmu_priv.dr.mmu_shadow_hop0);
gen_pool_destroy(hdev->mmu_priv.dr.mmu_pgt_pool); gen_pool_destroy(hdev->mmu_priv.dr.mmu_pgt_pool);
}
/* Make sure that if we arrive here again without init was called we /* Make sure that if we arrive here again without init was
* won't cause kernel panic. This can happen for example if we fail * called we won't cause kernel panic. This can happen for
* during hard reset code at certain points * example if we fail during hard reset code at certain points
*/ */
hdev->mmu_priv.dr.mmu_shadow_hop0 = NULL; hdev->mmu_priv.dr.mmu_shadow_hop0 = NULL;
} }
}
/** /**
* hl_mmu_ctx_init() - initialize a context for using the MMU module. * hl_mmu_ctx_init() - initialize a context for using the MMU module.
......
...@@ -436,6 +436,8 @@ int hl_pci_init(struct hl_device *hdev) ...@@ -436,6 +436,8 @@ int hl_pci_init(struct hl_device *hdev)
goto unmap_pci_bars; goto unmap_pci_bars;
} }
dma_set_max_seg_size(&pdev->dev, U32_MAX);
return 0; return 0;
unmap_pci_bars: unmap_pci_bars:
......
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright 2021 HabanaLabs, Ltd.
* All Rights Reserved.
*/
#include <linux/vmalloc.h>
#include <uapi/misc/habanalabs.h>
#include "habanalabs.h"
/**
* hl_format_as_binary - helper function, format an integer as binary
* using supplied scratch buffer
* @buf: the buffer to use
* @buf_len: buffer capacity
* @n: number to format
*
* Returns pointer to buffer
*/
char *hl_format_as_binary(char *buf, size_t buf_len, u32 n)
{
int i;
u32 bit;
bool leading0 = true;
char *wrptr = buf;
if (buf_len > 0 && buf_len < 3) {
*wrptr = '\0';
return buf;
}
wrptr[0] = '0';
wrptr[1] = 'b';
wrptr += 2;
/* Remove 3 characters from length for '0b' and '\0' termination */
buf_len -= 3;
for (i = 0; i < sizeof(n) * BITS_PER_BYTE && buf_len; ++i, n <<= 1) {
/* Writing bit calculation in one line would cause a false
* positive static code analysis error, so splitting.
*/
bit = n & (1 << (sizeof(n) * BITS_PER_BYTE - 1));
bit = !!bit;
leading0 &= !bit;
if (!leading0) {
*wrptr = '0' + bit;
++wrptr;
}
}
*wrptr = '\0';
return buf;
}
/**
* resize_to_fit - helper function, resize buffer to fit given amount of data
* @buf: destination buffer double pointer
* @size: pointer to the size container
* @desired_size: size the buffer must contain
*
* Returns 0 on success or error code on failure.
* On success, the size of buffer is at least desired_size. Buffer is allocated
* via vmalloc and must be freed with vfree.
*/
static int resize_to_fit(char **buf, size_t *size, size_t desired_size)
{
char *resized_buf;
size_t new_size;
if (*size >= desired_size)
return 0;
/* Not enough space to print all, have to resize */
new_size = max_t(size_t, PAGE_SIZE, round_up(desired_size, PAGE_SIZE));
resized_buf = vmalloc(new_size);
if (!resized_buf)
return -ENOMEM;
memcpy(resized_buf, *buf, *size);
vfree(*buf);
*buf = resized_buf;
*size = new_size;
return 1;
}
/**
* hl_snprintf_resize() - print formatted data to buffer, resize as needed
* @buf: buffer double pointer, to be written to and resized, must be either
* NULL or allocated with vmalloc.
* @size: current size of the buffer
* @offset: current offset to write to
* @format: format of the data
*
* This function will write formatted data into the buffer. If buffer is not
* large enough, it will be resized using vmalloc. Size may be modified if the
* buffer was resized, offset will be advanced by the number of bytes written
* not including the terminating character
*
* Returns 0 on success or error code on failure
*
* Note that the buffer has to be manually released using vfree.
*/
int hl_snprintf_resize(char **buf, size_t *size, size_t *offset,
const char *format, ...)
{
va_list args;
size_t length;
int rc;
if (*buf == NULL && (*size != 0 || *offset != 0))
return -EINVAL;
va_start(args, format);
length = vsnprintf(*buf + *offset, *size - *offset, format, args);
va_end(args);
rc = resize_to_fit(buf, size, *offset + length + 1);
if (rc < 0)
return rc;
else if (rc > 0) {
/* Resize was needed, write again */
va_start(args, format);
length = vsnprintf(*buf + *offset, *size - *offset, format,
args);
va_end(args);
}
*offset += length;
return 0;
}
/**
* hl_sync_engine_to_string - convert engine type enum to string literal
* @engine_type: engine type (TPC/MME/DMA)
*
* Return the resolved string literal
*/
const char *hl_sync_engine_to_string(enum hl_sync_engine_type engine_type)
{
switch (engine_type) {
case ENGINE_DMA:
return "DMA";
case ENGINE_MME:
return "MME";
case ENGINE_TPC:
return "TPC";
}
return "Invalid Engine Type";
}
/**
* hl_print_resize_sync_engine - helper function, format engine name and ID
* using hl_snprintf_resize
* @buf: destination buffer double pointer to be used with hl_snprintf_resize
* @size: pointer to the size container
* @offset: pointer to the offset container
* @engine_type: engine type (TPC/MME/DMA)
* @engine_id: engine numerical id
*
* Returns 0 on success or error code on failure
*/
static int hl_print_resize_sync_engine(char **buf, size_t *size, size_t *offset,
enum hl_sync_engine_type engine_type,
u32 engine_id)
{
return hl_snprintf_resize(buf, size, offset, "%s%u",
hl_sync_engine_to_string(engine_type), engine_id);
}
/**
* hl_state_dump_get_sync_name - transform sync object id to name if available
* @hdev: pointer to the device
* @sync_id: sync object id
*
* Returns a name literal or NULL if not resolved.
* Note: returning NULL shall not be considered as a failure, as not all
* sync objects are named.
*/
const char *hl_state_dump_get_sync_name(struct hl_device *hdev, u32 sync_id)
{
struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
struct hl_hw_obj_name_entry *entry;
hash_for_each_possible(sds->so_id_to_str_tb, entry,
node, sync_id)
if (sync_id == entry->id)
return entry->name;
return NULL;
}
/**
* hl_state_dump_get_monitor_name - transform monitor object dump to monitor
* name if available
* @hdev: pointer to the device
* @mon: monitor state dump
*
* Returns a name literal or NULL if not resolved.
* Note: returning NULL shall not be considered as a failure, as not all
* monitors are named.
*/
const char *hl_state_dump_get_monitor_name(struct hl_device *hdev,
struct hl_mon_state_dump *mon)
{
struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
struct hl_hw_obj_name_entry *entry;
hash_for_each_possible(sds->monitor_id_to_str_tb,
entry, node, mon->id)
if (mon->id == entry->id)
return entry->name;
return NULL;
}
/**
* hl_state_dump_free_sync_to_engine_map - free sync object to engine map
* @map: sync object to engine map
*
* Note: generic free implementation, the allocation is implemented per ASIC.
*/
void hl_state_dump_free_sync_to_engine_map(struct hl_sync_to_engine_map *map)
{
struct hl_sync_to_engine_map_entry *entry;
struct hlist_node *tmp_node;
int i;
hash_for_each_safe(map->tb, i, tmp_node, entry, node) {
hash_del(&entry->node);
kfree(entry);
}
}
/**
* hl_state_dump_get_sync_to_engine - transform sync_id to
* hl_sync_to_engine_map_entry if available for current id
* @map: sync object to engine map
* @sync_id: sync object id
*
* Returns the translation entry if found or NULL if not.
* Note, returned NULL shall not be considered as a failure as the map
* does not cover all possible, it is a best effort sync ids.
*/
static struct hl_sync_to_engine_map_entry *
hl_state_dump_get_sync_to_engine(struct hl_sync_to_engine_map *map, u32 sync_id)
{
struct hl_sync_to_engine_map_entry *entry;
hash_for_each_possible(map->tb, entry, node, sync_id)
if (entry->sync_id == sync_id)
return entry;
return NULL;
}
/**
* hl_state_dump_read_sync_objects - read sync objects array
* @hdev: pointer to the device
* @index: sync manager block index starting with E_N
*
* Returns array of size SP_SYNC_OBJ_AMOUNT on success or NULL on failure
*/
static u32 *hl_state_dump_read_sync_objects(struct hl_device *hdev, u32 index)
{
struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
u32 *sync_objects;
s64 base_addr; /* Base addr can be negative */
int i;
base_addr = sds->props[SP_SYNC_OBJ_BASE_ADDR] +
sds->props[SP_NEXT_SYNC_OBJ_ADDR] * index;
sync_objects = vmalloc(sds->props[SP_SYNC_OBJ_AMOUNT] * sizeof(u32));
if (!sync_objects)
return NULL;
for (i = 0; i < sds->props[SP_SYNC_OBJ_AMOUNT]; ++i)
sync_objects[i] = RREG32(base_addr + i * sizeof(u32));
return sync_objects;
}
/**
* hl_state_dump_free_sync_objects - free sync objects array allocated by
* hl_state_dump_read_sync_objects
* @sync_objects: sync objects array
*/
static void hl_state_dump_free_sync_objects(u32 *sync_objects)
{
vfree(sync_objects);
}
/**
* hl_state_dump_print_syncs_single_block - print active sync objects on a
* single block
* @hdev: pointer to the device
* @index: sync manager block index starting with E_N
* @buf: destination buffer double pointer to be used with hl_snprintf_resize
* @size: pointer to the size container
* @offset: pointer to the offset container
* @map: sync engines names map
*
* Returns 0 on success or error code on failure
*/
static int
hl_state_dump_print_syncs_single_block(struct hl_device *hdev, u32 index,
char **buf, size_t *size, size_t *offset,
struct hl_sync_to_engine_map *map)
{
struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
const char *sync_name;
u32 *sync_objects = NULL;
int rc = 0, i;
if (sds->sync_namager_names) {
rc = hl_snprintf_resize(
buf, size, offset, "%s\n",
sds->sync_namager_names[index]);
if (rc)
goto out;
}
sync_objects = hl_state_dump_read_sync_objects(hdev, index);
if (!sync_objects) {
rc = -ENOMEM;
goto out;
}
for (i = 0; i < sds->props[SP_SYNC_OBJ_AMOUNT]; ++i) {
struct hl_sync_to_engine_map_entry *entry;
u64 sync_object_addr;
if (!sync_objects[i])
continue;
sync_object_addr = sds->props[SP_SYNC_OBJ_BASE_ADDR] +
sds->props[SP_NEXT_SYNC_OBJ_ADDR] * index +
i * sizeof(u32);
rc = hl_snprintf_resize(buf, size, offset, "sync id: %u", i);
if (rc)
goto free_sync_objects;
sync_name = hl_state_dump_get_sync_name(hdev, i);
if (sync_name) {
rc = hl_snprintf_resize(buf, size, offset, " %s",
sync_name);
if (rc)
goto free_sync_objects;
}
rc = hl_snprintf_resize(buf, size, offset, ", value: %u",
sync_objects[i]);
if (rc)
goto free_sync_objects;
/* Append engine string */
entry = hl_state_dump_get_sync_to_engine(map,
(u32)sync_object_addr);
if (entry) {
rc = hl_snprintf_resize(buf, size, offset,
", Engine: ");
if (rc)
goto free_sync_objects;
rc = hl_print_resize_sync_engine(buf, size, offset,
entry->engine_type,
entry->engine_id);
if (rc)
goto free_sync_objects;
}
rc = hl_snprintf_resize(buf, size, offset, "\n");
if (rc)
goto free_sync_objects;
}
free_sync_objects:
hl_state_dump_free_sync_objects(sync_objects);
out:
return rc;
}
/**
* hl_state_dump_print_syncs - print active sync objects
* @hdev: pointer to the device
* @buf: destination buffer double pointer to be used with hl_snprintf_resize
* @size: pointer to the size container
* @offset: pointer to the offset container
*
* Returns 0 on success or error code on failure
*/
static int hl_state_dump_print_syncs(struct hl_device *hdev,
char **buf, size_t *size,
size_t *offset)
{
struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
struct hl_sync_to_engine_map *map;
u32 index;
int rc = 0;
map = kzalloc(sizeof(*map), GFP_KERNEL);
if (!map)
return -ENOMEM;
rc = sds->funcs.gen_sync_to_engine_map(hdev, map);
if (rc)
goto free_map_mem;
rc = hl_snprintf_resize(buf, size, offset, "Non zero sync objects:\n");
if (rc)
goto out;
if (sds->sync_namager_names) {
for (index = 0; sds->sync_namager_names[index]; ++index) {
rc = hl_state_dump_print_syncs_single_block(
hdev, index, buf, size, offset, map);
if (rc)
goto out;
}
} else {
for (index = 0; index < sds->props[SP_NUM_CORES]; ++index) {
rc = hl_state_dump_print_syncs_single_block(
hdev, index, buf, size, offset, map);
if (rc)
goto out;
}
}
out:
hl_state_dump_free_sync_to_engine_map(map);
free_map_mem:
kfree(map);
return rc;
}
/**
* hl_state_dump_alloc_read_sm_block_monitors - read monitors for a specific
* block
* @hdev: pointer to the device
* @index: sync manager block index starting with E_N
*
* Returns an array of monitor data of size SP_MONITORS_AMOUNT or NULL
* on error
*/
static struct hl_mon_state_dump *
hl_state_dump_alloc_read_sm_block_monitors(struct hl_device *hdev, u32 index)
{
struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
struct hl_mon_state_dump *monitors;
s64 base_addr; /* Base addr can be negative */
int i;
monitors = vmalloc(sds->props[SP_MONITORS_AMOUNT] *
sizeof(struct hl_mon_state_dump));
if (!monitors)
return NULL;
base_addr = sds->props[SP_NEXT_SYNC_OBJ_ADDR] * index;
for (i = 0; i < sds->props[SP_MONITORS_AMOUNT]; ++i) {
monitors[i].id = i;
monitors[i].wr_addr_low =
RREG32(base_addr + sds->props[SP_MON_OBJ_WR_ADDR_LOW] +
i * sizeof(u32));
monitors[i].wr_addr_high =
RREG32(base_addr + sds->props[SP_MON_OBJ_WR_ADDR_HIGH] +
i * sizeof(u32));
monitors[i].wr_data =
RREG32(base_addr + sds->props[SP_MON_OBJ_WR_DATA] +
i * sizeof(u32));
monitors[i].arm_data =
RREG32(base_addr + sds->props[SP_MON_OBJ_ARM_DATA] +
i * sizeof(u32));
monitors[i].status =
RREG32(base_addr + sds->props[SP_MON_OBJ_STATUS] +
i * sizeof(u32));
}
return monitors;
}
/**
* hl_state_dump_free_monitors - free the monitors structure
* @monitors: monitors array created with
* hl_state_dump_alloc_read_sm_block_monitors
*/
static void hl_state_dump_free_monitors(struct hl_mon_state_dump *monitors)
{
vfree(monitors);
}
/**
* hl_state_dump_print_monitors_single_block - print active monitors on a
* single block
* @hdev: pointer to the device
* @index: sync manager block index starting with E_N
* @buf: destination buffer double pointer to be used with hl_snprintf_resize
* @size: pointer to the size container
* @offset: pointer to the offset container
*
* Returns 0 on success or error code on failure
*/
static int hl_state_dump_print_monitors_single_block(struct hl_device *hdev,
u32 index,
char **buf, size_t *size,
size_t *offset)
{
struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
struct hl_mon_state_dump *monitors = NULL;
int rc = 0, i;
if (sds->sync_namager_names) {
rc = hl_snprintf_resize(
buf, size, offset, "%s\n",
sds->sync_namager_names[index]);
if (rc)
goto out;
}
monitors = hl_state_dump_alloc_read_sm_block_monitors(hdev, index);
if (!monitors) {
rc = -ENOMEM;
goto out;
}
for (i = 0; i < sds->props[SP_MONITORS_AMOUNT]; ++i) {
if (!(sds->funcs.monitor_valid(&monitors[i])))
continue;
/* Monitor is valid, dump it */
rc = sds->funcs.print_single_monitor(buf, size, offset, hdev,
&monitors[i]);
if (rc)
goto free_monitors;
hl_snprintf_resize(buf, size, offset, "\n");
}
free_monitors:
hl_state_dump_free_monitors(monitors);
out:
return rc;
}
/**
* hl_state_dump_print_monitors - print active monitors
* @hdev: pointer to the device
* @buf: destination buffer double pointer to be used with hl_snprintf_resize
* @size: pointer to the size container
* @offset: pointer to the offset container
*
* Returns 0 on success or error code on failure
*/
static int hl_state_dump_print_monitors(struct hl_device *hdev,
char **buf, size_t *size,
size_t *offset)
{
struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
u32 index;
int rc = 0;
rc = hl_snprintf_resize(buf, size, offset,
"Valid (armed) monitor objects:\n");
if (rc)
goto out;
if (sds->sync_namager_names) {
for (index = 0; sds->sync_namager_names[index]; ++index) {
rc = hl_state_dump_print_monitors_single_block(
hdev, index, buf, size, offset);
if (rc)
goto out;
}
} else {
for (index = 0; index < sds->props[SP_NUM_CORES]; ++index) {
rc = hl_state_dump_print_monitors_single_block(
hdev, index, buf, size, offset);
if (rc)
goto out;
}
}
out:
return rc;
}
/**
* hl_state_dump_print_engine_fences - print active fences for a specific
* engine
* @hdev: pointer to the device
* @engine_type: engine type to use
* @buf: destination buffer double pointer to be used with hl_snprintf_resize
* @size: pointer to the size container
* @offset: pointer to the offset container
*/
static int
hl_state_dump_print_engine_fences(struct hl_device *hdev,
enum hl_sync_engine_type engine_type,
char **buf, size_t *size, size_t *offset)
{
struct hl_state_dump_specs *sds = &hdev->state_dump_specs;
int rc = 0, i, n_fences;
u64 base_addr, next_fence;
switch (engine_type) {
case ENGINE_TPC:
n_fences = sds->props[SP_NUM_OF_TPC_ENGINES];
base_addr = sds->props[SP_TPC0_CMDQ];
next_fence = sds->props[SP_NEXT_TPC];
break;
case ENGINE_MME:
n_fences = sds->props[SP_NUM_OF_MME_ENGINES];
base_addr = sds->props[SP_MME_CMDQ];
next_fence = sds->props[SP_NEXT_MME];
break;
case ENGINE_DMA:
n_fences = sds->props[SP_NUM_OF_DMA_ENGINES];
base_addr = sds->props[SP_DMA_CMDQ];
next_fence = sds->props[SP_DMA_QUEUES_OFFSET];
break;
default:
return -EINVAL;
}
for (i = 0; i < n_fences; ++i) {
rc = sds->funcs.print_fences_single_engine(
hdev,
base_addr + next_fence * i +
sds->props[SP_FENCE0_CNT_OFFSET],
base_addr + next_fence * i +
sds->props[SP_CP_STS_OFFSET],
engine_type, i, buf, size, offset);
if (rc)
goto out;
}
out:
return rc;
}
/**
* hl_state_dump_print_fences - print active fences
* @hdev: pointer to the device
* @buf: destination buffer double pointer to be used with hl_snprintf_resize
* @size: pointer to the size container
* @offset: pointer to the offset container
*/
static int hl_state_dump_print_fences(struct hl_device *hdev, char **buf,
size_t *size, size_t *offset)
{
int rc = 0;
rc = hl_snprintf_resize(buf, size, offset, "Valid (armed) fences:\n");
if (rc)
goto out;
rc = hl_state_dump_print_engine_fences(hdev, ENGINE_TPC, buf, size, offset);
if (rc)
goto out;
rc = hl_state_dump_print_engine_fences(hdev, ENGINE_MME, buf, size, offset);
if (rc)
goto out;
rc = hl_state_dump_print_engine_fences(hdev, ENGINE_DMA, buf, size, offset);
if (rc)
goto out;
out:
return rc;
}
/**
* hl_state_dump() - dump system state
* @hdev: pointer to device structure
*/
int hl_state_dump(struct hl_device *hdev)
{
char *buf = NULL;
size_t offset = 0, size = 0;
int rc;
rc = hl_snprintf_resize(&buf, &size, &offset,
"Timestamp taken on: %llu\n\n",
ktime_to_ns(ktime_get()));
if (rc)
goto err;
rc = hl_state_dump_print_syncs(hdev, &buf, &size, &offset);
if (rc)
goto err;
hl_snprintf_resize(&buf, &size, &offset, "\n");
rc = hl_state_dump_print_monitors(hdev, &buf, &size, &offset);
if (rc)
goto err;
hl_snprintf_resize(&buf, &size, &offset, "\n");
rc = hl_state_dump_print_fences(hdev, &buf, &size, &offset);
if (rc)
goto err;
hl_snprintf_resize(&buf, &size, &offset, "\n");
hl_debugfs_set_state_dump(hdev, buf, size);
return 0;
err:
vfree(buf);
return rc;
}
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment