Skip to content
  • Huy Nguyen's avatar
    net/mlx5: Cancel health poll before sending panic teardown command · d2aa060d
    Huy Nguyen authored
    After the panic teardown firmware command, health_care detects the error
    in PCI bus and calls the mlx5_pci_err_detected. This health_care flow is
    no longer needed because the panic teardown firmware command will bring
    down the PCI bus communication with the HCA.
    
    The solution is to cancel the health care timer and its pending
    workqueue request before sending panic teardown firmware command.
    
    Kernel trace:
    mlx5_core 0033:01:00.0: Shutdown was called
    mlx5_core 0033:01:00.0: health_care:154:(pid 9304): handling bad device here
    mlx5_core 0033:01:00.0: mlx5_handle_bad_state:114:(pid 9304): NIC state 1
    mlx5_core 0033:01:00.0: mlx5_pci_err_detected was called
    mlx5_core 0033:01:00.0: mlx5_enter_error_state:96:(pid 9304): start
    mlx5_3:mlx5_ib_event:3061:(pid 9304): warning: event on port 0
    mlx5_core 0033:01:00.0: mlx5_enter_error_state:104:(pid 9304): end
    Unable to handle kernel paging request for data at address 0x0000003f
    Faulting instruction address: 0xc0080000434b8c80
    
    Fixes: 8812c24d
    
     ('net/mlx5: Add fast unload support in shutdown flow')
    Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
    Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
    Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
    d2aa060d