Skip to content
  • Michal Hocko's avatar
    mm: make mm_percpu_wq non freezable · 80d136e1
    Michal Hocko authored
    Geert has reported a freeze during PM resume and some additional
    debugging has shown that the device_resume worker cannot make a forward
    progress because it waits for an event which is stuck waiting in
    drain_all_pages:
    
      INFO: task kworker/u4:0:5 blocked for more than 120 seconds.
            Not tainted 4.11.0-rc7-koelsch-00029-g005882e5-dirty #3476
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kworker/u4:0    D    0     5      2 0x00000000
      Workqueue: events_unbound async_run_entry_fn
        __schedule
        schedule
        schedule_timeout
        wait_for_common
        dpm_wait_for_superior
        device_resume
        async_resume
        async_run_entry_fn
        process_one_work
        worker_thread
        kthread
      [...]
      bash            D    0  1703   1694 0x00000000
        __schedule
        schedule
        schedule_timeout
        wait_for_common
        flush_work
        drain_all_pages
        start_isolate_page_range
        alloc_contig_range
        cma_alloc
        __alloc_from_contiguous
        cma_allocator_alloc
        __dma_alloc
        arm_dma_alloc
        sh_eth_ring_init
        sh_eth_open
        sh_eth_resume
        dpm_run_callback
        device_resume
        dpm_resume
        dpm_resume_end
        suspend_devices_and_enter
        pm_suspend
        state_store
        kernfs_fop_write
        __vfs_write
        vfs_write
        SyS_write
      [...]
      Showing busy workqueues and worker pools:
      [...]
      workqueue mm_percpu_wq: flags=0xc
        pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=0/0
          delayed: drain_local_pages_wq, vmstat_update
        pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=0/0
          delayed: drain_local_pages_wq BAR(1703), vmstat_update
    
    Tetsuo has properly noted that mm_percpu_wq is created as WQ_FREEZABLE
    so it is frozen this early during resume so we are effectively
    deadlocked.  Fix this by dropping WQ_FREEZABLE when creating
    mm_percpu_wq.  We really want to have it operational all the time.
    
    Fixes: ce612879
    
     ("mm: move pcp and lru-pcp draining into single wq")
    Reported-and-tested-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
    Debugged-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
    Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    80d136e1