Skip to content
Snippets Groups Projects
  1. Jan 31, 2020
    • Theodore Ts'o's avatar
      memcg: fix a crash in wb_workfn when a device disappears · 68f23b89
      Theodore Ts'o authored
      Without memcg, there is a one-to-one mapping between the bdi and
      bdi_writeback structures.  In this world, things are fairly
      straightforward; the first thing bdi_unregister() does is to shutdown
      the bdi_writeback structure (or wb), and part of that writeback ensures
      that no other work queued against the wb, and that the wb is fully
      drained.
      
      With memcg, however, there is a one-to-many relationship between the bdi
      and bdi_writeback structures; that is, there are multiple wb objects
      which can all point to a single bdi.  There is a refcount which prevents
      the bdi object from being released (and hence, unregistered).  So in
      theory, the bdi_unregister() *should* only get called once its refcount
      goes to zero (bdi_put will drop the refcount, and when it is zero,
      release_bdi gets called, which calls bdi_unregister).
      
      Unfortunately, del_gendisk() in block/gen_hd.c never got the memo about
      the Brave New memcg World, and calls bdi_unregister directly.  It does
      this without informing the file system, or the memcg code, or anything
      else.  This causes the root wb associated with the bdi to be
      unregistered, but none of the memcg-specific wb's are shutdown.  So when
      one of these wb's are woken up to do delayed work, they try to
      dereference their wb->bdi->dev to fetch the device name, but
      unfortunately bdi->dev is now NULL, thanks to the bdi_unregister()
      called by del_gendisk().  As a result, *boom*.
      
      Fortunately, it looks like the rest of the writeback path is perfectly
      happy with bdi->dev and bdi->owner being NULL, so the simplest fix is to
      create a bdi_dev_name() function which can handle bdi->dev being NULL.
      This also allows us to bulletproof the writeback tracepoints to prevent
      them from dereferencing a NULL pointer and crashing the kernel if one is
      tracing with memcg's enabled, and an iSCSI device dies or a USB storage
      stick is pulled.
      
      The most common way of triggering this will be hotremoval of a device
      while writeback with memcg enabled is going on.  It was triggering
      several times a day in a heavily loaded production environment.
      
      Google Bug Id: 145475544
      
      Link: https://lore.kernel.org/r/20191227194829.150110-1-tytso@mit.edu
      Link: http://lkml.kernel.org/r/20191228005211.163952-1-tytso@mit.edu
      
      
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      68f23b89
  2. Oct 06, 2019
    • Mika Westerberg's avatar
      bdi: Do not use freezable workqueue · a2b90f11
      Mika Westerberg authored
      
      A removable block device, such as NVMe or SSD connected over Thunderbolt
      can be hot-removed any time including when the system is suspended. When
      device is hot-removed during suspend and the system gets resumed, kernel
      first resumes devices and then thaws the userspace including freezable
      workqueues. What happens in that case is that the NVMe driver notices
      that the device is unplugged and removes it from the system. This ends
      up calling bdi_unregister() for the gendisk which then schedules
      wb_workfn() to be run one more time.
      
      However, since the bdi_wq is still frozen flush_delayed_work() call in
      wb_shutdown() blocks forever halting system resume process. User sees
      this as hang as nothing is happening anymore.
      
      Triggering sysrq-w reveals this:
      
        Workqueue: nvme-wq nvme_remove_dead_ctrl_work [nvme]
        Call Trace:
         ? __schedule+0x2c5/0x630
         ? wait_for_completion+0xa4/0x120
         schedule+0x3e/0xc0
         schedule_timeout+0x1c9/0x320
         ? resched_curr+0x1f/0xd0
         ? wait_for_completion+0xa4/0x120
         wait_for_completion+0xc3/0x120
         ? wake_up_q+0x60/0x60
         __flush_work+0x131/0x1e0
         ? flush_workqueue_prep_pwqs+0x130/0x130
         bdi_unregister+0xb9/0x130
         del_gendisk+0x2d2/0x2e0
         nvme_ns_remove+0xed/0x110 [nvme_core]
         nvme_remove_namespaces+0x96/0xd0 [nvme_core]
         nvme_remove+0x5b/0x160 [nvme]
         pci_device_remove+0x36/0x90
         device_release_driver_internal+0xdf/0x1c0
         nvme_remove_dead_ctrl_work+0x14/0x30 [nvme]
         process_one_work+0x1c2/0x3f0
         worker_thread+0x48/0x3e0
         kthread+0x100/0x140
         ? current_work+0x30/0x30
         ? kthread_park+0x80/0x80
         ret_from_fork+0x35/0x40
      
      This is not limited to NVMes so exactly same issue can be reproduced by
      hot-removing SSD (over Thunderbolt) while the system is suspended.
      
      Prevent this from happening by removing WQ_FREEZABLE from bdi_wq.
      
      Reported-by: default avatarAceLan Kao <acelan.kao@canonical.com>
      Link: https://marc.info/?l=linux-kernel&m=138695698516487
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=204385
      Link: https://lore.kernel.org/lkml/20191002122136.GD2819@lahna.fi.intel.com/#t
      
      
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a2b90f11
  3. Aug 27, 2019
  4. Jun 03, 2019
  5. May 21, 2019
  6. Jan 22, 2019
  7. Aug 31, 2018
    • Dennis Zhou (Facebook)'s avatar
      blkcg: delay blkg destruction until after writeback has finished · 59b57717
      Dennis Zhou (Facebook) authored
      
      Currently, blkcg destruction relies on a sequence of events:
        1. Destruction starts. blkcg_css_offline() is called and blkgs
           release their reference to the blkcg. This immediately destroys
           the cgwbs (writeback).
        2. With blkgs giving up their reference, the blkcg ref count should
           become zero and eventually call blkcg_css_free() which finally
           frees the blkcg.
      
      Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
      and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
      on the completion of all writeback associated with the blkcg. A count of
      the number of cgwbs is maintained and once that goes to zero, blkg
      destruction can follow. This should prevent premature blkg destruction
      related to writeback.
      
      The new process for blkcg cleanup is as follows:
        1. Destruction starts. blkcg_css_offline() is called which offlines
           writeback. Blkg destruction is delayed on the cgwb_refcnt count to
           avoid punting potentially large amounts of outstanding writeback
           to root while maintaining any ongoing policies. Here, the base
           cgwb_refcnt is put back.
        2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
           and handles destruction of blkgs. This is where the css reference
           held by each blkg is released.
        3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
           This finally frees the blkg.
      
      It seems in the past blk-throttle didn't do the most understandable
      things with taking data from a blkg while associating with current. So,
      the simplification and unification of what blk-throttle is doing caused
      this.
      
      Fixes: 08e18eab ("block: add bi_blkg to the bio for cgroups")
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDennis Zhou <dennisszhou@gmail.com>
      Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      59b57717
  8. Aug 22, 2018
  9. Jun 22, 2018
    • Jan Kara's avatar
      bdi: Fix another oops in wb_workfn() · 3ee7e869
      Jan Kara authored
      
      syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to
      wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was
      WB_shutting_down after wb->bdi->dev became NULL. This indicates that
      unregister_bdi() failed to call wb_shutdown() on one of wb objects.
      
      The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus
      drops bdi's reference to wb structures before going through the list of
      wbs again and calling wb_shutdown() on each of them. This way the loop
      iterating through all wbs can easily miss a wb if that wb has already
      passed through cgwb_remove_from_bdi_list() called from wb_shutdown()
      from cgwb_release_workfn() and as a result fully shutdown bdi although
      wb_workfn() for this wb structure is still running. In fact there are
      also other ways cgwb_bdi_unregister() can race with
      cgwb_release_workfn() leading e.g. to use-after-free issues:
      
      CPU1                            CPU2
                                      cgwb_bdi_unregister()
                                        cgwb_kill(*slot);
      
      cgwb_release()
        queue_work(cgwb_release_wq, &wb->release_work);
      cgwb_release_workfn()
                                        wb = list_first_entry(&bdi->wb_list, ...)
                                        spin_unlock_irq(&cgwb_lock);
        wb_shutdown(wb);
        ...
        kfree_rcu(wb, rcu);
                                        wb_shutdown(wb); -> oops use-after-free
      
      We solve these issues by synchronizing writeback structure shutdown from
      cgwb_bdi_unregister() with cgwb_release_workfn() using a new mutex. That
      way we also no longer need synchronization using WB_shutting_down as the
      mutex provides it for CONFIG_CGROUP_WRITEBACK case and without
      CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from
      bdi_unregister().
      
      Reported-by: default avatarsyzbot <syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3ee7e869
  10. Jun 08, 2018
  11. May 23, 2018
    • Tejun Heo's avatar
      bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue · f1834646
      Tejun Heo authored
      
      From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001
      From: Tejun Heo <tj@kernel.org>
      Date: Wed, 23 May 2018 10:29:00 -0700
      
      cgwb_release() punts the actual release to cgwb_release_workfn() on
      system_wq.  Depending on the number of cgroups or block devices, there
      can be a lot of cgwb_release_workfn() in flight at the same time.
      
      We're periodically seeing close to 256 kworkers getting stuck with the
      following stack trace and overtime the entire system gets stuck.
      
        [<ffffffff810ee40c>] _synchronize_rcu_expedited.constprop.72+0x2fc/0x330
        [<ffffffff810ee634>] synchronize_rcu_expedited+0x24/0x30
        [<ffffffff811ccf23>] bdi_unregister+0x53/0x290
        [<ffffffff811cd1e9>] release_bdi+0x89/0xc0
        [<ffffffff811cd645>] wb_exit+0x85/0xa0
        [<ffffffff811cdc84>] cgwb_release_workfn+0x54/0xb0
        [<ffffffff810a68d0>] process_one_work+0x150/0x410
        [<ffffffff810a71fd>] worker_thread+0x6d/0x520
        [<ffffffff810ad3dc>] kthread+0x12c/0x160
        [<ffffffff81969019>] ret_from_fork+0x29/0x40
        [<ffffffffffffffff>] 0xffffffffffffffff
      
      The events leading to the lockup are...
      
      1. A lot of cgwb_release_workfn() is queued at the same time and all
         system_wq kworkers are assigned to execute them.
      
      2. They all end up calling synchronize_rcu_expedited().  One of them
         wins and tries to perform the expedited synchronization.
      
      3. However, that invovles queueing rcu_exp_work to system_wq and
         waiting for it.  Because #1 is holding all available kworkers on
         system_wq, rcu_exp_work can't be executed.  cgwb_release_workfn()
         is waiting for synchronize_rcu_expedited() which in turn is waiting
         for cgwb_release_workfn() to free up some of the kworkers.
      
      We shouldn't be scheduling hundreds of cgwb_release_workfn() at the
      same time.  There's nothing to be gained from that.  This patch
      updates cgwb release path to use a dedicated percpu workqueue with
      @max_active of 1.
      
      While this resolves the problem at hand, it might be a good idea to
      isolate rcu_exp_work to its own workqueue too as it can be used from
      various paths and is prone to this sort of indirect A-A deadlocks.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f1834646
  12. May 03, 2018
  13. Apr 11, 2018
    • Andrey Ryabinin's avatar
      mm/vmscan: don't mess with pgdat->flags in memcg reclaim · e3c1ac58
      Andrey Ryabinin authored
      memcg reclaim may alter pgdat->flags based on the state of LRU lists in
      cgroup and its children.  PGDAT_WRITEBACK may force kswapd to sleep
      congested_wait(), PGDAT_DIRTY may force kswapd to writeback filesystem
      pages.  But the worst here is PGDAT_CONGESTED, since it may force all
      direct reclaims to stall in wait_iff_congested().  Note that only kswapd
      have powers to clear any of these bits.  This might just never happen if
      cgroup limits configured that way.  So all direct reclaims will stall as
      long as we have some congested bdi in the system.
      
      Leave all pgdat->flags manipulations to kswapd.  kswapd scans the whole
      pgdat, only kswapd can clear pgdat->flags once node is balanced, thus
      it's reasonable to leave all decisions about node state to kswapd.
      
      Why only kswapd? Why not allow to global direct reclaim change these
      flags? It is because currently only kswapd can clear these flags.  I'm
      less worried about the case when PGDAT_CONGESTED falsely not set, and
      more worried about the case when it falsely set.  If direct reclaimer
      sets PGDAT_CONGESTED, do we have guarantee that after the congestion
      problem is sorted out, kswapd will be woken up and clear the flag? It
      seems like there is no such guarantee.  E.g.  direct reclaimers may
      eventually balance pgdat and kswapd simply won't wake up (see
      wakeup_kswapd()).
      
      Moving pgdat->flags manipulation to kswapd, means that cgroup2 recalim
      now loses its congestion throttling mechanism.  Add per-cgroup
      congestion state and throttle cgroup2 reclaimers if memcg is in
      congestion state.
      
      Currently there is no need in per-cgroup PGDAT_WRITEBACK and PGDAT_DIRTY
      bits since they alter only kswapd behavior.
      
      The problem could be easily demonstrated by creating heavy congestion in
      one cgroup:
      
          echo "+memory" > /sys/fs/cgroup/cgroup.subtree_control
          mkdir -p /sys/fs/cgroup/congester
          echo 512M > /sys/fs/cgroup/congester/memory.max
          echo $$ > /sys/fs/cgroup/congester/cgroup.procs
          /* generate a lot of diry data on slow HDD */
          while true; do dd if=/dev/zero of=/mnt/sdb/zeroes bs=1M count=1024; done &
          ....
          while true; do dd if=/dev/zero of=/mnt/sdb/zeroes bs=1M count=1024; done &
      
      and some job in another cgroup:
      
          mkdir /sys/fs/cgroup/victim
          echo 128M > /sys/fs/cgroup/victim/memory.max
      
          # time cat /dev/sda > /dev/null
          real    10m15.054s
          user    0m0.487s
          sys     1m8.505s
      
      According to the tracepoint in wait_iff_congested(), the 'cat' spent 50%
      of the time sleeping there.
      
      With the patch, cat don't waste time anymore:
      
          # time cat /dev/sda > /dev/null
          real    5m32.911s
          user    0m0.411s
          sys     0m56.664s
      
      [aryabinin@virtuozzo.com: congestion state should be per-node]
        Link: http://lkml.kernel.org/r/20180406135215.10057-1-aryabinin@virtuozzo.com
      [ayabinin@virtuozzo.com: make congestion state per-cgroup-per-node instead of just per-cgroup[
        Link: http://lkml.kernel.org/r/20180406180254.8970-2-aryabinin@virtuozzo.com
      Link: http://lkml.kernel.org/r/20180323152029.11084-5-aryabinin@virtuozzo.com
      
      
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e3c1ac58
  14. Apr 06, 2018
  15. Feb 28, 2018
  16. Dec 21, 2017
  17. Nov 19, 2017
  18. Oct 06, 2017
  19. Sep 11, 2017
  20. Apr 20, 2017
  21. Mar 23, 2017
    • Jan Kara's avatar
      bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() · b1c51afc
      Jan Kara authored
      
      Rename cgwb_bdi_destroy() to cgwb_bdi_unregister() as it gets called
      from bdi_unregister() which is not necessarily called from bdi_destroy()
      and thus the name is somewhat misleading.
      
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b1c51afc
    • Jan Kara's avatar
      bdi: Do not wait for cgwbs release in bdi_unregister() · 4514451e
      Jan Kara authored
      
      Currently we wait for all cgwbs to get released in cgwb_bdi_destroy()
      (called from bdi_unregister()). That is however unnecessary now when
      cgwb->bdi is a proper refcounted reference (thus bdi cannot get
      released before all cgwbs are released) and when cgwb_bdi_destroy()
      shuts down writeback directly.
      
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      4514451e
    • Jan Kara's avatar
      bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy() · 5318ce7d
      Jan Kara authored
      
      Currently we waited for all cgwbs to get freed in cgwb_bdi_destroy()
      which also means that writeback has been shutdown on them. Since this
      wait is going away, directly shutdown writeback on cgwbs from
      cgwb_bdi_destroy() to avoid live writeback structures after
      bdi_unregister() has finished. To make that safe with concurrent
      shutdown from cgwb_release_workfn(), we also have to make sure
      wb_shutdown() returns only after the bdi_writeback structure is really
      shutdown.
      
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      5318ce7d
    • Jan Kara's avatar
      bdi: Unify bdi->wb_list handling for root wb_writeback · e8cb72b3
      Jan Kara authored
      
      Currently root wb_writeback structure is added to bdi->wb_list in
      bdi_init() and never removed. That is different from all other
      wb_writeback structures which get added to the list when created and
      removed from it before wb_shutdown().
      
      So move list addition of root bdi_writeback to bdi_register() and list
      removal of all wb_writeback structures to wb_shutdown(). That way a
      wb_writeback structure is on bdi->wb_list if and only if it can handle
      writeback and it will make it easier for us to handle shutdown of all
      wb_writeback structures in bdi_unregister().
      
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e8cb72b3
    • Jan Kara's avatar
      bdi: Make wb->bdi a proper reference · 810df54a
      Jan Kara authored
      
      Make wb->bdi a proper refcounted reference to bdi for all bdi_writeback
      structures except for the one embedded inside struct backing_dev_info.
      That will allow us to simplify bdi unregistration.
      
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      810df54a
    • Jan Kara's avatar
      bdi: Mark congested->bdi as internal · b7d680d7
      Jan Kara authored
      
      congested->bdi pointer is used only to be able to remove congested
      structure from bdi->cgwb_congested_tree on structure release. Moreover
      the pointer can become NULL when we unregister the bdi. Rename the field
      to __bdi and add a comment to make it more explicit this is internal
      stuff of memcg writeback code and people should not use the field as
      such use will be likely race prone.
      
      We do not bother with converting congested->bdi to a proper refcounted
      reference. It will be slightly ugly to special-case bdi->wb.congested to
      avoid effectively a cyclic reference of bdi to itself and the reference
      gets cleared from bdi_unregister() making it impossible to reference
      a freed bdi.
      
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b7d680d7
  22. Mar 08, 2017
    • Jan Kara's avatar
      bdi: Fix use-after-free in wb_congested_put() · df23de55
      Jan Kara authored
      
      bdi_writeback_congested structures get created for each blkcg and bdi
      regardless whether bdi is registered or not. When they are created in
      unregistered bdi and the request queue (and thus bdi) is then destroyed
      while blkg still holds reference to bdi_writeback_congested structure,
      this structure will be referencing freed bdi and last wb_congested_put()
      will try to remove the structure from already freed bdi.
      
      With commit 165a5e22 "block: Move bdi_unregister() to
      del_gendisk()", SCSI started to destroy bdis without calling
      bdi_unregister() first (previously it was calling bdi_unregister() even
      for unregistered bdis) and thus the code detaching
      bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we
      started hitting this use-after-free bug. It is enough to boot a KVM
      instance with virtio-scsi device to trigger this behavior.
      
      Fix the problem by detaching bdi_writeback_congested structures in
      bdi_exit() instead of bdi_unregister(). This is also more logical as
      they can get attached to bdi regardless whether it ever got registered
      or not.
      
      Fixes: 165a5e22
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Tested-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      df23de55
    • Jan Kara's avatar
      block: Allow bdi re-registration · b6f8fec4
      Jan Kara authored
      
      SCSI can call device_add_disk() several times for one request queue when
      a device in unbound and bound, creating new gendisk each time. This will
      lead to bdi being repeatedly registered and unregistered. This was not a
      big problem until commit 165a5e22 "block: Move bdi_unregister() to
      del_gendisk()" since bdi was only registered repeatedly (bdi_register()
      handles repeated calls fine, only we ended up leaking reference to
      gendisk due to overwriting bdi->owner) but unregistered only in
      blk_cleanup_queue() which didn't get called repeatedly. After
      165a5e22 we were doing correct bdi_register() - bdi_unregister()
      cycles however bdi_unregister() is not prepared for it. So make sure
      bdi_unregister() cleans up bdi in such a way that it is prepared for
      a possible following bdi_register() call.
      
      An easy way to provoke this behavior is to enable
      CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a
      scsi disk which immediately hangs without this fix.
      
      Fixes: 165a5e22
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Tested-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b6f8fec4
  23. Feb 23, 2017
  24. Feb 08, 2017
    • Tejun Heo's avatar
      block: fix double-free in the failure path of cgwb_bdi_init() · 5f478e4e
      Tejun Heo authored
      
      When !CONFIG_CGROUP_WRITEBACK, bdi has single bdi_writeback_congested
      at bdi->wb_congested.  cgwb_bdi_init() allocates it with kzalloc() and
      doesn't do further initialization.  This usually works fine as the
      reference count gets bumped to 1 by wb_init() and the put from
      wb_exit() releases it.
      
      However, when wb_init() fails, it puts the wb base ref automatically
      freeing the wb and the explicit kfree() in cgwb_bdi_init() error path
      ends up trying to free the same pointer the second time causing a
      double-free.
      
      Fix it by explicitly initilizing the refcnt to 1 and putting the base
      ref from cgwb_bdi_destroy().
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Fixes: a13f35e8 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback")
      Cc: stable@vger.kernel.org # v4.2+
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      5f478e4e
  25. Feb 02, 2017
  26. Nov 08, 2016
Loading