• Michael Bohan's avatar
    hrtimer: Don't reinitialize a cpu_base lock on CPU_UP · 84cc8fd2
    Michael Bohan authored
    The current code makes the assumption that a cpu_base lock won't be
    held if the CPU corresponding to that cpu_base is offline, which isn't
    always true.
    
    If a hrtimer is not queued, then it will not be migrated by
    migrate_hrtimers() when a CPU is offlined. Therefore, the hrtimer's
    cpu_base may still point to a CPU which has subsequently gone offline
    if the timer wasn't enqueued at the time the CPU went down.
    
    Normally this wouldn't be a problem, but a cpu_base's lock is blindly
    reinitialized each time a CPU is brought up. If a CPU is brought
    online during the period that another thread is performing a hrtimer
    operation on a stale hrtimer, then the lock will be reinitialized
    under its feet, and a SPIN_BUG() like the following will be observed:
    
    <0>[   28.082085] BUG: spinlock already unlocked on CPU#0, swapper/0/0
    <0>[   28.087078]  lock: 0xc4780b40, value 0x0 .magic: dead4ead, .owner: <none>/-1, .owner_cpu: -1
    <4>[   42.451150] [<c0014398>] (unwind_backtrace+0x0/0x120) from [<c0269220>] (do_raw_spin_unlock+0x44/0xdc)
    <4>[   42.460430] [<c0269220>] (do_raw_spin_unlock+0x44/0xdc) from [<c071b5bc>] (_raw_spin_unlock+0x8/0x30)
    <4>[   42.469632] [<c071b5bc>] (_raw_spin_unlock+0x8/0x30) from [<c00a9ce0>] (__hrtimer_start_range_ns+0x1e4/0x4f8)
    <4>[   42.479521] [<c00a9ce0>] (__hrtimer_start_range_ns+0x1e4/0x4f8) from [<c00aa014>] (hrtimer_start+0x20/0x28)
    <4>[   42.489247] [<c00aa014>] (hrtimer_start+0x20/0x28) from [<c00e6190>] (rcu_idle_enter_common+0x1ac/0x320)
    <4>[   42.498709] [<c00e6190>] (rcu_idle_enter_common+0x1ac/0x320) from [<c00e6440>] (rcu_idle_enter+0xa0/0xb8)
    <4>[   42.508259] [<c00e6440>] (rcu_idle_enter+0xa0/0xb8) from [<c000f268>] (cpu_idle+0x24/0xf0)
    <4>[   42.516503] [<c000f268>] (cpu_idle+0x24/0xf0) from [<c06ed3c0>] (rest_init+0x88/0xa0)
    <4>[   42.524319] [<c06ed3c0>] (rest_init+0x88/0xa0) from [<c0c00978>] (start_kernel+0x3d0/0x434)
    
    As an example, this particular crash occurred when hrtimer_start() was
    executed on CPU #0. The code locked the hrtimer's current cpu_base
    corresponding to CPU #1. CPU #0 then tried to switch the hrtimer's
    cpu_base to an optimal CPU which was online. In this case, it selected
    the cpu_base corresponding to CPU #3.
    
    Before it could proceed, CPU #1 came online and reinitialized the
    spinlock corresponding to its cpu_base. Thus now CPU #0 held a lock
    which was reinitialized. When CPU #0 finally ended up unlocking the
    old cpu_base corresponding to CPU #1 so that it could switch to CPU
    #3, we hit this SPIN_BUG() above while in switch_hrtimer_base().
    
    CPU #0                            CPU #1
    ----                              ----
    ...                               <offline>
    hrtimer_start()
    lock_hrtimer_base(base #1)
    ...                               init_hrtimers_cpu()
    switch_hrtimer_base()             ...
    ...                               raw_spin_lock_init(&cpu_base->lock)
    raw_spin_unlock(&cpu_base->lock)  ...
    <spin_bug>
    
    Solve this by statically initializing the lock.
    Signed-off-by: default avatarMichael Bohan <mbohan@codeaurora.org>
    Link: http://lkml.kernel.org/r/1363745965-23475-1-git-send-email-mbohan@codeaurora.org
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    84cc8fd2