Skip to content
Snippets Groups Projects
  1. Jan 24, 2019
  2. Jan 22, 2019
  3. Jan 21, 2019
    • Thomas Gleixner's avatar
      ceph: quota: cleanup license mess · 74827ee2
      Thomas Gleixner authored
      
      Precise and non-ambiguous license information is important. The recently
      added quota.c file has a SPDX license identifier, which is nice, but
      at the same time it has a contradictionary license boiler plate text.
      
        SPDX-License-Identifier: GPL-2.0
      
      versus
      
        * This program is free software; you can redistribute it and/or
        * modify it under the terms of the GNU General Public License
        * as published by the Free Software Foundation; either version 2
        * of the License, or (at your option) any later version.
      
      Oh well.
      
      As the other ceph related files are licensed under the GPL v2 only, it's
      assumed that the SPDX id is correct and the boiler plate was randomly
      copied into that patch.
      
      Remove the boiler plate as it is wrong and even if correct it is redundant.
      
      Fixes: fb18a575 ("ceph: quota: add initial infrastructure to support cephfs quotas")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Luis Henriques <lhenriques@suse.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: "Yan, Zheng" <zyan@redhat.com>
      Cc: Sage Weil <sage@redhat.com>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: ceph-devel@vger.kernel.org
      Acked-by: default avatarLuis Henriques <lhenriques@suse.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      74827ee2
    • Yan, Zheng's avatar
      ceph: clear inode pointer when snap realm gets dropped by its inode · d95e674c
      Yan, Zheng authored
      
      snap realm and corresponding inode have pointers to each other.
      The two pointer should get clear at the same time. Otherwise,
      snap realm's pointer may reference freed inode.
      
      Cc: stable@vger.kernel.org # 4.17+
      Signed-off-by: default avatar"Yan, Zheng" <zyan@redhat.com>
      Reviewed-by: default avatarLuis Henriques <lhenriques@suse.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      d95e674c
  4. Jan 20, 2019
  5. Jan 18, 2019
    • Josef Bacik's avatar
      btrfs: wakeup cleaner thread when adding delayed iput · fd340d0f
      Josef Bacik authored
      
      The cleaner thread usually takes care of delayed iputs, with the
      exception of the btrfs_end_transaction_throttle path.  Delaying iputs
      means we are potentially delaying the eviction of an inode and it's
      respective space.  The cleaner thread only gets woken up every 30
      seconds, or when we require space.  If there are a lot of inodes that
      need to be deleted we could induce a serious amount of latency while we
      wait for these inodes to be evicted.  So instead wakeup the cleaner if
      it's not already awake to process any new delayed iputs we add to the
      list.  If we suddenly need space we will less likely be backed up
      behind a bunch of inodes that are waiting to be deleted, and we could
      possibly free space before we need to get into the flushing logic which
      will save us some latency.
      
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      fd340d0f
    • Josef Bacik's avatar
      btrfs: run delayed iputs before committing · 3ec9a4c8
      Josef Bacik authored
      
      Delayed iputs means we can have final iputs of deleted inodes in the
      queue, which could potentially generate a lot of pinned space that could
      be free'd.  So before we decide to commit the transaction for ENOPSC
      reasons, run the delayed iputs so that any potential space is free'd up.
      If there is and we freed enough we can then commit the transaction and
      potentially be able to make our reservation.
      
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      3ec9a4c8
    • Josef Bacik's avatar
      btrfs: wait on ordered extents on abort cleanup · 74d5d229
      Josef Bacik authored
      
      If we flip read-only before we initiate writeback on all dirty pages for
      ordered extents we've created then we'll have ordered extents left over
      on umount, which results in all sorts of bad things happening.  Fix this
      by making sure we wait on ordered extents if we have to do the aborted
      transaction cleanup stuff.
      
      generic/475 can produce this warning:
      
       [ 8531.177332] WARNING: CPU: 2 PID: 11997 at fs/btrfs/disk-io.c:3856 btrfs_free_fs_root+0x95/0xa0 [btrfs]
       [ 8531.183282] CPU: 2 PID: 11997 Comm: umount Tainted: G        W 5.0.0-rc1-default+ #394
       [ 8531.185164] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
       [ 8531.187851] RIP: 0010:btrfs_free_fs_root+0x95/0xa0 [btrfs]
       [ 8531.193082] RSP: 0018:ffffb1ab86163d98 EFLAGS: 00010286
       [ 8531.194198] RAX: ffff9f3449494d18 RBX: ffff9f34a2695000 RCX:0000000000000000
       [ 8531.195629] RDX: 0000000000000002 RSI: 0000000000000001 RDI:0000000000000000
       [ 8531.197315] RBP: ffff9f344e930000 R08: 0000000000000001 R09:0000000000000000
       [ 8531.199095] R10: 0000000000000000 R11: ffff9f34494d4ff8 R12:ffffb1ab86163dc0
       [ 8531.200870] R13: ffff9f344e9300b0 R14: ffffb1ab86163db8 R15:0000000000000000
       [ 8531.202707] FS:  00007fc68e949fc0(0000) GS:ffff9f34bd800000(0000)knlGS:0000000000000000
       [ 8531.204851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [ 8531.205942] CR2: 00007ffde8114dd8 CR3: 000000002dfbd000 CR4:00000000000006e0
       [ 8531.207516] Call Trace:
       [ 8531.208175]  btrfs_free_fs_roots+0xdb/0x170 [btrfs]
       [ 8531.210209]  ? wait_for_completion+0x5b/0x190
       [ 8531.211303]  close_ctree+0x157/0x350 [btrfs]
       [ 8531.212412]  generic_shutdown_super+0x64/0x100
       [ 8531.213485]  kill_anon_super+0x14/0x30
       [ 8531.214430]  btrfs_kill_super+0x12/0xa0 [btrfs]
       [ 8531.215539]  deactivate_locked_super+0x29/0x60
       [ 8531.216633]  cleanup_mnt+0x3b/0x70
       [ 8531.217497]  task_work_run+0x98/0xc0
       [ 8531.218397]  exit_to_usermode_loop+0x83/0x90
       [ 8531.219324]  do_syscall_64+0x15b/0x180
       [ 8531.220192]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
       [ 8531.221286] RIP: 0033:0x7fc68e5e4d07
       [ 8531.225621] RSP: 002b:00007ffde8116608 EFLAGS: 00000246 ORIG_RAX:00000000000000a6
       [ 8531.227512] RAX: 0000000000000000 RBX: 00005580c2175970 RCX:00007fc68e5e4d07
       [ 8531.229098] RDX: 0000000000000001 RSI: 0000000000000000 RDI:00005580c2175b80
       [ 8531.230730] RBP: 0000000000000000 R08: 00005580c2175ba0 R09:00007ffde8114e80
       [ 8531.232269] R10: 0000000000000000 R11: 0000000000000246 R12:00005580c2175b80
       [ 8531.233839] R13: 00007fc68eac61c4 R14: 00005580c2175a68 R15:0000000000000000
      
      Leaving a tree in the rb-tree:
      
      3853 void btrfs_free_fs_root(struct btrfs_root *root)
      3854 {
      3855         iput(root->ino_cache_inode);
      3856         WARN_ON(!RB_EMPTY_ROOT(&root->inode_tree));
      
      CC: stable@vger.kernel.org
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      [ add stacktrace ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      74d5d229
    • Josef Bacik's avatar
      btrfs: handle delayed ref head accounting cleanup in abort · 31890da0
      Josef Bacik authored
      
      We weren't doing any of the accounting cleanup when we aborted
      transactions.  Fix this by making cleanup_ref_head_accounting global and
      calling it from the abort code, this fixes the issue where our
      accounting was all wrong after the fs aborts.
      
      The test generic/475 on a 2G VM can trigger the problems eg.:
      
        [ 8502.136957] WARNING: CPU: 0 PID: 11064 at fs/btrfs/extent-tree.c:5986 btrfs_free_block_grou +ps+0x3dc/0x410 [btrfs]
        [ 8502.148372] CPU: 0 PID: 11064 Comm: umount Not tainted 5.0.0-rc1-default+ #394
        [ 8502.150807] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626 +cc-prebuilt.qemu-project.org 04/01/2014
        [ 8502.154317] RIP: 0010:btrfs_free_block_groups+0x3dc/0x410 [btrfs]
        [ 8502.160623] RSP: 0018:ffffb1ab84b93de8 EFLAGS: 00010206
        [ 8502.161906] RAX: 0000000001000000 RBX: ffff9f34b1756400 RCX: 0000000000000000
        [ 8502.163448] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff9f34b1755400
        [ 8502.164906] RBP: ffff9f34b7e8c000 R08: 0000000000000001 R09: 0000000000000000
        [ 8502.166716] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9f34b7e8c108
        [ 8502.168498] R13: ffff9f34b7e8c158 R14: 0000000000000000 R15: dead000000000100
        [ 8502.170296] FS:  00007fb1cf15ffc0(0000) GS:ffff9f34bd400000(0000) knlGS:0000000000000000
        [ 8502.172439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 8502.173669] CR2: 00007fb1ced507b0 CR3: 000000002f7a6000 CR4: 00000000000006f0
        [ 8502.175094] Call Trace:
        [ 8502.175759]  close_ctree+0x17f/0x350 [btrfs]
        [ 8502.176721]  generic_shutdown_super+0x64/0x100
        [ 8502.177702]  kill_anon_super+0x14/0x30
        [ 8502.178607]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [ 8502.179602]  deactivate_locked_super+0x29/0x60
        [ 8502.180595]  cleanup_mnt+0x3b/0x70
        [ 8502.181406]  task_work_run+0x98/0xc0
        [ 8502.182255]  exit_to_usermode_loop+0x83/0x90
        [ 8502.183113]  do_syscall_64+0x15b/0x180
        [ 8502.183919]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Corresponding to
      
        release_global_block_rsv() {
        ...
        WARN_ON(fs_info->delayed_refs_rsv.reserved > 0);
      
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      [ add log dump ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      31890da0
    • David Sterba's avatar
      Revert "btrfs: balance dirty metadata pages in btrfs_finish_ordered_io" · 77b7aad1
      David Sterba authored
      This reverts commit e73e81b6.
      
      This patch causes a few problems:
      
      - adds latency to btrfs_finish_ordered_io
      - as btrfs_finish_ordered_io is used for free space cache, generating
        more work from btrfs_btree_balance_dirty_nodelay could end up in the
        same workque, effectively deadlocking
      
      12260 kworker/u96:16+btrfs-freespace-write D
      [<0>] balance_dirty_pages+0x6e6/0x7ad
      [<0>] balance_dirty_pages_ratelimited+0x6bb/0xa90
      [<0>] btrfs_finish_ordered_io+0x3da/0x770
      [<0>] normal_work_helper+0x1c5/0x5a0
      [<0>] process_one_work+0x1ee/0x5a0
      [<0>] worker_thread+0x46/0x3d0
      [<0>] kthread+0xf5/0x130
      [<0>] ret_from_fork+0x24/0x30
      [<0>] 0xffffffffffffffff
      
      Transaction commit will wait on the freespace cache:
      
      838 btrfs-transacti D
      [<0>] btrfs_start_ordered_extent+0x154/0x1e0
      [<0>] btrfs_wait_ordered_range+0xbd/0x110
      [<0>] __btrfs_wait_cache_io+0x49/0x1a0
      [<0>] btrfs_write_dirty_block_groups+0x10b/0x3b0
      [<0>] commit_cowonly_roots+0x215/0x2b0
      [<0>] btrfs_commit_transaction+0x37e/0x910
      [<0>] transaction_kthread+0x14d/0x180
      [<0>] kthread+0xf5/0x130
      [<0>] ret_from_fork+0x24/0x30
      [<0>] 0xffffffffffffffff
      
      And then writepages ends up waiting on transaction commit:
      
      9520 kworker/u96:13+flush-btrfs-1 D
      [<0>] wait_current_trans+0xac/0xe0
      [<0>] start_transaction+0x21b/0x4b0
      [<0>] cow_file_range_inline+0x10b/0x6b0
      [<0>] cow_file_range.isra.69+0x329/0x4a0
      [<0>] run_delalloc_range+0x105/0x3c0
      [<0>] writepage_delalloc+0x119/0x180
      [<0>] __extent_writepage+0x10c/0x390
      [<0>] extent_write_cache_pages+0x26f/0x3d0
      [<0>] extent_writepages+0x4f/0x80
      [<0>] do_writepages+0x17/0x60
      [<0>] __writeback_single_inode+0x59/0x690
      [<0>] writeback_sb_inodes+0x291/0x4e0
      [<0>] __writeback_inodes_wb+0x87/0xb0
      [<0>] wb_writeback+0x3bb/0x500
      [<0>] wb_workfn+0x40d/0x610
      [<0>] process_one_work+0x1ee/0x5a0
      [<0>] worker_thread+0x1e0/0x3d0
      [<0>] kthread+0xf5/0x130
      [<0>] ret_from_fork+0x24/0x30
      [<0>] 0xffffffffffffffff
      
      Eventually, we have every process in the system waiting on
      balance_dirty_pages(), and nobody is able to make progress on page
      writeback.
      
      The original patch tried to fix an OOM condition, that happened on 4.4 but no
      success reproducing that on later kernels (4.19 and 4.20). This is more likely
      a problem in OOM itself.
      
      Link: https://lore.kernel.org/linux-btrfs/20180528054821.9092-1-ethanlien@synology.com/
      
      
      Reported-by: default avatarChris Mason <clm@fb.com>
      CC: stable@vger.kernel.org # 4.18+
      CC: ethanlien <ethanlien@synology.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      77b7aad1
  6. Jan 17, 2019
    • Sai Prakash Ranjan's avatar
      pstore/ram: Fix console ramoops to show the previous boot logs · 6a4c9ab1
      Sai Prakash Ranjan authored
      
      commit b05c9506 ("pstore/ram: Simplify ramoops_get_next_prz()
      arguments") changed update assignment in getting next persistent ram zone
      by adding a check for record type. But the check always returns true since
      the record type is assigned 0. And this breaks console ramoops by showing
      current console log instead of previous log on warm reset and hard reset
      (actually hard reset should not be showing any logs).
      
      Fix this by having persistent ram zone type check instead of record type
      check. Tested this on SDM845 MTP and dragonboard 410c.
      
      Reproducing this issue is simple as below:
      
      1. Trigger hard reset and mount pstore. Will see console-ramoops
         record in the mounted location which is the current log.
      
      2. Trigger warm reset and mount pstore. Will see the current
         console-ramoops record instead of previous record.
      
      Fixes: b05c9506 ("pstore/ram: Simplify ramoops_get_next_prz() arguments")
      Signed-off-by: default avatarSai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
      Acked-by: default avatarJoel Fernandes (Google) <joel@joelfernandes.org>
      [kees: dropped local variable usage]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      6a4c9ab1
    • David Howells's avatar
      afs: Fix race in async call refcounting · 34fa4761
      David Howells authored
      
      There's a race between afs_make_call() and afs_wake_up_async_call() in the
      case that an error is returned from rxrpc_kernel_send_data() after it has
      queued the final packet.
      
      afs_make_call() will try and clean up the mess, but the call state may have
      been moved on thereby causing afs_process_async_call() to also try and to
      delete the call.
      
      Fix this by:
      
       (1) Getting an extra ref for an asynchronous call for the call itself to
           hold.  This makes sure the call doesn't evaporate on us accidentally
           and will allow the call to be retained by the caller in a future
           patch.  The ref is released on leaving afs_make_call() or
           afs_wait_for_call_to_complete().
      
       (2) In the event of an error from rxrpc_kernel_send_data():
      
           (a) Don't set the call state to AFS_CALL_COMPLETE until *after* the
           	 call has been aborted and ended.  This prevents
           	 afs_deliver_to_call() from doing anything with any notifications
           	 it gets.
      
           (b) Explicitly end the call immediately to prevent further callbacks.
      
           (c) Cancel any queued async_work and wait for the work if it's
           	 executing.  This allows us to be sure the race won't recur when we
           	 change the state.  We put the work queue's ref on the call if we
           	 managed to cancel it.
      
           (d) Put the call's ref that we got in (1).  This belongs to us as long
           	 as the call is in state AFS_CALL_CL_REQUESTING.
      
      Fixes: 341f741f ("afs: Refcount the afs_call struct")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      34fa4761
    • David Howells's avatar
      afs: Provide a function to get a ref on a call · 7a75b007
      David Howells authored
      
      Provide a function to get a reference on an afs_call struct.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      7a75b007
    • David Howells's avatar
      afs: Fix key refcounting in file locking code · 59d49076
      David Howells authored
      
      Fix the refcounting of the authentication keys in the file locking code.
      The vnode->lock_key member points to a key on which it expects to be
      holding a ref, but it isn't always given an extra ref, however.
      
      Fixes: 0fafdc9f ("afs: Fix file locking")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      59d49076
    • Marc Dionne's avatar
      afs: Don't set vnode->cb_s_break in afs_validate() · 4882a27c
      Marc Dionne authored
      
      A cb_interest record is not necessarily attached to the vnode on entry to
      afs_validate(), which can cause an oops when we try to bring the vnode's
      cb_s_break up to date in the default case (ie. no current callback promise
      and the vnode has not been deleted).
      
      Fix this by simply removing the line, as vnode->cb_s_break will be set when
      needed by afs_register_server_cb_interest() when we next get a callback
      promise from RPC call.
      
      The oops looks something like:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
          ...
          RIP: 0010:afs_validate+0x66/0x250 [kafs]
          ...
          Call Trace:
           afs_d_revalidate+0x8d/0x340 [kafs]
           ? __d_lookup+0x61/0x150
           lookup_dcache+0x44/0x70
           ? lookup_dcache+0x44/0x70
           __lookup_hash+0x24/0xa0
           do_unlinkat+0x11d/0x2c0
           __x64_sys_unlink+0x23/0x30
           do_syscall_64+0x4d/0xf0
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: ae3b7361 ("afs: Fix validation/callback interaction")
      Signed-off-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      4882a27c
  7. Jan 15, 2019
    • Olga Kornievskaia's avatar
      NFSv4.2 fix unnecessary retry in nfs4_copy_file_range · 45ac486e
      Olga Kornievskaia authored
      
      Currently nfs42_proc_copy_file_range() can not return EAGAIN.
      
      Fixes: e4648aa4 ("NFS recover from destination server reboot for copies")
      Signed-off-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      45ac486e
    • Jan Kara's avatar
      blockdev: Fix livelocks on loop device · 04906b2f
      Jan Kara authored
      
      bd_set_size() updates also block device's block size. This is somewhat
      unexpected from its name and at this point, only blkdev_open() uses this
      functionality. Furthermore, this can result in changing block size under
      a filesystem mounted on a loop device which leads to livelocks inside
      __getblk_gfp() like:
      
      Sending NMI from CPU 0 to CPUs 1:
      NMI backtrace for cpu 1
      CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
      01/01/2011
      RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
      ...
      Call Trace:
       init_page_buffers+0x3e2/0x530 fs/buffer.c:904
       grow_dev_page fs/buffer.c:947 [inline]
       grow_buffers fs/buffer.c:1009 [inline]
       __getblk_slow fs/buffer.c:1036 [inline]
       __getblk_gfp+0x906/0xb10 fs/buffer.c:1313
       __bread_gfp+0x2d/0x310 fs/buffer.c:1347
       sb_bread include/linux/buffer_head.h:307 [inline]
       fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
       fat_ent_read_block fs/fat/fatent.c:441 [inline]
       fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
       fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
       __fat_get_block fs/fat/inode.c:148 [inline]
      ...
      
      Trivial reproducer for the problem looks like:
      
      truncate -s 1G /tmp/image
      losetup /dev/loop0 /tmp/image
      mkfs.ext4 -b 1024 /dev/loop0
      mount -t ext4 /dev/loop0 /mnt
      losetup -c /dev/loop0
      l /mnt
      
      Fix the problem by moving initialization of a block device block size
      into a separate function and call it when needed.
      
      Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with
      debugging the problem.
      
      Reported-by: default avatar <syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      04906b2f
  8. Jan 11, 2019
  9. Jan 10, 2019
Loading