1. 10 Dec, 2017 1 commit
  2. 07 Dec, 2017 1 commit
  3. 06 Dec, 2017 5 commits
    • Nikolay Borisov's avatar
      btrfs: Fix possible off-by-one in btrfs_search_path_in_tree · c8bcbfbd
      Nikolay Borisov authored
      The name char array passed to btrfs_search_path_in_tree is of size
      BTRFS_INO_LOOKUP_PATH_MAX (4080). So the actual accessible char indexes
      are in the range of [0, 4079]. Currently the code uses the define but this
      represents an off-by-one.
      
      Implications:
      
      Size of btrfs_ioctl_ino_lookup_args is 4096, so the new byte will be
      written to extra space, not some padding that could be provided by the
      allocator.
      
      btrfs-progs store the arguments on stack, but kernel does own copy of
      the ioctl buffer and the off-by-one overwrite does not affect userspace,
      but the ending 0 might be lost.
      
      Kernel ioctl buffer is allocated dynamically so we're overwriting
      somebody else's memory, and the ioctl is privileged if args.objectid is
      not 256. Which is in most cases, but resolving a subvolume stored in
      another directory will trigger that path.
      
      Before this patch the buffer was one byte larger, but then the -1 was
      not added.
      
      Fixes: ac8e9819 ("Btrfs: add search and inode lookup ioctls")
      Signed-off-by: 's avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: 's avatarDavid Sterba <dsterba@suse.com>
      [ added implications ]
      Signed-off-by: 's avatarDavid Sterba <dsterba@suse.com>
      c8bcbfbd
    • Omar Sandoval's avatar
      Btrfs: disable FUA if mounted with nobarrier · 1b9e619c
      Omar Sandoval authored
      I was seeing disk flushes still happening when I mounted a Btrfs
      filesystem with nobarrier for testing. This is because we use FUA to
      write out the first super block, and on devices without FUA support, the
      block layer translates FUA to a flush. Even on devices supporting true
      FUA, using FUA when we asked for no barriers is surprising.
      
      Fixes: 387125fc ("Btrfs: fix barrier flushes")
      Signed-off-by: 's avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: 's avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: 's avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: 's avatarDavid Sterba <dsterba@suse.com>
      1b9e619c
    • Jeff Mahoney's avatar
      btrfs: fix missing error return in btrfs_drop_snapshot · e19182c0
      Jeff Mahoney authored
      If btrfs_del_root fails in btrfs_drop_snapshot, we'll pick up the
      error but then return 0 anyway due to mixing err and ret.
      
      Fixes: 79787eaa ("btrfs: replace many BUG_ONs with proper error handling")
      Cc: <stable@vger.kernel.org> # v3.4+
      Signed-off-by: 's avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: 's avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: 's avatarDavid Sterba <dsterba@suse.com>
      e19182c0
    • Jeff Mahoney's avatar
      btrfs: handle errors while updating refcounts in update_ref_for_cow · 692826b2
      Jeff Mahoney authored
      Since commit fb235dc0 (btrfs: qgroup: Move half of the qgroup
      accounting time out of commit trans) the assumption that
      btrfs_add_delayed_{data,tree}_ref can only return 0 or -ENOMEM has
      been false.  The qgroup operations call into btrfs_search_slot
      and friends and can now return the full spectrum of error codes.
      
      Fortunately, the fix here is easy since update_ref_for_cow failing
      is already handled so we just need to bail early with the error
      code.
      
      Fixes: fb235dc0 (btrfs: qgroup: Move half of the qgroup accounting ...)
      Cc: <stable@vger.kernel.org> # v4.11+
      Signed-off-by: 's avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: 's avatarEdmund Nadolski <enadolski@suse.com>
      Reviewed-by: 's avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: 's avatarDavid Sterba <dsterba@suse.com>
      692826b2
    • Justin Maggard's avatar
      btrfs: Fix quota reservation leak on preallocated files · b430b775
      Justin Maggard authored
      Commit c6887cd1 ("Btrfs: don't do nocow check unless we have to")
      changed the behavior of __btrfs_buffered_write() so that it first tries
      to get a data space reservation, and then skips the relatively expensive
      nocow check if the reservation succeeded.
      
      If we have quotas enabled, the data space reservation also includes a
      quota reservation.  But in the rewrite case, the space has already been
      accounted for in qgroups.  So btrfs_check_data_free_space() increases
      the quota reservation, but it never gets decreased when the data
      actually gets written and overwrites the pre-existing data.  So we're
      left with both the qgroup and qgroup reservation accounting for the same
      space.
      
      This commit adds the missing btrfs_qgroup_free_data() call in the case
      of BTRFS_ORDERED_PREALLOC extents.
      
      Fixes: c6887cd1 ("Btrfs: don't do nocow check unless we have to")
      Signed-off-by: 's avatarJustin Maggard <jmaggard@netgear.com>
      Reviewed-by: 's avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: 's avatarDavid Sterba <dsterba@suse.com>
      b430b775
  4. 01 Dec, 2017 2 commits
    • David Howells's avatar
      afs: Properly reset afs_vnode (inode) fields · f8de483e
      David Howells authored
      When an AFS inode is allocated by afs_alloc_inode(), the allocated
      afs_vnode struct isn't necessarily reset from the last time it was used as
      an inode because the slab constructor is only invoked once when the memory
      is obtained from the page allocator.
      
      This means that information can leak from one inode to the next because
      we're not calling kmem_cache_zalloc().  Some of the information isn't
      reset, in particular the permit cache pointer.
      
      Bring the clearances up to date.
      Signed-off-by: 's avatarDavid Howells <dhowells@redhat.com>
      Tested-by: 's avatarMarc Dionne <marc.dionne@auristor.com>
      f8de483e
    • David Howells's avatar
      afs: Fix permit refcounting · 1bcab125
      David Howells authored
      Fix four refcount bugs in afs_cache_permit():
      
       (1) When checking the result of the kzalloc(), we can't just return, but
           must put 'permits'.
      
       (2) We shouldn't put permits immediately after hashing a new permit as we
           need to keep the pointer stable so that we can check to see if
           vnode->permit_cache has changed before we decide whether to assign to
           it.
      
       (3) 'permits' is being put twice.
      
       (4) We need to put either the replacement or the thing replaced after the
           assignment to vnode->permit_cache.
      
      Without this, lots of the following are seen:
      
        Kernel BUG at ffffffffa039857b [verbose debug info unavailable]
        ------------[ cut here ]------------
        Kernel BUG at ffffffffa039858a [verbose debug info unavailable]
        ------------[ cut here ]------------
      
      The addresses are in the .text..refcount section of the kafs.ko module.
      Following the relocation records for the __ex_table section shows one to be
      due to the decrement in afs_put_permits() and the other to be key_get() in
      afs_cache_permit().
      
      Occasionally, the following is seen:
      
        refcount_t overflow at afs_cache_permit+0x57d/0x5c0 [kafs] in cc1[562], uid/euid: 0/0
        WARNING: CPU: 0 PID: 562 at kernel/panic.c:657 refcount_error_report+0x9c/0xac
        ...
      Reported-by: 's avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: 's avatarDavid Howells <dhowells@redhat.com>
      Tested-by: 's avatarMarc Dionne <marc.dionne@auristor.com>
      1bcab125
  5. 30 Nov, 2017 11 commits
  6. 29 Nov, 2017 2 commits
  7. 28 Nov, 2017 8 commits
    • Eric Sandeen's avatar
      xfs: calculate correct offset in xfs_scrub_quota_item · 712d361d
      Eric Sandeen authored
      It's only used for tracepoints so it's relatively harmless,
      but the offset is calculated incorrectly in xfs_scrub_quota_item.
      
      qi_dqperchunk is the nr. of dquots per "chunk" which we have
      conveniently *cough* defined to always be 1 FSB.  Therefore
      block_offset * qi_dqperchunk == first id in that chunk,
      and so offset = id / qi_dqperchunk
      
      id * dqperchunk is ... meaningless.
      
      Fixes-coverity-id: 1423965
      Fixes: c2fc338c ("xfs: scrub quota information")
      Signed-off-by: 's avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      712d361d
    • Eric Sandeen's avatar
      xfs: fix uninitialized variable in xfs_scrub_quota · eda6bc27
      Eric Sandeen authored
      On the first pass through the while(1) loop, we get to
      xfs_scrub_should_terminate() which can test the uninitialized
      error variable.
      
      Fixes-coverity-id: 1423737
      Fixes: c2fc338c ("xfs: scrub quota information")
      Signed-off-by: 's avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      eda6bc27
    • Eric Sandeen's avatar
      xfs: fix leaks on corruption errors in xfs_bmap.c · d41c6172
      Eric Sandeen authored
      Use _GOTO instead of _RETURN so we can free the allocated
      cursor on error.
      
      Fixes: bf806280 ("xfs: remove xfs_bmse_shift_one")
      Fixes-coverity-id: 1423813, 1423676
      Signed-off-by: 's avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      d41c6172
    • Michal Hocko's avatar
      xfs: fortify xfs_alloc_buftarg error handling · d210a987
      Michal Hocko authored
      percpu_counter_init failure path doesn't clean up &btp->bt_lru list.
      Call list_lru_destroy in that error path. Similarly register_shrinker
      error path is not handled.
      
      While it is unlikely to trigger these error path, it is not impossible
      especially the later might fail with large NUMAs.  Let's handle the
      failure to make the code more robust.
      Noticed-by: 's avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: 's avatarMichal Hocko <mhocko@suse.com>
      Acked-by: 's avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: 's avatarDarrick J. Wong <darrick.wong@oracle.com>
      d210a987
    • Filipe Manana's avatar
      Btrfs: incremental send, fix wrong unlink path after renaming file · ea37d599
      Filipe Manana authored
      Under some circumstances, an incremental send operation can issue wrong
      paths for unlink commands related to files that have multiple hard links
      and some (or all) of those links were renamed between the parent and send
      snapshots. Consider the following example:
      
      Parent snapshot
      
       .                                                      (ino 256)
       |---- a/                                               (ino 257)
       |     |---- b/                                         (ino 259)
       |     |     |---- c/                                   (ino 260)
       |     |     |---- f2                                   (ino 261)
       |     |
       |     |---- f2l1                                       (ino 261)
       |
       |---- d/                                               (ino 262)
             |---- f1l1_2                                     (ino 258)
             |---- f2l2                                       (ino 261)
             |---- f1_2                                       (ino 258)
      
      Send snapshot
      
       .                                                      (ino 256)
       |---- a/                                               (ino 257)
       |     |---- f2l1/                                      (ino 263)
       |             |---- b2/                                (ino 259)
       |                   |---- c/                           (ino 260)
       |                   |     |---- d3                     (ino 262)
       |                   |           |---- f1l1_2           (ino 258)
       |                   |           |---- f2l2_2           (ino 261)
       |                   |           |---- f1_2             (ino 258)
       |                   |
       |                   |---- f2                           (ino 261)
       |                   |---- f1l2                         (ino 258)
       |
       |---- d                                                (ino 261)
      
      When computing the incremental send stream the following steps happen:
      
      1) When processing inode 261, a rename operation is issued that renames
         inode 262, which currently as a path of "d", to an orphan name of
         "o262-7-0". This is done because in the send snapshot, inode 261 has
         of its hard links with a path of "d" as well.
      
      2) Two link operations are issued that create the new hard links for
         inode 261, whose names are "d" and "f2l2_2", at paths "/" and
         "o262-7-0/" respectively.
      
      3) Still while processing inode 261, unlink operations are issued to
         remove the old hard links of inode 261, with names "f2l1" and "f2l2",
         at paths "a/" and "d/". However path "d/" does not correspond anymore
         to the directory inode 262 but corresponds instead to a hard link of
         inode 261 (link command issued in the previous step). This makes the
         receiver fail with a ENOTDIR error when attempting the unlink
         operation.
      
      The problem happens because before sending the unlink operation, we failed
      to detect that inode 262 was one of ancestors for inode 261 in the parent
      snapshot, and therefore we didn't recompute the path for inode 262 before
      issuing the unlink operation for the link named "f2l2" of inode 262. The
      detection failed because the function "is_ancestor()" only follows the
      first hard link it finds for an inode instead of all of its hard links
      (as it was originally created for being used with directories only, for
      which only one hard link exists). So fix this by making "is_ancestor()"
      follow all hard links of the input inode.
      
      A test case for fstests follows soon.
      Signed-off-by: 's avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: 's avatarDavid Sterba <dsterba@suse.com>
      ea37d599
    • Chao Yu's avatar
      quota: propagate error from __dquot_initialize · 1a6152d3
      Chao Yu authored
      In commit 6184fc0b ("quota: Propagate error from ->acquire_dquot()"),
      we have propagated error from __dquot_initialize to caller, but we forgot
      to handle such error in add_dquot_ref(), so, currently, during quota
      accounting information initialization flow, if we failed for some of
      inodes, we just ignore such error, and do account for others, which is
      not a good implementation.
      
      In this patch, we choose to let user be aware of such error, so after
      turning on quota successfully, we can make sure all inodes disk usage
      can be accounted, which will be more reasonable.
      Suggested-by: 's avatarJan Kara <jack@suse.cz>
      Signed-off-by: 's avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: 's avatarJan Kara <jack@suse.cz>
      1a6152d3
    • Qu Wenruo's avatar
      btrfs: tree-checker: Fix false panic for sanity test · 69fc6cbb
      Qu Wenruo authored
      [BUG]
      If we run btrfs with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y, it will
      instantly cause kernel panic like:
      
      ------
      ...
      assertion failed: 0, file: fs/btrfs/disk-io.c, line: 3853
      ...
      Call Trace:
       btrfs_mark_buffer_dirty+0x187/0x1f0 [btrfs]
       setup_items_for_insert+0x385/0x650 [btrfs]
       __btrfs_drop_extents+0x129a/0x1870 [btrfs]
      ...
      -----
      
      [Cause]
      Btrfs will call btrfs_check_leaf() in btrfs_mark_buffer_dirty() to check
      if the leaf is valid with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y.
      
      However quite some btrfs_mark_buffer_dirty() callers(*) don't really
      initialize its item data but only initialize its item pointers, leaving
      item data uninitialized.
      
      This makes tree-checker catch uninitialized data as error, causing
      such panic.
      
      *: These callers include but not limited to
      setup_items_for_insert()
      btrfs_split_item()
      btrfs_expand_item()
      
      [Fix]
      Add a new parameter @check_item_data to btrfs_check_leaf().
      With @check_item_data set to false, item data check will be skipped and
      fallback to old btrfs_check_leaf() behavior.
      
      So we can still get early warning if we screw up item pointers, and
      avoid false panic.
      
      Cc: Filipe Manana <fdmanana@gmail.com>
      Reported-by: 's avatarLakshmipathi.G <lakshmipathi.g@gmail.com>
      Signed-off-by: 's avatarQu Wenruo <wqu@suse.com>
      Reviewed-by: 's avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: 's avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: 's avatarDavid Sterba <dsterba@suse.com>
      69fc6cbb
    • Linus Torvalds's avatar
      proc: don't report kernel addresses in /proc/<pid>/stack · 8f5abe84
      Linus Torvalds authored
      This just changes the file to report them as zero, although maybe even
      that could be removed.  I checked, and at least procps doesn't actually
      seem to parse the 'stack' file at all.
      
      And since the file doesn't necessarily even exist (it requires
      CONFIG_STACKTRACE), possibly other tools don't really use it either.
      
      That said, in case somebody parses it with tools, just having that zero
      there should keep such tools happy.
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f5abe84
  8. 27 Nov, 2017 10 commits