1. 27 Apr, 2017 1 commit
  2. 26 Apr, 2017 1 commit
    • David Howells's avatar
      statx: Kill fd-with-NULL-path support in favour of AT_EMPTY_PATH · 1e2f82d1
      David Howells authored
      With the new statx() syscall, the following both allow the attributes of
      the file attached to a file descriptor to be retrieved:
      
      	statx(dfd, NULL, 0, ...);
      
      and:
      
      	statx(dfd, "", AT_EMPTY_PATH, ...);
      
      Change the code to reject the first option, though this means copying
      the path and engaging pathwalk for the fstat() equivalent.  dfd can be a
      non-directory provided path is "".
      
      [ The timing of this isn't wonderful, but applying this now before we
        have statx() in any released kernel, before anybody starts using the
        NULL special case.    - Linus ]
      
      Fixes: a528d35e
      
       ("statx: Add a system call to make enhanced file info available")
      Reported-by: default avatarMichael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Eric Sandeen <sandeen@sandeen.net>
      cc: fstests@vger.kernel.org
      cc: linux-api@vger.kernel.org
      cc: linux-man@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e2f82d1
  3. 19 Apr, 2017 1 commit
    • Cong Wang's avatar
      nsfs: mark dentry with DCACHE_RCUACCESS · 073c516f
      Cong Wang authored
      Andrey reported a use-after-free in __ns_get_path():
      
        spin_lock include/linux/spinlock.h:299 [inline]
        lockref_get_not_dead+0x19/0x80 lib/lockref.c:179
        __ns_get_path+0x197/0x860 fs/nsfs.c:66
        open_related_ns+0xda/0x200 fs/nsfs.c:143
        sock_ioctl+0x39d/0x440 net/socket.c:1001
        vfs_ioctl fs/ioctl.c:45 [inline]
        do_vfs_ioctl+0x1bf/0x1780 fs/ioctl.c:685
        SYSC_ioctl fs/ioctl.c:700 [inline]
        SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
      
      We are under rcu read lock protection at that point:
      
              rcu_read_lock();
              d = atomic_long_read(&ns->stashed);
              if (!d)
                      goto slow;
              dentry = (struct dentry *)d;
              if (!lockref_get_not_dead(&dentry->d_lockref))
                      goto slow;
              rcu_read_unlock();
      
      but don't use a proper RCU API on the free path, therefore a parallel
      __d_free() could free it at the same time.  We need to mark the stashed
      dentry with DCACHE_RCUACCESS so that __d_free() will be called after all
      readers leave RCU.
      
      Fixes: e149ed2b
      
       ("take the targets of /proc/*/ns/* symlinks to separate fs")
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      073c516f
  4. 18 Apr, 2017 1 commit
  5. 17 Apr, 2017 1 commit
    • Sachin Prabhu's avatar
      cifs: Do not send echoes before Negotiate is complete · 62a6cfdd
      Sachin Prabhu authored
      commit 4fcd1813 ("Fix reconnect to not defer smb3 session reconnect
      long after socket reconnect") added support for Negotiate requests to
      be initiated by echo calls.
      
      To avoid delays in calling echo after a reconnect, I added the patch
      introduced by the commit b8c60012
      
       ("Call echo service immediately
      after socket reconnect").
      
      This has however caused a regression with cifs shares which do not have
      support for echo calls to trigger Negotiate requests. On connections
      which need to call Negotiation, the echo calls trigger an error which
      triggers a reconnect which in turn triggers another echo call. This
      results in a loop which is only broken when an operation is performed on
      the cifs share. For an idle share, it can DOS a server.
      
      The patch uses the smb_operation can_echo() for cifs so that it is
      called only if connection has been already been setup.
      
      kernel bz: 194531
      Signed-off-by: default avatarSachin Prabhu <sprabhu@redhat.com>
      Tested-by: default avatarJonathan Liu <net147@gmail.com>
      Acked-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      62a6cfdd
  6. 15 Apr, 2017 2 commits
    • Martin Brandenburg's avatar
      orangefs: free superblock when mount fails · 1ec1688c
      Martin Brandenburg authored
      
      
      Otherwise lockdep says:
      
      [ 1337.483798] ================================================
      [ 1337.483999] [ BUG: lock held when returning to user space! ]
      [ 1337.484252] 4.11.0-rc6 #19 Not tainted
      [ 1337.484423] ------------------------------------------------
      [ 1337.484626] mount/14766 is leaving the kernel with locks still held!
      [ 1337.484841] 1 lock held by mount/14766:
      [ 1337.485017]  #0:  (&type->s_umount_key#33/1){+.+.+.}, at: [<ffffffff8124171f>] sget_userns+0x2af/0x520
      
      Caught by xfstests generic/413 which tried to mount with the unsupported
      mount option dax.  Then xfstests generic/422 ran sync which deadlocks.
      Signed-off-by: default avatarMartin Brandenburg <martin@omnibond.com>
      Acked-by: default avatarMike Marshall <hubcap@omnibond.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ec1688c
    • Linus Torvalds's avatar
      vfs: don't do RCU lookup of empty pathnames · c0eb027e
      Linus Torvalds authored
      
      
      Normal pathname lookup doesn't allow empty pathnames, but using
      AT_EMPTY_PATH (with name_to_handle_at() or fstatat(), for example) you
      can trigger an empty pathname lookup.
      
      And not only is the RCU lookup in that case entirely unnecessary
      (because we'll obviously immediately finalize the end result), it is
      actively wrong.
      
      Why? An empth path is a special case that will return the original
      'dirfd' dentry - and that dentry may not actually be RCU-free'd,
      resulting in a potential use-after-free if we were to initialize the
      path lazily under the RCU read lock and depend on complete_walk()
      finalizing the dentry.
      
      Found by syzkaller and KASAN.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarVegard Nossum <vegard.nossum@gmail.com>
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c0eb027e
  7. 14 Apr, 2017 2 commits
  8. 13 Apr, 2017 2 commits
    • Olga Kornievskaia's avatar
      nfsd: fix oops on unsupported operation · 05b7278d
      Olga Kornievskaia authored
      I'm hitting the BUG in nfsd4_max_reply() at fs/nfsd/nfs4proc.c:2495 when
      client sends an operation the server doesn't support.
      
      in nfsd4_max_reply() it checks for NULL rsize_bop but a non-supported
      operation wouldn't have that set.
      
      Cc: Kinglong Mee <kinglongmee@gmail.com>
      Fixes: 2282cd2c
      
       "NFSD: Get response size before operation..."
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      05b7278d
    • Pavel Shilovsky's avatar
      CIFS: Fix SMB3 mount without specifying a security mechanism · 67dbea2c
      Pavel Shilovsky authored
      Commit ef65aaed
      
       ("smb2: Enforce sec= mount option") changed the
      behavior of a mount command to enforce a specified security mechanism
      during mounting. On another hand according to the spec if SMB3 server
      doesn't respond with a security context it implies that it supports
      NTLMSSP. The current code doesn't keep it in mind and fails a mount
      for such servers if no security mechanism is specified. Fix this by
      indicating that a server supports NTLMSSP if a security context isn't
      returned during negotiate phase. This allows the code to use NTLMSSP
      by default for SMB3 mounts.
      Signed-off-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      67dbea2c
  9. 11 Apr, 2017 9 commits
    • Liu Bo's avatar
      Btrfs: fix potential use-after-free for cloned bio · a967efb3
      Liu Bo authored
      
      
      KASAN reports that there is a use-after-free case of bio in btrfs_map_bio.
      
      If we need to submit IOs to several disks at a time, the original bio
      would get cloned and mapped to the destination disk, but we really should
      use the original bio instead of a cloned bio to do the sanity check
      because cloned bios are likely to be freed by its endio.
      Reported-by: default avatarDiego <diegocg@gmail.com>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      a967efb3
    • Liu Bo's avatar
      Btrfs: fix segmentation fault when doing dio read · 97bf5a55
      Liu Bo authored
      Commit 2dabb324
      
       ("Btrfs: Direct I/O read: Work on sectorsized blocks")
      introduced this bug during iterating bio pages in dio read's endio hook,
      and it could end up with segment fault of the dio reading task.
      
      So the reason is 'if (nr_sectors--)', and it makes the code assume that
      there is one more block in the same page, so page offset is increased and
      the bio which is created to repair the bad block then has an incorrect
      bvec.bv_offset, and a later access of the page content would throw a
      segmentation fault.
      
      This also adds ASSERT to check page offset against page size.
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      97bf5a55
    • Liu Bo's avatar
      Btrfs: fix invalid dereference in btrfs_retry_endio · 2e949b0a
      Liu Bo authored
      
      
      When doing directIO repair, we have this oops:
      
      [ 1458.532816] general protection fault: 0000 [#1] SMP
      ...
      [ 1458.536291] Workqueue: btrfs-endio-repair btrfs_endio_repair_helper [btrfs]
      [ 1458.536893] task: ffff88082a42d100 task.stack: ffffc90002b3c000
      [ 1458.537499] RIP: 0010:btrfs_retry_endio+0x7e/0x1a0 [btrfs]
      ...
      [ 1458.543261] Call Trace:
      [ 1458.543958]  ? rcu_read_lock_sched_held+0xc4/0xd0
      [ 1458.544374]  bio_endio+0xed/0x100
      [ 1458.544750]  end_workqueue_fn+0x3c/0x40 [btrfs]
      [ 1458.545257]  normal_work_helper+0x9f/0x900 [btrfs]
      [ 1458.545762]  btrfs_endio_repair_helper+0x12/0x20 [btrfs]
      [ 1458.546224]  process_one_work+0x34d/0xb70
      [ 1458.546570]  ? process_one_work+0x29e/0xb70
      [ 1458.546938]  worker_thread+0x1cf/0x960
      [ 1458.547263]  ? process_one_work+0xb70/0xb70
      [ 1458.547624]  kthread+0x17d/0x180
      [ 1458.547909]  ? kthread_create_on_node+0x70/0x70
      [ 1458.548300]  ret_from_fork+0x31/0x40
      
      It turns out that btrfs_retry_endio is trying to get inode from a directIO
      page.
      
      This fixes the problem by using the saved inode pointer, done->inode.
      btrfs_retry_endio_nocsum has the same problem, and it's fixed as well.
      
      Also cleanup unused @start (which is too trivial for a separate patch).
      
      Cc: David Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      2e949b0a
    • Adam Borowski's avatar
      btrfs: drop the nossd flag when remounting with -o ssd · 951e7966
      Adam Borowski authored
      
      
      The opposite case was already handled right in the very next switch entry.
      And also when turning on nossd, drop ssd_spread.
      Reported-by: default avatarHans van Kranenburg <hans.van.kranenburg@mendix.com>
      Signed-off-by: default avatarAdam Borowski <kilobyte@angband.pl>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      951e7966
    • Germano Percossi's avatar
      CIFS: store results of cifs_reopen_file to avoid infinite wait · 1fa839b4
      Germano Percossi authored
      
      
      This fixes Continuous Availability when errors during
      file reopen are encountered.
      
      cifs_user_readv and cifs_user_writev would wait for ever if
      results of cifs_reopen_file are not stored and for later inspection.
      
      In fact, results are checked and, in case of errors, a chain
      of function calls leading to reads and writes to be scheduled in
      a separate thread is skipped.
      These threads will wake up the corresponding waiters once reads
      and writes are done.
      
      However, given the return value is not stored, when rc is checked
      for errors a previous one (always zero) is inspected instead.
      This leads to pending reads/writes added to the list, making
      cifs_user_readv and cifs_user_writev wait for ever.
      Signed-off-by: default avatarGermano Percossi <germano.percossi@citrix.com>
      Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      1fa839b4
    • Germano Percossi's avatar
      CIFS: remove bad_network_name flag · a0918f1c
      Germano Percossi authored
      
      
      STATUS_BAD_NETWORK_NAME can be received during node failover,
      causing the flag to be set and making the reconnect thread
      always unsuccessful, thereafter.
      
      Once the only place where it is set is removed, the remaining
      bits are rendered moot.
      
      Removing it does not prevent "mount" from failing when a non
      existent share is passed.
      
      What happens when the share really ceases to exist while the
      share is mounted is undefined now as much as it was before.
      Signed-off-by: default avatarGermano Percossi <germano.percossi@citrix.com>
      Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      a0918f1c
    • Germano Percossi's avatar
      CIFS: reconnect thread reschedule itself · 18ea4311
      Germano Percossi authored
      
      
      In case of error, smb2_reconnect_server reschedule itself
      with a delay, to avoid being too aggressive.
      Signed-off-by: default avatarGermano Percossi <germano.percossi@citrix.com>
      Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      18ea4311
    • Mark Syms's avatar
      CIFS: handle guest access errors to Windows shares · 40920c2b
      Mark Syms authored
      Commit 1a967d6c
      
       ("correctly to
      anonymous authentication for the NTLM(v2) authentication") introduces
      a regression in handling errors related to attempting a guest
      connection to a Windows share which requires authentication. This
      should result in a permission denied error but actually causes the
      kernel module to enter a never-ending loop trying to follow a DFS
      referal which doesn't exist.
      
      The base cause of this is the failure now occurs later in the process
      during tree connect and not at the session setup setup and all errors
      in tree connect are interpreted as needing to follow the DFS paths
      which isn't in this case correct. So, check the returned error against
      EACCES and fail if this is returned error.
      
      Feedback from Aurelien:
      
        PS> net user guest /activate:no
          PS> mkdir C:\guestshare
            PS> icacls C:\guestshare /grant 'Everyone:(OI)(CI)F'
              PS> new-smbshare -name guestshare -path C:\guestshare -fullaccess Everyone
      
              I've tested v3.10, v4.4, master, master+your patch using default options
              (empty or no user "NU") and user=abc (U).
      
              NT_LOGON_FAILURE in session setup: LF
              This is what you seem to have in 3.10.
      
              NT_ACCESS_DENIED in tree connect to the share: AD
              This is what you get before your infinite loop.
      
                           |   NU       U
                           --------------------------------
                           3.10         |   LF       LF
                           4.4          |   LF       LF
                           master       |   AD       LF
                           master+patch |   AD       LF
      
                           No infinite DFS loop :(
                           All these issues result in mount failing very fast with permission denied.
      
                           I guess it could be from either the Windows version or the share/folder
                           ACL. A deeper analysis of the packets might reveal more.
      
                           In any case I did not notice any issues for on a basic DFS setup with
                           the patch so I don't think it introduced any regressions, which is
                           probably all that matters. It still bothers me a little I couldn't hit
                           the bug.
      
                           I've included kernel output w/ debugging output and network capture of
                           my tests if anyone want to have a look at it. (master+patch = ml-guestfix).
      Signed-off-by: default avatarMark Syms <mark.syms@citrix.com>
      Reviewed-by: default avatarAurelien Aptel <aaptel@suse.com>
      Tested-by: default avatarAurelien Aptel <aaptel@suse.com>
      Acked-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      40920c2b
    • Pavel Shilovsky's avatar
      CIFS: Fix null pointer deref during read resp processing · 350be257
      Pavel Shilovsky authored
      
      
      Currently during receiving a read response mid->resp_buf can be
      NULL when it is being passed to cifs_discard_remaining_data() from
      cifs_readv_discard(). Fix it by always passing server->smallbuf
      instead and initializing mid->resp_buf at the end of read response
      processing.
      Signed-off-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Acked-by: default avatarSachin Prabhu <sprabhu@redhat.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      350be257
  10. 08 Apr, 2017 3 commits
    • NeilBrown's avatar
      sysfs: be careful of error returns from ops->show() · c8a139d0
      NeilBrown authored
      ops->show() can return a negative error code.
      Commit 65da3484 ("sysfs: correctly handle short reads on PREALLOC attrs.")
      (in v4.4) caused this to be stored in an unsigned 'size_t' variable, so errors
      would look like large numbers.
      As a result, if an error is returned, sysfs_kf_read() will return the
      value of 'count', typically 4096.
      
      Commit 17d0774f ("sysfs: correctly handle read offset on PREALLOC attrs")
      (in v4.8) extended this error to use the unsigned large 'len' as a size for
      memmove().
      Consequently, if ->show returns an error, then the first read() on the
      sysfs file will return 4096 and could return uninitialized memory to
      user-space.
      If the application performs a subsequent read, this will trigger a memmove()
      with extremely large count, and is likely to crash the machine is bizarre ways.
      
      This bug can currently only be triggered by reading from an md
      sysfs attribute declared with __ATTR_PREALLOC() during the
      brief period between when mddev_put() deletes an mddev from
      the ->all_mddevs list, and when mddev_delayed_delete() - which is
      scheduled on a workqueue - completes.
      Before this, an error won't be returned by the ->show()
      After this, the ->show() won't be called.
      
      I can reproduce it reliably only by putting delay like
      	usleep_range(500000,700000);
      early in mddev_delayed_delete(). Then after creating an
      md device md0 run
        echo clear > /sys/block/md0/md/array_state; cat /sys/block/md0/md/array_state
      
      The bug can be triggered without the usleep.
      
      Fixes: 65da3484 ("sysfs: correctly handle short reads on PREALLOC attrs.")
      Fixes: 17d0774f
      
       ("sysfs: correctly handle read offset on PREALLOC attrs")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-tested-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8a139d0
    • Ross Zwisler's avatar
      dax: fix radix tree insertion race · e11f8b7b
      Ross Zwisler authored
      While running generic/340 in my test setup I hit the following race.  It
      can happen with kernels that support FS DAX PMDs, so v4.10 thru
      v4.11-rc5.
      
      Thread 1				Thread 2
      --------				--------
      dax_iomap_pmd_fault()
        grab_mapping_entry()
          spin_lock_irq()
          get_unlocked_mapping_entry()
          'entry' is NULL, can't call lock_slot()
          spin_unlock_irq()
          radix_tree_preload()
      					dax_iomap_pmd_fault()
      					  grab_mapping_entry()
      					    spin_lock_irq()
      					    get_unlocked_mapping_entry()
      					    ...
      					    lock_slot()
      					    spin_unlock_irq()
      					  dax_pmd_insert_mapping()
      					    <inserts a PMD mapping>
          spin_lock_irq()
          __radix_tree_insert() fails with -EEXIST
          <fall back to 4k fault, and die horribly
           when inserting a 4k entry where a PMD exists>
      
      The issue is that we have to drop mapping->tree_lock while calling
      radix_tree_preload(), but since we didn't have a radix tree entry to
      lock (unlike in the pmd_downgrade case) we have no protection against
      Thread 2 coming along and inserting a PMD at the same index.  For 4k
      entries we handled this with a special-case response to -EEXIST coming
      from the __radix_tree_insert(), but this doesn't save us for PMDs
      because the -EEXIST case can also mean that we collided with a 4k entry
      in the radix tree at a different index, but one that is covered by our
      PMD range.
      
      So, correctly handle both the 4k and 2M collision cases by explicitly
      re-checking the radix tree for an entry at our index once we reacquire
      mapping->tree_lock.
      
      This patch has made it through a clean xfstests run with the current
      v4.11-rc5 based linux/master, and it also ran generic/340 500 times in a
      loop.  It used to fail within the first 10 iterations.
      
      Link: http://lkml.kernel.org/r/20170406212944.2866-1-ross.zwisler@linux.intel.com
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: <stable@vger.kernel.org>    [4.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e11f8b7b
    • Mike Rapoport's avatar
      userfaultfd: report actual registered features in fdinfo · 045098e9
      Mike Rapoport authored
      fdinfo for userfault file descriptor reports UFFD_API_FEATURES.  Up
      until recently, the UFFD_API_FEATURES was defined as 0, therefore
      corresponding field in fdinfo always contained zero.  Now, with
      introduction of several additional features, UFFD_API_FEATURES is not
      longer 0 and it seems better to report actual features requested for the
      userfaultfd object described by the fdinfo.
      
      First, the applications that were using userfault will still see zero at
      the features field in fdinfo.  Next, reporting actual features rather
      than available features, gives clear indication of what userfault
      features are used by an application.
      
      Link: http://lkml.kernel.org/r/1491140181-22121-1-git-send-email-rppt@linux.vnet.ibm.com
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      045098e9
  11. 07 Apr, 2017 7 commits
    • Martin Brandenburg's avatar
      orangefs: move features validation to fix filesystem hang · cefdc26e
      Martin Brandenburg authored
      Without this fix (and another to the userspace component itself
      described later), the kernel will be unable to process any OrangeFS
      requests after the userspace component is restarted (due to a crash or
      at the administrator's behest).
      
      The bug here is that inside orangefs_remount, the orangefs_request_mutex
      is locked.  When the userspace component restarts while the filesystem
      is mounted, it sends a ORANGEFS_DEV_REMOUNT_ALL ioctl to the device,
      which causes the kernel to send it a few requests aimed at synchronizing
      the state between the two.  While this is happening the
      orangefs_request_mutex is locked to prevent any other requests going
      through.
      
      This is only half of the bugfix.  The other half is in the userspace
      component which outright ignores(!) requests made before it considers
      the filesystem remounted, which is after the ioctl returns.  Of course
      the ioctl doesn't return until after the userspace component responds to
      the request it ignores.  The userspace component has been changed to
      allow ORANGEFS_VFS_OP_FEATURES regardless of the mount status.
      
      Mike Marshall says:
       "I've tested this patch against the fixed userspace part. This patch is
        real important, I hope it can make it into 4.11...
      
        Here's what happens when the userspace daemon is restarted, without
        the patch:
      
          =============================================
          [ INFO: possible recursive locking detected ]
          [   4.10.0-00007-ge98bdb30 #1 Not tainted    ]
          ---------------------------------------------
          pvfs2-client-co/29032 is trying to acquire lock:
           (orangefs_request_mutex){+.+.+.}, at: service_operation+0x3c7/0x7b0 [orangefs]
                        but task is already holding lock:
           (orangefs_request_mutex){+.+.+.}, at: dispatch_ioctl_command+0x1bf/0x330 [orangefs]
      
          CPU: 0 PID: 29032 Comm: pvfs2-client-co Not tainted 4.10.0-00007-ge98bdb30
      
       #1
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
          Call Trace:
           __lock_acquire+0x7eb/0x1290
           lock_acquire+0xe8/0x1d0
           mutex_lock_killable_nested+0x6f/0x6e0
           service_operation+0x3c7/0x7b0 [orangefs]
           orangefs_remount+0xea/0x150 [orangefs]
           dispatch_ioctl_command+0x227/0x330 [orangefs]
           orangefs_devreq_ioctl+0x29/0x70 [orangefs]
           do_vfs_ioctl+0xa3/0x6e0
           SyS_ioctl+0x79/0x90"
      Signed-off-by: default avatarMartin Brandenburg <martin@omnibond.com>
      Acked-by: default avatarMike Marshall <hubcap@omnibond.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cefdc26e
    • Liping Zhang's avatar
      sysctl: add sanity check for proc_douintvec · 1680a386
      Liping Zhang authored
      Commit e7d316a0
      
       ("sysctl: handle error writing UINT_MAX to u32
      fields") introduced the proc_douintvec helper function, but it forgot to
      add the related sanity check when doing register_sysctl_table.  So add
      it now.
      Signed-off-by: default avatarLiping Zhang <zlpnobody@gmail.com>
      Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1680a386
    • Jan-Marek Glogowski's avatar
      Reset TreeId to zero on SMB2 TREE_CONNECT · 806a28ef
      Jan-Marek Glogowski authored
      Currently the cifs module breaks the CIFS specs on reconnect as
      described in http://msdn.microsoft.com/en-us/library/cc246529.aspx
      
      :
      
      "TreeId (4 bytes): Uniquely identifies the tree connect for the
      command. This MUST be 0 for the SMB2 TREE_CONNECT Request."
      Signed-off-by: default avatarJan-Marek Glogowski <glogow@fbihome.de>
      Reviewed-by: default avatarAurelien Aptel <aaptel@suse.com>
      Tested-by: default avatarAurelien Aptel <aaptel@suse.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      CC: Stable <stable@vger.kernel.org>
      806a28ef
    • Tobias Regnery's avatar
      CIFS: Fix build failure with smb2 · 4fa8e504
      Tobias Regnery authored
      
      
      I saw the following build error during a randconfig build:
      
      fs/cifs/smb2ops.c: In function 'smb2_new_lease_key':
      fs/cifs/smb2ops.c:1104:2: error: implicit declaration of function 'generate_random_uuid' [-Werror=implicit-function-declaration]
      
      Explicit include the right header to fix this issue.
      Signed-off-by: default avatarTobias Regnery <tobias.regnery@gmail.com>
      Reviewed-by: default avatarAurelien Aptel <aaptel@suse.com>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      4fa8e504
    • Sachin Prabhu's avatar
      Introduce cifs_copy_file_range() · 620d8745
      Sachin Prabhu authored
      
      
      The earlier changes to copy range for cifs unintentionally disabled the more
      common form of server side copy.
      
      The patch introduces the file_operations helper cifs_copy_file_range()
      which is used by the syscall copy_file_range. The new file operations
      helper allows us to perform server side copies for SMB2.0 and 2.1
      servers as well as SMB 3.0+ servers which do not support the ioctl
      FSCTL_DUPLICATE_EXTENTS_TO_FILE.
      
      The new helper uses the ioctl FSCTL_SRV_COPYCHUNK_WRITE to perform
      server side copies. The helper is called by vfs_copy_file_range() only
      once an attempt to clone the file using the ioctl
      FSCTL_DUPLICATE_EXTENTS_TO_FILE has failed.
      Signed-off-by: default avatarSachin Prabhu <sprabhu@redhat.com>
      Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      CC: Stable  <stable@vger.kernel.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      620d8745
    • Sachin Prabhu's avatar
      SMB3: Rename clone_range to copychunk_range · 312bbc59
      Sachin Prabhu authored
      
      
      Server side copy is one of the most important mechanisms smb2/smb3
      supports and it was unintentionally disabled for most use cases.
      
      Renaming calls to reflect the underlying smb2 ioctl called. This is
      similar to the name duplicate_extents used for a similar ioctl which is
      also used to duplicate files by reusing fs blocks. The name change is to
      avoid confusion.
      Signed-off-by: default avatarSachin Prabhu <sprabhu@redhat.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      312bbc59
    • Sachin Prabhu's avatar
      Handle mismatched open calls · 38bd4906
      Sachin Prabhu authored
      
      
      A signal can interrupt a SendReceive call which result in incoming
      responses to the call being ignored. This is a problem for calls such as
      open which results in the successful response being ignored. This
      results in an open file resource on the server.
      
      The patch looks into responses which were cancelled after being sent and
      in case of successful open closes the open fids.
      
      For this patch, the check is only done in SendReceive2()
      
      RH-bz: 1403319
      Signed-off-by: default avatarSachin Prabhu <sprabhu@redhat.com>
      Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
      Cc: Stable <stable@vger.kernel.org>
      38bd4906
  12. 03 Apr, 2017 10 commits
    • Darrick J. Wong's avatar
      xfs: fix kernel memory exposure problems · bf9216f9
      Darrick J. Wong authored
      
      
      Fix a memory exposure problems in inumbers where we allocate an array of
      structures with holes, fail to zero the holes, then blindly copy the
      kernel memory contents (junk and all) into userspace.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      bf9216f9
    • Calvin Owens's avatar
      xfs: Honor FALLOC_FL_KEEP_SIZE when punching ends of files · 3dd09d5a
      Calvin Owens authored
      When punching past EOF on XFS, fallocate(mode=PUNCH_HOLE|KEEP_SIZE) will
      round the file size up to the nearest multiple of PAGE_SIZE:
      
        calvinow@vm-disks/generic-xfs-1 ~$ dd if=/dev/urandom of=test bs=2048 count=1
        calvinow@vm-disks/generic-xfs-1 ~$ stat test
          Size: 2048            Blocks: 8          IO Block: 4096   regular file
        calvinow@vm-disks/generic-xfs-1 ~$ fallocate -n -l 2048 -o 2048 -p test
        calvinow@vm-disks/generic-xfs-1 ~$ stat test
          Size: 4096            Blocks: 8          IO Block: 4096   regular file
      
      Commit 3c2bdc91
      
       ("xfs: kill xfs_zero_remaining_bytes") replaced
      xfs_zero_remaining_bytes() with calls to iomap helpers. The new helpers
      don't enforce that [pos,offset) lies strictly on [0,i_size) when being
      called from xfs_free_file_space(), so by "leaking" these ranges into
      xfs_zero_range() we get this buggy behavior.
      
      Fix this by reintroducing the checks xfs_zero_remaining_bytes() did
      against i_size at the bottom of xfs_free_file_space().
      Reported-by: default avatarAaron Gao <gzh@fb.com>
      Fixes: 3c2bdc91
      
       ("xfs: kill xfs_zero_remaining_bytes")
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: <stable@vger.kernel.org> # 4.8+
      Signed-off-by: default avatarCalvin Owens <calvinowens@fb.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      3dd09d5a
    • Darrick J. Wong's avatar
      xfs: rework the inline directory verifiers · 78420281
      Darrick J. Wong authored
      
      
      The inline directory verifiers should be called on the inode fork data,
      which means after iformat_local on the read side, and prior to
      ifork_flush on the write side.  This makes the fork verifier more
      consistent with the way buffer verifiers work -- i.e. they will operate
      on the memory buffer that the code will be reading and writing directly.
      
      Furthermore, revise the verifier function to return -EFSCORRUPTED so
      that we don't flood the logs with corruption messages and assert
      notices.  This has been a particular problem with xfs/348, which
      triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the
      kernel when CONFIG_XFS_DEBUG=y.  Disk corruption isn't supposed to do
      that, at least not in a verifier.
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      78420281
    • David Howells's avatar
      statx: Include a mask for stx_attributes in struct statx · 3209f68b
      David Howells authored
      
      
      Include a mask in struct stat to indicate which bits of stx_attributes the
      filesystem actually supports.
      
      This would also be useful if we add another system call that allows you to
      do a 'bulk attribute set' and pass in a statx struct with the masks
      appropriately set to say what you want to set.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3209f68b
    • David Howells's avatar
      statx: Reserve the top bit of the mask for future struct expansion · 47071aee
      David Howells authored
      
      
      Reserve the top bit of the mask for future expansion of the statx struct
      and give an error if statx() sees it set.  All the other bits are ignored
      if we see them set but don't support the bit; we just clear the bit in the
      returned mask.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      47071aee
    • Darrick J. Wong's avatar
      xfs: report crtime and attribute flags to statx · 5f955f26
      Darrick J. Wong authored
      
      
      statx has the ability to report inode creation times and inode flags, so
      hook up di_crtime and di_flags to that functionality.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      5f955f26
    • David Howells's avatar
      ext4: Add statx support · 99652ea5
      David Howells authored
      
      
      Return enhanced file attributes from the Ext4 filesystem.  This includes
      the following:
      
       (1) The inode creation time (i_crtime) as stx_btime, setting STATX_BTIME.
      
       (2) Certain FS_xxx_FL flags are mapped to stx_attribute flags.
      
      This requires that all ext4 inodes have a getattr call, not just some of
      them, so to this end, split the ext4_getattr() function and only call part
      of it where appropriate.
      
      Example output:
      
      	[root@andromeda ~]# touch foo
      	[root@andromeda ~]# chattr +ai foo
      	[root@andromeda ~]# /tmp/test-statx foo
      	statx(foo) = 0
      	results=fff
      	  Size: 0               Blocks: 0          IO Block: 4096    regular file
      	Device: 08:12           Inode: 2101950     Links: 1
      	Access: (0644/-rw-r--r--)  Uid:     0   Gid:     0
      	Access: 2016-02-11 17:08:29.031795451+0000
      	Modify: 2016-02-11 17:08:29.031795451+0000
      	Change: 2016-02-11 17:11:11.987790114+0000
      	 Birth: 2016-02-11 17:08:29.031795451+0000
      	Attributes: 0000000000000030 (-------- -------- -------- -------- -------- -------- -------- --ai----)
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      99652ea5
    • Eric Biggers's avatar
      statx: optimize copy of struct statx to userspace · 64bd7204
      Eric Biggers authored
      
      
      I found that statx() was significantly slower than stat().  As a
      microbenchmark, I compared 10,000,000 invocations of fstat() on a tmpfs
      file to the same with statx() passed a NULL path:
      
      	$ time ./stat_benchmark
      
      	real	0m1.464s
      	user	0m0.275s
      	sys	0m1.187s
      
      	$ time ./statx_benchmark
      
      	real	0m5.530s
      	user	0m0.281s
      	sys	0m5.247s
      
      statx is expected to be a little slower than stat because struct statx
      is larger than struct stat, but not by *that* much.  It turns out that
      most of the overhead was in copying struct statx to userspace, mostly in
      all the stac/clac instructions that got generated for each __put_user()
      call.  (This was on x86_64, but some other architectures, e.g. arm64,
      have something similar now too.)
      
      stat() instead initializes its struct on the stack and copies it to
      userspace with a single call to copy_to_user().  This turns out to be
      much faster, and changing statx to do this makes it almost as fast as
      stat:
      
      	$ time ./statx_benchmark
      
      	real	0m1.624s
      	user	0m0.270s
      	sys	0m1.354s
      
      For zeroing the reserved fields, start by zeroing the full struct with
      memset.  This makes it clear that every byte copied to userspace is
      initialized, even implicit padding bytes (though there are none
      currently).  In the scenarios I tested, it also performed the same as a
      designated initializer.  Manually initializing each field was still
      slightly faster, but would have been more error-prone and less
      verifiable.
      
      Also rename statx_set_result() to cp_statx() for consistency with
      cp_old_stat() et al., and make it noinline so that struct statx doesn't
      add to the stack usage during the main portion of the syscall execution.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      64bd7204
    • Eric Biggers's avatar
      statx: remove incorrect part of vfs_statx() comment · b15fb70b
      Eric Biggers authored
      
      
      request_mask and query_flags are function arguments, not passed in
      struct kstat.  So remove the part of the comment which claims otherwise.
      This was apparently left over from an earlier version of the statx
      patch.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b15fb70b
    • Eric Biggers's avatar
      statx: reject unknown flags when using NULL path · 8c7493aa
      Eric Biggers authored
      
      
      The statx() system call currently accepts unknown flags when called with
      a NULL path to operate on a file descriptor.  Left unchanged, this could
      make it hard to introduce new query flags in the future, since
      applications may not be able to tell whether a given flag is supported.
      
      Fix this by failing the system call with EINVAL if any flags other than
      KSTAT_QUERY_FLAGS are specified in combination with a NULL path.
      
      Arguably, we could still permit known lookup-related flags such as
      AT_SYMLINK_NOFOLLOW.  However, that would be inconsistent with how
      sys_utimensat() behaves when passed a NULL path, which seems to be the
      closest precedent.  And given that the NULL path case is (I believe)
      mainly intended to be used to implement a wrapper function like fstatx()
      that doesn't have a path argument, I think rejecting lookup-related
      flags too is probably the best choice.
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      8c7493aa