Skip to content
Snippets Groups Projects
  1. Apr 27, 2021
    • André Almeida's avatar
      kernel: Enable waitpid() for futex2 · 96db7320
      André Almeida authored
      
      To make pthreads works as expected if they are using futex2, wake
      clear_child_tid with futex2 as well. This is make applications that uses
      waitpid() (and clone(CLONE_CHILD_SETTID)) wake while waiting for the
      child to terminate. Given that apps should not mix futex() and futex2(),
      any correct app will trigger a harmless noop wakeup on the interface
      that it isn't using.
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      ---
      
      This commit is here for the intend to show what we need to do in order
      to get a full NPTL working on top of futex2. It should be merged after
      we talk to glibc folks on the details around the futex_wait() side. For
      instance, we could use this as an opportunity to use private futexes or
      8bit sized futexes, but both sides need to use the exactly same flags.
      96db7320
    • André Almeida's avatar
      perf bench: Add futex2 benchmark tests · 2f05d29e
      André Almeida authored
      
      Add support at the existing futex benchmarking code base to enable
      futex2 calls. `perf bench` tests can be used not only as a way to
      measure the performance of implementation, but also as stress testing
      for the kernel infrastructure.
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      2f05d29e
    • André Almeida's avatar
      selftests: futex2: Add requeue test · 3c55be83
      André Almeida authored
      
      Add testing for futex_requeue(). The first test just requeue from one
      waiter to another one, and wake it. The second performs both wake and
      requeue, and we check return values to see if the operation
      woke/requeued the expected number of waiters.
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      3c55be83
    • André Almeida's avatar
      selftests: futex2: Add waitv test · 8021a26d
      André Almeida authored
      
      Create a new file to test the waitv mechanism. Test both private and
      shared futexes. Wake the last futex in the array, and check if the
      return value from futex_waitv() is the right index.
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      8021a26d
    • André Almeida's avatar
      selftests: futex2: Add wouldblock test · 848b3b58
      André Almeida authored
      
      Adapt existing futex wait wouldblock file to test the same mechanism for
      futex2.
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      848b3b58
    • André Almeida's avatar
      selftests: futex2: Add timeout test · d59f169a
      André Almeida authored
      
      Adapt existing futex wait timeout file to test the same mechanism for
      futex2. futex2 accepts only absolute 64bit timers, but supports both
      monotonic and realtime clocks.
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      d59f169a
    • André Almeida's avatar
      selftests: futex2: Add wake/wait test · 84bf1c38
      André Almeida authored
      
      Add a simple file to test wake/wait mechanism using futex2 interface.
      Test three scenarios: using a common local int variable as private
      futex, a shm futex as shared futex and a file-backed shared memory as a
      shared futex. This should test all branches of futex_get_key().
      
      Create helper files so more tests can evaluate futex2. While 32bit ABIs
      from glibc aren't yet able to use 64 bit sized time variables, add a
      temporary workaround that implements the required types and calls the
      appropriated syscalls, since futex2 doesn't supports 32 bit sized time.
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      84bf1c38
    • André Almeida's avatar
      docs: locking: futex2: Add documentation · d174085b
      André Almeida authored
      
      Add a new documentation file specifying both userspace API and internal
      implementation details of futex2 syscalls.
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      d174085b
    • André Almeida's avatar
      futex2: Add compatibility entry point for x86_x32 ABI · 45ae0b61
      André Almeida authored
      
      New syscalls should use the same entry point for x86_64 and x86_x32
      paths. Add a wrapper for x32 calls to use parse functions that assumes
      32bit pointers.
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      45ae0b61
    • André Almeida's avatar
      futex2: Implement requeue operation · a942311f
      André Almeida authored
      
      Implement requeue interface similarly to FUTEX_CMP_REQUEUE operation.
      This is the syscall implemented by this patch:
      
      futex_requeue(struct futex_requeue *uaddr1, struct futex_requeue *uaddr2,
      	      unsigned int nr_wake, unsigned int nr_requeue,
      	      u64 cmpval, unsigned int flags)
      
      struct futex_requeue {
      	void *uaddr;
      	unsigned int flags;
      };
      
      If (uaddr1->uaddr == cmpval), wake at uaddr1->uaddr a nr_wake number of
      waiters and then, remove a number of nr_requeue waiters at uaddr1->uaddr
      and add them to uaddr2->uaddr list. Each uaddr has its own set of flags,
      that must be defined at struct futex_requeue (such as size, shared, NUMA).
      The flags argument of the syscall is there just for the sake of
      extensibility, and right now it needs to be zero.
      
      Return the number of the woken futexes + the number of requeued ones on
      success, error code otherwise.
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      ---
      
      The original FUTEX_CMP_REQUEUE interfaces is such as follows:
      
      futex(*uaddr1, FUTEX_CMP_REQUEUE, nr_wake, nr_requeue, *uaddr2, cmpval);
      
      Given that when this interface was created they was only one type of
      futex (as opposed to futex2, where there is shared, sizes, and NUMA),
      there was no way to specify individual flags for uaddr1 and 2. When
      FUTEX_PRIVATE was implemented, a new opcode was created as well
      (FUTEX_CMP_REQUEUE_PRIVATE), but they apply both futexes, so they
      should be of the same type regarding private/shared. This imposes a
      limitation on the use cases of the operation, and to overcome that at futex2,
      `struct futex_requeue` was created, so one can set individual flags for
      each futex. This flexibility is a trade-off with performance, given that
      now we need to perform two extra copy_from_user(). One alternative would
      be to use the upper half of flags bits to the first one, and the bottom
      half for the second futex, but this would also impose limitations, given
      that we would limit by half the flags possibilities. If equal futexes
      are common enough, the following extension could be added to overcome
      the current performance:
      
      - A flag FUTEX_REQUEUE_EQUAL is added to futex2() flags;
      - If futex_requeue() see this flag, that means that both futexes uses
        the same set of attributes.
      - Then, the function parses the flags as of futex_wait/wake().
      - *uaddr1 and *uaddr2 are used as void* (instead of struct
        futex_requeue) just like wait/wake().
      
      In that way, we could avoid the copy_from_user().
      a942311f
    • André Almeida's avatar
      futex2: Implement vectorized wait · 12b55c1f
      André Almeida authored
      
      Add support to wait on multiple futexes. This is the interface
      implemented by this syscall:
      
      futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes,
      	    unsigned int flags, struct timespec *timo)
      
      struct futex_waitv {
      	__u64 val;
      	void *uaddr;
      	unsigned int flags;
      };
      
      Given an array of struct futex_waitv, wait on each uaddr. The thread
      wakes if a futex_wake() is performed at any uaddr. The syscall returns
      immediately if any waiter has *uaddr != val. *timo is an optional
      timeout value for the operation. The flags argument of the syscall
      should be used solely for specifying the timeout as realtime, if needed.
      Flags for shared futexes, sizes, etc. should be used on the individual
      flags of each waiter.
      
      Returns the array index of one of the awakened futexes. There’s no given
      information of how many were awakened, or any particular attribute of it
      (if it’s the first awakened, if it is of the smaller index...).
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      12b55c1f
    • André Almeida's avatar
      futex2: Add support for shared futexes · 1e7fd797
      André Almeida authored
      
      Add support for shared futexes for cross-process resources. This design
      relies on the same approach done in old futex to create an unique id for
      file-backed shared memory, by using a counter at struct inode.
      
      There are two types of futexes: private and shared ones. The private are
      futexes meant to be used by threads that shares the same memory space,
      are easier to be uniquely identified an thus can have some performance
      optimization. The elements for identifying one are: the start address of
      the page where the address is, the address offset within the page and
      the current->mm pointer.
      
      Now, for uniquely identifying shared futex:
      
      - If the page containing the user address is an anonymous page, we can
        just use the same data used for private futexes (the start address of
        the page, the address offset within the page and the current->mm
        pointer) that will be enough for uniquely identifying such futex. We
        also set one bit at the key to differentiate if a private futex is
        used on the same address (mixing shared and private calls are not
        allowed).
      
      - If the page is file-backed, current->mm maybe isn't the same one for
        every user of this futex, so we need to use other data: the
        page->index, an UUID for the struct inode and the offset within the
        page.
      
      Note that members of futex_key doesn't have any particular meaning after
      they are part of the struct - they are just bytes to identify a futex.
      Given that, we don't need to use a particular name or type that matches
      the original data, we only need to care about the bitsize of each
      component and make both private and shared data fit in the same memory
      space.
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      1e7fd797
    • André Almeida's avatar
      futex2: Implement wait and wake functions · c9d8776e
      André Almeida authored
      
      Create a new set of futex syscalls known as futex2. This new interface
      is aimed to implement a more maintainable code, while removing obsolete
      features and expanding it with new functionalities.
      
      Implements wait and wake semantics for futexes, along with the base
      infrastructure for future operations. The whole wait path is designed to
      be used by N waiters, thus making easier to implement vectorized wait.
      
      * Syscalls implemented by this patch:
      
      - futex_wait(void *uaddr, unsigned int val, unsigned int flags,
      	     struct timespec *timo)
      
         The user thread is put to sleep, waiting for a futex_wake() at uaddr,
         if the value at *uaddr is the same as val (otherwise, the syscall
         returns immediately with -EAGAIN). timo is an optional timeout value
         for the operation.
      
         Return 0 on success, error code otherwise.
      
       - futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags)
      
         Wake `nr_wake` threads waiting at uaddr.
      
         Return the number of woken threads on success, error code otherwise.
      
      ** The `flag` argument
      
       The flag is used to specify the size of the futex word
       (FUTEX_[8, 16, 32, 64]). It's mandatory to define one, since there's no
       default size.
      
       By default, the timeout uses a monotonic clock, but can be used as a
       realtime one by using the FUTEX_REALTIME_CLOCK flag.
      
       By default, futexes are of the private type, that means that this user
       address will be accessed by threads that shares the same memory region.
       This allows for some internal optimizations, so they are faster.
       However, if the address needs to be shared with different processes
       (like using `mmap()` or `shm()`), they need to be defined as shared and
       the flag FUTEX_SHARED_FLAG is used to set that.
      
       By default, the operation has no NUMA-awareness, meaning that the user
       can't choose the memory node where the kernel side futex data will be
       stored. The user can choose the node where it wants to operate by
       setting the FUTEX_NUMA_FLAG and using the following structure (where X
       can be 8, 16, or 32, 64):
      
        struct futexX_numa {
                __uX value;
                __sX hint;
        };
      
       This structure should be passed at the `void *uaddr` of futex
       functions. The address of the structure will be used to be waited/waken
       on, and the `value` will be compared to `val` as usual. The `hint`
       member is used to defined which node the futex will use. When waiting,
       the futex will be registered on a kernel-side table stored on that
       node; when waking, the futex will be searched for on that given table.
       That means that there's no redundancy between tables, and the wrong
       `hint` value will led to undesired behavior.  Userspace is responsible
       for dealing with node migrations issues that may occur. `hint` can
       range from [0, MAX_NUMA_NODES], for specifying a node, or -1, to use
       the same node the current process is using.
      
       When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be
       stored on a global table on some node, defined at compilation time.
      
      ** The `timo` argument
      
      As per the Y2038 work done in the kernel, new interfaces shouldn't add
      timeout options known to be buggy. Given that, `timo` should be a 64bit
      timeout at all platforms, using an absolute timeout value.
      
      Signed-off-by: default avatarAndré Almeida <andrealmeid@collabora.com>
      ---
      
      [RFC Add futex2 syscall 0/0]
      
      Hi,
      
      This patch series introduces the futex2 syscalls.
      
      * What happened to the current futex()?
      
      For some years now, developers have been trying to add new features to
      futex, but maintainers have been reluctant to accept then, given the
      multiplexed interface full of legacy features and tricky to do big
      changes. Some problems that people tried to address with patchsets are:
      NUMA-awareness[0], smaller sized futexes[1], wait on multiple futexes[2].
      NUMA, for instance, just doesn't fit the current API in a reasonable
      way. Considering that, it's not possible to merge new features into the
      current futex.
      
       ** The NUMA problem
      
       At the current implementation, all futex kernel side infrastructure is
       stored on a single node. Given that, all futex() calls issued by
       processors that aren't located on that node will have a memory access
       penalty when doing it.
      
       ** The 32bit sized futex problem
      
       Embedded systems or anything with memory constrains would benefit of
       using smaller sizes for the futex userspace integer. Also, a mutex
       implementation can be done using just three values, so 8 bits is enough
       for various scenarios.
      
       ** The wait on multiple problem
      
       The use case lies in the Wine implementation of the Windows NT interface
       WaitMultipleObjects. This Windows API function allows a thread to sleep
       waiting on the first of a set of event sources (mutexes, timers, signal,
       console input, etc) to signal.  Considering this is a primitive
       synchronization operation for Windows applications, being able to quickly
       signal events on the producer side, and quickly go to sleep on the
       consumer side is essential for good performance of those running over Wine.
      
      [0] https://lore.kernel.org/lkml/20160505204230.932454245@linutronix.de/
      [1] https://lore.kernel.org/lkml/20191221155659.3159-2-malteskarupke@web.de/
      [2] https://lore.kernel.org/lkml/20200213214525.183689-1-andrealmeid@collabora.com/
      
      * The solution
      
      As proposed by Peter Zijlstra and Florian Weimer[3], a new interface
      is required to solve this, which must be designed with those features in
      mind. futex2() is that interface. As opposed to the current multiplexed
      interface, the new one should have one syscall per operation. This will
      allow the maintainability of the API if it gets extended, and will help
      users with type checking of arguments.
      
      In particular, the new interface is extended to support the ability to
      wait on any of a list of futexes at a time, which could be seen as a
      vectored extension of the FUTEX_WAIT semantics.
      
      [3] https://lore.kernel.org/lkml/20200303120050.GC2596@hirez.programming.kicks-ass.net/
      
      * The interface
      
      The new interface can be seen in details in the following patches, but
      this is a high level summary of what the interface can do:
      
       - Supports wake/wait semantics, as in futex()
       - Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with
         individual flags for each address
       - Supports waiting for a vector of futexes, using a new syscall named
         futex_waitv()
       - Supports variable sized futexes (8bits, 16bits, 32bits and 64bits)
       - Supports NUMA-awareness operations, where the user can specify on
         which memory node would like to operate
      
      * Implementation
      
      The internal implementation follows a similar design to the original futex.
      Given that we want to replicate the same external behavior of current
      futex, this should be somewhat expected. For some functions, like the
      init and the code to get a shared key, I literally copied code and
      comments from kernel/futex.c. I decided to do so instead of exposing the
      original function as a public function since in that way we can freely
      modify our implementation if required, without any impact on old futex.
      Also, the comments precisely describes the details and corner cases of
      the implementation.
      
      Each patch contains a brief description of implementation, but patch 6
      "docs: locking: futex2: Add documentation" adds a more complete document
      about it.
      
      * The patchset
      
      This patchset can be also found at my git tree:
      
      https://gitlab.collabora.com/tonyk/linux/-/tree/futex2-dev
      
        - Patch 1: Implements wait/wake, and the basics foundations of futex2
      
        - Patches 2-4: Implement the remaining features (shared, waitv, requeue).
      
        - Patch 5:  Adds the x86_x32 ABI handling. I kept it in a separated
          patch since I'm not sure if x86_x32 is still a thing, or if it should
          return -ENOSYS.
      
        - Patch 6: Add a documentation file which details the interface and
          the internal implementation.
      
        - Patches 7-13: Selftests for all operations along with perf
          support for futex2.
      
        - Patch 14: While working on porting glibc for futex2, I found out
          that there's a futex_wake() call at the user thread exit path, if
          that thread was created with clone(..., CLONE_CHILD_SETTID, ...). In
          order to make pthreads work with futex2, it was required to add
          this patch. Note that this is more a proof-of-concept of what we
          will need to do in future, rather than part of the interface and
          shouldn't be merged as it is.
      
      * Testing:
      
      This patchset provides selftests for each operation and their flags.
      Along with that, the following work was done:
      
       ** Stability
      
       To stress the interface in "real world scenarios":
      
       - glibc[4]: nptl's low level locking was modified to use futex2 API
         (except for robust and PI things). All relevant nptl/ tests passed.
      
       - Wine[5]: Proton/Wine was modified in order to use futex2() for the
         emulation of Windows NT sync mechanisms based on futex, called "fsync".
         Triple-A games with huge CPU's loads and tons of parallel jobs worked
         as expected when compared with the previous FUTEX_WAIT_MULTIPLE
         implementation at futex(). Some games issue 42k futex2() calls
         per second.
      
       - Full GNU/Linux distro: I installed the modified glibc in my host
         machine, so all pthread's programs would use futex2(). After tweaking
         systemd[6] to allow futex2() calls at seccomp, everything worked as
         expected (web browsers do some syscall sandboxing and need some
         configuration as well).
      
       - perf: The perf benchmarks tests can also be used to stress the
         interface, and they can be found in this patchset.
      
       ** Performance
      
       - For comparing futex() and futex2() performance, I used the artificial
         benchmarks implemented at perf (wake, wake-parallel, hash and
         requeue). The setup was 200 runs for each test and using 8, 80, 800,
         8000 for the number of threads, Note that for this test, I'm not using
         patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained
         at "The patchset" section.
      
       - For the first three ones, I measured an average of 4% gain in
         performance. This is not a big step, but it shows that the new
         interface is at least comparable in performance with the current one.
      
       - For requeue, I measured an average of 21% decrease in performance
         compared to the original futex implementation. This is expected given
         the new design with individual flags. The performance trade-offs are
         explained at patch 4 ("futex2: Implement requeue operation").
      
      [4] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2
      [5] https://gitlab.collabora.com/tonyk/wine/-/tree/proton_5.13
      [6] https://gitlab.collabora.com/tonyk/systemd
      
      * FAQ
      
       ** "Where's the code for NUMA and FUTEX_8/16/64?"
      
       The current code is already complex enough to take some time for
       review, so I believe it's better to split that work out to a future
       iteration of this patchset. Besides that, this RFC is the core part of the
       infrastructure, and the following features will not pose big design
       changes to it, the work will be more about wiring up the flags and
       modifying some functions.
      
       ** "Where's the PI/robust stuff?"
      
       As said by Peter Zijlstra at [3], all those new features are related to
       the "simple" futex interface, that doesn't use PI or robust. Do we want
       to have this complexity at futex2() and if so, should it be part of
       this patchset or can it be future work?
      
      Thanks,
      	André
      
      * Changelog
      
      Changes from v2:
      - API now supports 64bit futexes, in addition to 8, 16 and 32.
      - Refactored futex2_wait and futex2_waitv selftests
      v2: https://lore.kernel.org/lkml/20210304004219.134051-1-andrealmeid@collabora.com/
      
      Changes from v1:
      - Unified futex_set_timer_and_wait and __futex_wait code
      - Dropped _carefull from linked list function calls
      - Fixed typos on docs patch
      - uAPI flags are now added as features are introduced, instead of all flags
        in patch 1
      - Removed struct futex_single_waiter in favor of an anon struct
      v1: https://lore.kernel.org/lkml/20210215152404.250281-1-andrealmeid@collabora.com/
      c9d8776e
  2. Apr 19, 2021
  3. Apr 18, 2021
    • Linus Torvalds's avatar
      Linux 5.12-rc8 · bf05bf16
      Linus Torvalds authored
      bf05bf16
    • Linus Torvalds's avatar
      Merge tag 'arm-fixes-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 5ffe04cc
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "Another smaller set of fixes for three of the Arm platforms:
      
        TI OMAP:
      
           Fix swapped mmc device order also for omap3 that got changed with
           the recent PROBE_PREFER_ASYNCHRONOUS changes. While eventually the
           aliases should be board specific, all the mmc device instances are
           all there in the SoC, and we do probe them by default so that PM
           runtime can idle the devices if left enabled from the bootloader.
      
        Qualcomm Snapdragon:
      
           This bypasses the recently introduced interconnect handling in
           the GENI (serial engine) driver when running off ACPI, as this
           causes the GENI probe to fail and the Lenovo Yoga C630 to boot
           without keyboard and touchpad.
      
        Allwinner:
      
           One 32kHz clock fix for the beelink gs1, a CD polarity fix for the
           SoPine, some MAINTAINERS maintainance, and a clk / reset switch to
           our headers"
      
      * tag 'arm-fixes-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        arm64: dts: allwinner: h6: beelink-gs1: Remove ext. 32 kHz osc reference
        MAINTAINERS: Match on allwinner keyword
        MAINTAINERS: Add our new mailing-list
        arm64: dts: allwinner: Fix SD card CD GPIO for SOPine systems
        arm64: dts: allwinner: h6: Switch to macros for RSB clock/reset indices
        ARM: OMAP2+: Fix uninitialized sr_inst
        ARM: dts: Fix swapped mmc order for omap3
        ARM: OMAP2+: Fix warning for omap_init_time_of()
        soc: qcom: geni: shield geni_icc_get() for ACPI boot
      5ffe04cc
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · f5ce0466
      Linus Torvalds authored
      Pull ARM fixes from Russell King:
      
       - Halve maximum number of CPUs if DEBUG_KMAP_LOCAL is enabled
      
       - Fix conversion for_each_membock() to for_each_mem_range()
      
       - Fix footbridge PCI mapping
      
       - Avoid uprobes hooking on thumb instructions
      
      * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: 9071/1: uprobes: Don't hook on thumb instructions
        ARM: footbridge: fix PCI interrupt mapping
        ARM: 9069/1: NOMMU: Fix conversion for_each_membock() to for_each_mem_range()
        ARM: 9063/1: mm: reduce maximum number of CPUs if DEBUG_KMAP_LOCAL is enabled
      f5ce0466
    • Fredrik Strupe's avatar
      ARM: 9071/1: uprobes: Don't hook on thumb instructions · d2f7eca6
      Fredrik Strupe authored
      
      Since uprobes is not supported for thumb, check that the thumb bit is
      not set when matching the uprobes instruction hooks.
      
      The Arm UDF instructions used for uprobes triggering
      (UPROBE_SWBP_ARM_INSN and UPROBE_SS_ARM_INSN) coincidentally share the
      same encoding as a pair of unallocated 32-bit thumb instructions (not
      UDF) when the condition code is 0b1111 (0xf). This in effect makes it
      possible to trigger the uprobes functionality from thumb, and at that
      using two unallocated instructions which are not permanently undefined.
      
      Signed-off-by: default avatarFredrik Strupe <fredrik@strupe.net>
      Cc: stable@vger.kernel.org
      Fixes: c7edc9e3 ("ARM: add uprobes support")
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      d2f7eca6
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · c98ff1d0
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Two fixes: the libsas fix is for a problem that occurs when trying to
        change the cache type of an ATA device and the libiscsi one is a
        regression fix from this merge window"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: libsas: Reset num_scatter if libata marks qc as NODATA
        scsi: iscsi: Fix iSCSI cls conn state
      c98ff1d0
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2021-04-18' of git://anongit.freedesktop.org/drm/drm · aba5970c
      Linus Torvalds authored
      Pull vmwgfx fixes from Dave Airlie:
       "This contains two regression fixes for vmwgfx, one due to a refactor
        which meant locks were being used before initialisation, and the other
        in fixing up some warnings from the core when destroying pinned
        buffers.
      
        vmwgfx:
      
         - fixed unpinning before destruction
      
         - lockdep init reordering"
      
      * tag 'drm-fixes-2021-04-18' of git://anongit.freedesktop.org/drm/drm:
        drm/vmwgfx: Make sure bo's are unpinned before putting them back
        drm/vmwgfx: Fix the lockdep breakage
        drm/vmwgfx: Make sure we unpin no longer needed buffers
      aba5970c
  4. Apr 17, 2021
    • Dave Airlie's avatar
      Merge tag 'vmwgfx-fixes-2021-04-14' of gitlab.freedesktop.org:zack/vmwgfx into drm-fixes · 796b556c
      Dave Airlie authored
      
      vmwgfx fixes for regressions in 5.12
      
      Here's a set of 3 patches fixing ugly regressions
      in the vmwgfx driver. We broke lock initialization
      code and ended up using spinlocks before initialization
      breaking lockdep.
      Also there was a bit of a fallout from drm changes
      which made the core validate that unreferenced buffers
      have been unpinned. vmwgfx pinning code predates a lot
      of the core drm and wasn't written to account for those
      semantics. Fortunately changes required to fix it
      are not too intrusive.
      The changes have been validated by our internal ci.
      
      Signed-off-by: default avatarZack Rusin <zackr@vmware.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      
      From: Zack Rusin <zackr@vmware.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/f7add0a2-162e-3bd2-b1be-344a94f2acbf@vmware.com
      796b556c
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 194cf482
      Linus Torvalds authored
      Pull i2c fix from Wolfram Sang:
       "One more driver bugfix for I2C"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: mv64xxx: Fix random system lock caused by runtime PM
      194cf482
    • Linus Torvalds's avatar
      readdir: make sure to verify directory entry for legacy interfaces too · 0c93ac69
      Linus Torvalds authored
      This does the directory entry name verification for the legacy
      "fillonedir" (and compat) interface that goes all the way back to the
      dark ages before we had a proper dirent, and the readdir() system call
      returned just a single entry at a time.
      
      Nobody should use this interface unless you still have binaries from
      1991, but let's do it right.
      
      This came up during discussions about unsafe_copy_to_user() and proper
      checking of all the inputs to it, as the networking layer is looking to
      use it in a few new places.  So let's make sure the _old_ users do it
      all right and proper, before we add new ones.
      
      See also commit 8a23eb80 ("Make filldir[64]() verify the directory
      entry filename is valid") which did the proper modern interfaces that
      people actually use. It had a note:
      
          Note that I didn't bother adding the checks to any legacy interfaces
          that nobody uses.
      
      which this now corrects.  Note that we really don't care about POSIX and
      the presense of '/' in a directory entry, but verify_dirent_name() also
      ends up doing the proper name length verification which is what the
      input checking discussion was about.
      
      [ Another option would be to remove the support for this particular very
        old interface: any binaries that use it are likely a.out binaries, and
        they will no longer run anyway since we removed a.out binftm support
        in commit eac61655 ("x86: Deprecate a.out support").
      
        But I'm not sure which came first: getdents() or ELF support, so let's
        pretend somebody might still have a working binary that uses the
        legacy readdir() case.. ]
      
      Link: https://lore.kernel.org/lkml/CAHk-=wjbvzCAhAtvG0d81W5o0-KT5PPTHhfJ5ieDFq+bGtgOYg@mail.gmail.com/
      
      
      Acked-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0c93ac69
    • Linus Torvalds's avatar
      Merge tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 88a5af94
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes for 5.12-rc8, including fixes from netfilter, and
        bpf. BPF verifier changes stand out, otherwise things have slowed
        down.
      
        Current release - regressions:
      
         - gro: ensure frag0 meets IP header alignment
      
         - Revert "net: stmmac: re-init rx buffers when mac resume back"
      
         - ethernet: macb: fix the restore of cmp registers
      
        Previous releases - regressions:
      
         - ixgbe: Fix NULL pointer dereference in ethtool loopback test
      
         - ixgbe: fix unbalanced device enable/disable in suspend/resume
      
         - phy: marvell: fix detection of PHY on Topaz switches
      
         - make tcp_allowed_congestion_control readonly in non-init netns
      
         - xen-netback: Check for hotplug-status existence before watching
      
        Previous releases - always broken:
      
         - bpf: mitigate a speculative oob read of up to map value size by
           tightening the masking window
      
         - sctp: fix race condition in sctp_destroy_sock
      
         - sit, ip6_tunnel: Unregister catch-all devices
      
         - netfilter: nftables: clone set element expression template
      
         - netfilter: flowtable: fix NAT IPv6 offload mangling
      
         - net: geneve: check skb is large enough for IPv4/IPv6 header
      
         - netlink: don't call ->netlink_bind with table lock held"
      
      * tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits)
        netlink: don't call ->netlink_bind with table lock held
        MAINTAINERS: update my email
        bpf: Update selftests to reflect new error states
        bpf: Tighten speculative pointer arithmetic mask
        bpf: Move sanitize_val_alu out of op switch
        bpf: Refactor and streamline bounds check into helper
        bpf: Improve verifier error messages for users
        bpf: Rework ptr_limit into alu_limit and add common error path
        bpf: Ensure off_reg has no mixed signed bounds for all types
        bpf: Move off_reg into sanitize_ptr_alu
        bpf: Use correct permission flag for mixed signed bounds arithmetic
        ch_ktls: do not send snd_una update to TCB in middle
        ch_ktls: tcb close causes tls connection failure
        ch_ktls: fix device connection close
        ch_ktls: Fix kernel panic
        i40e: fix the panic when running bpf in xdpdrv mode
        net/mlx5e: fix ingress_ifindex check in mlx5e_flower_parse_meta
        net/mlx5e: Fix setting of RS FEC mode
        net/mlx5: Fix setting of devlink traps in switchdev mode
        Revert "net: stmmac: re-init rx buffers when mac resume back"
        ...
      88a5af94
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-fixes-for-5.12-rc8' of... · bdfd99e6
      Linus Torvalds authored
      Merge tag 'libnvdimm-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
      
      Pull libnvdimm fixes from Dan Williams:
       "The largest change is for a regression that landed during -rc1 for
        block-device read-only handling. Vaibhav found a new use for the
        ability (originally introduced by virtio_pmem) to call back to the
        platform to flush data, but also found an original bug in that
        implementation. Lastly, Arnd cleans up some compile warnings in dax.
      
        This has all appeared in -next with no reported issues.
      
        Summary:
      
         - Fix a regression of read-only handling in the pmem driver
      
         - Fix a compile warning
      
         - Fix support for platform cache flush commands on powerpc/papr"
      
      * tag 'libnvdimm-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm/region: Fix nvdimm_has_flush() to handle ND_REGION_ASYNC
        libnvdimm: Notify disk drivers to revalidate region read-only
        dax: avoid -Wempty-body warnings
      bdfd99e6
    • Linus Torvalds's avatar
      Merge tag 'cxl-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · 7c226774
      Linus Torvalds authored
      Pull CXL memory class fixes from Dan Williams:
       "A collection of fixes for the CXL memory class driver introduced in
        this release cycle.
      
        The driver was primarily developed on a work-in-progress QEMU
        emulation of the interface and we have since found a couple places
        where it hid spec compliance bugs in the driver, or had a spec
        implementation bug itself.
      
        The biggest change here is replacing a percpu_ref with an rwsem to
        cleanup a couple bugs in the error unwind path during ioctl device
        init. Lastly there were some minor cleanups to not export the
        power-management sysfs-ABI for the ioctl device, use the proper sysfs
        helper for emitting values, and prevent subtle bugs as new
        administration commands are added to the supported list.
      
        The bulk of it has appeared in -next save for the top commit which was
        found today and validated on a fixed-up QEMU model.
      
        Summary:
      
         - Fix support for CXL memory devices with registers offset from the
           BAR base.
      
         - Fix the reporting of device capacity.
      
         - Fix the driver commands list definition to be disconnected from the
           UAPI command list.
      
         - Replace percpu_ref with rwsem to fix initialization error path.
      
         - Fix leaks in the driver initialization error path.
      
         - Drop the power/ directory from CXL device sysfs.
      
         - Use the recommended sysfs helper for attribute 'show'
           implementations"
      
      * tag 'cxl-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
        cxl/mem: Fix memory device capacity probing
        cxl/mem: Fix register block offset calculation
        cxl/mem: Force array size of mem_commands[] to CXL_MEM_COMMAND_ID_MAX
        cxl/mem: Disable cxl device power management
        cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures
        cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations
        cxl/mem: Use sysfs_emit() for attribute show routines
      7c226774
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · fdb5d6ca
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "12 patches.
      
        Subsystems affected by this patch series: mm (documentation, kasan,
        and pagemap), csky, ia64, gcov, and lib"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        lib: remove "expecting prototype" kernel-doc warnings
        gcov: clang: fix clang-11+ build
        mm: ptdump: fix build failure
        mm/mapping_dirty_helpers: guard hugepage pud's usage
        ia64: tools: remove duplicate definition of ia64_mf() on ia64
        ia64: tools: remove inclusion of ia64-specific version of errno.h header
        ia64: fix discontig.c section mismatches
        ia64: remove duplicate entries in generic_defconfig
        csky: change a Kconfig symbol name to fix e1000 build error
        kasan: remove redundant config option
        kasan: fix hwasan build for gcc
        mm: eliminate "expecting prototype" kernel-doc warnings
      fdb5d6ca
    • Dan Williams's avatar
      cxl/mem: Fix memory device capacity probing · fae8817a
      Dan Williams authored
      
      The CXL Identify Memory Device output payload emits capacity in 256MB
      units. The driver is treating the capacity field as bytes. This was
      missed because QEMU reports bytes when it should report bytes / 256MB.
      
      Fixes: 8adaf747 ("cxl/mem: Find device capabilities")
      Reviewed-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Cc: Ben Widawsky <ben.widawsky@intel.com>
      Link: https://lore.kernel.org/r/161862021044.3259705.7008520073059739760.stgit@dwillia2-desk3.amr.corp.intel.com
      
      
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      fae8817a
    • Florian Westphal's avatar
      netlink: don't call ->netlink_bind with table lock held · f2764bd4
      Florian Westphal authored
      When I added support to allow generic netlink multicast groups to be
      restricted to subscribers with CAP_NET_ADMIN I was unaware that a
      genl_bind implementation already existed in the past.
      
      It was reverted due to ABBA deadlock:
      
      1. ->netlink_bind gets called with the table lock held.
      2. genetlink bind callback is invoked, it grabs the genl lock.
      
      But when a new genl subsystem is (un)registered, these two locks are
      taken in reverse order.
      
      One solution would be to revert again and add a comment in genl
      referring 1e82a62f, "genetlink: remove genl_bind").
      
      This would need a second change in mptcp to not expose the raw token
      value anymore, e.g.  by hashing the token with a secret key so userspace
      can still associate subflow events with the correct mptcp connection.
      
      However, Paolo Abeni reminded me to double-check why the netlink table is
      locked in the first place.
      
      I can't find one.  netlink_bind() is already called without this lock
      when userspace joins a group via NETLINK_ADD_MEMBERSHIP setsockopt.
      Same holds for the netlink_unbind operation.
      
      Digging through the history, commit f7736080
      ("netlink: access nlk groups safely in netlink bind and getname")
      expanded the lock scope.
      
      commit 3a20773b ("net: netlink: cap max groups which will be considered in netlink_bind()")
      ... removed the nlk->ngroups access that the lock scope
      extension was all about.
      
      Reduce the lock scope again and always call ->netlink_bind without
      the table lock.
      
      The Fixes tag should be vs. the patch mentioned in the link below,
      but that one got squash-merged into the patch that came earlier in the
      series.
      
      Fixes: 4d54cc32 ("mptcp: avoid lock_fast usage in accept path")
      Link: https://lore.kernel.org/mptcp/20210213000001.379332-8-mathew.j.martineau@linux.intel.com/T/#u
      
      
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Sean Tranchetti <stranche@codeaurora.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2764bd4
  5. Apr 16, 2021
Loading