- Apr 27, 2021
-
-
André Almeida authored
To make pthreads works as expected if they are using futex2, wake clear_child_tid with futex2 as well. This is make applications that uses waitpid() (and clone(CLONE_CHILD_SETTID)) wake while waiting for the child to terminate. Given that apps should not mix futex() and futex2(), any correct app will trigger a harmless noop wakeup on the interface that it isn't using. Signed-off-by: André Almeida <andrealmeid@collabora.com> --- This commit is here for the intend to show what we need to do in order to get a full NPTL working on top of futex2. It should be merged after we talk to glibc folks on the details around the futex_wait() side. For instance, we could use this as an opportunity to use private futexes or 8bit sized futexes, but both sides need to use the exactly same flags.
-
André Almeida authored
Add support at the existing futex benchmarking code base to enable futex2 calls. `perf bench` tests can be used not only as a way to measure the performance of implementation, but also as stress testing for the kernel infrastructure. Signed-off-by: André Almeida <andrealmeid@collabora.com>
-
André Almeida authored
Add testing for futex_requeue(). The first test just requeue from one waiter to another one, and wake it. The second performs both wake and requeue, and we check return values to see if the operation woke/requeued the expected number of waiters. Signed-off-by: André Almeida <andrealmeid@collabora.com>
-
André Almeida authored
Create a new file to test the waitv mechanism. Test both private and shared futexes. Wake the last futex in the array, and check if the return value from futex_waitv() is the right index. Signed-off-by: André Almeida <andrealmeid@collabora.com>
-
André Almeida authored
Adapt existing futex wait wouldblock file to test the same mechanism for futex2. Signed-off-by: André Almeida <andrealmeid@collabora.com>
-
André Almeida authored
Adapt existing futex wait timeout file to test the same mechanism for futex2. futex2 accepts only absolute 64bit timers, but supports both monotonic and realtime clocks. Signed-off-by: André Almeida <andrealmeid@collabora.com>
-
André Almeida authored
Add a simple file to test wake/wait mechanism using futex2 interface. Test three scenarios: using a common local int variable as private futex, a shm futex as shared futex and a file-backed shared memory as a shared futex. This should test all branches of futex_get_key(). Create helper files so more tests can evaluate futex2. While 32bit ABIs from glibc aren't yet able to use 64 bit sized time variables, add a temporary workaround that implements the required types and calls the appropriated syscalls, since futex2 doesn't supports 32 bit sized time. Signed-off-by: André Almeida <andrealmeid@collabora.com>
-
André Almeida authored
Add a new documentation file specifying both userspace API and internal implementation details of futex2 syscalls. Signed-off-by: André Almeida <andrealmeid@collabora.com>
-
André Almeida authored
New syscalls should use the same entry point for x86_64 and x86_x32 paths. Add a wrapper for x32 calls to use parse functions that assumes 32bit pointers. Signed-off-by: André Almeida <andrealmeid@collabora.com>
-
André Almeida authored
Implement requeue interface similarly to FUTEX_CMP_REQUEUE operation. This is the syscall implemented by this patch: futex_requeue(struct futex_requeue *uaddr1, struct futex_requeue *uaddr2, unsigned int nr_wake, unsigned int nr_requeue, u64 cmpval, unsigned int flags) struct futex_requeue { void *uaddr; unsigned int flags; }; If (uaddr1->uaddr == cmpval), wake at uaddr1->uaddr a nr_wake number of waiters and then, remove a number of nr_requeue waiters at uaddr1->uaddr and add them to uaddr2->uaddr list. Each uaddr has its own set of flags, that must be defined at struct futex_requeue (such as size, shared, NUMA). The flags argument of the syscall is there just for the sake of extensibility, and right now it needs to be zero. Return the number of the woken futexes + the number of requeued ones on success, error code otherwise. Signed-off-by: André Almeida <andrealmeid@collabora.com> --- The original FUTEX_CMP_REQUEUE interfaces is such as follows: futex(*uaddr1, FUTEX_CMP_REQUEUE, nr_wake, nr_requeue, *uaddr2, cmpval); Given that when this interface was created they was only one type of futex (as opposed to futex2, where there is shared, sizes, and NUMA), there was no way to specify individual flags for uaddr1 and 2. When FUTEX_PRIVATE was implemented, a new opcode was created as well (FUTEX_CMP_REQUEUE_PRIVATE), but they apply both futexes, so they should be of the same type regarding private/shared. This imposes a limitation on the use cases of the operation, and to overcome that at futex2, `struct futex_requeue` was created, so one can set individual flags for each futex. This flexibility is a trade-off with performance, given that now we need to perform two extra copy_from_user(). One alternative would be to use the upper half of flags bits to the first one, and the bottom half for the second futex, but this would also impose limitations, given that we would limit by half the flags possibilities. If equal futexes are common enough, the following extension could be added to overcome the current performance: - A flag FUTEX_REQUEUE_EQUAL is added to futex2() flags; - If futex_requeue() see this flag, that means that both futexes uses the same set of attributes. - Then, the function parses the flags as of futex_wait/wake(). - *uaddr1 and *uaddr2 are used as void* (instead of struct futex_requeue) just like wait/wake(). In that way, we could avoid the copy_from_user().
-
André Almeida authored
Add support to wait on multiple futexes. This is the interface implemented by this syscall: futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes, unsigned int flags, struct timespec *timo) struct futex_waitv { __u64 val; void *uaddr; unsigned int flags; }; Given an array of struct futex_waitv, wait on each uaddr. The thread wakes if a futex_wake() is performed at any uaddr. The syscall returns immediately if any waiter has *uaddr != val. *timo is an optional timeout value for the operation. The flags argument of the syscall should be used solely for specifying the timeout as realtime, if needed. Flags for shared futexes, sizes, etc. should be used on the individual flags of each waiter. Returns the array index of one of the awakened futexes. There’s no given information of how many were awakened, or any particular attribute of it (if it’s the first awakened, if it is of the smaller index...). Signed-off-by: André Almeida <andrealmeid@collabora.com>
-
André Almeida authored
Add support for shared futexes for cross-process resources. This design relies on the same approach done in old futex to create an unique id for file-backed shared memory, by using a counter at struct inode. There are two types of futexes: private and shared ones. The private are futexes meant to be used by threads that shares the same memory space, are easier to be uniquely identified an thus can have some performance optimization. The elements for identifying one are: the start address of the page where the address is, the address offset within the page and the current->mm pointer. Now, for uniquely identifying shared futex: - If the page containing the user address is an anonymous page, we can just use the same data used for private futexes (the start address of the page, the address offset within the page and the current->mm pointer) that will be enough for uniquely identifying such futex. We also set one bit at the key to differentiate if a private futex is used on the same address (mixing shared and private calls are not allowed). - If the page is file-backed, current->mm maybe isn't the same one for every user of this futex, so we need to use other data: the page->index, an UUID for the struct inode and the offset within the page. Note that members of futex_key doesn't have any particular meaning after they are part of the struct - they are just bytes to identify a futex. Given that, we don't need to use a particular name or type that matches the original data, we only need to care about the bitsize of each component and make both private and shared data fit in the same memory space. Signed-off-by: André Almeida <andrealmeid@collabora.com>
-
André Almeida authored
Create a new set of futex syscalls known as futex2. This new interface is aimed to implement a more maintainable code, while removing obsolete features and expanding it with new functionalities. Implements wait and wake semantics for futexes, along with the base infrastructure for future operations. The whole wait path is designed to be used by N waiters, thus making easier to implement vectorized wait. * Syscalls implemented by this patch: - futex_wait(void *uaddr, unsigned int val, unsigned int flags, struct timespec *timo) The user thread is put to sleep, waiting for a futex_wake() at uaddr, if the value at *uaddr is the same as val (otherwise, the syscall returns immediately with -EAGAIN). timo is an optional timeout value for the operation. Return 0 on success, error code otherwise. - futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags) Wake `nr_wake` threads waiting at uaddr. Return the number of woken threads on success, error code otherwise. ** The `flag` argument The flag is used to specify the size of the futex word (FUTEX_[8, 16, 32, 64]). It's mandatory to define one, since there's no default size. By default, the timeout uses a monotonic clock, but can be used as a realtime one by using the FUTEX_REALTIME_CLOCK flag. By default, futexes are of the private type, that means that this user address will be accessed by threads that shares the same memory region. This allows for some internal optimizations, so they are faster. However, if the address needs to be shared with different processes (like using `mmap()` or `shm()`), they need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that. By default, the operation has no NUMA-awareness, meaning that the user can't choose the memory node where the kernel side futex data will be stored. The user can choose the node where it wants to operate by setting the FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, or 32, 64): struct futexX_numa { __uX value; __sX hint; }; This structure should be passed at the `void *uaddr` of futex functions. The address of the structure will be used to be waited/waken on, and the `value` will be compared to `val` as usual. The `hint` member is used to defined which node the futex will use. When waiting, the futex will be registered on a kernel-side table stored on that node; when waking, the futex will be searched for on that given table. That means that there's no redundancy between tables, and the wrong `hint` value will led to undesired behavior. Userspace is responsible for dealing with node migrations issues that may occur. `hint` can range from [0, MAX_NUMA_NODES], for specifying a node, or -1, to use the same node the current process is using. When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a global table on some node, defined at compilation time. ** The `timo` argument As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout options known to be buggy. Given that, `timo` should be a 64bit timeout at all platforms, using an absolute timeout value. Signed-off-by: André Almeida <andrealmeid@collabora.com> --- [RFC Add futex2 syscall 0/0] Hi, This patch series introduces the futex2 syscalls. * What happened to the current futex()? For some years now, developers have been trying to add new features to futex, but maintainers have been reluctant to accept then, given the multiplexed interface full of legacy features and tricky to do big changes. Some problems that people tried to address with patchsets are: NUMA-awareness[0], smaller sized futexes[1], wait on multiple futexes[2]. NUMA, for instance, just doesn't fit the current API in a reasonable way. Considering that, it's not possible to merge new features into the current futex. ** The NUMA problem At the current implementation, all futex kernel side infrastructure is stored on a single node. Given that, all futex() calls issued by processors that aren't located on that node will have a memory access penalty when doing it. ** The 32bit sized futex problem Embedded systems or anything with memory constrains would benefit of using smaller sizes for the futex userspace integer. Also, a mutex implementation can be done using just three values, so 8 bits is enough for various scenarios. ** The wait on multiple problem The use case lies in the Wine implementation of the Windows NT interface WaitMultipleObjects. This Windows API function allows a thread to sleep waiting on the first of a set of event sources (mutexes, timers, signal, console input, etc) to signal. Considering this is a primitive synchronization operation for Windows applications, being able to quickly signal events on the producer side, and quickly go to sleep on the consumer side is essential for good performance of those running over Wine. [0] https://lore.kernel.org/lkml/20160505204230.932454245@linutronix.de/ [1] https://lore.kernel.org/lkml/20191221155659.3159-2-malteskarupke@web.de/ [2] https://lore.kernel.org/lkml/20200213214525.183689-1-andrealmeid@collabora.com/ * The solution As proposed by Peter Zijlstra and Florian Weimer[3], a new interface is required to solve this, which must be designed with those features in mind. futex2() is that interface. As opposed to the current multiplexed interface, the new one should have one syscall per operation. This will allow the maintainability of the API if it gets extended, and will help users with type checking of arguments. In particular, the new interface is extended to support the ability to wait on any of a list of futexes at a time, which could be seen as a vectored extension of the FUTEX_WAIT semantics. [3] https://lore.kernel.org/lkml/20200303120050.GC2596@hirez.programming.kicks-ass.net/ * The interface The new interface can be seen in details in the following patches, but this is a high level summary of what the interface can do: - Supports wake/wait semantics, as in futex() - Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with individual flags for each address - Supports waiting for a vector of futexes, using a new syscall named futex_waitv() - Supports variable sized futexes (8bits, 16bits, 32bits and 64bits) - Supports NUMA-awareness operations, where the user can specify on which memory node would like to operate * Implementation The internal implementation follows a similar design to the original futex. Given that we want to replicate the same external behavior of current futex, this should be somewhat expected. For some functions, like the init and the code to get a shared key, I literally copied code and comments from kernel/futex.c. I decided to do so instead of exposing the original function as a public function since in that way we can freely modify our implementation if required, without any impact on old futex. Also, the comments precisely describes the details and corner cases of the implementation. Each patch contains a brief description of implementation, but patch 6 "docs: locking: futex2: Add documentation" adds a more complete document about it. * The patchset This patchset can be also found at my git tree: https://gitlab.collabora.com/tonyk/linux/-/tree/futex2-dev - Patch 1: Implements wait/wake, and the basics foundations of futex2 - Patches 2-4: Implement the remaining features (shared, waitv, requeue). - Patch 5: Adds the x86_x32 ABI handling. I kept it in a separated patch since I'm not sure if x86_x32 is still a thing, or if it should return -ENOSYS. - Patch 6: Add a documentation file which details the interface and the internal implementation. - Patches 7-13: Selftests for all operations along with perf support for futex2. - Patch 14: While working on porting glibc for futex2, I found out that there's a futex_wake() call at the user thread exit path, if that thread was created with clone(..., CLONE_CHILD_SETTID, ...). In order to make pthreads work with futex2, it was required to add this patch. Note that this is more a proof-of-concept of what we will need to do in future, rather than part of the interface and shouldn't be merged as it is. * Testing: This patchset provides selftests for each operation and their flags. Along with that, the following work was done: ** Stability To stress the interface in "real world scenarios": - glibc[4]: nptl's low level locking was modified to use futex2 API (except for robust and PI things). All relevant nptl/ tests passed. - Wine[5]: Proton/Wine was modified in order to use futex2() for the emulation of Windows NT sync mechanisms based on futex, called "fsync". Triple-A games with huge CPU's loads and tons of parallel jobs worked as expected when compared with the previous FUTEX_WAIT_MULTIPLE implementation at futex(). Some games issue 42k futex2() calls per second. - Full GNU/Linux distro: I installed the modified glibc in my host machine, so all pthread's programs would use futex2(). After tweaking systemd[6] to allow futex2() calls at seccomp, everything worked as expected (web browsers do some syscall sandboxing and need some configuration as well). - perf: The perf benchmarks tests can also be used to stress the interface, and they can be found in this patchset. ** Performance - For comparing futex() and futex2() performance, I used the artificial benchmarks implemented at perf (wake, wake-parallel, hash and requeue). The setup was 200 runs for each test and using 8, 80, 800, 8000 for the number of threads, Note that for this test, I'm not using patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained at "The patchset" section. - For the first three ones, I measured an average of 4% gain in performance. This is not a big step, but it shows that the new interface is at least comparable in performance with the current one. - For requeue, I measured an average of 21% decrease in performance compared to the original futex implementation. This is expected given the new design with individual flags. The performance trade-offs are explained at patch 4 ("futex2: Implement requeue operation"). [4] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2 [5] https://gitlab.collabora.com/tonyk/wine/-/tree/proton_5.13 [6] https://gitlab.collabora.com/tonyk/systemd * FAQ ** "Where's the code for NUMA and FUTEX_8/16/64?" The current code is already complex enough to take some time for review, so I believe it's better to split that work out to a future iteration of this patchset. Besides that, this RFC is the core part of the infrastructure, and the following features will not pose big design changes to it, the work will be more about wiring up the flags and modifying some functions. ** "Where's the PI/robust stuff?" As said by Peter Zijlstra at [3], all those new features are related to the "simple" futex interface, that doesn't use PI or robust. Do we want to have this complexity at futex2() and if so, should it be part of this patchset or can it be future work? Thanks, André * Changelog Changes from v2: - API now supports 64bit futexes, in addition to 8, 16 and 32. - Refactored futex2_wait and futex2_waitv selftests v2: https://lore.kernel.org/lkml/20210304004219.134051-1-andrealmeid@collabora.com/ Changes from v1: - Unified futex_set_timer_and_wait and __futex_wait code - Dropped _carefull from linked list function calls - Fixed typos on docs patch - uAPI flags are now added as features are introduced, instead of all flags in patch 1 - Removed struct futex_single_waiter in favor of an anon struct v1: https://lore.kernel.org/lkml/20210215152404.250281-1-andrealmeid@collabora.com/
-
- Apr 19, 2021
-
-
Linus Torvalds authored
This reverts commit 04c53de5. Nathan Chancellor points out that it should not have been merged into mainline by itself. It was a fix for "gcov: use kvmalloc()", which is still in -mm/-next. Merging it alone has broken the build. Link: https://github.com/ClangBuiltLinux/continuous-integration2/runs/2384465683?check_suite_focus=true Reported-by: Nathan Chancellor <nathan@kernel.org> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 18, 2021
-
-
Linus Torvalds authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds authored
Pull ARM SoC fixes from Arnd Bergmann: "Another smaller set of fixes for three of the Arm platforms: TI OMAP: Fix swapped mmc device order also for omap3 that got changed with the recent PROBE_PREFER_ASYNCHRONOUS changes. While eventually the aliases should be board specific, all the mmc device instances are all there in the SoC, and we do probe them by default so that PM runtime can idle the devices if left enabled from the bootloader. Qualcomm Snapdragon: This bypasses the recently introduced interconnect handling in the GENI (serial engine) driver when running off ACPI, as this causes the GENI probe to fail and the Lenovo Yoga C630 to boot without keyboard and touchpad. Allwinner: One 32kHz clock fix for the beelink gs1, a CD polarity fix for the SoPine, some MAINTAINERS maintainance, and a clk / reset switch to our headers" * tag 'arm-fixes-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: arm64: dts: allwinner: h6: beelink-gs1: Remove ext. 32 kHz osc reference MAINTAINERS: Match on allwinner keyword MAINTAINERS: Add our new mailing-list arm64: dts: allwinner: Fix SD card CD GPIO for SOPine systems arm64: dts: allwinner: h6: Switch to macros for RSB clock/reset indices ARM: OMAP2+: Fix uninitialized sr_inst ARM: dts: Fix swapped mmc order for omap3 ARM: OMAP2+: Fix warning for omap_init_time_of() soc: qcom: geni: shield geni_icc_get() for ACPI boot
-
git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds authored
Pull ARM fixes from Russell King: - Halve maximum number of CPUs if DEBUG_KMAP_LOCAL is enabled - Fix conversion for_each_membock() to for_each_mem_range() - Fix footbridge PCI mapping - Avoid uprobes hooking on thumb instructions * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9071/1: uprobes: Don't hook on thumb instructions ARM: footbridge: fix PCI interrupt mapping ARM: 9069/1: NOMMU: Fix conversion for_each_membock() to for_each_mem_range() ARM: 9063/1: mm: reduce maximum number of CPUs if DEBUG_KMAP_LOCAL is enabled
-
Fredrik Strupe authored
Since uprobes is not supported for thumb, check that the thumb bit is not set when matching the uprobes instruction hooks. The Arm UDF instructions used for uprobes triggering (UPROBE_SWBP_ARM_INSN and UPROBE_SS_ARM_INSN) coincidentally share the same encoding as a pair of unallocated 32-bit thumb instructions (not UDF) when the condition code is 0b1111 (0xf). This in effect makes it possible to trigger the uprobes functionality from thumb, and at that using two unallocated instructions which are not permanently undefined. Signed-off-by: Fredrik Strupe <fredrik@strupe.net> Cc: stable@vger.kernel.org Fixes: c7edc9e3 ("ARM: add uprobes support") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds authored
Pull SCSI fixes from James Bottomley: "Two fixes: the libsas fix is for a problem that occurs when trying to change the cache type of an ATA device and the libiscsi one is a regression fix from this merge window" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: libsas: Reset num_scatter if libata marks qc as NODATA scsi: iscsi: Fix iSCSI cls conn state
-
git://anongit.freedesktop.org/drm/drmLinus Torvalds authored
Pull vmwgfx fixes from Dave Airlie: "This contains two regression fixes for vmwgfx, one due to a refactor which meant locks were being used before initialisation, and the other in fixing up some warnings from the core when destroying pinned buffers. vmwgfx: - fixed unpinning before destruction - lockdep init reordering" * tag 'drm-fixes-2021-04-18' of git://anongit.freedesktop.org/drm/drm: drm/vmwgfx: Make sure bo's are unpinned before putting them back drm/vmwgfx: Fix the lockdep breakage drm/vmwgfx: Make sure we unpin no longer needed buffers
-
- Apr 17, 2021
-
-
Dave Airlie authored
vmwgfx fixes for regressions in 5.12 Here's a set of 3 patches fixing ugly regressions in the vmwgfx driver. We broke lock initialization code and ended up using spinlocks before initialization breaking lockdep. Also there was a bit of a fallout from drm changes which made the core validate that unreferenced buffers have been unpinned. vmwgfx pinning code predates a lot of the core drm and wasn't written to account for those semantics. Fortunately changes required to fix it are not too intrusive. The changes have been validated by our internal ci. Signed-off-by: Zack Rusin <zackr@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com> From: Zack Rusin <zackr@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/f7add0a2-162e-3bd2-b1be-344a94f2acbf@vmware.com
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds authored
Pull i2c fix from Wolfram Sang: "One more driver bugfix for I2C" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: mv64xxx: Fix random system lock caused by runtime PM
-
Linus Torvalds authored
This does the directory entry name verification for the legacy "fillonedir" (and compat) interface that goes all the way back to the dark ages before we had a proper dirent, and the readdir() system call returned just a single entry at a time. Nobody should use this interface unless you still have binaries from 1991, but let's do it right. This came up during discussions about unsafe_copy_to_user() and proper checking of all the inputs to it, as the networking layer is looking to use it in a few new places. So let's make sure the _old_ users do it all right and proper, before we add new ones. See also commit 8a23eb80 ("Make filldir[64]() verify the directory entry filename is valid") which did the proper modern interfaces that people actually use. It had a note: Note that I didn't bother adding the checks to any legacy interfaces that nobody uses. which this now corrects. Note that we really don't care about POSIX and the presense of '/' in a directory entry, but verify_dirent_name() also ends up doing the proper name length verification which is what the input checking discussion was about. [ Another option would be to remove the support for this particular very old interface: any binaries that use it are likely a.out binaries, and they will no longer run anyway since we removed a.out binftm support in commit eac61655 ("x86: Deprecate a.out support"). But I'm not sure which came first: getdents() or ELF support, so let's pretend somebody might still have a working binary that uses the legacy readdir() case.. ] Link: https://lore.kernel.org/lkml/CAHk-=wjbvzCAhAtvG0d81W5o0-KT5PPTHhfJ5ieDFq+bGtgOYg@mail.gmail.com/ Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds authored
Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.12-rc8, including fixes from netfilter, and bpf. BPF verifier changes stand out, otherwise things have slowed down. Current release - regressions: - gro: ensure frag0 meets IP header alignment - Revert "net: stmmac: re-init rx buffers when mac resume back" - ethernet: macb: fix the restore of cmp registers Previous releases - regressions: - ixgbe: Fix NULL pointer dereference in ethtool loopback test - ixgbe: fix unbalanced device enable/disable in suspend/resume - phy: marvell: fix detection of PHY on Topaz switches - make tcp_allowed_congestion_control readonly in non-init netns - xen-netback: Check for hotplug-status existence before watching Previous releases - always broken: - bpf: mitigate a speculative oob read of up to map value size by tightening the masking window - sctp: fix race condition in sctp_destroy_sock - sit, ip6_tunnel: Unregister catch-all devices - netfilter: nftables: clone set element expression template - netfilter: flowtable: fix NAT IPv6 offload mangling - net: geneve: check skb is large enough for IPv4/IPv6 header - netlink: don't call ->netlink_bind with table lock held" * tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits) netlink: don't call ->netlink_bind with table lock held MAINTAINERS: update my email bpf: Update selftests to reflect new error states bpf: Tighten speculative pointer arithmetic mask bpf: Move sanitize_val_alu out of op switch bpf: Refactor and streamline bounds check into helper bpf: Improve verifier error messages for users bpf: Rework ptr_limit into alu_limit and add common error path bpf: Ensure off_reg has no mixed signed bounds for all types bpf: Move off_reg into sanitize_ptr_alu bpf: Use correct permission flag for mixed signed bounds arithmetic ch_ktls: do not send snd_una update to TCB in middle ch_ktls: tcb close causes tls connection failure ch_ktls: fix device connection close ch_ktls: Fix kernel panic i40e: fix the panic when running bpf in xdpdrv mode net/mlx5e: fix ingress_ifindex check in mlx5e_flower_parse_meta net/mlx5e: Fix setting of RS FEC mode net/mlx5: Fix setting of devlink traps in switchdev mode Revert "net: stmmac: re-init rx buffers when mac resume back" ...
-
Linus Torvalds authored
Merge tag 'libnvdimm-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "The largest change is for a regression that landed during -rc1 for block-device read-only handling. Vaibhav found a new use for the ability (originally introduced by virtio_pmem) to call back to the platform to flush data, but also found an original bug in that implementation. Lastly, Arnd cleans up some compile warnings in dax. This has all appeared in -next with no reported issues. Summary: - Fix a regression of read-only handling in the pmem driver - Fix a compile warning - Fix support for platform cache flush commands on powerpc/papr" * tag 'libnvdimm-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm/region: Fix nvdimm_has_flush() to handle ND_REGION_ASYNC libnvdimm: Notify disk drivers to revalidate region read-only dax: avoid -Wempty-body warnings
-
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds authored
Pull CXL memory class fixes from Dan Williams: "A collection of fixes for the CXL memory class driver introduced in this release cycle. The driver was primarily developed on a work-in-progress QEMU emulation of the interface and we have since found a couple places where it hid spec compliance bugs in the driver, or had a spec implementation bug itself. The biggest change here is replacing a percpu_ref with an rwsem to cleanup a couple bugs in the error unwind path during ioctl device init. Lastly there were some minor cleanups to not export the power-management sysfs-ABI for the ioctl device, use the proper sysfs helper for emitting values, and prevent subtle bugs as new administration commands are added to the supported list. The bulk of it has appeared in -next save for the top commit which was found today and validated on a fixed-up QEMU model. Summary: - Fix support for CXL memory devices with registers offset from the BAR base. - Fix the reporting of device capacity. - Fix the driver commands list definition to be disconnected from the UAPI command list. - Replace percpu_ref with rwsem to fix initialization error path. - Fix leaks in the driver initialization error path. - Drop the power/ directory from CXL device sysfs. - Use the recommended sysfs helper for attribute 'show' implementations" * tag 'cxl-fixes-for-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/mem: Fix memory device capacity probing cxl/mem: Fix register block offset calculation cxl/mem: Force array size of mem_commands[] to CXL_MEM_COMMAND_ID_MAX cxl/mem: Disable cxl device power management cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations cxl/mem: Use sysfs_emit() for attribute show routines
-
Linus Torvalds authored
Merge misc fixes from Andrew Morton: "12 patches. Subsystems affected by this patch series: mm (documentation, kasan, and pagemap), csky, ia64, gcov, and lib" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: lib: remove "expecting prototype" kernel-doc warnings gcov: clang: fix clang-11+ build mm: ptdump: fix build failure mm/mapping_dirty_helpers: guard hugepage pud's usage ia64: tools: remove duplicate definition of ia64_mf() on ia64 ia64: tools: remove inclusion of ia64-specific version of errno.h header ia64: fix discontig.c section mismatches ia64: remove duplicate entries in generic_defconfig csky: change a Kconfig symbol name to fix e1000 build error kasan: remove redundant config option kasan: fix hwasan build for gcc mm: eliminate "expecting prototype" kernel-doc warnings
-
Dan Williams authored
The CXL Identify Memory Device output payload emits capacity in 256MB units. The driver is treating the capacity field as bytes. This was missed because QEMU reports bytes when it should report bytes / 256MB. Fixes: 8adaf747 ("cxl/mem: Find device capabilities") Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Cc: Ben Widawsky <ben.widawsky@intel.com> Link: https://lore.kernel.org/r/161862021044.3259705.7008520073059739760.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
-
Florian Westphal authored
When I added support to allow generic netlink multicast groups to be restricted to subscribers with CAP_NET_ADMIN I was unaware that a genl_bind implementation already existed in the past. It was reverted due to ABBA deadlock: 1. ->netlink_bind gets called with the table lock held. 2. genetlink bind callback is invoked, it grabs the genl lock. But when a new genl subsystem is (un)registered, these two locks are taken in reverse order. One solution would be to revert again and add a comment in genl referring 1e82a62f, "genetlink: remove genl_bind"). This would need a second change in mptcp to not expose the raw token value anymore, e.g. by hashing the token with a secret key so userspace can still associate subflow events with the correct mptcp connection. However, Paolo Abeni reminded me to double-check why the netlink table is locked in the first place. I can't find one. netlink_bind() is already called without this lock when userspace joins a group via NETLINK_ADD_MEMBERSHIP setsockopt. Same holds for the netlink_unbind operation. Digging through the history, commit f7736080 ("netlink: access nlk groups safely in netlink bind and getname") expanded the lock scope. commit 3a20773b ("net: netlink: cap max groups which will be considered in netlink_bind()") ... removed the nlk->ngroups access that the lock scope extension was all about. Reduce the lock scope again and always call ->netlink_bind without the table lock. The Fixes tag should be vs. the patch mentioned in the link below, but that one got squash-merged into the patch that came earlier in the series. Fixes: 4d54cc32 ("mptcp: avoid lock_fast usage in accept path") Link: https://lore.kernel.org/mptcp/20210213000001.379332-8-mathew.j.martineau@linux.intel.com/T/#u Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Sean Tranchetti <stranche@codeaurora.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- Apr 16, 2021
-
-
git://git.kernel.dk/linux-blockLinus Torvalds authored
Pull io_uring fix from Jens Axboe: "Fix for a potential hang at exit with SQPOLL from Pavel" * tag 'io_uring-5.12-2021-04-16' of git://git.kernel.dk/linux-block: io_uring: fix early sqd_list removal sqpoll hangs
-
Randy Dunlap authored
Fix various kernel-doc warnings in lib/ due to missing or erroneous function names. Add kernel-doc for some function parameters that was missing. Use kernel-doc "Return:" notation in earlycpio.c. Quietens the following warnings: lib/earlycpio.c:61: warning: expecting prototype for cpio_data find_cpio_data(). Prototype was for find_cpio_data() instead lib/lru_cache.c:640: warning: expecting prototype for lc_dump(). Prototype was for lc_seq_dump_details() instead lru_cache.c:90: warning: Function parameter or member 'cache' not described in 'lc_create' lib/parman.c:368: warning: expecting prototype for parman_item_del(). Prototype was for parman_item_remove() instead parman.c:309: warning: Excess function parameter 'prority' description in 'parman_prio_init' lib/radix-tree.c:703: warning: expecting prototype for __radix_tree_insert(). Prototype was for radix_tree_insert() instead radix-tree.c:180: warning: Excess function parameter 'addr' description in 'radix_tree_find_next_bit' radix-tree.c:180: warning: Excess function parameter 'size' description in 'radix_tree_find_next_bit' radix-tree.c:931: warning: Function parameter or member 'iter' not described in 'radix_tree_iter_replace' Link: https://lkml.kernel.org/r/20210411221756.15461-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Jiri Pirko <jiri@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Johannes Berg authored
With clang-11+, the code is broken due to my kvmalloc() conversion (which predated the clang-11 support code) leaving one vmalloc() in place. Fix that. Link: https://lkml.kernel.org/r/20210412214210.6e1ecca9cdc5.I24459763acf0591d5e6b31c7e3a59890d802f79c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christophe Leroy authored
READ_ONCE() cannot be used for reading PTEs. Use ptep_get() instead, to avoid the following errors: CC mm/ptdump.o In file included from <command-line>: mm/ptdump.c: In function 'ptdump_pte_entry': include/linux/compiler_types.h:320:38: error: call to '__compiletime_assert_207' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE(). 320 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^ include/linux/compiler_types.h:301:4: note: in definition of macro '__compiletime_assert' 301 | prefix ## suffix(); \ | ^~~~~~ include/linux/compiler_types.h:320:2: note: in expansion of macro '_compiletime_assert' 320 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^~~~~~~~~~~~~~~~~~~ include/asm-generic/rwonce.h:36:2: note: in expansion of macro 'compiletime_assert' 36 | compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \ | ^~~~~~~~~~~~~~~~~~ include/asm-generic/rwonce.h:49:2: note: in expansion of macro 'compiletime_assert_rwonce_type' 49 | compiletime_assert_rwonce_type(x); \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/ptdump.c:114:14: note: in expansion of macro 'READ_ONCE' 114 | pte_t val = READ_ONCE(*pte); | ^~~~~~~~~ make[2]: *** [mm/ptdump.o] Error 1 See commit 481e980a ("mm: Allow arches to provide ptep_get()") and commit c0e1c8c2 ("powerpc/8xx: Provide ptep_get() with 16k pages") for details. Link: https://lkml.kernel.org/r/912b349e2bcaa88939904815ca0af945740c6bd4.1618478922.git.christophe.leroy@csgroup.eu Fixes: 30d621f6 ("mm: add generic ptdump") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Steven Price <steven.price@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Zack Rusin authored
Mapping dirty helpers have, so far, been only used on X86, but a port of vmwgfx to ARM64 exposed a problem which results in a compilation error on ARM64 systems: mm/mapping_dirty_helpers.c: In function `wp_clean_pud_entry': mm/mapping_dirty_helpers.c:172:32: error: implicit declaration of function `pud_dirty'; did you mean `pmd_dirty'? [-Werror=implicit-function-declaration] This is due to the fact that mapping_dirty_helpers code assumes that pud_dirty is always defined, which is not the case for architectures that don't define CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD. ARM64 arch is a little inconsistent when it comes to PUD hugepage helpers, e.g. it defines pud_young but not pud_dirty but regardless of that the core kernel code shouldn't assume that any of the PUD hugepage helpers are available unless CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD is defined. This prevents compilation errors whenever one of the drivers is ported to new architectures. Link: https://lkml.kernel.org/r/20210409165151.694574-1-zackr@vmware.com Signed-off-by: Zack Rusin <zackr@vmware.com> Reviewed-by: Thomas Hellstrm (Intel) <thomas_os@shipmail.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
John Paul Adrian Glaubitz authored
The ia64_mf() macro defined in tools/arch/ia64/include/asm/barrier.h is already defined in <asm/gcc_intrin.h> on ia64 which causes libbpf failing to build: CC /usr/src/linux/tools/bpf/bpftool//libbpf/staticobjs/libbpf.o In file included from /usr/src/linux/tools/include/asm/barrier.h:24, from /usr/src/linux/tools/include/linux/ring_buffer.h:4, from libbpf.c:37: /usr/src/linux/tools/include/asm/../../arch/ia64/include/asm/barrier.h:43: error: "ia64_mf" redefined [-Werror] 43 | #define ia64_mf() asm volatile ("mf" ::: "memory") | In file included from /usr/include/ia64-linux-gnu/asm/intrinsics.h:20, from /usr/include/ia64-linux-gnu/asm/swab.h:11, from /usr/include/linux/swab.h:8, from /usr/include/linux/byteorder/little_endian.h:13, from /usr/include/ia64-linux-gnu/asm/byteorder.h:5, from /usr/src/linux/tools/include/uapi/linux/perf_event.h:20, from libbpf.c:36: /usr/include/ia64-linux-gnu/asm/gcc_intrin.h:382: note: this is the location of the previous definition 382 | #define ia64_mf() __asm__ volatile ("mf" ::: "memory") | cc1: all warnings being treated as errors Thus, remove the definition from tools/arch/ia64/include/asm/barrier.h. Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
John Paul Adrian Glaubitz authored
There is no longer an ia64-specific version of the errno.h header below arch/ia64/include/uapi/asm/, so trying to build tools/bpf fails with: CC /usr/src/linux/tools/bpf/bpftool/btf_dumper.o In file included from /usr/src/linux/tools/include/linux/err.h:8, from btf_dumper.c:11: /usr/src/linux/tools/include/uapi/asm/errno.h:13:10: fatal error: ../../../arch/ia64/include/uapi/asm/errno.h: No such file or directory 13 | #include "../../../arch/ia64/include/uapi/asm/errno.h" | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. Thus, just remove the inclusion of the ia64-specific errno.h so that the build will use the generic errno.h header on this target which was used there anyway as the ia64-specific errno.h was just a wrapper for the generic header. Fixes: c25f867d ("ia64: remove unneeded uapi asm-generic wrappers") Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Randy Dunlap authored
Fix IA64 discontig.c Section mismatch warnings. When CONFIG_SPARSEMEM=y and CONFIG_MEMORY_HOTPLUG=y, the functions computer_pernodesize() and scatter_node_data() should not be marked as __meminit because they are needed after init, on any memory hotplug event. Also, early_nr_cpus_node() is called by compute_pernodesize(), so early_nr_cpus_node() cannot be __meminit either. WARNING: modpost: vmlinux.o(.text.unlikely+0x1612): Section mismatch in reference from the function arch_alloc_nodedata() to the function .meminit.text:compute_pernodesize() The function arch_alloc_nodedata() references the function __meminit compute_pernodesize(). This is often because arch_alloc_nodedata lacks a __meminit annotation or the annotation of compute_pernodesize is wrong. WARNING: modpost: vmlinux.o(.text.unlikely+0x1692): Section mismatch in reference from the function arch_refresh_nodedata() to the function .meminit.text:scatter_node_data() The function arch_refresh_nodedata() references the function __meminit scatter_node_data(). This is often because arch_refresh_nodedata lacks a __meminit annotation or the annotation of scatter_node_data is wrong. WARNING: modpost: vmlinux.o(.text.unlikely+0x1502): Section mismatch in reference from the function compute_pernodesize() to the function .meminit.text:early_nr_cpus_node() The function compute_pernodesize() references the function __meminit early_nr_cpus_node(). This is often because compute_pernodesize lacks a __meminit annotation or the annotation of early_nr_cpus_node is wrong. Link: https://lkml.kernel.org/r/20210411001201.3069-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Mike Rapoport <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Randy Dunlap authored
Fix ia64 generic_defconfig duplicate entries, as warned by: arch/ia64/configs/generic_defconfig: warning: override: reassigning to symbol ATA: => 58 arch/ia64/configs/generic_defconfig: warning: override: reassigning to symbol ATA_PIIX: => 59 These 2 symbols still have the same value as in the removed lines. Link: https://lkml.kernel.org/r/20210411020255.18052-1-rdunlap@infradead.org Fixes: c331649e ("ia64: Use libata instead of the legacy ide driver in defconfigs") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Randy Dunlap authored
e1000's #define of CONFIG_RAM_BASE conflicts with a Kconfig symbol in arch/csky/Kconfig. The symbol in e1000 has been around longer, so change arch/csky/ to use DRAM_BASE instead of RAM_BASE to remove the conflict. (although e1000 is also a 2-line change) Link: https://lkml.kernel.org/r/20210411055335.7111-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Guo Ren <guoren@kernel.org> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Walter Wu authored
CONFIG_KASAN_STACK and CONFIG_KASAN_STACK_ENABLE both enable KASAN stack instrumentation, but we should only need one config, so that we remove CONFIG_KASAN_STACK_ENABLE and make CONFIG_KASAN_STACK workable. see [1]. When enable KASAN stack instrumentation, then for gcc we could do no prompt and default value y, and for clang prompt and default value n. This patch fixes the following compilation warning: include/linux/kasan.h:333:30: warning: 'CONFIG_KASAN_STACK' is not defined, evaluates to 0 [-Wundef] [akpm@linux-foundation.org: fix merge snafu] Link: https://bugzilla.kernel.org/show_bug.cgi?id=210221 [1] Link: https://lkml.kernel.org/r/20210226012531.29231-1-walter-zh.wu@mediatek.com Fixes: d9b571c8 ("kasan: fix KASAN_STACK dependency for HW_TAGS") Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com> Suggested-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-