diff --git a/Documentation/userspace-api/futex2.rst b/Documentation/userspace-api/futex2.rst new file mode 100644 index 0000000000000000000000000000000000000000..dc3131b3df7ce7ce49352e165483371e9f40a1c5 --- /dev/null +++ b/Documentation/userspace-api/futex2.rst @@ -0,0 +1,90 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====== +futex2 +====== + +:Author: André Almeida <andrealmeid@collabora.com> + +futex, or fast user mutex, is a set of syscalls to allow userspace to create +performant synchronization mechanisms, such as mutexes, semaphores and +conditional variables in userspace. C standard libraries, like glibc, uses it +as a means to implement more high level interfaces like pthreads. + +futex2 is a followup version of the initial futex syscall, designed to overcome +limitations of the original interface. + +User API +======== + +``futex_waitv()`` +----------------- + +Wait on an array of futexes, wake on any:: + + futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes, unsigned int flags, struct timespec *timo) + + struct futex_waitv { + __u64 val; + __u64 uaddr; + __u32 flags; + __u32 __reserved; + }; + +Userspace set an array of struct futex_waitv (up to a max of 128 entries), +using ``uaddr`` for the address to wait for, ``val`` for the expected value +and ``flags`` to specify the type (shared, private) and size of futex. +``__reserved`` is here for explicit padding in 32 and 64-bit platforms. It can +be used for future extensions, but currently it should be always 0. + +``uaddr`` uses a 64-bit unsigned integer to store the userspace address. Given +signal extension and promotion rules, platforms that uses 32-bit pointers should +explicit cast the address value. Casting using ``(uintptr_t)`` solves the +problem and make the code work on both 64 and 32-bit platforms without further +tweaks:: + + waitv[i].uaddr = (uintptr_t) uaddr; + +The pointer for the first item of the array is passed as ``waiters``. An invalid +address for ``waiters`` or for any ``uaddr`` returns ``-EFAULT``. + +``nr_futexes`` specifies the size of the array. Numbers out of [1, 128] +interval will make the syscall return ``-EINVAL``. + +``flags`` can be used to set the timeout clock. + +For each entry in ``waiters`` array, the current value at ``uaddr`` is compared +to ``val``. If it's different, the syscall undo all the work done so far and +return ``-EAGAIN``. If all tests and verifications succeeds, syscall waits until +one of the following happens: + +- The timeout expires, returning ``-ETIMEOUT``. +- A signal was sent to the sleeping task, returning ``-ERESTARTSYS``. +- Some futex at the list was awaken, returning the index of some waked futex. + +An example of how to use the interface can be found at ``tools/testing/selftests/futex/futenctional/futex_waitv.c``. + +Timeout +------- + +For every operation that has a ``struct timespec timo`` argument, it is an +optional argument that points to an absolute timeout. By default, it's measured +against ``CLOCK_MONOTONIC``, but the flag ``FUTEX_CLOCK_REALTIME`` can be used +to measure it against ``CLOCK_REALTIME``. This syscall accepts only 64bit +timespec structs. + +Types of futex +-------------- + +A futex can be either private or shared. Private is used for processes that +shares the same memory space and the virtual address of the futex will be the +same for all processes. This allows for optimizations in the kernel and is the +default type of a futex. To use private futexes, set the flag +``FUTEX_PRIVATE_FLAG``. For processes that doesn't share the same memory space +and therefore can have different virtual addresses for the same futex (using, +for instance, a file-backed shared memory) requires different internal +mechanisms to be get properly enqueued in the kernel. This is the default mode. + +Futexes can be of different sizes: 8, 16, 32 or 64 bits. Currently, the only +supported one is 32 bit sized futex, and it need to be specified using +``FUTEX_32`` flag. diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index c432be070f67cbd8f39e0b2721129786634da2b2..a61eac0c73f8252dbb5da0afae2457ecbb4b41b1 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -28,6 +28,7 @@ place where this information is gathered. media/index sysfs-platform_profile vduse + futex2 .. only:: subproject and html diff --git a/MAINTAINERS b/MAINTAINERS index 7b756d96f09fa500289e5b2965ff66cb5a432eee..5641e8e9b92453b9f946d39b3061f88b9240ee1d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7723,6 +7723,7 @@ L: linux-kernel@vger.kernel.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking/core F: Documentation/locking/*futex* +F: Documentation/userspace-api/futex2.rst F: include/asm-generic/futex.h F: include/linux/futex.h F: include/uapi/linux/futex.h