Skip to content
Snippets Groups Projects
Commit fd468043 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

x86: avoid per-cpu system call trampoline


The per-cpu system call trampoline was a clever trick, and allows us to
have percpu data even before swapgs is done by just doing %rip-relative
addressing.  And that was important, because syscall doesn't have a
kernel stack, so we needed that percpu data very very early, just to get
a temporary register to switch the page tables around.

However, it turns out to be unnecessary.  Because we actually have a
temporary register that we can use: %r11 is destroyed by the 'syscall'
instruction anyway.

Ok, technically it contains the user mode flags register, but we *have*
that information anyway: it's still in %rflags, we've just masked off a
few unimportant bits.  We'll destroy the rest too when we do the "and"
of the CR3 value, but who cares? It's a system call.

Btw, there are a few bits in eflags that might matter to user space: DF
and AC.  Right now this clears them, but that is fixable by just
changing the MSR_SYSCALL_MASK value to not include them, and clearing
them by hand the way we do for all other kernel entry points anyway.

So the only _real_ flags we'd destroy are IF and the arithmetic flags
that get trampled on by the arithmetic instructions that are part of the
%cr3 reload logic.

However, if we really end up caring, we can save off even those: we'd
take advantage of the fact that %rcx - which contains the returning IP
of the system call - also has 8 bits free.

Why 8? Even with 5-level paging, we only have 57 bits of virtual address
space, and the high address space is for the kernel (and vsyscall, but
we'd just disable native vsyscall).  So the %rip value saved in %rcx can
have only 56 valid bits, which means that we have 8 bits free.

So *if* we care about IF and the arithmetic flags being saved over a
system call, we'd do:

        shlq $8,%rcx
        movb %r11b,%cl
        shrl $8,%r11d
        andl $8,%r11d
        orb %r11b,%cl

to save those bits off before we then user %r11 as a temporary register
(we'd obviously need to then undo that as we save the user space state
on the stack).

Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent 6f70eb2b
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment