Skip to content
  • Andy Lutomirski's avatar
    x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems · 5beda7d5
    Andy Lutomirski authored
    Neil Berrington reported a double-fault on a VM with 768GB of RAM that uses
    large amounts of vmalloc space with PTI enabled.
    
    The cause is that load_new_mm_cr3() was never fixed to take the 5-level pgd
    folding code into account, so, on a 4-level kernel, the pgd synchronization
    logic compiles away to exactly nothing.
    
    Interestingly, the problem doesn't trigger with nopti.  I assume this is
    because the kernel is mapped with global pages if we boot with nopti.  The
    sequence of operations when we create a new task is that we first load its
    mm while still running on the old stack (which crashes if the old stack is
    unmapped in the new mm unless the TLB saves us), then we call
    prepare_switch_to(), and then we switch to the new stack.
    prepare_switch_to() pokes the new stack directly, which will populate the
    mapping through vmalloc_fault().  I assume that we're getting lucky on
    non-PTI systems -- the old stack's TLB entry stays alive long enough to
    make it all the way through prepare_switch_to() and switch_to() so that we
    make it to a valid stack.
    
    Fixes: b50858ce
    
     ("x86/mm/vmalloc: Add 5-level paging support")
    Reported-and-tested-by: default avatarNeil Berrington <neil.berrington@datacore.com>
    Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: stable@vger.kernel.org
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Link: https://lkml.kernel.org/r/346541c56caed61abbe693d7d2742b4a380c5001.1516914529.git.luto@kernel.org
    5beda7d5