Skip to content
  • Longpeng's avatar
    mm/hugetlb: fix a addressing exception caused by huge_pte_offset · 3c1d7e6c
    Longpeng authored
    
    
    Our machine encountered a panic(addressing exception) after run for a
    long time and the calltrace is:
    
        RIP: hugetlb_fault+0x307/0xbe0
        RSP: 0018:ffff9567fc27f808  EFLAGS: 00010286
        RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48
        RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48
        RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080
        R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8
        R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074
        FS:  00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
          follow_hugetlb_page+0x175/0x540
          __get_user_pages+0x2a0/0x7e0
          __get_user_pages_unlocked+0x15d/0x210
          __gfn_to_pfn_memslot+0x3c5/0x460 [kvm]
          try_async_pf+0x6e/0x2a0 [kvm]
          tdp_page_fault+0x151/0x2d0 [kvm]
         ...
          kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm]
          kvm_vcpu_ioctl+0x309/0x6d0 [kvm]
          do_vfs_ioctl+0x3f0/0x540
          SyS_ioctl+0xa1/0xc0
          system_call_fastpath+0x22/0x27
    
    For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
    may return a wrong 'pmdp' if there is a race.  Please look at the
    following code snippet:
    
        ...
        pud = pud_offset(p4d, addr);
        if (sz != PUD_SIZE && pud_none(*pud))
            return NULL;
        /* hugepage or swap? */
        if (pud_huge(*pud) || !pud_present(*pud))
            return (pte_t *)pud;
    
        pmd = pmd_offset(pud, addr);
        if (sz != PMD_SIZE && pmd_none(*pmd))
            return NULL;
        /* hugepage or swap? */
        if (pmd_huge(*pmd) || !pmd_present(*pmd))
            return (pte_t *)pmd;
        ...
    
    The following sequence would trigger this bug:
    
     - CPU0: sz = PUD_SIZE and *pud = 0 , continue
     - CPU0: "pud_huge(*pud)" is false
     - CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
     - CPU0: "!pud_present(*pud)" is false, continue
     - CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp
    
    However, we want CPU0 to return NULL or pudp in this case.
    
    We must make sure there is exactly one dereference of pud and pmd.
    
    Signed-off-by: default avatarLongpeng <longpeng2@huawei.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: default avatarJason Gunthorpe <jgg@mellanox.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Sean Christopherson <sean.j.christopherson@intel.com>
    Cc: <stable@vger.kernel.org>
    Link: http://lkml.kernel.org/r/20200413010342.771-1-longpeng2@huawei.com
    
    
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    3c1d7e6c