Skip to content
  • Will Deacon's avatar
    arm64: mm: Use READ_ONCE when dereferencing pointer to pte table · f069faba
    Will Deacon authored
    On kernels built with support for transparent huge pages, different CPUs
    can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk
    and they must take care to use READ_ONCE to avoid value tearing or caching
    of stale values by the compiler. Unfortunately, these functions call into
    our pgtable macros, which don't use READ_ONCE, and compiler caching has
    been observed to cause the following crash during ext4 writeback:
    
    PC is at check_pte+0x20/0x170
    LR is at page_vma_mapped_walk+0x2e0/0x540
    [...]
    Process doio (pid: 2463, stack limit = 0xffff00000f2e8000)
    Call trace:
    [<ffff000008233328>] check_pte+0x20/0x170
    [<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540
    [<ffff000008234adc>] page_mkclean_one+0xac/0x278
    [<ffff000008234d98>] rmap_walk_file+0xf0/0x238
    [<ffff000008236e74>] rmap_walk+0x64/0xa0
    [<ffff0000082370c8>] page_mkclean+0x90/0xa8
    [<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8
    [<ffff00000832f984>] mpage_submit_page+0x34/0x98
    [<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170
    [<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8
    [<ffff00000833530c>] ext4_writepages+0x484/0xe30
    [<ffff0000081f6ab4>] do_writepages+0x44/0xe8
    [<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110
    [<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8
    [<ffff000008324310>] ext4_sync_file+0x80/0x4b8
    [<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0
    [<ffff0000082332b4>] SyS_msync+0x194/0x1e8
    
    This is because page_vma_mapped_walk loads the PMD twice before calling
    pte_offset_map: the first time without READ_ONCE (where it gets all zeroes
    due to a concurrent pmdp_invalidate) and the second time with READ_ONCE
    (where it sees a valid table pointer due to a concurrent pmd_populate).
    However, the compiler inlines everything and caches the first value in
    a register, which is subsequently used in pte_offset_phys which returns
    a junk pointer that is later dereferenced when attempting to access the
    relevant pte.
    
    This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure
    that a stale value is not used. Whilst this is a point fix for a known
    failure (and simple to backport), a full fix moving all of our page table
    accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in
    page_vma_mapped_walk is in the works for a future kernel release.
    
    Cc: Jon Masters <jcm@redhat.com>
    Cc: Timur Tabi <timur@codeaurora.org>
    Cc: <stable@vger.kernel.org>
    Fixes: f27176cf
    
     ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()")
    Tested-by: default avatarRichard Ruigrok <rruigrok@codeaurora.org>
    Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
    Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    f069faba