Commit b3a81d08 authored by Minchan Kim's avatar Minchan Kim Committed by Linus Torvalds
Browse files

mm: fix KSM data corruption

Nadav reported KSM can corrupt the user data by the TLB batching
race[1].  That means data user written can be lost.

Quote from Nadav Amit:
 "For this race we need 4 CPUs:

  CPU0: Caches a writable and dirty PTE entry, and uses the stale value
  for write later.

  CPU1: Runs madvise_free on the range that includes the PTE. It would
  clear the dirty-bit. It batches TLB flushes.

  CPU2: Writes 4 to /proc/PID/clear_refs , clearing the PTEs soft-dirty.
  We care about the fact that it clears the PTE write-bit, and of
  course, batches TLB flushes.

  CPU3: Runs KSM. Our purpose is to pass the following test in

	if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) ||
	    (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte)))

  Since it will avoid TLB flush. And we want to do it while the PTE is
  stale. Later, and before replacing the page, we would be able to
  change the page.

  Note that all the operations the CPU1-3 perform canhappen in parallel
parent 99baac21
......@@ -16,9 +16,10 @@
#include <linux/mmu_notifier.h>
#include <linux/page_idle.h>
#include <linux/shmem_fs.h>
#include <linux/uaccess.h>
#include <asm/elf.h>
#include <linux/uaccess.h>
#include <asm/tlb.h>
#include <asm/tlbflush.h>
#include "internal.h"
......@@ -1008,6 +1009,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
struct mm_struct *mm;
struct vm_area_struct *vma;
enum clear_refs_types type;
struct mmu_gather tlb;
int itype;
int rv;
......@@ -1054,6 +1056,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
tlb_gather_mmu(&tlb, mm, 0, -1);
if (type == CLEAR_REFS_SOFT_DIRTY) {
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (!(vma->vm_flags & VM_SOFTDIRTY))
......@@ -1075,7 +1078,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
walk_page_range(0, mm->highest_vm_end, &clear_refs_walk);
mmu_notifier_invalidate_range_end(mm, 0, -1);
tlb_finish_mmu(&tlb, 0, -1);
......@@ -1038,7 +1038,8 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
goto out_unlock;
if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) ||
(pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte))) {
(pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte)) ||
mm_tlb_flush_pending(mm)) {
pte_t entry;
swapped = PageSwapCache(page);
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment