1. 15 Dec, 2009 5 commits
  2. 10 Dec, 2009 1 commit
    • Christoph Hellwig's avatar
      vfs: Implement proper O_SYNC semantics · 6b2f3d1f
      Christoph Hellwig authored
      
      
      While Linux provided an O_SYNC flag basically since day 1, it took until
      Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
      since that day we had generic_osync_around with only minor changes and the
      great "For now, when the user asks for O_SYNC, we'll actually give
      O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
      semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
      patches which are required before this patch it's actually surprisingly
      simple, we just need to figure out when to set the datasync flag to
      vfs_fsync_range and when not.
      
      This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
      numerical value to keep binary compatibility, and adds a new real O_SYNC
      flag.  To guarantee backwards compatiblity it is defined as expanding to
      both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
      sure we are backwards-compatible when compiled against the new headers.
      
      This also means that all places that don't care about the differences can
      just check O_DSYNC and get the right behaviour for O_SYNC, too - only
      places that actuall care need to check __O_SYNC in addition.  Drivers and
      network filesystems have been updated in a fail safe way to always do the
      full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
      lower layers are kept that way for now to stay failsafe.
      
      We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
      to make sure we always get these sane options.
      
      Note that parisc really screwed up their headers as they already define a
      O_DSYNC that has always been a no-op.  We try to repair it by using it for
      the new O_DSYNC and redefinining O_SYNC to send both the traditional
      O_SYNC numerical value _and_ the O_DSYNC one.
      
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger@sun.com>
      Acked-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: default avatarKyle McMartin <kyle@mcmartin.ca>
      Acked-by: default avatarUlrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      6b2f3d1f
  3. 04 Dec, 2009 1 commit
  4. 14 Oct, 2009 1 commit
    • Frederic Weisbecker's avatar
      mem_class: Drop the bkl from memory_open() · 205153aa
      Frederic Weisbecker authored
      
      
      The generic open callback for the mem class devices is "protected" by
      the bkl.
      
      Let's look at the datas manipulated inside memory_open:
      
      - inode and file: safe
      - the devlist: safe because it is constant
      - the memdev classes inside this array are safe too (constant)
      
      After we find out which memdev file operation we need to use, we call
      its open callback. Depending on the targeted memdev, we call either
      open_port() that doesn't manipulate any racy data (just a capable()
      check), or we call nothing.
      
      So it's safe to remove the big kernel lock there.
      
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1255113062-5835-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      205153aa
  5. 27 Sep, 2009 1 commit
  6. 24 Sep, 2009 1 commit
  7. 19 Sep, 2009 1 commit
  8. 15 Sep, 2009 2 commits
  9. 11 Sep, 2009 1 commit
  10. 18 Jun, 2009 1 commit
  11. 10 Jun, 2009 1 commit
    • Linus Torvalds's avatar
      Make /dev/zero reads interruptible by signals · 2b838687
      Linus Torvalds authored
      
      
      This helps with bad latencies for large reads from /dev/zero, but might
      conceivably break some application that "knows" that a read of /dev/zero
      cannot return early.  So do this early in the merge window to give us
      maximal test coverage, even if the patch is totally trivial.
      
      Obviously, no well-behaved application should ever depend on the read
      being uninterruptible, but hey, bugs happen.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2b838687
  12. 04 Jun, 2009 1 commit
    • Salman Qazi's avatar
      drivers/char/mem.c: avoid OOM lockup during large reads from /dev/zero · 730c586a
      Salman Qazi authored
      While running 20 parallel instances of dd as follows:
      
        #!/bin/bash
        for i in `seq 1 20`; do
                 dd if=/dev/zero of=/export/hda3/dd_$i bs=1073741824 count=1 &
        done
        wait
      
      on a 16G machine, we noticed that rather than just killing the processes,
      the entire kernel went down.  Stracing dd reveals that it first does an
      mmap2, which makes 1GB worth of zero page mappings.  Then it performs a
      read on those pages from /dev/zero, and finally it performs a write.
      
      The machine died during the reads.  Looking at the code, it was noticed
      that /dev/zero's read operation had been changed by
      557ed1fa
      
       ("remove ZERO_PAGE") from giving
      zero page mappings to actually zeroing the page.
      
      The zeroing of the pages causes physical pages to be allocated to the
      process.  But, when the process exhausts all the memory that it can, the
      kernel cannot kill it, as it is still in the kernel mode allocating more
      memory.  Consequently, the kernel eventually crashes.
      
      To fix this, I propose that when a fatal signal is pending during
      /dev/zero read operation, we simply return and let the user process die.
      
      Signed-off-by: default avatarSalman Qazi <sqazi@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      [ Modified error return and comment trivially.  - Linus]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      730c586a
  13. 10 Apr, 2009 1 commit
  14. 06 Jan, 2009 1 commit
  15. 16 Oct, 2008 1 commit
  16. 24 Jul, 2008 1 commit
  17. 22 Jul, 2008 1 commit
  18. 20 Jul, 2008 1 commit
    • Ingo Molnar's avatar
      Subject: devmem, x86: fix rename of CONFIG_NONPROMISC_DEVMEM · d092633b
      Ingo Molnar authored
      
      From: Arjan van de Ven <arjan@infradead.org>
      Date: Sat, 19 Jul 2008 15:47:17 -0700
      
      CONFIG_NONPROMISC_DEVMEM was a rather confusing name - but renaming it
      to CONFIG_PROMISC_DEVMEM causes problems on architectures that do not
      support this feature; this patch renames it to CONFIG_STRICT_DEVMEM,
      so that architectures can opt-in into it.
      
      ( the polarity of the option is still the same as it was originally; it
        needs to be for now to not break architectures that don't have the
        infastructure yet to support this feature)
      
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: "V.Radhakrishnan" <rk@atr-labs.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ---
      d092633b
  19. 17 Jul, 2008 1 commit
    • Ingo Molnar's avatar
      x86: rename CONFIG_NONPROMISC_DEVMEM to CONFIG_PROMISC_DEVMEM · 64d206d8
      Ingo Molnar authored
      
      
      Linus observed:
      
      > The real bug is that we shouldn't have "double negatives", and
      > certainly not negative config options. Making that "promiscuous
      > /dev/mem" option a negated thing as a config option was bad.
      
      right ... lets rename this option. There should never be a negation
      in config options.
      
      [ that reminds me of CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER, but that
        is for another commit ;-) ]
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      64d206d8
  20. 20 Jun, 2008 1 commit
  21. 29 Apr, 2008 1 commit
  22. 24 Apr, 2008 5 commits
  23. 29 Oct, 2007 1 commit
  24. 17 Oct, 2007 1 commit
  25. 16 Oct, 2007 1 commit
    • Nick Piggin's avatar
      remove ZERO_PAGE · 557ed1fa
      Nick Piggin authored
      The commit b5810039
      
       contains the note
      
        A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
        (and thus mapcounted and count towards shared rss).  These writes to
        the struct page could cause excessive cacheline bouncing on big
        systems.  There are a number of ways this could be addressed if it is
        an issue.
      
      And indeed this cacheline bouncing has shown up on large SGI systems.
      There was a situation where an Altix system was essentially livelocked
      tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
      This situation can be avoided in userspace, but it does highlight the
      potential scalability problem with refcounting ZERO_PAGE, and corner
      cases where it can really hurt (we don't want the system to livelock!).
      
      There are several broad ways to fix this problem:
      1. add back some special casing to avoid refcounting ZERO_PAGE
      2. per-node or per-cpu ZERO_PAGES
      3. remove the ZERO_PAGE completely
      
      I will argue for 3. The others should also fix the problem, but they
      result in more complex code than does 3, with little or no real benefit
      that I can see.
      
      Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
      false optimisation: if an application is performance critical, it would
      not be doing many read faults of new memory, or at least it could be
      expected to write to that memory soon afterwards. If cache or memory use
      is critical, it should not be working with a significant number of
      ZERO_PAGEs anyway (a more compact representation of zeroes should be
      used).
      
      As a sanity check -- mesuring on my desktop system, there are never many
      mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
      increase much without it.
      
      When running a make -j4 kernel compile on my dual core system, there are
      about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
      ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
      is torn down without being COWed). So removing ZERO_PAGE will save 1,000
      page faults per second when running kbuild, while keeping it only saves
      less than 1 page clearing operation per second. 1 page clear is cheaper
      than a thousand faults, presumably, so there isn't an obvious loss.
      
      Neither the logical argument nor these basic tests give a guarantee of no
      regressions. However, this is a reasonable opportunity to try to remove
      the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
      we can reintroduce it and just avoid refcounting it.
      
      The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked.  I don't see
      much use to them except on benchmarks.  All other users of ZERO_PAGE are
      converted just to use ZERO_PAGE(0) for simplicity. We can look at
      replacing them all and maybe ripping out ZERO_PAGE completely when we are
      more satisfied with this solution.
      
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus "snif" Torvalds <torvalds@linux-foundation.org>
      557ed1fa
  26. 10 Jul, 2007 2 commits
  27. 08 May, 2007 2 commits
  28. 17 Apr, 2007 1 commit
  29. 22 Jan, 2007 1 commit
    • Linus Torvalds's avatar
      Revert "[PATCH] Fix up mmap_kmem" · 6d3154cc
      Linus Torvalds authored
      This reverts commit 99a10a60
      
      .
      
      As per Hugh Dickins:
      
        "Nadia Derbey has reported that mmap of /dev/kmem no longer works with
         the kernel virtual address as offset, and Franck has confirmed that
         his patch came from a misunderstanding of what an offset means to
         /dev/kmem - whereas his patch description seems to say that he was
         correcting the offset on a few plaforms, there was no such problem to
         correct, and his patch was in fact changing its API on all platforms."
      
      Suggested-by: default avatarHugh Dickins <hugh@veritas.com>
      Cc: Franck Bui-Huu <fbuihuu@gmail.com>
      Cc: Nadia Derbey <Nadia.Derbey@bull.net>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d3154cc