- Aug 30, 2005
-
-
Nick Piggin authored
Defer copying of ptes until fault time when it is possible to reconstruct the pte from backing store. Idea from Andi Kleen and Nick Piggin. Thanks to input from Rik van Riel and Linus and to Hugh for correcting my blundering. Ray Fucillo <fucillo@intersystems.com> reports: "I applied this latest patch to a 2.6.12 kernel and found that it does resolve the problem. Prior to the patch on this machine, I was seeing about 23ms spent in fork for ever 100MB of shared memory segment. After applying the patch, fork is taking about 1ms regardless of the shared memory size." Signed-off-by:
Nick Piggin <npiggin@suse.de> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Aug 20, 2005
-
-
Linus Torvalds authored
This bug could cause oopses and page state corruption, because ncpfs used the generic page-cache symlink handlign functions. But those functions only work if the page cache is guaranteed to be "stable", ie a page that was installed when the symlink walk was started has to still be installed in the page cache at the end of the walk. We could have fixed ncpfs to not use the generic helper routines, but it is in many ways much cleaner to instead improve on the symlink walking helper routines so that they don't require that absolute stability. We do this by allowing "follow_link()" to return a error-pointer as a cookie, which is fed back to the cleanup "put_link()" routine. This also simplifies NFS symlink handling. Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Aug 05, 2005
-
-
David Gibson authored
This patch fixes a crash in the hugepage code. unmap_hugepage_area() was assuming that (due to prefault) PTEs must exist for all the area in question. However, this may not be the case, if mmap() encounters an error before the prefault and calls unmap_region() to clean up any partial mapping. Depending on the hugepage configuration, this crash can be triggered by an unpriveleged user. Signed-off-by:
David Gibson <david@gibson.dropbear.id.au> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Simon Derr authored
We have found what seems to be a small bug in __vm_enough_memory() when sysctl_overcommit_memory is set to OVERCOMMIT_NEVER. When this bug occurs the systems fails to boot, with /sbin/init whining about fork() returning ENOMEM. We hunted down the problem to this: The deferred update mecanism used in vm_acct_memory(), on a SMP system, allows the vm_committed_space counter to have a negative value. This should not be a problem since this counter is known to be inaccurate. But in __vm_enough_memory() this counter is compared to the `allowed' variable, which is an unsigned long. This comparison is broken since it will consider the negative values of vm_committed_space to be huge positive values, resulting in a memory allocation failure. Signed-off-by:
<Jean-Marc.Saffroy@ext.bull.net> Signed-off-by:
<Simon.Derr@bull.net> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Aug 04, 2005
-
-
Hugh Dickins authored
mremap's move_vma is applying __vm_stat_account to the old vma which may have already been freed: move it to just before the do_munmap. mremapping to and fro with CONFIG_DEBUG_SLAB=y showed /proc/<pid>/status VmSize and VmData wrapping just like in kernel bugzilla #4842, and fixed by this patch - worth including in 2.6.13, though not yet confirmed that it fixes that specific report from Frank van Maarseveen. Signed-off-by:
Hugh Dickins <hugh@veritas.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Aug 03, 2005
-
-
Linus Torvalds authored
The VM_FAULT_WRITE thing is an extra bit, not a valid return value, and has to be treated as such by get_user_pages(). Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Nick Piggin authored
Checking pte_dirty instead of pte_write in __follow_page is problematic for s390, and for copy_one_pte which leaves dirty when clearing write. So revert __follow_page to check pte_write as before, and make do_wp_page pass back a special extra VM_FAULT_WRITE bit to say it has done its full job: once get_user_pages receives this value, it no longer requires pte_write in __follow_page. But most callers of handle_mm_fault, in the various architectures, have switch statements which do not expect this new case. To avoid changing them all in a hurry, make an inline wrapper function (using the old name) that masks off the new bit, and use the extended interface with double underscores. Yes, we do have a call to do_wp_page from do_swap_page, but no need to change that: in rare case it's needed, another do_wp_page will follow. Signed-off-by:
Hugh Dickins <hugh@veritas.com> [ Cleanups by Nick Piggin ] Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Aug 02, 2005
-
-
Eric Dumazet authored
A kernel BUG() is triggered by a call to set_mempolicy() with a negative first argument. This is because the mode is declared as an int, and the validity check doesnt check < 0 values. Alternatively, mode could be declared as unsigned int or unsigned long. Signed-off-by:
Eric Dumazet <dada1@cosmosbay.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Hugh Dickins authored
x86_64 has a large sparse gate area between VSYSCALL_START and VSYSCALL_END, not all of it presently backed by pmds. Alexander Nyberg has found that in some circumstances gdb may try to ptrace here, and hit get_user_pages BUG_ON. It seems odd that gdb should be accessing here, but it certainly shouldn't crash in this way: relax BUG_ON to -EFAULT. Fixes kernel bugzilla #4801. Signed-off-by:
Hugh Dickins <hugh@veritas.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Aug 01, 2005
-
-
Linus Torvalds authored
There's no real guarantee that handle_mm_fault() will always be able to break a COW situation - if an update from another thread ends up modifying the page table some way, handle_mm_fault() may end up requiring us to re-try the operation. That's normally fine, but get_user_pages() ended up re-trying it as a read, and thus a write access could in theory end up losing the dirty bit or be done on a page that had not been properly COW'ed. This makes get_user_pages() always retry write accesses as write accesses by making "follow_page()" require that a writable follow has the dirty bit set. That simplifies the code and solves the race: if the COW break fails for some reason, we'll just loop around and try again. Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Jul 30, 2005
-
-
Martin J. Bligh authored
We are iterating over all nodes in nr_free_zone_pages(). Because the fallback zonelists contain all nodes in the system, and we walk all the zonelists, we're counting memory multiple times (once for each node). This caused us to make a size estimate of 32GB for an 8GB AMD64 box, which makes all the dirty ratio calculations, etc incorrect. There's still a further bug to fix from e820 holes causing overestimation as well, but this fix is separate, and good as is, and fixes one class of problems. Problem found by Badari, and tested by Ram Pai - thanks! Signed-off-by:
Martin J. Bligh <mbligh@mbligh.org> Signed-off-by:
Matt Dobson <colpatch@us.ibm.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Jul 27, 2005
-
-
Andy Whitcroft authored
Originally __free_pages_bulk used the relative page number within a zone to define its buddies. This meant that to maintain the "maximally aligned" requirements (that an allocation of size N will be aligned at least to N physically) zones had to also be aligned to 1<<MAX_ORDER pages. When __free_pages_bulk was updated to use the relative page frame numbers of the free'd pages to pair buddies this released the alignment constraint on the 'left' edge of the zone. This allows _either_ edge of the zone to contain partial MAX_ORDER sized buddies. These simply never will have matching buddies and thus will never make it to the 'top' of the pyramid. The patch below removes a now redundant check ensuring that the mem_map was aligned to MAX_ORDER. Signed-off-by:
Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <christoph@lameter.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
suzuki authored
The madvise() system call returns -EBADF for areas which does not map to files, only for *behaviour* request MADV_WILLNEED. According to man pages, madvise returns : EBADF - the map exists, but the area maps something that isn't a file. Fixes bug 2995. Signed-off-by:
Suzuki K P <suzuki@in.ibm.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andrew Morton authored
Fix bug identifued by Richard Purdie <rpurdie@rpsys.net>. oprofile calls check_user_page_readable() from interrupt context, so we deadlock over various VFS locks. But check_user_page_readable() doesn't imply either a read or a write of the page's contents. Change __follow_page() so that check_user_page_readable() can tell __follow_page() that we're not accessing the page's contents, and use that info to avoid the troublesome lock-takings. Also, make follow_page() inline for the single callsite in memory.c to save a bit of stack space. Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Andi Kleen authored
All mempolicy changes must be inside the spinlock and readding the rb_erase prevents a crash while doing: > echo "1" > /tmp/numatest > numactl --length=0x4000 --shm /tmp/numatest --localalloc > numactl --length=0x2000 --offset=0 --shm /tmp/numatest --membind=0 > numactl --length=0x2000 --offset=0x2000 --shm /tmp/numatest --membind=1 > ipcs > ipcrm -M "the_key_value_of_this_shm_area" Based on a patch by John Blackwood Cc: <john.blackwood@ccur.com> Cc: <andrea@suse.de> Signed-off-by:
Andi Kleen <ak@suse.de> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Jul 15, 2005
-
-
Carsten Otte authored
This patch includes feedback from Andrew and Christoph. Thanks for taking time to review. Use of empty_zero_page was eliminated to fix compilation for architectures that don't have it. This patch removes setting pages up-to-date in ext2_get_xip_page and all bug checks to verify that the page is indeed up to date. Setting the page state on mapping to userland is bogus. None of the code patchs involved with these pages in mm cares about the page state. still on my ToDo list: identify a place outside second extended where __inode_direct_access should reside Signed-off-by:
Carsten Otte <cotte@de.ibm.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Jul 12, 2005
-
-
Geert Uytterhoeven authored
mm/filemap_xip.c: In function `__xip_unmap': mm/filemap_xip.c:194: request for member `pte' in something not a structure or union Apparently pte_pfn() takes a pte_t, not a pointer to a pte_t. From looking at asm/page.h, it seems to be the same on ia32 or ppc (iff STRICT_MM_TYPECHECKS is enabled, which is disabled by default on ppc). Acked-by:
Carsten Otte <cotte@de.ibm.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Jul 08, 2005
-
-
Alexey Dobriyan authored
Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Anton Blanchard authored
We now print statistics when invoking the OOM killer, however this information is not rate limited and you can get into situations where the console is continually spammed. For example, when a task is exiting the OOM killer will simply return (waiting for that task to exit and clear up memory). If the VM continually calls back into the OOM killer we get thousands of copies of show_mem() on the console. Use printk_ratelimit() to quieten it. Signed-off-by:
Anton Blanchard <anton@samba.org> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Marcelo Tosatti authored
Remove completly bogus comment from did_some_progress != 0 handling (that same comment is a few lines below on did_some_progress = 0 case, where it belongs). Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Marcelo Tosatti authored
Dump the current allocation order when OOM killing. Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Jul 06, 2005
-
-
Christoph Lameter authored
This patch used to be in Andrew's tree before the NUMA slab allocator went in. Either this patch or the NUMA slab allocator is needed in order for kmalloc_node to work correctly. pcibus_to_node may be used to generate the node information passed to kmalloc_node. pcibus_to_node returns -1 if it was not able to determine on which node a pcibus is located. For that case kmalloc_node must work like kmalloc. Signed-off-by:
Christoph Lameter <christoph@lameter.com> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Jun 29, 2005
-
-
Pekka J Enberg authored
Signed-off-by:
Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Jun 27, 2005
-
-
Bob Picco authored
I spotted this issue while in memmap_init last week. I can't say the change has any test coverage by me. start_pfn was formerly used in main "for" loop. The fix is replace start_pfn with pfn. Signed-off-by:
Bob Picco <bob.picco@hp.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Jun 26, 2005
-
-
Christoph Lameter authored
1. Establish a simple API for process freezing defined in linux/include/sched.h: frozen(process) Check for frozen process freezing(process) Check if a process is being frozen freeze(process) Tell a process to freeze (go to refrigerator) thaw_process(process) Restart process frozen_process(process) Process is frozen now 2. Remove all references to PF_FREEZE and PF_FROZEN from all kernel sources except sched.h 3. Fix numerous locations where try_to_freeze is manually done by a driver 4. Remove the argument that is no longer necessary from two function calls. 5. Some whitespace cleanup 6. Clear potential race in refrigerator (provides an open window of PF_FREEZE cleared before setting PF_FROZEN, recalc_sigpending does not check PF_FROZEN). This patch does not address the problem of freeze_processes() violating the rule that a task may only modify its own flags by setting PF_FREEZE. This is not clean in an SMP environment. freeze(process) is therefore not SMP safe! Signed-off-by:
Christoph Lameter <christoph@lameter.com> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Jun 25, 2005
-
-
Nick Wilson authored
This patch makes use of ALIGN() to remove duplicate round-up code. Signed-off-by:
Nick Wilson <njw@osdl.org> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Vivek Goyal authored
This patch retrieves the max_pfn being used by previous kernel and stores it in a safe location (saved_max_pfn) before it is overwritten due to user defined memory map. This pfn is used to make sure that user does not try to read the physical memory beyond saved_max_pfn. Signed-off-by:
Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Badari Pulavarty authored
Here is the fix for the problem described in http://bugzilla.kernel.org/show_bug.cgi?id=4721 Basically, problem is generic_file_buffered_write() is accessing beyond end of the iov[] vector after handling the last vector. If we happen to cross page boundary, we get a fault. I think this simple patch is good enough. If we really don't want to depend on the "count", then we need pass nr_segs to filemap_set_next_iovec() and decrement it and check it. Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Pavel Machek authored
CONFIG_PM_DISK is long gone, but it still managed to survived at few places. Signed-off-by:
Pavel Machek <pavel@suse.cz> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Hugh Dickins authored
Out-of-tree user of remap_pfn_range hit kernel BUG at mm/memory.c:1112! It passes an unrounded size to remap_pfn_range, which was okay before 2.6.12, but misses remap_pte_range's new end condition. An audit of all the other ptwalks confirms that this is the only one so exposed. Signed-off-by:
Hugh Dickins <hugh@veritas.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Hifumi Hisashi authored
Fix a bug on error handling in the direct I/O function. Currently, if a file is opened with the O_DIRECT|O_SYNC flag, the write() syscall cannot receive the EIO error after an I/O error (SCSI cable is disconnected etc.). Return values of other points that call generic_osync_inode() are treated appropriately. Signed-off-by:
Hisashi Hifumi <hifumi.hisashi@lab.ntt.co.jp> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Jun 24, 2005
-
-
Carsten Otte authored
Make sys_madvice/fadvice return sane with xip. Signed-off-by:
Carsten Otte <cotte@de.ibm.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Carsten Otte authored
This patch reworks filemap_xip.c with the goal to reduce code duplication from mm/filemap.c. It applies agains 2.6.12-rc6-mm1. Instead of implementing the aio functions, this one implements the synchronous read/write functions only. For readv and writev, the generic fallback is used. For aio, we rely on the application doing the fallback. Since our "synchronous" function does memcpy immediately anyway, there is no performance difference between using the fallbacks or implementing each operation. Signed-off-by:
Carsten Otte <cotte@de.ibm.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Carsten Otte authored
- generic_file* file operations do no longer have a xip/non-xip split - filemap_xip.c implements a new set of fops that require get_xip_page aop to work proper. all new fops are exported GPL-only (don't like to see whatever code use those except GPL modules) - __xip_unmap now uses page_check_address, which is no longer static in rmap.c, and defined in linux/rmap.h - mm/filemap.h is now much more clean, plainly having just Linus' inline funcs moved here from filemap.c - fix includes in filemap_xip to make it build cleanly on i386 Signed-off-by:
Carsten Otte <cotte@de.ibm.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Martin Waitz authored
This patch updates some comments to match code changes. Signed-off-by:
Martin Waitz <tali@admingilde.org> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
- Jun 23, 2005
-
-
Christoph Lameter authored
The following patch removes the f_error field and all checks of f_error. Trond said: f_error was introduced for NFS, and made sense when we were guaranteed always to have a file pointer around when write errors occurred. Since then, we have (for various reasons) had to introduce the nfs_open_context in order to track the file read/write state, and it made sense to move our f_error tracking there too. Signed-off-by:
Christoph Lameter <christoph@lameter.com> Acked-by:
Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Benjamin LaHaise authored
Here's a small patch to improve the performance of mempool_alloc by only initializing the wait queue when we're about to wait. Signed-off-by:
Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Pekka Enberg authored
This patch removes redundant VM_ClearReadHint from mm/madvice.c which was left there by Prasanna's patch. Signed-off-by:
Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Paulo Marques authored
This patch creates a new kstrdup library function and changes the "local" implementations in several places to use this function. Most of the changes come from the sound and net subsystems. The sound part had already been acknowledged by Takashi Iwai and the net part by David S. Miller. I left UML alone for now because I would need more time to read the code carefully before making changes there. Signed-off-by:
Paulo Marques <pmarques@grupopie.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-
Christoph Lameter authored
Patch to allocate the control structures for for ide devices on the node of the device itself (for NUMA systems). The patch depends on the Slab API change patch by Manfred and me (in mm) and the pcidev_to_node patch that I posted today. Does some realignment too. Signed-off-by:
Justin M. Forbes <jmforbes@linuxtx.org> Signed-off-by:
Christoph Lameter <christoph@lameter.com> Signed-off-by:
Pravin Shelar <pravin@calsoftinc.com> Signed-off-by:
Shobhit Dayal <shobhit@calsoftinc.com> Signed-off-by:
Andrew Morton <akpm@osdl.org> Signed-off-by:
Linus Torvalds <torvalds@osdl.org>
-