1. 09 May, 2017 1 commit
  2. 08 May, 2017 2 commits
    • Xunlei Pang's avatar
      x86/kexec/64: Use gbpages for identity mappings if available · 8638100c
      Xunlei Pang authored
      
      
      Kexec sets up all identity mappings before booting into the new
      kernel, and this will cause extra memory consumption for paging
      structures which is quite considerable on modern machines with
      huge memory sizes.
      
      E.g. on a 32TB machine that is kdumping, it could waste around
      128MB (around 4MB/TB) from the reserved memory after kexec sets
      all the identity mappings using the current 2MB page.
      
      Add to that the memory needed for the loaded kdump kernel, initramfs,
      etc., and it causes a kexec syscall -NOMEM failure.
      
      As a result, we had to enlarge reserved memory via "crashkernel=X"
      to work around this problem.
      
      This causes some trouble for distributions that use policies
      to evaluate the proper "crashkernel=X" value for users.
      
      So enable gbpages for kexec mappings.
      Signed-off-by: default avatarXunlei Pang <xlpang@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: akpm@linux-foundation.org
      Cc: kexec@lists.infradead.org
      Link: http://lkml.kernel.org/r/1493862171-8799-2-git-send-email-xlpang@redhat.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8638100c
    • Xunlei Pang's avatar
      x86/mm: Add support for gbpages to kernel_ident_mapping_init() · 66aad4fd
      Xunlei Pang authored
      
      
      Kernel identity mappings on x86-64 kernels are created in two
      ways: by the early x86 boot code, or by kernel_ident_mapping_init().
      
      Native kernels (which is the dominant usecase) use the former,
      but the kexec and the hibernation code uses kernel_ident_mapping_init().
      
      There's a subtle difference between these two ways of how identity
      mappings are created, the current kernel_ident_mapping_init() code
      creates identity mappings always using 2MB page(PMD level) - while
      the native kernel boot path also utilizes gbpages where available.
      
      This difference is suboptimal both for performance and for memory
      usage: kernel_ident_mapping_init() needs to allocate pages for the
      page tables when creating the new identity mappings.
      
      This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
      to address these concerns.
      
      The primary advantage would be better TLB coverage/performance,
      because we'd utilize 1GB TLBs instead of 2MB ones.
      
      It is also useful for machines with large number of memory to
      save paging structure allocations(around 4MB/TB using 2MB page)
      when setting identity mappings for all the memory, after using
      1GB page it will consume only 8KB/TB.
      
      ( Note that this change alone does not activate gbpages in kexec,
        we are doing that in a separate patch. )
      Signed-off-by: default avatarXunlei Pang <xlpang@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: akpm@linux-foundation.org
      Cc: kexec@lists.infradead.org
      Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      66aad4fd
  3. 27 Mar, 2017 1 commit
  4. 10 Mar, 2017 1 commit
    • Thomas Gleixner's avatar
      kexec, x86/purgatory: Unbreak it and clean it up · 40c50c1f
      Thomas Gleixner authored
      The purgatory code defines global variables which are referenced via a
      symbol lookup in the kexec code (core and arch).
      
      A recent commit addressing sparse warnings made these static and thereby
      broke kexec_file.
      
      Why did this happen? Simply because the whole machinery is undocumented and
      lacks any form of forward declarations. The variable names are unspecific
      and lack a prefix, so adding forward declarations creates shadow variables
      in the core code. Aside of that the code relies on magic constants and
      duplicate struct definitions with no way to ensure that these things stay
      in sync. The section placement of the purgatory variables happened by
      chance and not by design.
      
      Unbreak kexec and cleanup the mess:
      
       - Add proper forward declarations and document the usage
       - Use common struct definition
       - Use the proper common defines instead of magic constants
       - Add a purgatory_ prefix to have a proper name space
       - Use ARRAY_SIZE() instead of a homebrewn reimpl...
      40c50c1f
  5. 15 Dec, 2016 2 commits
    • Baoquan He's avatar
      kexec: export the value of phys_base instead of symbol address · 401721ec
      Baoquan He authored
      Currently in x86_64, the symbol address of phys_base is exported to
      vmcoreinfo.  Dave Anderson complained this is really useless for his
      Crash implementation.  Because in user-space utility Crash and
      Makedumpfile which exported vmcore information is mainly used for, value
      of phys_base is needed to covert virtual address of exported kernel
      symbol to physical address.  Especially init_level4_pgt, if we want to
      access and go over the page table to look up a PA corresponding to VA,
      firstly we need calculate
      
        page_dir = SYMBOL(init_level4_pgt) - __START_KERNEL_map + phys_base;
      
      Now in Crash and Makedumpfile, we have to analyze the vmcore elf program
      header to get value of phys_base.  As Dave said, it would be preferable
      if it were readily availabl in vmcoreinfo rather than depending upon the
      PT_LOAD semantics.
      
      Hence in this patch change to export the value of phys_base instead of
      its virtual address.
      
      And people also complained that KERNEL_IMAGE_SIZE exporting is x86_64
      only, should be moved into arch dependent function
      arch_crash_save_vmcoreinfo.  Do the moving in this patch.
      
      Link: http://lkml.kernel.org/r/1478568596-30060-2-git-send-email-bhe@redhat.com
      
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Xunlei Pang <xlpang@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Eugene Surovegin <surovegin@google.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      401721ec
    • Baoquan He's avatar
      Revert "kdump, vmcoreinfo: report memory sections virtual addresses" · 69f58384
      Baoquan He authored
      This reverts commit 0549a3c0 ("kdump, vmcoreinfo: report memory
      sections virtual addresses").
      
      Commit 0549a3c0 tells the userspace utility makedumpfile the
      randomized base address of these memmory sections when mm kaslr is
      enabled.  However the following patch "kexec: export the value of
      phys_base instead of symbol address" makes makedumpfile not need these
      addresses any more.
      
      Besides we should use VMCOREINFO_NUMBER to export the value of the
      variable so that we can use the existing number_table mechanism of
      Makedumpfile to fetch it.  So revert it now.  If needed we can add it
      later.
      
      http://lists.infradead.org/pipermail/kexec/2016-October/017540.html
      Link: http://lkml.kernel.org/r/1478568596-30060-1-git-send-email-bhe@redhat.com
      
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Xunlei Pang <xlpang@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Eugene Surovegin <surovegin@google.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69f58384
  6. 11 Oct, 2016 1 commit
  7. 24 May, 2016 1 commit
  8. 21 Jan, 2016 1 commit
  9. 03 Jun, 2015 1 commit
    • Stephen Rothwell's avatar
      x86/mm: Decouple <linux/vmalloc.h> from <asm/io.h> · d6472302
      Stephen Rothwell authored
      
      
      Nothing in <asm/io.h> uses anything from <linux/vmalloc.h>, so
      remove it from there and fix up the resulting build problems
      triggered on x86 {64|32}-bit {def|allmod|allno}configs.
      
      The breakages were triggering in places where x86 builds relied
      on vmalloc() facilities but did not include <linux/vmalloc.h>
      explicitly and relied on the implicit inclusion via <asm/io.h>.
      
      Also add:
      
        - <linux/init.h> to <linux/io.h>
        - <asm/pgtable_types> to <asm/io.h>
      
      ... which were two other implicit header file dependencies.
      Suggested-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      [ Tidied up the changelog. ]
      Acked-by: default avatarDavid Miller <davem@davemloft.net>
      Acked-by: default avatarTakashi Iwai <tiwai@suse.de>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: default avatarVinod Koul <vinod.koul@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: James E.J. Bottomley <JBottomley@odin.com>
      Cc: Jaroslav Kysela <perex@perex.cz>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Suma Ramars <sramars@cisco.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      d6472302
  10. 29 Apr, 2015 1 commit
    • Jiri Kosina's avatar
      x86: introduce kaslr_offset() · 4545c898
      Jiri Kosina authored
      
      
      Offset that has been chosen for kaslr during kernel decompression can be
      easily computed as a difference between _text and __START_KERNEL. We are
      already making use of this in dump_kernel_offset() notifier and in
      arch_crash_save_vmcoreinfo().
      
      Introduce kaslr_offset() that makes this computation instead of hard-coding
      it, so that other kernel code (such as live patching) can make use of it.
      Also convert existing users to make use of it.
      
      This patch is equivalent transofrmation without any effects on the resulting
      code:
      
      	$ diff -u vmlinux.old.asm vmlinux.new.asm
      	--- vmlinux.old.asm     2015-04-28 17:55:19.520983368 +0200
      	+++ vmlinux.new.asm     2015-04-28 17:55:24.141206072 +0200
      	@@ -1,5 +1,5 @@
      
      	-vmlinux.old:     file format elf64-x86-64
      	+vmlinux.new:     file format elf64-x86-64
      
      	Disassembly of section .text:
      	$
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      4545c898
  11. 16 Dec, 2014 1 commit
    • Jiang Liu's avatar
      x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.h · 8643e28d
      Jiang Liu authored
      
      
      Clean up code by moving IOAPIC related declarations from hw_irq.h into
      io_apic.h.
      Signed-off-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
      Cc: Aubrey <aubrey.li@linux.intel.com>
      Cc: Ryan Desfosses <ryan@de...
      8643e28d
  12. 29 Aug, 2014 1 commit
    • Vivek Goyal's avatar
      kexec: create a new config option CONFIG_KEXEC_FILE for new syscall · 74ca317c
      Vivek Goyal authored
      
      
      Currently new system call kexec_file_load() and all the associated code
      compiles if CONFIG_KEXEC=y.  But new syscall also compiles purgatory
      code which currently uses gcc option -mcmodel=large.  This option seems
      to be available only gcc 4.4 onwards.
      
      Hiding new functionality behind a new config option will not break
      existing users of old gcc.  Those who wish to enable new functionality
      will require new gcc.  Having said that, I am trying to figure out how
      can I move away from using -mcmodel=large but that can take a while.
      
      I think there are other advantages of introducing this new config
      option.  As this option will be enabled only on x86_64, other arches
      don't have to compile generic kexec code which will never be used.  This
      new code selects CRYPTO=y and CRYPTO_SHA256=y.  And all other arches had
      to do this for CONFIG_KEXEC.  Now with introduction of new config
      option, we can remove crypto dependency from other arches.
      
      Now CONFIG_KEXEC_FILE is available only on x86_64.  So whereever I had
      CONFIG_X86_64 defined, I got rid of that.
      
      For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed it to
      "depends on CRYPTO=y".  This should be safer as "select" is not
      recursive.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Tested-by: default avatarShaun Ruffell <sruffell@digium.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      74ca317c
  13. 08 Aug, 2014 5 commits
    • Vivek Goyal's avatar
      kexec: verify the signature of signed PE bzImage · 8e7d8381
      Vivek Goyal authored
      
      
      This is the final piece of the puzzle of verifying kernel image signature
      during kexec_file_load() syscall.
      
      This patch calls into PE file routines to verify signature of bzImage.  If
      signature are valid, kexec_file_load() succeeds otherwise it fails.
      
      Two new config options have been introduced.  First one is
      CONFIG_KEXEC_VERIFY_SIG.  This option enforces that kernel has to be
      validly signed otherwise kernel load will fail.  If this option is not
      set, no signature verification will be done.  Only exception will be when
      secureboot is enabled.  In that case signature verification should be
      automatically enforced when secureboot is enabled.  But that will happen
      when secureboot patches are merged.
      
      Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.  This option
      enables signature verification support on bzImage.  If this option is not
      set and previous one is set, kernel image loading will fail because kernel
      does not have support to verify signature of bzImage.
      
      I tested these patches with both "pesign" and "sbsign" signed bzImages.
      
      I used signing_key.priv key and signing_key.x509 cert for signing as
      generated during kernel build process (if module signing is enabled).
      
      Used following method to sign bzImage.
      
      pesign
      ======
      - Convert DER format cert to PEM format cert
      openssl x509 -in signing_key.x509 -inform DER -out signing_key.x509.PEM -outform
      PEM
      
      - Generate a .p12 file from existing cert and private key file
      openssl pkcs12 -export -out kernel-key.p12 -inkey signing_key.priv -in
      signing_key.x509.PEM
      
      - Import .p12 file into pesign db
      pk12util -i /tmp/kernel-key.p12 -d /etc/pki/pesign
      
      - Sign bzImage
      pesign -i /boot/vmlinuz-3.16.0-rc3+ -o /boot/vmlinuz-3.16.0-rc3+.signed.pesign
      -c "Glacier signing key - Magrathea" -s
      
      sbsign
      ======
      sbsign --key signing_key.priv --cert signing_key.x509.PEM --output
      /boot/vmlinuz-3.16.0-rc3+.signed.sbsign /boot/vmlinuz-3.16.0-rc3+
      
      Patch details:
      
      Well all the hard work is done in previous patches.  Now bzImage loader
      has just call into that code and verify whether bzImage signature are
      valid or not.
      
      Also create two config options.  First one is CONFIG_KEXEC_VERIFY_SIG.
      This option enforces that kernel has to be validly signed otherwise kernel
      load will fail.  If this option is not set, no signature verification will
      be done.  Only exception will be when secureboot is enabled.  In that case
      signature verification should be automatically enforced when secureboot is
      enabled.  But that will happen when secureboot patches are merged.
      
      Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.  This option
      enables signature verification support on bzImage.  If this option is not
      set and previous one is set, kernel image loading will fail because kernel
      does not have support to verify signature of bzImage.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Fleming <matt@console-pimps.org>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8e7d8381
    • Vivek Goyal's avatar
      kexec: support for kexec on panic using new system call · dd5f7260
      Vivek Goyal authored
      
      
      This patch adds support for loading a kexec on panic (kdump) kernel usning
      new system call.
      
      It prepares ELF headers for memory areas to be dumped and for saved cpu
      registers.  Also prepares the memory map for second kernel and limits its
      boot to reserved areas only.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dd5f7260
    • Vivek Goyal's avatar
      kexec-bzImage64: support for loading bzImage using 64bit entry · 27f48d3e
      Vivek Goyal authored
      
      
      This is loader specific code which can load bzImage and set it up for
      64bit entry.  This does not take care of 32bit entry or real mode entry.
      
      32bit mode entry can be implemented if somebody needs it.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      27f48d3e
    • Vivek Goyal's avatar
      kexec: load and relocate purgatory at kernel load time · 12db5562
      Vivek Goyal authored
      
      
      Load purgatory code in RAM and relocate it based on the location.
      Relocation code has been inspired by module relocation code and purgatory
      relocation code in kexec-tools.
      
      Also compute the checksums of loaded kexec segments and store them in
      purgatory.
      
      Arch independent code provides this functionality so that arch dependent
      bootloaders can make use of it.
      
      Helper functions are provided to get/set symbol values in purgatory which
      are used by bootloaders later to set things like stack and entry point of
      second kernel etc.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12db5562
    • Vivek Goyal's avatar
      kexec: implementation of new syscall kexec_file_load · cb105258
      Vivek Goyal authored
      
      
      Previous patch provided the interface definition and this patch prvides
      implementation of new syscall.
      
      Previously segment list was prepared in user space.  Now user space just
      passes kernel fd, initrd fd and command line and kernel will create a
      segment list internally.
      
      This patch contains generic part of the code.  Actual segment preparation
      and loading is done by arch and image specific loader.  Which comes in
      next patch.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb105258
  14. 26 Feb, 2014 1 commit
  15. 29 Jan, 2013 3 commits
  16. 22 Sep, 2010 1 commit
  17. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  18. 02 Jun, 2009 1 commit
  19. 08 May, 2009 1 commit
  20. 11 Mar, 2009 3 commits
  21. 04 Feb, 2009 1 commit
    • Huang Ying's avatar
      x86: kexec: Use one page table in x86_64 machine_kexec · f5deb796
      Huang Ying authored
      
      
      Impact: reduce kernel BSS size by 7 pages, improve code readability
      
      Two page tables are used in current x86_64 kexec implementation. One
      is used to jump from kernel virtual address to identity map address,
      the other is used to map all physical memory. In fact, on x86_64,
      there is no conflict between kernel virtual address space and physical
      memory space, so just one page table is sufficient. The page table
      pages used to map control page are dynamically allocated to save
      memory if kexec image is not loaded. ASM code used to map control page
      is replaced by C code too.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      f5deb796
  22. 26 Jul, 2008 1 commit
    • Huang Ying's avatar
      kexec jump · 3ab83521
      Huang Ying authored
      This patch provides an enhancement to kexec/kdump.  It implements the
      following features:
      
      - Backup/restore memory used by the original kernel before/after
        kexec.
      
      - Save/restore CPU state before/after kexec.
      
      The features of this patch can be used as a general method to call program in
      physical mode (paging turning off).  This can be used to call BIOS code under
      Linux.
      
      kexec-tools needs to be patched to support kexec jump. The patches and
      the precompiled kexec can be download from the following URL:
      
             source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
             patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
             binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10
      
      
      
      Usage example of calling some physical mode code and return:
      
      1. Compile and install patched kernel with following options selected:
      
      CONFIG_X86_32=y
      CONFIG_KEXEC=y
      CONFIG_PM=y
      CONFIG_KEXEC_JUMP=y
      
      2. Build patched kexec-tool or download the pre-built one.
      
      3. Build some physical mode executable named such as "phy_mode"
      
      4. Boot kernel compiled in step 1.
      
      5. Load physical mode executable with /sbin/kexec. The shell command
         line can be as follow:
      
         /sbin/kexec --load-preserve-context --args-none phy_mode
      
      6. Call physical mode executable with following shell command line:
      
         /sbin/kexec -e
      
      Implementation point:
      
      To support jumping without reserving memory.  One shadow backup page (source
      page) is allocated for each page used by kexeced code image (destination
      page).  When do kexec_load, the image of kexeced code is loaded into source
      pages, and before executing, the destination pages and the source pages are
      swapped, so the contents of destination pages are backupped.  Before jumping
      to the kexeced code image and after jumping back to the original kernel, the
      destination pages and the source pages are swapped too.
      
      C ABI (calling convention) is used as communication protocol between
      kernel and called code.
      
      A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
      indicate that the loaded kernel image is used for jumping back.
      
      Now, only the i386 architecture is supported.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ab83521
  23. 08 Jul, 2008 1 commit
  24. 23 May, 2008 1 commit
  25. 02 Apr, 2008 1 commit
    • Ken'ichi Ohmichi's avatar
      vmcoreinfo: add the symbol "phys_base" · 629c8b4c
      Ken'ichi Ohmichi authored
      
      
      Fix the problem that makedumpfile sometimes fails on x86_64 machine.
      
      This patch adds the symbol "phys_base" to a vmcoreinfo data.  The
      vmcoreinfo data has the minimum debugging information only for dump
      filtering.  makedumpfile (dump filtering command) gets it to distinguish
      unnecessary pages, and makedumpfile creates a small dumpfile.
      
      On x86_64 kernel which compiled with CONFIG_PHYSICAL_START=0x0 and
      CONFIG_RELOCATABLE=y, makedumpfile fails like the following:
      
       # makedumpfile -d31 /proc/vmcore dumpfile
       The kernel version is not supported.
       The created dumpfile may be incomplete.
       _exclude_free_page: Can't get next online node.
      
       makedumpfile Failed.
       #
      
      The cause is the lack of the symbol "phys_base" in a vmcoreinfo data.
      If the symbol "phys_base" does not exist, makedumpfile considers an
      x86_64 kernel as non relocatable.  As the result, makedumpfile
      misunderstands the physical address where the kernel is loaded, and it
      cannot translate a kernel virtual address to physical address correctly.
      
      To fix this problem, this patch adds the symbol "phys_base" to a
      vmcoreinfo data.
      Signed-off-by: default avatarKen'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      629c8b4c
  26. 07 Feb, 2008 1 commit
    • Ken'ichi Ohmichi's avatar
      vmcoreinfo: fix the configuration dependencies · 92df5c3e
      Ken'ichi Ohmichi authored
      
      
      This patch fixes the configuration dependencies in the vmcoreinfo data.
      
      i386's "node_data" is defined in arch/x86/mm/discontig_32.c,
      and x86_64's one is defined in arch/x86/mm/numa_64.c.
      They depend on CONFIG_NUMA:
        arch/x86/mm/Makefile_32:7
          obj-$(CONFIG_NUMA) += discontig_32.o
        arch/x86/mm/Makefile_64:7
          obj-$(CONFIG_NUMA) += numa_64.o
      
      ia64's "pgdat_list" is defined in arch/ia64/mm/discontig.c,
      and it depends on CONFIG_DISCONTIGMEM and CONFIG_SPARSEMEM:
        arch/ia64/mm/Makefile:9-10
          obj-$(CONFIG_DISCONTIGMEM) += discontig.o
          obj-$(CONFIG_SPARSEMEM)    += discontig.o
      
      ia64's "node_memblk" is defined in arch/ia64/mm/numa.c,
      and it depends on CONFIG_NUMA:
        arch/ia64/mm/Makefile:8
          obj-$(CONFIG_NUMA)         += numa.o
      Signed-off-by: default avatarKen'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
      Acked-by: default avatarSimon Horman <horms@verge.net.au>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      92df5c3e
  27. 30 Jan, 2008 1 commit
    • Christoph Lameter's avatar
      x86: 64-bit, make sparsemem vmemmap the only memory model · b263295d
      Christoph Lameter authored
      
      
      Use sparsemem as the only memory model for UP, SMP and NUMA.  Measurements
      indicate that DISCONTIGMEM has a higher overhead than sparsemem.  And
      FLATMEMs benefits are minimal.  So I think its best to simply standardize
      on sparsemem.
      
      Results of page allocator tests (test can be had via git from slab git
      tree branch tests)
      
      Measurements in cycle counts. 1000 allocations were performed and then the
      average cycle count was calculated.
      
      Order	FlatMem	Discontig	SparseMem
      0	  639	  665		  641
      1	  567	  647		  593
      2	  679	  774		  692
      3	  763	  967		  781
      4	  961	 1501		  962
      5	 1356	 2344		 1392
      6	 2224	 3982		 2336
      7	 4869	 7225		 5074
      8	12500	14048		12732
      9	27926	28223		28165
      10	58578	58714		58682
      
      (Note that FlatMem is an SMP config and the rest NUMA configurations)
      
      Memory use:
      
      SMP Sparsemem
      -------------
      
      Kernel size:
      
         text    data     bss     dec     hex filename
      3849268  397739 1264856 5511863  541ab7 vmlinux
      
                   total       used       free     shared    buffers     cached
      Mem:       8242252      41164    8201088          0        352      11512
      -/+ buffers/cache:      29300    8212952
      Swap:      9775512          0    9775512
      
      SMP Flatmem
      -----------
      
      Kernel size:
      
         text    data     bss     dec     hex filename
      3844612  397739 1264536 5506887  540747 vmlinux
      
      So 4.5k growth in text size vs. FLATMEM.
      
                   total       used       free     shared    buffers     cached
      Mem:       8244052      40544    8203508          0        352      11484
      -/+ buffers/cache:      28708    8215344
      
      2k growth in overall memory use after boot.
      
      NUMA discontig:
      
         text    data     bss     dec     hex filename
      3888124  470659 1276504 5635287  55fcd7 vmlinux
      
                   total       used       free     shared    buffers     cached
      Mem:       8256256      56908    8199348          0        352      11496
      -/+ buffers/cache:      45060    8211196
      Swap:      9775512          0    9775512
      
      NUMA sparse:
      
         text    data     bss     dec     hex filename
      3896428  470659 1276824 5643911  561e87 vmlinux
      
      8k text growth. Given that we fully inline virt_to_page and friends now
      that is rather good.
      
                   total       used       free     shared    buffers     cached
      Mem:       8264720      57240    8207480          0        352      11516
      -/+ buffers/cache:      45372    8219348
      Swap:      9775512          0    9775512
      
      The total available memory is increased by 8k.
      
      This patch makes sparsemem the default and removes discontig and
      flatmem support from x86.
      
      [ akpm@linux-foundation.org: allnoconfig build fix ]
      Acked-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      b263295d
  28. 27 Oct, 2007 1 commit
    • Ken'ichi Ohmichi's avatar
      x86: Dump filtering supports x86_64 sparsemem · 69243f91
      Ken'ichi Ohmichi authored
      
      
      This patch adds the symbol "init_level4_pgt" to the vmcoreinfo data so
      that makedumpfile (dump filtering command) supports x86_64 sparsemem 
      kernel of linux-2.6.24.
      
      makedumpfile creates a small dumpfile by excluding unnecessary pages for
      the analysis. It checks attributes in page structures and distinguishes
      necessary pages and unnecessary ones. To check them, makedumpfile gets
      the vmcoreinfo data which has the minimum debugging information only for
      dump filtering.
      
      For older x86_64 kernel (linux-2.6.23 or before), makedumpfile translates
      the virtual address of page structure into physical address by subtracting
      PAGE_OFFSET from virtual address, but this translation isn't effective for
      linux-2.6.24 sparsemem kernel, because its page structures are in virtual
      memmap area. makedumpfile should translate their virtual address by 4-levels
      paging and it needs the symbol "init_level4_pgt".
      Signed-off-by: default avatarKen'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      69243f91
  29. 19 Oct, 2007 1 commit
  30. 17 Oct, 2007 1 commit