- Feb 28, 2019
-
-
Greg Kroah-Hartman authored
debugfs can now report an error code if something went wrong instead of just NULL. So if the return value is to be used as a "real" dentry, it needs to be checked if it is an error before dereferencing it. This is now happening because of ff9fb72b ("debugfs: return error values, not NULL"). syzbot has found a way to trigger multiple debugfs files attempting to be created, which fails, and then the error code gets passed to dentry_path_raw() which obviously does not like it. Reported-by:
Eric Biggers <ebiggers@kernel.org> Reported-and-tested-by:
<syzbot+7857962b4d45e602b8ad@syzkaller.appspotmail.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Acked-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 07, 2019
-
-
James Morse authored
To split up APEIs in_nmi() path, the caller needs to always be in_nmi(). KVM shouldn't have to know about this, pull the RAS plumbing out into a header file. Currently guest synchronous external aborts are claimed as RAS notifications by handle_guest_sea(), which is hidden in the arch codes mm/fault.c. 32bit gets a dummy declaration in system_misc.h. There is going to be more of this in the future if/when the kernel supports the SError-based firmware-first notification mechanism and/or kernel-first notifications for both synchronous external abort and SError. Each of these will come with some Kconfig symbols and a handful of header files. Create a header file for all this. This patch gives handle_guest_sea() a 'kvm_' prefix, and moves the declarations to kvm_ras.h as preparation for a future patch that moves the ACPI-specific RAS code out of mm/fault.c. Signed-off-by:
James Morse <james.morse@arm.com> Reviewed-by:
Punit Agrawal <punit.agrawal@arm.com> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Tested-by:
Tyler Baicar <tbaicar@codeaurora.org> Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Jann Horn authored
kvm_ioctl_create_device() does the following: 1. creates a device that holds a reference to the VM object (with a borrowed reference, the VM's refcount has not been bumped yet) 2. initializes the device 3. transfers the reference to the device to the caller's file descriptor table 4. calls kvm_get_kvm() to turn the borrowed reference to the VM into a real reference The ownership transfer in step 3 must not happen before the reference to the VM becomes a proper, non-borrowed reference, which only happens in step 4. After step 3, an attacker can close the file descriptor and drop the borrowed reference, which can cause the refcount of the kvm object to drop to zero. This means that we need to grab a reference for the device before anon_inode_getfd(), otherwise the VM can disappear from under us. Fixes: 852b6d57 ("kvm: add device control API") Cc: stable@kernel.org Signed-off-by:
Jann Horn <jannh@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Suzuki K Poulose authored
We restrict mapping the PUD huge pages in stage2 to only when the stage2 has 4 level page table, leaving the feature unused with the default IPA size. But we could use it even with a 3 level page table, i.e, when the PUD level is folded into PGD, just like the stage1. Relax the condition to allow using the PUD huge page mappings at stage2 when it is possible. Cc: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
We currently initialize the group of private IRQs during kvm_vgic_vcpu_init, and the value of the group depends on the GIC model we are emulating. However, CPUs created before creating (and initializing) the VGIC might end up with the wrong group if the VGIC is created as GICv3 later. Since we have no enforced ordering of creating the VGIC and creating VCPUs, we can end up with part the VCPUs being properly intialized and the remaining incorrectly initialized. That also means that we have no single place to do the per-cpu data structure initialization which depends on knowing the emulated GIC model (which is only the group field). This patch removes the incorrect comment from kvm_vgic_vcpu_init and initializes the group of all previously created VCPUs's private interrupts in vgic_init in addition to the existing initialization in kvm_vgic_vcpu_init. Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Marc Zyngier authored
The current kvm_psci_vcpu_on implementation will directly try to manipulate the state of the VCPU to reset it. However, since this is not done on the thread that runs the VCPU, we can end up in a strangely corrupted state when the source and target VCPUs are running at the same time. Fix this by factoring out all reset logic from the PSCI implementation and forwarding the required information along with a request to the target VCPU. Reviewed-by:
Andrew Jones <drjones@redhat.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com>
-
- Jan 25, 2019
-
-
Paul E. McKenney authored
lockdep_assert_held() is better suited to checking locking requirements, since it only checks if the current thread holds the lock regardless of whether someone else does. This is also a step towards possibly removing spin_is_locked(). Signed-off-by:
Paul E. McKenney <paulmck@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: <kvm@vger.kernel.org> Acked-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jan 24, 2019
-
-
Julien Thierry authored
vgic_cpu->ap_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by:
Julien Thierry <julien.thierry@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@arm.com> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com>
-
Julien Thierry authored
vgic_dist->lpi_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by:
Julien Thierry <julien.thierry@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@arm.com> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com>
-
Julien Thierry authored
vgic_irq->irq_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by:
Julien Thierry <julien.thierry@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@arm.com> Acked-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com>
-
- Jan 11, 2019
-
-
Tomas Bortoli authored
The function at issue does not fully validate the content of the structure pointed by the log parameter, though its content has just been copied from userspace and lacks validation. Fix that. Moreover, change the type of n to unsigned long as that is the type returned by kvm_dirty_bitmap_bytes(). Signed-off-by:
Tomas Bortoli <tomasbortoli@gmail.com> Reported-by:
<syzbot+028366e52c9ace67deb3@syzkaller.appspotmail.com> [Squashed the fix from Paolo. - Radim.] Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- Jan 04, 2019
-
-
Joel Fernandes (Google) authored
Patch series "Add support for fast mremap". This series speeds up the mremap(2) syscall by copying page tables at the PMD level even for non-THP systems. There is concern that the extra 'address' argument that mremap passes to pte_alloc may do something subtle architecture related in the future that may make the scheme not work. Also we find that there is no point in passing the 'address' to pte_alloc since its unused. This patch therefore removes this argument tree-wide resulting in a nice negative diff as well. Also ensuring along the way that the enabled architectures do not do anything funky with the 'address' argument that goes unnoticed by the optimization. Build and boot tested on x86-64. Build tested on arm64. The config enablement patch for arm64 will be posted in the future after more testing. The changes were obtained by applying the following Coccinelle script. (thanks Julia for answering all Coccinelle questions!). Following fix ups were done manually: * Removal of address argument from pte_fragment_alloc * Removal of pte_alloc_one_fast definitions from m68k and microblaze. // Options: --include-headers --no-includes // Note: I split the 'identifier fn' line, so if you are manually // running it, please unsplit it so it runs for you. virtual patch @pte_alloc_func_def depends on patch exists@ identifier E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; type T2; @@ fn(... - , T2 E2 ) { ... } @pte_alloc_func_proto_noarg depends on patch exists@ type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1, T2); + T3 fn(T1); | - T3 fn(T1, T2, T4); + T3 fn(T1, T2); ) @pte_alloc_func_proto depends on patch exists@ identifier E1, E2, E4; type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1 E1, T2 E2); + T3 fn(T1 E1); | - T3 fn(T1 E1, T2 E2, T4 E4); + T3 fn(T1 E1, T2 E2); ) @pte_alloc_func_call depends on patch exists@ expression E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ fn(... -, E2 ) @pte_alloc_macro depends on patch exists@ identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; identifier a, b, c; expression e; position p; @@ ( - #define fn(a, b, c) e + #define fn(a, b) e | - #define fn(a, b) e + #define fn(a) e ) Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.com Signed-off-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Suggested-by:
Kirill A. Shutemov <kirill@shutemov.name> Acked-by:
Kirill A. Shutemov <kirill@shutemov.name> Cc: Michal Hocko <mhocko@kernel.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Linus Torvalds authored
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 28, 2018
-
-
Jérôme Glisse authored
Patch series "mmu notifier contextual informations", v2. This patchset adds contextual information, why an invalidation is happening, to mmu notifier callback. This is necessary for user of mmu notifier that wish to maintains their own data structure without having to add new fields to struct vm_area_struct (vma). For instance device can have they own page table that mirror the process address space. When a vma is unmap (munmap() syscall) the device driver can free the device page table for the range. Today we do not have any information on why a mmu notifier call back is happening and thus device driver have to assume that it is always an munmap(). This is inefficient at it means that it needs to re-allocate device page table on next page fault and rebuild the whole device driver data structure for the range. Other use case beside munmap() also exist, for instance it is pointless for device driver to invalidate the device page table when the invalidation is for the soft dirtyness tracking. Or device driver can optimize away mprotect() that change the page table permission access for the range. This patchset enables all this optimizations for device drivers. I do not include any of those in this series but another patchset I am posting will leverage this. The patchset is pretty simple from a code point of view. The first two patches consolidate all mmu notifier arguments into a struct so that it is easier to add/change arguments. The last patch adds the contextual information (munmap, protection, soft dirty, clear, ...). This patch (of 3): To avoid having to change many callback definition everytime we want to add a parameter use a structure to group all parameters for the mmu_notifier invalidate_range_start/end callback. No functional changes with this patch. [akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc] Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.com Signed-off-by:
Jérôme Glisse <jglisse@redhat.com> Acked-by:
Jan Kara <jack@suse.cz> Acked-by: Jason Gunthorpe <jgg@mellanox.com> [infiniband] Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Dec 21, 2018
-
-
Lan Tianyu authored
This patch is to move tlb flush in kvm_set_pte_rmapp() to kvm_mmu_notifier_change_pte() in order to avoid redundant tlb flush. Signed-off-by:
Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Lan Tianyu authored
The patch is to make kvm_set_spte_hva() return int and caller can check return value to determine flush tlb or not. Signed-off-by:
Lan Tianyu <Tianyu.Lan@microsoft.com> Acked-by:
Paul Mackerras <paulus@ozlabs.org> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Wei Yang authored
Signed-off-by:
Wei Yang <richard.weiyang@gmail.com> [Preserved the iff and a probably intentional weird bracket notation. Also dropped the style change to make a single-purpose patch. - Radim] Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Jim Mattson authored
Since the offset is added directly to the hva from the gfn_to_hva_cache, a negative offset could result in an out of bounds write. The existing BUG_ON only checks for addresses beyond the end of the gfn_to_hva_cache, not for addresses before the start of the gfn_to_hva_cache. Note that all current call sites have non-negative offsets. Fixes: 4ec6e863 ("kvm: Introduce kvm_write_guest_offset_cached()") Reported-by:
Cfir Cohen <cfir@google.com> Signed-off-by:
Jim Mattson <jmattson@google.com> Reviewed-by:
Cfir Cohen <cfir@google.com> Reviewed-by:
Peter Shier <pshier@google.com> Reviewed-by:
Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by:
Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
Jim Mattson authored
Previously, in the case where (gpa + len) wrapped around, the entire region was not validated, as the comment claimed. It doesn't actually seem that wraparound should be allowed here at all. Furthermore, since some callers don't check the return code from this function, it seems prudent to clear ghc->memslot in the event of an error. Fixes: 8f964525 ("KVM: Allow cross page reads and writes from cached translations.") Reported-by:
Cfir Cohen <cfir@google.com> Signed-off-by:
Jim Mattson <jmattson@google.com> Reviewed-by:
Cfir Cohen <cfir@google.com> Reviewed-by:
Marc Orr <marcorr@google.com> Cc: Andrew Honig <ahonig@google.com> Signed-off-by:
Radim Krčmář <rkrcmar@redhat.com>
-
- Dec 19, 2018
-
-
Marc Zyngier authored
32 and 64bit use different symbols to identify the traps. 32bit has a fine grained approach (prefetch abort, data abort and HVC), while 64bit is pretty happy with just "trap". This has been fine so far, except that we now need to decode some of that in tracepoints that are common to both architectures. Introduce ARM_EXCEPTION_IS_TRAP which abstracts the trap symbols and make the tracepoint use it. Acked-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
There are two things we need to take care of when we create block mappings in the stage 2 page tables: (1) The alignment within a PMD between the host address range and the guest IPA range must be the same, since otherwise we end up mapping pages with the wrong offset. (2) The head and tail of a memory slot may not cover a full block size, and we have to take care to not map those with block descriptors, since we could expose memory to the guest that the host did not intend to expose. So far, we have been taking care of (1), but not (2), and our commentary describing (1) was somewhat confusing. This commit attempts to factor out the checks of both into a common function, and if we don't pass the check, we won't attempt any PMD mappings for neither hugetlbfs nor THP. Note that we used to only check the alignment for THP, not for hugetlbfs, but as far as I can tell the check needs to be applied to both scenarios. Cc: Ralph Palutke <ralph.palutke@fau.de> Cc: Lukas Braun <koomi@moshbit.net> Reported-by:
Lukas Braun <koomi@moshbit.net> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Marc Zyngier authored
We currently only halt the guest when a vCPU messes with the active state of an SPI. This is perfectly fine for GICv2, but isn't enough for GICv3, where all vCPUs can access the state of any other vCPU. Let's broaden the condition to include any GICv3 interrupt that has an active state (i.e. all but LPIs). Cc: stable@vger.kernel.org Reviewed-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
kvm_timer_vcpu_terminate can only be called in two scenarios: 1. As part of cleanup during a failed VCPU create 2. As part of freeing the whole VM (struct kvm refcount == 0) In the first case, we cannot have programmed any timers or mapped any IRQs, and therefore we do not have to cancel anything or unmap anything. In the second case, the VCPU will have gone through kvm_timer_vcpu_put, which will have canceled the emulated physical timer's hrtimer, and we do not need to that here as well. We also do not care if the irq is recorded as mapped or not in the VGIC data structure, because the whole VM is going away. That leaves us only with having to ensure that we cancel the bg_timer if we were blocking the last time we called kvm_timer_vcpu_put(). Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
The use of a work queue in the hrtimer expire function for the bg_timer is a leftover from the time when we would inject interrupts when the bg_timer expired. Since we are no longer doing that, we can instead call kvm_vcpu_wake_up() directly from the hrtimer function and remove all workqueue functionality from the arch timer code. Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
The kvm_exit tracepoint strangely always reported exits as being IRQs. This seems to be because either the __print_symbolic or the tracepoint macros use a variable named idx. Take this chance to update the fields in the tracepoint to reflect the concepts in the arm64 architecture that we pass to the tracepoint and move the exception type table to the same location and header files as the exits code. We also clear out the exception code to 0 for IRQ exits (which translates to UNKNOWN in text) to make it slighyly less confusing to parse the trace output. Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
When checking if there are any pending IRQs for the VM, consider the active state and priority of the IRQs as well. Otherwise we could be continuously scheduling a guest hypervisor without it seeing an IRQ. Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Gustavo A. R. Silva authored
When using the nospec API, it should be taken into account that: "...if the CPU speculates past the bounds check then * array_index_nospec() will clamp the index within the range of [0, * size)." The above is part of the header for macro array_index_nospec() in linux/nospec.h Now, in this particular case, if intid evaluates to exactly VGIC_MAX_SPI or to exaclty VGIC_MAX_PRIVATE, the array_index_nospec() macro ends up returning VGIC_MAX_SPI - 1 or VGIC_MAX_PRIVATE - 1 respectively, instead of VGIC_MAX_SPI or VGIC_MAX_PRIVATE, which, based on the original logic: /* SGIs and PPIs */ if (intid <= VGIC_MAX_PRIVATE) return &vcpu->arch.vgic_cpu.private_irqs[intid]; /* SPIs */ if (intid <= VGIC_MAX_SPI) return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS]; are valid values for intid. Fix this by calling array_index_nospec() macro with VGIC_MAX_PRIVATE + 1 and VGIC_MAX_SPI + 1 as arguments for its parameter size. Fixes: 41b87599 ("KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()") Cc: stable@vger.kernel.org Signed-off-by:
Gustavo A. R. Silva <gustavo@embeddedor.com> [dropped the SPI part which was fixed separately] Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
- Dec 18, 2018
-
-
Eric Biggers authored
If you register a kvm_coalesced_mmio_zone with '.pio = 0' but then unregister it with '.pio = 1', KVM_UNREGISTER_COALESCED_MMIO will try to unregister it from KVM_PIO_BUS rather than KVM_MMIO_BUS, which is a no-op. But it frees the kvm_coalesced_mmio_dev anyway, causing a use-after-free. Fix it by only unregistering and freeing the zone if the correct value of 'pio' is provided. Reported-by:
<syzbot+f87f60bb6f13f39b54e3@syzkaller.appspotmail.com> Fixes: 0804c849 ("kvm/x86 : add coalesced pio support") Signed-off-by:
Eric Biggers <ebiggers@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Marc Zyngier authored
SPIs should be checked against the VMs specific configuration, and not the architectural maximum. Cc: stable@vger.kernel.org Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
In attempting to re-construct the logic for our stage 2 page table layout I found the reasoning in the comment explaining how we calculate the number of levels used for stage 2 page tables a bit backwards. This commit attempts to clarify the comment, to make it slightly easier to read without having the Arm ARM open on the right page. While we're at it, fixup a typo in a comment that was recently changed. Reviewed-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Julien Thierry authored
To change the active state of an MMIO, halt is requested for all vcpus of the affected guest before modifying the IRQ state. This is done by calling cond_resched_lock() in vgic_mmio_change_active(). However interrupts are disabled at this point and we cannot reschedule a vcpu. We actually don't need any of this, as kvm_arm_halt_guest ensures that all the other vcpus are out of the guest. Let's just drop that useless code. Signed-off-by:
Julien Thierry <julien.thierry@arm.com> Suggested-by:
Christoffer Dall <christoffer.dall@arm.com> Cc: stable@vger.kernel.org Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Punit Agrawal authored
KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages. Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by:
Punit Agrawal <punit.agrawal@arm.com> Reviewed-by:
Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replace BUG() => WARN_ON(1) for arm32 PUD helpers ] Signed-off-by:
Suzuki Poulose <suzuki.poulose@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Punit Agrawal authored
In preparation for creating larger hugepages at Stage 2, add support to the age handling notifiers for PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing code. Signed-off-by:
Punit Agrawal <punit.agrawal@arm.com> Reviewed-by:
Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) for arm32 PUD helpers ] Signed-off-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Punit Agrawal authored
In preparation for creating larger hugepages at Stage 2, extend the access fault handling at Stage 2 to support PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing of code. Signed-off-by:
Punit Agrawal <punit.agrawal@arm.com> Reviewed-by:
Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) in PUD helpers ] Signed-off-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Punit Agrawal authored
In preparation for creating PUD hugepages at stage 2, add support for detecting execute permissions on PUD page table entries. Faults due to lack of execute permissions on page table entries is used to perform i-cache invalidation on first execute. Provide trivial implementations of arm32 helpers to allow sharing of code. Signed-off-by:
Punit Agrawal <punit.agrawal@arm.com> Reviewed-by:
Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) in arm32 PUD helpers ] Signed-off-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Punit Agrawal authored
In preparation for creating PUD hugepages at stage 2, add support for write protecting PUD hugepages when they are encountered. Write protecting guest tables is used to track dirty pages when migrating VMs. Also, provide trivial implementations of required kvm_s2pud_* helpers to allow sharing of code with arm32. Signed-off-by:
Punit Agrawal <punit.agrawal@arm.com> Reviewed-by:
Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON() in arm32 pud helpers ] Signed-off-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Punit Agrawal authored
Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry. The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm. Signed-off-by:
Punit Agrawal <punit.agrawal@arm.com> Reviewed-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by:
Christoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by:
Marc Zyngier <marc.zyngier@arm.com> Signed-off-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Punit Agrawal authored
Stage 2 fault handler marks a page as executable if it is handling an execution fault or if it was a permission fault in which case the executable bit needs to be preserved. The logic to decide if the page should be marked executable is duplicated for PMD and PTE entries. To avoid creating another copy when support for PUD hugepages is introduced refactor the code to share the checks needed to mark a page table entry as executable. Signed-off-by:
Punit Agrawal <punit.agrawal@arm.com> Reviewed-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Punit Agrawal authored
The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages. Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes. Signed-off-by:
Punit Agrawal <punit.agrawal@arm.com> Reviewed-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-
Christoffer Dall authored
When restoring the active state from userspace, we don't know which CPU was the source for the active state, and this is not architecturally exposed in any of the register state. Set the active_source to 0 in this case. In the future, we can expand on this and exposse the information as additional information to userspace for GICv2 if anyone cares. Cc: stable@vger.kernel.org Signed-off-by:
Christoffer Dall <christoffer.dall@arm.com> Signed-off-by:
Marc Zyngier <marc.zyngier@arm.com>
-