- Nov 14, 2024
-
-
Paolo Bonzini authored
kvm_vm_create_worker_thread() is meant to be used for kthreads that can consume significant amounts of CPU time on behalf of a VM or in response to how the VM behaves (for example how it accesses its memory). Therefore it wants to charge the CPU time consumed by that work to the VM's container. However, because of these threads, cgroups which have kvm instances inside never complete freezing. This can be trivially reproduced: root@test ~# mkdir /sys/fs/cgroup/test root@test ~# echo $$ > /sys/fs/cgroup/test/cgroup.procs root@test ~# qemu-system-x86_64 -nographic -enable-kvm and in another terminal: root@test ~# echo 1 > /sys/fs/cgroup/test/cgroup.freeze root@test ~# cat /sys/fs/cgroup/test/cgroup.events populated 1 frozen 0 The cgroup freezing happens in the signal delivery path but kvm_nx_huge_page_recovery_worker, while joining non-root cgroups, never calls into the signal delivery path and thus never gets frozen. Because the cgroup freezer determines whether a given cgroup is frozen by comparing the number of frozen threads to the total number of threads in the cgroup, the cgroup never becomes frozen and users waiting for the state transition may hang indefinitely. Since the worker kthread is tied to a user process, it's better if it behaves similarly to user tasks as much as possible, including being able to send SIGSTOP and SIGCONT. In fact, vhost_task is all that kvm_vm_create_worker_thread() wanted to be and more: not only it inherits the userspace process's cgroups, it has other niceties like being parented properly in the process tree. Use it instead of the homegrown alternative. Incidentally, the new code is also better behaved when you flip recovery back and forth to disabled and back to enabled. If your recovery period is 1 minute, it will run the next recovery after 1 minute independent of how many times you flipped the parameter. (Commit message based on emails from Tejun). Reported-by:
Tejun Heo <tj@kernel.org> Reported-by:
Luca Boccassi <bluca@debian.org> Acked-by:
Tejun Heo <tj@kernel.org> Tested-by:
Luca Boccassi <bluca@debian.org> Cc: stable@vger.kernel.org Reviewed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Nov 03, 2024
-
-
Al Viro authored
in all of those failure exits prior to fdget() are plain returns and the only thing done after fdput() is (on failure exits) a kfree(), which can be done before fdput() just fine. NOTE: in acrn_irqfd_assign() 'fail:' failure exit is wrong for eventfd_ctx_fileget() failure (we only want fdput() there) and once we stop doing that, it doesn't need to check if eventfd is NULL or ERR_PTR(...) there. NOTE: in privcmd we move fdget() up before the allocation - more to the point, before the copy_from_user() attempt. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
all failure exits prior to fdget() leave the scope, all matching fdput() are immediately followed by leaving the scope. [xfs_ioc_commit_range() chunk moved here as well] Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
fdget() is the first thing done in scope, all matching fdput() are immediately followed by leaving the scope. Reviewed-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Oct 30, 2024
-
-
Sean Christopherson authored
Add a Kconfig to allow architectures to opt-out of a TLB flush when a young page is aged, as invalidating TLB entries is not functionally required on most KVM-supported architectures. Stale TLB entries can result in false negatives and theoretically lead to suboptimal reclaim, but in practice all observations have been that the performance gained by skipping TLB flushes outweighs any performance lost by reclaiming hot pages. E.g. the primary MMUs for x86 RISC-V, s390, and PPC Book3S elide the TLB flush for ptep_clear_flush_young(), and arm64's MMU skips the trailing DSB that's required for ordering (presumably because there are optimizations related to eliding other TLB flushes when doing make-before-break). Link: https://lore.kernel.org/r/20241011021051.1557902-18-seanjc@google.com Signed-off-by:
Sean Christopherson <seanjc@google.com>
-
Sean Christopherson authored
To avoid jitter on KVM_RUN due to synchronize_rcu(), use a rwlock instead of RCU to protect vcpu->pid, a.k.a. the pid of the task last used to a vCPU. When userspace is doing M:N scheduling of tasks to vCPUs, e.g. to run SEV migration helper vCPUs during post-copy, the synchronize_rcu() needed to change the PID associated with the vCPU can stall for hundreds of milliseconds, which is problematic for latency sensitive post-copy operations. In the directed yield path, do not acquire the lock if it's contended, i.e. if the associated PID is changing, as that means the vCPU's task is already running. Reported-by:
Steve Rutherford <srutherford@google.com> Reviewed-by:
Steve Rutherford <srutherford@google.com> Acked-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240802200136.329973-3-seanjc@google.com Signed-off-by:
Sean Christopherson <seanjc@google.com>
-
Sean Christopherson authored
Do "return 0" instead of initializing and returning a local variable in kvm_vcpu_yield_to(), e.g. so that it's more obvious what the function returns if there is no task. No functional change intended. Acked-by:
Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240802200136.329973-2-seanjc@google.com Signed-off-by:
Sean Christopherson <seanjc@google.com>
-
Sean Christopherson authored
Rework kvm_vcpu_on_spin() to use a single for-loop instead of making "two" passes over all vCPUs. Given N=kvm->last_boosted_vcpu, the logic is to iterate from vCPU[N+1]..vcpu[N-1], i.e. using two loops is just a kludgy way of handling the wrap from the last vCPU to vCPU0 when a boostable vCPU isn't found in vcpu[N+1]..vcpu[MAX]. Open code the xa_load() instead of using kvm_get_vcpu() to avoid reading online_vcpus in every loop, as well as the accompanying smp_rmb(), i.e. make it a custom kvm_for_each_vcpu(), for all intents and purposes. Oppurtunistically clean up the comment explaining the logic. Link: https://lore.kernel.org/r/20240802202121.341348-1-seanjc@google.com Signed-off-by:
Sean Christopherson <seanjc@google.com>
-
Christophe JAILLET authored
'struct kvm_device_ops' is not modified in this driver. Constifying this structure moves some data to a read-only section, so increases overall security, especially when the structure holds some function pointers. On a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 2605 169 16 2790 ae6 virt/kvm/vfio.o After: ===== text data bss dec hex filename 2685 89 16 2790 ae6 virt/kvm/vfio.o Signed-off-by:
Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/e7361a1bb7defbb0f7056b884e83f8d75ac9fe21.1727517084.git.christophe.jaillet@wanadoo.fr Signed-off-by:
Alex Williamson <alex.williamson@redhat.com>
-
- Oct 25, 2024
-
-
Sean Christopherson authored
Now that KVM no longer relies on an ugly heuristic to find its struct page references, i.e. now that KVM can't get false positives on VM_MIXEDMAP pfns, remove KVM's hack to elevate the refcount for pfns that happen to have a valid struct page. In addition to removing a long-standing wart in KVM, this allows KVM to map non-refcounted struct page memory into the guest, e.g. for exposing GPU TTM buffers to KVM guests. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-86-seanjc@google.com>
-
Sean Christopherson authored
Remove all kvm_{release,set}_pfn_*() APIs now that all users are gone. No functional change intended. Reviewed-by:
Alex Bennée <alex.bennee@linaro.org> Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-85-seanjc@google.com>
-
Sean Christopherson authored
Now that the legacy gfn_to_pfn() APIs are gone, and all callers of hva_to_pfn() pass in a refcounted_page pointer, make it a required field to ensure all future usage in KVM plays nice. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-82-seanjc@google.com>
-
Sean Christopherson authored
Drop gfn_to_pfn() and all its variants now that all users are gone. No functional change intended. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-80-seanjc@google.com>
-
Sean Christopherson authored
Rework gfn_to_page() to support read-only accesses so that it can be used by arm64 to get MTE tags out of guest memory. Opportunistically rewrite the comment to be even more stern about using gfn_to_page(), as there are very few scenarios where requiring a struct page is actually the right thing to do (though there are such scenarios). Add a FIXME to call out that KVM probably should be pinning pages, not just getting pages. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-77-seanjc@google.com>
-
Sean Christopherson authored
Convert gfn_to_page() to the new kvm_follow_pfn() internal API, which will eventually allow removing gfn_to_pfn() and kvm_pfn_to_refcounted_page(). Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-76-seanjc@google.com>
-
Sean Christopherson authored
Provide the "struct page" associated with a guest_memfd pfn as an output from __kvm_gmem_get_pfn() so that KVM guest page fault handlers can directly put the page instead of having to rely on kvm_pfn_to_refcounted_page(). Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-47-seanjc@google.com>
-
Sean Christopherson authored
Refactor guest_memfd usage of __kvm_gmem_get_pfn() to pass the index into the guest_memfd file instead of the gfn, i.e. resolve the index based on the slot+gfn in the caller instead of in __kvm_gmem_get_pfn(). This will allow kvm_gmem_get_pfn() to retrieve and return the specific "struct page", which requires the index into the folio, without a redoing the index calculation multiple times (which isn't costly, just hard to follow). Opportunistically add a kvm_gmem_get_index() helper to make the copy+pasted code easier to understand. Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-46-seanjc@google.com>
-
Sean Christopherson authored
Add a new dedicated API, kvm_faultin_pfn(), for servicing guest page faults, i.e. for getting pages/pfns that will be mapped into the guest via an mmu_notifier-protected KVM MMU. Keep struct kvm_follow_pfn buried in internal code, as having __kvm_faultin_pfn() take "out" params is actually cleaner for several architectures, e.g. it allows the caller to have its own "page fault" structure without having to marshal data to/from kvm_follow_pfn. Long term, common KVM would ideally provide a kvm_page_fault structure, a la x86's struct of the same name. But all architectures need to be converted to a common API before that can happen. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-44-seanjc@google.com>
-
Sean Christopherson authored
Add an off-by-default module param to control whether or not KVM is allowed to map memory that isn't pinned, i.e. that KVM can't guarantee won't be freed while it is mapped into KVM and/or the guest. Don't remove the functionality entirely, as there are use cases where mapping unpinned memory is safe (as defined by the platform owner), e.g. when memory is hidden from the kernel and managed by userspace, in which case userspace is already fully trusted to not muck with guest memory mappings. But for more typical setups, mapping unpinned memory is wildly unsafe, and unnecessary. The APIs are used exclusively by x86's nested virtualization support, and there is no known (or sane) use case for mapping PFN-mapped memory a KVM guest _and_ letting the guest use it for virtualization structures. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-36-seanjc@google.com>
-
Sean Christopherson authored
When creating a memory map for read, don't request a writable pfn from the primary MMU. While creating read-only mappings can be theoretically slower, as they don't play nice with fast GUP due to the need to break CoW before mapping the underlying PFN, practically speaking, creating a mapping isn't a super hot path, and getting a writable mapping for reading is weird and confusing. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-35-seanjc@google.com>
-
Sean Christopherson authored
Now that all kvm_vcpu_{,un}map() users pass "true" for @dirty, have them pass "true" as a @writable param to kvm_vcpu_map(), and thus create a read-only mapping when possible. Note, creating read-only mappings can be theoretically slower, as they don't play nice with fast GUP due to the need to break CoW before mapping the underlying PFN. But practically speaking, creating a mapping isn't a super hot path, and getting a writable mapping for reading is weird and confusing. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-34-seanjc@google.com>
-
Sean Christopherson authored
Pin, as in FOLL_PIN, pages when mapping them for direct access by KVM. As per Documentation/core-api/pin_user_pages.rst, writing to a page that was gotten via FOLL_GET is explicitly disallowed. Correct (uses FOLL_PIN calls): pin_user_pages() write to the data within the pages unpin_user_pages() INCORRECT (uses FOLL_GET calls): get_user_pages() write to the data within the pages put_page() Unfortunately, FOLL_PIN is a "private" flag, and so kvm_follow_pfn must use a one-off bool instead of being able to piggyback the "flags" field. Link: https://lwn.net/Articles/930667 Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-32-seanjc@google.com>
-
David Stevens authored
Migrate kvm_vcpu_map() to kvm_follow_pfn(), and have it track whether or not the map holds a refcounted struct page. Precisely tracking struct page references will eventually allow removing kvm_pfn_to_refcounted_page() and its various wrappers. Signed-off-by:
David Stevens <stevensd@chromium.org> [sean: use a pointer instead of a boolean] Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-31-seanjc@google.com>
-
Sean Christopherson authored
Track refcounted struct page memory using kvm_follow_pfn.refcounted_page instead of relying on kvm_release_pfn_clean() to correctly detect that the pfn is associated with a struct page. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-30-seanjc@google.com>
-
Sean Christopherson authored
Hoist the kvm_{set,release}_page_{clean,dirty}() APIs further up in kvm_main.c so that they can be used by the kvm_follow_pfn family of APIs. No functional change intended. Reviewed-by:
Alex Bennée <alex.bennee@linaro.org> Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-29-seanjc@google.com>
-
Sean Christopherson authored
Add kvm_follow_pfn.refcounted_page as an output for the "to pfn" APIs to "return" the struct page that is associated with the returned pfn (if KVM acquired a reference to the page). This will eventually allow removing KVM's hacky kvm_pfn_to_refcounted_page() code, which is error prone and can't detect pfns that are valid, but aren't (currently) refcounted. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-28-seanjc@google.com>
-
Sean Christopherson authored
Use a single pointer instead of a single-entry array for the struct page pointer in hva_to_pfn_fast(). Using an array makes the code unnecessarily annoying to read and update. No functional change intended. Reviewed-by:
Alex Bennée <alex.bennee@linaro.org> Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-27-seanjc@google.com>
-
Sean Christopherson authored
Drop yet another unnecessary magic page value from KVM, as there's zero reason to use a poisoned pointer to indicate "no page". If KVM uses a NULL page pointer, the kernel will explode just as quickly as if KVM uses a poisoned pointer. Never mind the fact that such usage would be a blatant and egregious KVM bug. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-23-seanjc@google.com>
-
Sean Christopherson authored
Explicitly initialize the entire kvm_host_map structure when mapping a pfn, as some callers declare their struct on the stack, i.e. don't zero-initialize the struct, which makes the map->hva in kvm_vcpu_unmap() *very* suspect. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-22-seanjc@google.com>
-
Sean Christopherson authored
Drop kvm_vcpu_{,un}map()'s useless checks on @map being non-NULL. The map is 100% kernel controlled, any caller that passes a NULL pointer is broken and needs to be fixed, i.e. a crash due to a NULL pointer dereference is desirable (though obviously not as desirable as not having a bug in the first place). Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-21-seanjc@google.com>
-
David Stevens authored
Introduce kvm_follow_pfn() to eventually supplant the various "gfn_to_pfn" APIs, albeit by adding more wrappers. The primary motivation of the new helper is to pass a structure instead of an ever changing set of parameters, e.g. so that tweaking the behavior, inputs, and/or outputs of the "to pfn" helpers doesn't require churning half of KVM. In the more distant future, the APIs exposed to arch code could also follow suit, e.g. by adding something akin to x86's "struct kvm_page_fault" when faulting in guest memory. But for now, the goal is purely to clean up KVM's "internal" MMU code. As part of the conversion, replace the write_fault, interruptible, and no-wait boolean flags with FOLL_WRITE, FOLL_INTERRUPTIBLE, and FOLL_NOWAIT respectively. Collecting the various FOLL_* flags into a single field will again ease the pain of passing new flags. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
David Stevens <stevensd@chromium.org> Co-developed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-20-seanjc@google.com>
-
Sean Christopherson authored
Drop @hva from __gfn_to_pfn_memslot() now that all callers pass NULL. No functional change intended. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-19-seanjc@google.com>
-
David Stevens authored
Add a pfn error code to communicate that hva_to_pfn() failed because I/O was needed and disallowed, and convert @async to a constant @no_wait boolean. This will allow eliminating the @no_wait param by having callers pass in FOLL_NOWAIT along with other FOLL_* flags. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
David Stevens <stevensd@chromium.org> Co-developed-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-17-seanjc@google.com>
-
Sean Christopherson authored
Remove check_user_page_hwpoison() as it's effectively dead code. Prior to commit 234b239b ("kvm: Faults which trigger IO release the mmap_sem"), hva_to_pfn_slow() wasn't actually a slow path in all cases, i.e. would do get_user_pages_fast() without ever doing slow GUP with FOLL_HWPOISON. Now that hva_to_pfn_slow() is a straight shot to get_user_pages_unlocked(), and unconditionally passes FOLL_HWPOISON, it is impossible for hva_to_pfn() to get an -errno that needs to be morphed to -EHWPOISON. There are essentially four cases in KVM: - npages == 0, then FOLL_NOWAIT, a.k.a. @async, must be true, and thus check_user_page_hwpoison() will not be called - npages == 1 || npages == -EHWPOISON, all good - npages == -EINTR || npages == -EAGAIN, bail early, all good - everything else, including -EFAULT, can go down the vma_lookup() path, as npages < 0 means KVM went through hva_to_pfn_slow() which passes FOLL_HWPOISON Suggested-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-16-seanjc@google.com>
-
Sean Christopherson authored
Treat an -EAGAIN return from GUP the same as -EINTR and immediately report to the caller that a signal is pending. GUP only returns -EAGAIN if the _initial_ mmap_read_lock_killable() fails, which in turn onnly fails if a signal is pending Note, rwsem_down_read_slowpath() actually returns -EINTR, so GUP is really just making life harder than it needs to be. And the call to mmap_read_lock_killable() in the retry path returns its -errno verbatim, i.e. GUP (and thus KVM) is already handling locking failure this way, but only some of the time. Suggested-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-15-seanjc@google.com>
-
Sean Christopherson authored
Now that hva_to_pfn() no longer supports being called in atomic context, move the might_sleep() annotation from hva_to_pfn_slow() to hva_to_pfn(). Reviewed-by:
Alex Bennée <alex.bennee@linaro.org> Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-14-seanjc@google.com>
-
Sean Christopherson authored
Drop @atomic from the myriad "to_pfn" APIs now that all callers pass "false", and remove a comment blurb about KVM running only the "GUP fast" part in atomic context. No functional change intended. Reviewed-by:
Alex Bennée <alex.bennee@linaro.org> Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-13-seanjc@google.com>
-
Sean Christopherson authored
Rename gfn_to_page_many_atomic() to kvm_prefetch_pages() to try and communicate its true purpose, as the "atomic" aspect is essentially a side effect of the fact that x86 uses the API while holding mmu_lock. E.g. even if mmu_lock weren't held, KVM wouldn't want to fault-in pages, as the goal is to opportunistically grab surrounding pages that have already been accessed and/or dirtied by the host, and to do so quickly. Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-12-seanjc@google.com>
-
Sean Christopherson authored
Allow passing a NULL @page to kvm_release_page_{clean,dirty}(), there's no tangible benefit to forcing the callers to pre-check @page, and it ends up generating a lot of duplicate boilerplate code. Reviewed-by:
Alex Bennée <alex.bennee@linaro.org> Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-3-seanjc@google.com>
-
Sean Christopherson authored
Remove KVM_ERR_PTR_BAD_PAGE and instead return NULL, as "bad page" is just a leftover bit of weirdness from days of old when KVM stuffed a "bad" page into the guest instead of actually handling missing pages. See commit cea7bb21 ("KVM: MMU: Make gfn_to_page() always safe"). Reviewed-by:
Alex Bennée <alex.bennee@linaro.org> Tested-by:
Alex Bennée <alex.bennee@linaro.org> Signed-off-by:
Sean Christopherson <seanjc@google.com> Tested-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-2-seanjc@google.com>
-