Skip to content
Snippets Groups Projects
  1. Feb 04, 2025
  2. Jan 31, 2025
  3. Jan 15, 2025
  4. Dec 23, 2024
    • Isaku Yamahata's avatar
      KVM: Add member to struct kvm_gfn_range to indicate private/shared · dca6c885
      Isaku Yamahata authored
      Add new members to strut kvm_gfn_range to indicate which mapping
      (private-vs-shared) to operate on: enum kvm_gfn_range_filter
      attr_filter. Update the core zapping operations to set them appropriately.
      
      TDX utilizes two GPA aliases for the same memslots, one for memory that is
      for private memory and one that is for shared. For private memory, KVM
      cannot always perform the same operations it does on memory for default
      VMs, such as zapping pages and having them be faulted back in, as this
      requires guest coordination. However, some operations such as guest driven
      conversion of memory between private and shared should zap private memory.
      
      Internally to the MMU, private and shared mappings are tracked on separate
      roots. Mapping and zapping operations will operate on the respective GFN
      alias for each root (private or shared). So zapping operations will by
      default zap both aliases. Add fields in struct kvm_gfn_range to allow
      callers to specify which aliases so they can only target the aliases
      appropriate for their specific operation.
      
      There was feedback that target aliases should be specified such that the
      default value (0) is to operate on both aliases. Several options were
      considered. Several variations of having separate bools defined such
      that the default behavior was to process both aliases. They either allowed
      nonsensical configurations, or were confusing for the caller. A simple
      enum was also explored and was close, but was hard to process in the
      caller. Instead, use an enum with the default value (0) reserved as a
      disallowed value. Catch ranges that didn't have the target aliases
      specified by looking for that specific value.
      
      Set target alias with enum appropriately for these MMU operations:
       - For KVM's mmu notifier callbacks, zap shared pages only because private
         pages won't have a userspace mapping
       - For setting memory attributes, kvm_arch_pre_set_memory_attributes()
         chooses the aliases based on the attribute.
       - For guest_memfd invalidations, zap private only.
      
      Link: https://lore.kernel.org/kvm/ZivIF9vjKcuGie3s@google.com/
      
      
      Signed-off-by: default avatarIsaku Yamahata <isaku.yamahata@intel.com>
      Co-developed-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
      Signed-off-by: default avatarRick Edgecombe <rick.p.edgecombe@intel.com>
      Message-ID: <20240718211230.1492011-3-rick.p.edgecombe@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      dca6c885
    • Yan Zhao's avatar
      KVM: guest_memfd: Remove RCU-protected attribute from slot->gmem.file · 67b43038
      Yan Zhao authored
      
      Remove the RCU-protected attribute from slot->gmem.file. No need to use RCU
      primitives rcu_assign_pointer()/synchronize_rcu() to update this pointer.
      
      - slot->gmem.file is updated in 3 places:
        kvm_gmem_bind(), kvm_gmem_unbind(), kvm_gmem_release().
        All of them are protected by kvm->slots_lock.
      
      - slot->gmem.file is read in 2 paths:
        (1) kvm_gmem_populate
              kvm_gmem_get_file
              __kvm_gmem_get_pfn
      
        (2) kvm_gmem_get_pfn
               kvm_gmem_get_file
               __kvm_gmem_get_pfn
      
        Path (1) kvm_gmem_populate() requires holding kvm->slots_lock, so
        slot->gmem.file is protected by the kvm->slots_lock in this path.
      
        Path (2) kvm_gmem_get_pfn() does not require holding kvm->slots_lock.
        However, it's also not guarded by rcu_read_lock() and rcu_read_unlock().
        So synchronize_rcu() in kvm_gmem_unbind()/kvm_gmem_release() actually
        will not wait for the readers in kvm_gmem_get_pfn() due to lack of RCU
        read-side critical section.
      
        The path (2) kvm_gmem_get_pfn() is safe without RCU protection because:
        a) kvm_gmem_bind() is called on a new memslot, before the memslot is
           visible to kvm_gmem_get_pfn().
        b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot
           occur after the memslot is no longer visible to kvm_gmem_get_pfn().
        c) get_file_active() ensures that kvm_gmem_get_pfn() will not access the
           stale file if kvm_gmem_release() sets it to NULL.  This is because if
           kvm_gmem_release() occurs before kvm_gmem_get_pfn(), get_file_active()
           will return NULL; if get_file_active() does not return NULL,
           kvm_gmem_release() should not occur until after kvm_gmem_get_pfn()
           releases the file reference.
      
      Signed-off-by: default avatarYan Zhao <yan.y.zhao@intel.com>
      Message-ID: <20241104084303.29909-1-yan.y.zhao@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      67b43038
  5. Dec 16, 2024
  6. Nov 14, 2024
    • Paolo Bonzini's avatar
      KVM: x86: switch hugepage recovery thread to vhost_task · d96c77bd
      Paolo Bonzini authored
      
      kvm_vm_create_worker_thread() is meant to be used for kthreads that
      can consume significant amounts of CPU time on behalf of a VM or in
      response to how the VM behaves (for example how it accesses its memory).
      Therefore it wants to charge the CPU time consumed by that work to
      the VM's container.
      
      However, because of these threads, cgroups which have kvm instances
      inside never complete freezing.  This can be trivially reproduced:
      
        root@test ~# mkdir /sys/fs/cgroup/test
        root@test ~# echo $$ > /sys/fs/cgroup/test/cgroup.procs
        root@test ~# qemu-system-x86_64 -nographic -enable-kvm
      
      and in another terminal:
      
        root@test ~# echo 1 > /sys/fs/cgroup/test/cgroup.freeze
        root@test ~# cat /sys/fs/cgroup/test/cgroup.events
        populated 1
        frozen 0
      
      The cgroup freezing happens in the signal delivery path but
      kvm_nx_huge_page_recovery_worker, while joining non-root cgroups, never
      calls into the signal delivery path and thus never gets frozen. Because
      the cgroup freezer determines whether a given cgroup is frozen by
      comparing the number of frozen threads to the total number of threads
      in the cgroup, the cgroup never becomes frozen and users waiting for
      the state transition may hang indefinitely.
      
      Since the worker kthread is tied to a user process, it's better if
      it behaves similarly to user tasks as much as possible, including
      being able to send SIGSTOP and SIGCONT.  In fact, vhost_task is all
      that kvm_vm_create_worker_thread() wanted to be and more: not only it
      inherits the userspace process's cgroups, it has other niceties like
      being parented properly in the process tree.  Use it instead of the
      homegrown alternative.
      
      Incidentally, the new code is also better behaved when you flip recovery
      back and forth to disabled and back to enabled.  If your recovery period
      is 1 minute, it will run the next recovery after 1 minute independent
      of how many times you flipped the parameter.
      
      (Commit message based on emails from Tejun).
      
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarLuca Boccassi <bluca@debian.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Tested-by: default avatarLuca Boccassi <bluca@debian.org>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      d96c77bd
  7. Nov 03, 2024
  8. Oct 30, 2024
  9. Oct 25, 2024
Loading