- Aug 15, 2024
-
-
when pd on or resume, need run hack code first. Change-Id: Ic61a301ec7b34a29b2fb3fe09b557dad4fab8dc7 Signed-off-by: Ding Wei <leo.ding@rock-chips.com> Signed-off-by: Tao Huang <huangtao@rock-chips.com>
-
decoder version: vdpu383 current commit: support register config mode Signed-off-by: Hongjin Li <vic.hong@rock-chips.com> Change-Id: I5d188736425070bc2cce68d52c8725224a4b33ee Signed-off-by: Ding Wei <leo.ding@rock-chips.com> Signed-off-by: Chandler Chen <chandler.chen@rock-chips.com>
-
Detlev Casanova authored
-
Detlev Casanova authored
-
Detlev Casanova authored
-
Detlev Casanova authored
-
Detlev Casanova authored
-
Detlev Casanova authored
-
Detlev Casanova authored
-
Detlev Casanova authored
-
Detlev Casanova authored
-
Detlev Casanova authored
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linuxLinus Torvalds authored
Pull btrfs fixes from David Sterba: - extend tree-checker verification of directory item type - fix regression in page/folio and extent state tracking in xarray, the dirty status can get out of sync and can cause problems e.g. a hang - in send, detect last extent and allow to clone it instead of sending it as write, reduces amount of data transferred in the stream - fix checking extent references when cleaning deleted subvolumes - fix one more case in the extent map shrinker, let it run only in the kswapd context so it does not cause latency spikes during other operations * tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix invalid mapping of extent xarray state btrfs: send: allow cloning non-aligned extent if it ends at i_size btrfs: only run the extent map shrinker from kswapd tasks btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type btrfs: check delayed refs when we're checking if a ref exists
-
- Aug 14, 2024
-
-
git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds authored
Pull kvm fixes from Paolo Bonzini: "s390: - Fix failure to start guests with kvm.use_gisa=0 - Panic if (un)share fails to maintain security. ARM: - Use kvfree() for the kvmalloc'd nested MMUs array - Set of fixes to address warnings in W=1 builds - Make KVM depend on assembler support for ARMv8.4 - Fix for vgic-debug interface for VMs without LPIs - Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest - Minor code / comment cleanups for configuring PAuth traps - Take kvm->arch.config_lock to prevent destruction / initialization race for a vCPU's CPUIF which may lead to a UAF x86: - Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX) - Fix smatch issues - Small cleanups - Make x2APIC ID 100% readonly - Fix typo in uapi constant Generic: - Use synchronize_srcu_expedited() on irqfd shutdown" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX) KVM: eventfd: Use synchronize_srcu_expedited() on shutdown KVM: selftests: Add a testcase to verify x2APIC is fully readonly KVM: x86: Make x2APIC ID 100% readonly KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id()) KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page() KVM: SVM: Fix an error code in sev_gmem_post_populate() KVM: SVM: Fix uninitialized variable bug KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list KVM: arm64: Tidying up PAuth code in KVM KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain s390/uv: Panic for set and remove shared access UVC errors KVM: s390: fix validity interception issue when gisa is switched off docs: KVM: Fix register ID of SPSR_FIQ KVM: arm64: vgic: fix unexpected unlock sparse warnings KVM: arm64: fix kdoc warnings in W=1 builds KVM: arm64: fix override-init warnings in W=1 builds ...
-
Some Rockchip devices put the phase settings into the dw_mmc controller. The feature is implemented in devices where the USRID register contains 0x20230002. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
-
v2 tuning will inherit pre-stage loader's phase settings for the first time, and do re-tune if necessary. Re-tune will still try the rough degrees, for instance, 90, 180, 270, 360 but continue to do the fine tuning if sample window isn't good enough. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
-
Detlev Casanova authored
Add the compatible string for rockchip,rk3576-dw-mshc and add support for the rockchip,v2-tuning flag, a new feature of this core. Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
-
Add support for the 5 rk3576 GPIO banks. This also adds support for optionnal support of the sys-grf syscon, used for i3c software controlled weak pull-up. Signed-off-by: Steven Liu <steven.liu@rock-chips.com> [rebase, reword commit message] Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
-
Detlev Casanova authored
Add the compatible string as well as the rockchip,sys-grf field which is only used on this SoC. Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
-
Add configuration for RK3576 SoC and list the power domains. Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
-
Define power domain IDs as described in the TRM and add compatible for rockchip,rk3576-power-controller Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com> Co-Developed-by: Detlev Casanova <detlev.casanova@collabora.com> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com> Acked-by: Conor Dooley <conor.dooley@microchip.com>
-
Amit Shah authored
"INVALID" is misspelt in "SEV_RET_INAVLID_CONFIG". Since this is part of the UAPI, keep the current definition and add a new one with the fix. Fix-suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Amit Shah <amit.shah@amd.com> Message-ID: <20240814083113.21622-1-amit@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't directly emulate instructions for ES/SNP, and instead the guest must explicitly request emulation. Unless the guest explicitly requests emulation without accessing memory, ES/SNP relies on KVM creating an MMIO SPTE, with the subsequent #NPF being reflected into the guest as a #VC. But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs, because except for ES/SNP, doing so requires setting reserved bits in the SPTE, i.e. the SPTE can't be readable while also generating a #VC on writes. Because KVM never creates MMIO SPTEs and jumps directly to emulation, the guest never gets a #VC. And since KVM simply resumes the guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU into an infinite #NPF loop if the vCPU attempts to write read-only memory. Disallow read-only memory for all VMs with protected state, i.e. for upcoming TDX VMs as well as ES/SNP VMs. For TDX, it's actually possible to support read-only memory, as TDX uses EPT Violation #VE to reflect the fault into the guest, e.g. KVM could configure read-only SPTEs with RX protections and SUPPRESS_VE=0. But there is no strong use case for supporting read-only memslots on TDX, e.g. the main historical usage is to emulate option ROMs, but TDX disallows executing from shared memory. And if someone comes along with a legitimate, strong use case, the restriction can always be lifted for TDX. Don't bother trying to retroactively apply the restriction to SEV-ES VMs that are created as type KVM_X86_DEFAULT_VM. Read-only memslots can't possibly work for SEV-ES, i.e. disallowing such memslots is really just means reporting an error to userspace instead of silently hanging vCPUs. Trying to deal with the ordering between KVM_SEV_INIT and memslot creation isn't worth the marginal benefit it would provide userspace. Fixes: 26c44aa9 ("KVM: SEV: define VM types for SEV and SEV-ES") Fixes: 1dfe571c ("KVM: SEV: Add initial SEV-SNP support") Cc: Peter Gonda <pgonda@google.com> Cc: Michael Roth <michael.roth@amd.com> Cc: Vishal Annapurve <vannapurve@google.com> Cc: Ackerly Tng <ackerleytng@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240809190319.1710470-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinuxLinus Torvalds authored
Pull selinux fixes from Paul Moore: - Fix a xperms counting problem where we adding to the xperms count even if we failed to add the xperm. - Propogate errors from avc_add_xperms_decision() back to the caller so that we can trigger the proper cleanup and error handling. - Revert our use of vma_is_initial_heap() in favor of our older logic as vma_is_initial_heap() doesn't correctly handle the no-heap case and it is causing issues with the SELinux process/execheap access control. While the older SELinux logic may not be perfect, it restores the expected user visible behavior. Hopefully we will be able to resolve the problem with the vma_is_initial_heap() macro with the mm folks, but we need to fix this in the meantime. * tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: revert our use of vma_is_initial_heap() selinux: add the processing of the failure of avc_add_xperms_decision() selinux: fix potential counting error in avc_add_xperms_decision()
-
Add the clock and reset tree definitions for the new RK3576 SoC. Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com> Signed-off-by: Finley Xiao <finley.xiao@rock-chips.com> Signed-off-by: YouMin Chen <cym@rock-chips.com> Signed-off-by: Liang Chen <cl@rock-chips.com> Signed-off-by: Sugar Zhang <sugar.zhang@rock-chips.com> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com> Reviewed-by: Elaine Zhang <zhangqing@rock-chips.com> Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
-
Detlev Casanova authored
Add clock and reset ID defines for rk3576. Compared to the downstream bindings written by Elaine, this uses continous gapless IDs starting at 0. Thus all numbers are different between downstream and upstream, but names are kept exactly the same. Also add documentation for the rk3576 CRU core. Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com> Signed-off-by: Sugar Zhang <sugar.zhang@rock-chips.com> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsLinus Torvalds authored
Pull vfs fixes from Christian Brauner: "VFS: - Fix the name of file lease slab cache. When file leases were split out of file locks the name of the file lock slab cache was used for the file leases slab cache as well. - Fix a type in take_fd() helper. - Fix infinite directory iteration for stable offsets in tmpfs. - When the icache is pruned all reclaimable inodes are marked with I_FREEING and other processes that try to lookup such inodes will block. But some filesystems like ext4 can trigger lookups in their inode evict callback causing deadlocks. Ext4 does such lookups if the ea_inode feature is used whereby a separate inode may be used to store xattrs. Introduce I_LRU_ISOLATING which pins the inode while its pages are reclaimed. This avoids inode deletion during inode_lru_isolate() avoiding the deadlock and evict is made to wait until I_LRU_ISOLATING is done. netfs: - Fault in smaller chunks for non-large folio mappings for filesystems that haven't been converted to large folios yet. - Fix the CONFIG_NETFS_DEBUG config option. The config option was renamed a short while ago and that introduced two minor issues. First, it depended on CONFIG_NETFS whereas it wants to depend on CONFIG_NETFS_SUPPORT. The former doesn't exist, while the latter does. Second, the documentation for the config option wasn't fixed up. - Revert the removal of the PG_private_2 writeback flag as ceph is using it and fix how that flag is handled in netfs. - Fix DIO reads on 9p. A program watching a file on a 9p mount wouldn't see any changes in the size of the file being exported by the server if the file was changed directly in the source filesystem. Fix this by attempting to read the full size specified when a DIO read is requested. - Fix a NULL pointer dereference bug due to a data race where a cachefiles cookies was retired even though it was still in use. Check the cookie's n_accesses counter before discarding it. nsfs: - Fix ioctl declaration for NS_GET_MNTNS_ID from _IO() to _IOR() as the kernel is writing to userspace. pidfs: - Prevent the creation of pidfds for kthreads until we have a use-case for it and we know the semantics we want. It also confuses userspace why they can get pidfds for kthreads. squashfs: - Fix an unitialized value bug reported by KMSAN caused by a corrupted symbolic link size read from disk. Check that the symbolic link size is not larger than expected" * tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: Squashfs: sanity check symbolic link size 9p: Fix DIO read through netfs vfs: Don't evict inode under the inode lru traversing context netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag" file: fix typo in take_fd() comment pidfd: prevent creation of pidfds for kthreads netfs: clean up after renaming FSCACHE_DEBUG config libfs: fix infinite directory reads for offset dir nsfs: fix ioctl declaration fs/netfs/fscache_cookie: add missing "n_accesses" check filelock: fix name of file_lease slab cache netfs: Fault in smaller chunks for non-large folio mappings
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds authored
Pull bpf fixes from Alexei Starovoitov: - Fix bpftrace regression from Kyle Huey. Tracing bpf prog was called with perf_event input arguments causing bpftrace produce garbage output. - Fix verifier crash in stacksafe() from Yonghong Song. Daniel Hodges reported verifier crash when playing with sched-ext. The stack depth in the known verifier state was larger than stack depth in being explored state causing out-of-bounds access. - Fix update of freplace prog in prog_array from Leon Hwang. freplace prog type wasn't recognized correctly. * tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: perf/bpf: Don't call bpf_overflow_handler() for tracing events selftests/bpf: Add a test to verify previous stacksafe() fix bpf: Fix a kernel verifier crash in stacksafe() bpf: Fix updating attached freplace prog in prog_array map
-
- Aug 13, 2024
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds authored
Pull execve fixes from Kees Cook: - binfmt_flat: Fix corruption when not offsetting data start - exec: Fix ToCToU between perm check and set-uid/gid usage * tag 'execve-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: exec: Fix ToCToU between perm check and set-uid/gid usage binfmt_flat: Fix corruption when not offsetting data start
-
Kees Cook authored
When opening a file for exec via do_filp_open(), permission checking is done against the file's metadata at that moment, and on success, a file pointer is passed back. Much later in the execve() code path, the file metadata (specifically mode, uid, and gid) is used to determine if/how to set the uid and gid. However, those values may have changed since the permissions check, meaning the execution may gain unintended privileges. For example, if a file could change permissions from executable and not set-id: ---------x 1 root root 16048 Aug 7 13:16 target to set-id and non-executable: ---S------ 1 root root 16048 Aug 7 13:16 target it is possible to gain root privileges when execution should have been disallowed. While this race condition is rare in real-world scenarios, it has been observed (and proven exploitable) when package managers are updating the setuid bits of installed programs. Such files start with being world-executable but then are adjusted to be group-exec with a set-uid bit. For example, "chmod o-x,u+s target" makes "target" executable only by uid "root" and gid "cdrom", while also becoming setuid-root: -rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target becomes: -rwsr-xr-- 1 root cdrom 16048 Aug 7 13:16 target But racing the chmod means users without group "cdrom" membership can get the permission to execute "target" just before the chmod, and when the chmod finishes, the exec reaches brpm_fill_uid(), and performs the setuid to root, violating the expressed authorization of "only cdrom group members can setuid to root". Re-check that we still have execute permissions in case the metadata has changed. It would be better to keep a copy from the perm-check time, but until we can do that refactoring, the least-bad option is to do a full inode_permission() call (under inode lock). It is understood that this is safe against dead-locks, but hardly optimal. Reported-by: Marco Vanotti <mvanotti@google.com> Tested-by: Marco Vanotti <mvanotti@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: stable@vger.kernel.org Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Kees Cook <kees@kernel.org>
-
Kyle Huey authored
The regressing commit is new in 6.10. It assumed that anytime event->prog is set bpf_overflow_handler() should be invoked to execute the attached bpf program. This assumption is false for tracing events, and as a result the regressing commit broke bpftrace by invoking the bpf handler with garbage inputs on overflow. Prior to the regression the overflow handlers formed a chain (of length 0, 1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog() (the tracing case) did not. Both set event->prog. The chain of overflow handlers was replaced by a single overflow handler slot and a fixed call to bpf_overflow_handler() when appropriate. This modifies the condition there to check event->prog->type == BPF_PROG_TYPE_PERF_EVENT, restoring the previous behavior and fixing bpftrace. Signed-off-by: Kyle Huey <khuey@kylehuey.com> Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Reported-by: Joe Damato <jdamato@fastly.com> Closes: https://lore.kernel.org/lkml/ZpFfocvyF3KHaSzF@LQ3V64L9R2/ Fixes: f11f10bf ("perf/bpf: Call BPF handler directly, not through overflow machinery") Cc: stable@vger.kernel.org Tested-by: Joe Damato <jdamato@fastly.com> # bpftrace Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240813151727.28797-1-jdamato@fastly.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-
Li RongQing authored
When hot-unplug a device which has many queues, and guest CPU will has huge jitter, and unplugging is very slow. It turns out synchronize_srcu() in irqfd_shutdown() caused the guest jitter and unplugging latency, so replace synchronize_srcu() with synchronize_srcu_expedited(), to accelerate the unplugging, and reduce the guest OS jitter, this accelerates the VM reboot too. Signed-off-by: Li RongQing <lirongqing@baidu.com> Message-ID: <20240711121130.38917-1-lirongqing@baidu.com> [Call it just once in irqfd_resampler_shutdown. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
git://git.samba.org/ksmbdLinus Torvalds authored
Pull smb server fixes from Steve French: "Two smb3 server fixes for access denied problem on share path checks" * tag '6.11-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: override fsids for smb2_query_info() ksmbd: override fsids for share path check
-
Michal Luczaj authored
Add a test to verify that userspace can't change a vCPU's x2APIC ID by abusing KVM_SET_LAPIC. KVM models the x2APIC ID (and x2APIC LDR) as readonly, and silently ignores userspace attempts to change the x2APIC ID for backwards compatibility. Signed-off-by: Michal Luczaj <mhal@rbox.co> [sean: write changelog, add to existing test] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240802202941.344889-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Sean Christopherson authored
Ignore the userspace provided x2APIC ID when fixing up APIC state for KVM_SET_LAPIC, i.e. make the x2APIC fully readonly in KVM. Commit a92e2543 ("KVM: x86: use hardware-compatible format for APIC ID register"), which added the fixup, didn't intend to allow userspace to modify the x2APIC ID. In fact, that commit is when KVM first started treating the x2APIC ID as readonly, apparently to fix some race: static inline u32 kvm_apic_id(struct kvm_lapic *apic) { - return (kvm_lapic_get_reg(apic, APIC_ID) >> 24) & 0xff; + /* To avoid a race between apic_base and following APIC_ID update when + * switching to x2apic_mode, the x2apic mode returns initial x2apic id. + */ + if (apic_x2apic_mode(apic)) + return apic->vcpu->vcpu_id; + + return kvm_lapic_get_reg(apic, APIC_ID) >> 24; } Furthermore, KVM doesn't support delivering interrupts to vCPUs with a modified x2APIC ID, but KVM *does* return the modified value on a guest RDMSR and for KVM_GET_LAPIC. I.e. no remotely sane setup can actually work with a modified x2APIC ID. Making the x2APIC ID fully readonly fixes a WARN in KVM's optimized map calculation, which expects the LDR to align with the x2APIC ID. WARNING: CPU: 2 PID: 958 at arch/x86/kvm/lapic.c:331 kvm_recalculate_apic_map+0x609/0xa00 [kvm] CPU: 2 PID: 958 Comm: recalc_apic_map Not tainted 6.4.0-rc3-vanilla+ #35 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.2-1-1 04/01/2014 RIP: 0010:kvm_recalculate_apic_map+0x609/0xa00 [kvm] Call Trace: <TASK> kvm_apic_set_state+0x1cf/0x5b0 [kvm] kvm_arch_vcpu_ioctl+0x1806/0x2100 [kvm] kvm_vcpu_ioctl+0x663/0x8a0 [kvm] __x64_sys_ioctl+0xb8/0xf0 do_syscall_64+0x56/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fade8b9dd6f Unfortunately, the WARN can still trigger for other CPUs than the current one by racing against KVM_SET_LAPIC, so remove it completely. Reported-by: Michal Luczaj <mhal@rbox.co> Closes: https://lore.kernel.org/all/814baa0c-1eaa-4503-129f-059917365e80@rbox.co Reported-by: Haoyu Wu <haoyuwu254@gmail.com> Closes: https://lore.kernel.org/all/20240126161633.62529-1-haoyuwu254@gmail.com Reported-by: <syzbot+545f1326f405db4e1c3e@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/all/000000000000c2a6b9061cbca3c3@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240802202941.344889-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Isaku Yamahata authored
Use this_cpu_ptr() instead of open coding the equivalent in various user return MSR helpers. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Yuan Yao <yuan.yao@intel.com> [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Message-ID: <20240802201630.339306-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Naohiro Aota authored
In __extent_writepage_io(), we call btrfs_set_range_writeback() -> folio_start_writeback(), which clears PAGECACHE_TAG_DIRTY mark from the mapping xarray if the folio is not dirty. This worked fine before commit 97713b1a ("btrfs: do not clear page dirty inside extent_write_locked_range()"). After the commit, however, the folio is still dirty at this point, so the mapping DIRTY tag is not cleared anymore. Then, __extent_writepage_io() calls btrfs_folio_clear_dirty() to clear the folio's dirty flag. That results in the page being unlocked with a "strange" state. The page is not PageDirty, but the mapping tag is set as PAGECACHE_TAG_DIRTY. This strange state looks like causing a hang with a call trace below when running fstests generic/091 on a null_blk device. It is waiting for a folio lock. While I don't have an exact relation between this hang and the strange state, fixing the state also fixes the hang. And, that state is worth fixing anyway. This commit reorders btrfs_folio_clear_dirty() and btrfs_set_range_writeback() in __extent_writepage_io(), so that the PAGECACHE_TAG_DIRTY tag is properly removed from the xarray. [464.274] task:fsx state:D stack:0 pid:3034 tgid:3034 ppid:2853 flags:0x00004002 [464.286] Call Trace: [464.291] <TASK> [464.295] __schedule+0x10ed/0x6260 [464.301] ? __pfx___blk_flush_plug+0x10/0x10 [464.308] ? __submit_bio+0x37c/0x450 [464.314] ? __pfx___schedule+0x10/0x10 [464.321] ? lock_release+0x567/0x790 [464.327] ? __pfx_lock_acquire+0x10/0x10 [464.334] ? __pfx_lock_release+0x10/0x10 [464.340] ? __pfx_lock_acquire+0x10/0x10 [464.347] ? __pfx_lock_release+0x10/0x10 [464.353] ? do_raw_spin_lock+0x12e/0x270 [464.360] schedule+0xdf/0x3b0 [464.365] io_schedule+0x8f/0xf0 [464.371] folio_wait_bit_common+0x2ca/0x6d0 [464.378] ? folio_wait_bit_common+0x1cc/0x6d0 [464.385] ? __pfx_folio_wait_bit_common+0x10/0x10 [464.392] ? __pfx_filemap_get_folios_tag+0x10/0x10 [464.400] ? __pfx_wake_page_function+0x10/0x10 [464.407] ? __pfx___might_resched+0x10/0x10 [464.414] ? do_raw_spin_unlock+0x58/0x1f0 [464.420] extent_write_cache_pages+0xe49/0x1620 [btrfs] [464.428] ? lock_acquire+0x435/0x500 [464.435] ? __pfx_extent_write_cache_pages+0x10/0x10 [btrfs] [464.443] ? btrfs_do_write_iter+0x493/0x640 [btrfs] [464.451] ? orc_find.part.0+0x1d4/0x380 [464.457] ? __pfx_lock_release+0x10/0x10 [464.464] ? __pfx_lock_release+0x10/0x10 [464.471] ? btrfs_do_write_iter+0x493/0x640 [btrfs] [464.478] btrfs_writepages+0x1cc/0x460 [btrfs] [464.485] ? __pfx_btrfs_writepages+0x10/0x10 [btrfs] [464.493] ? is_bpf_text_address+0x6e/0x100 [464.500] ? kernel_text_address+0x145/0x160 [464.507] ? unwind_get_return_address+0x5e/0xa0 [464.514] ? arch_stack_walk+0xac/0x100 [464.521] do_writepages+0x176/0x780 [464.527] ? lock_release+0x567/0x790 [464.533] ? __pfx_do_writepages+0x10/0x10 [464.540] ? __pfx_lock_acquire+0x10/0x10 [464.546] ? __pfx_stack_trace_save+0x10/0x10 [464.553] ? do_raw_spin_lock+0x12e/0x270 [464.560] ? do_raw_spin_unlock+0x58/0x1f0 [464.566] ? _raw_spin_unlock+0x23/0x40 [464.573] ? wbc_attach_and_unlock_inode+0x3da/0x7d0 [464.580] filemap_fdatawrite_wbc+0x113/0x180 [464.587] ? prepare_pages.constprop.0+0x13c/0x5c0 [btrfs] [464.596] __filemap_fdatawrite_range+0xaf/0xf0 [464.603] ? __pfx___filemap_fdatawrite_range+0x10/0x10 [464.611] ? trace_irq_enable.constprop.0+0xce/0x110 [464.618] ? kasan_quarantine_put+0xd7/0x1e0 [464.625] btrfs_start_ordered_extent+0x46f/0x570 [btrfs] [464.633] ? __pfx_btrfs_start_ordered_extent+0x10/0x10 [btrfs] [464.642] ? __clear_extent_bit+0x2c0/0x9d0 [btrfs] [464.650] btrfs_lock_and_flush_ordered_range+0xc6/0x180 [btrfs] [464.659] ? __pfx_btrfs_lock_and_flush_ordered_range+0x10/0x10 [btrfs] [464.669] btrfs_read_folio+0x12a/0x1d0 [btrfs] [464.676] ? __pfx_btrfs_read_folio+0x10/0x10 [btrfs] [464.684] ? __pfx_filemap_add_folio+0x10/0x10 [464.691] ? __pfx___might_resched+0x10/0x10 [464.698] ? __filemap_get_folio+0x1c5/0x450 [464.705] prepare_uptodate_page+0x12e/0x4d0 [btrfs] [464.713] prepare_pages.constprop.0+0x13c/0x5c0 [btrfs] [464.721] ? fault_in_iov_iter_readable+0xd2/0x240 [464.729] btrfs_buffered_write+0x5bd/0x12f0 [btrfs] [464.737] ? __pfx_btrfs_buffered_write+0x10/0x10 [btrfs] [464.745] ? __pfx_lock_release+0x10/0x10 [464.752] ? generic_write_checks+0x275/0x400 [464.759] ? down_write+0x118/0x1f0 [464.765] ? up_write+0x19b/0x500 [464.770] btrfs_direct_write+0x731/0xba0 [btrfs] [464.778] ? __pfx_btrfs_direct_write+0x10/0x10 [btrfs] [464.785] ? __pfx___might_resched+0x10/0x10 [464.792] ? lock_acquire+0x435/0x500 [464.798] ? lock_acquire+0x435/0x500 [464.804] btrfs_do_write_iter+0x494/0x640 [btrfs] [464.811] ? __pfx_btrfs_do_write_iter+0x10/0x10 [btrfs] [464.819] ? __pfx___might_resched+0x10/0x10 [464.825] ? rw_verify_area+0x6d/0x590 [464.831] vfs_write+0x5d7/0xf50 [464.837] ? __might_fault+0x9d/0x120 [464.843] ? __pfx_vfs_write+0x10/0x10 [464.849] ? btrfs_file_llseek+0xb1/0xfb0 [btrfs] [464.856] ? lock_release+0x567/0x790 [464.862] ksys_write+0xfb/0x1d0 [464.867] ? __pfx_ksys_write+0x10/0x10 [464.873] ? _raw_spin_unlock+0x23/0x40 [464.879] ? btrfs_getattr+0x4af/0x670 [btrfs] [464.886] ? vfs_getattr_nosec+0x79/0x340 [464.892] do_syscall_64+0x95/0x180 [464.898] ? __do_sys_newfstat+0xde/0xf0 [464.904] ? __pfx___do_sys_newfstat+0x10/0x10 [464.911] ? trace_irq_enable.constprop.0+0xce/0x110 [464.918] ? syscall_exit_to_user_mode+0xac/0x2a0 [464.925] ? do_syscall_64+0xa1/0x180 [464.931] ? trace_irq_enable.constprop.0+0xce/0x110 [464.939] ? trace_irq_enable.constprop.0+0xce/0x110 [464.946] ? syscall_exit_to_user_mode+0xac/0x2a0 [464.953] ? btrfs_file_llseek+0xb1/0xfb0 [btrfs] [464.960] ? do_syscall_64+0xa1/0x180 [464.966] ? btrfs_file_llseek+0xb1/0xfb0 [btrfs] [464.973] ? trace_irq_enable.constprop.0+0xce/0x110 [464.980] ? syscall_exit_to_user_mode+0xac/0x2a0 [464.987] ? __pfx_btrfs_file_llseek+0x10/0x10 [btrfs] [464.995] ? trace_irq_enable.constprop.0+0xce/0x110 [465.002] ? __pfx_btrfs_file_llseek+0x10/0x10 [btrfs] [465.010] ? do_syscall_64+0xa1/0x180 [465.016] ? lock_release+0x567/0x790 [465.022] ? __pfx_lock_acquire+0x10/0x10 [465.028] ? __pfx_lock_release+0x10/0x10 [465.034] ? trace_irq_enable.constprop.0+0xce/0x110 [465.042] ? syscall_exit_to_user_mode+0xac/0x2a0 [465.049] ? do_syscall_64+0xa1/0x180 [465.055] ? syscall_exit_to_user_mode+0xac/0x2a0 [465.062] ? do_syscall_64+0xa1/0x180 [465.068] ? syscall_exit_to_user_mode+0xac/0x2a0 [465.075] ? do_syscall_64+0xa1/0x180 [465.081] ? clear_bhb_loop+0x25/0x80 [465.087] ? clear_bhb_loop+0x25/0x80 [465.093] ? clear_bhb_loop+0x25/0x80 [465.099] entry_SYSCALL_64_after_hwframe+0x76/0x7e [465.106] RIP: 0033:0x7f093b8ee784 [465.111] RSP: 002b:00007ffc29d31b28 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [465.122] RAX: ffffffffffffffda RBX: 0000000000006000 RCX: 00007f093b8ee784 [465.131] RDX: 000000000001de00 RSI: 00007f093b6ed200 RDI: 0000000000000003 [465.141] RBP: 000000000001de00 R08: 0000000000006000 R09: 0000000000000000 [465.150] R10: 0000000000023e00 R11: 0000000000000202 R12: 0000000000006000 [465.160] R13: 0000000000023e00 R14: 0000000000023e00 R15: 0000000000000001 [465.170] </TASK> [465.174] INFO: lockdep is turned off. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 97713b1a ("btrfs: do not clear page dirty inside extent_write_locked_range()") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
-
Yue Haibing authored
There is no caller in tree since introduction in commit b4f69df0 ("KVM: x86: Make Hyper-V emulation optional") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Message-ID: <20240803113233.128185-1-yuehaibing@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
Phillip Lougher authored
Syzkiller reports a "KMSAN: uninit-value in pick_link" bug. This is caused by an uninitialised page, which is ultimately caused by a corrupted symbolic link size read from disk. The reason why the corrupted symlink size causes an uninitialised page is due to the following sequence of events: 1. squashfs_read_inode() is called to read the symbolic link from disk. This assigns the corrupted value 3875536935 to inode->i_size. 2. Later squashfs_symlink_read_folio() is called, which assigns this corrupted value to the length variable, which being a signed int, overflows producing a negative number. 3. The following loop that fills in the page contents checks that the copied bytes is less than length, which being negative means the loop is skipped, producing an uninitialised page. This patch adds a sanity check which checks that the symbolic link size is not larger than expected. -- Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Link: https://lore.kernel.org/r/20240811232821.13903-1-phillip@squashfs.org.uk Reported-by: Lizhi Xu <lizhi.xu@windriver.com> Reported-by: <syzbot+24ac24ff58dc5b0d26b9@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/all/000000000000a90e8c061e86a76b@google.com/ V2: fix spelling mistake. Signed-off-by: Christian Brauner <brauner@kernel.org>
-
Dominique Martinet authored
If a program is watching a file on a 9p mount, it won't see any change in size if the file being exported by the server is changed directly in the source filesystem, presumably because 9p doesn't have change notifications, and because netfs skips the reads if the file is empty. Fix this by attempting to read the full size specified when a DIO read is requested (such as when 9p is operating in unbuffered mode) and dealing with a short read if the EOF was less than the expected read. To make this work, filesystems using netfslib must not set NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF. I don't want to mandatorily clear this flag in netfslib for DIO because, say, ceph might make a read from an object that is not completely filled, but does not reside at the end of file - and so we need to clear the excess. This can be tested by watching an empty file over 9p within a VM (such as in the ktest framework): while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo then writing something into the empty file. The watcher should immediately display the file content and break out of the loop. Without this fix, it remains in the loop indefinitely. Fixes: 80105ed2 ("9p: Use netfslib read/write_iter") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916 Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/1229195.1723211769@warthog.procyon.org.uk cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
-