- Jun 09, 2023
-
-
Yosry Ahmed authored
Patch series "cgroup: eliminate atomic rstat flushing", v5. A previous patch series [1] changed most atomic rstat flushing contexts to become non-atomic. This was done to avoid an expensive operation that scales with # cgroups and # cpus to happen with irqs disabled and scheduling not permitted. There were two remaining atomic flushing contexts after that series. This series tries to eliminate them as well, eliminating atomic rstat flushing completely. The two remaining atomic flushing contexts are: (a) wb_over_bg_thresh()->mem_cgroup_wb_stats() (b) mem_cgroup_threshold()->mem_cgroup_usage() For (a), flushing needs to be atomic as wb_writeback() calls wb_over_bg_thresh() with a spinlock held. However, it seems like the call to wb_over_bg_thresh() doesn't need to be protected by that spinlock, so this series proposes a refactoring that moves the call outside the lock criticial section and makes the stats flushing in mem_cgroup_wb_stats() non-atomic. For (b), flushing needs to be atomic as mem_cgroup_threshold() is called with irqs disabled. We only flush the stats when calculating the root usage, as it is approximated as the sum of some memcg stats (file, anon, and optionally swap) instead of the conventional page counter. This series proposes changing this calculation to use the global stats instead, eliminating the need for a memcg stat flush. After these 2 contexts are eliminated, we no longer need mem_cgroup_flush_stats_atomic() or cgroup_rstat_flush_atomic(). We can remove them and simplify the code. [1] https://lore.kernel.org/linux-mm/20230330191801.1967435-1-yosryahmed@google.com/ This patch (of 5): wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat flush, which can be expensive on large systems. Currently, wb_writeback() calls wb_over_bg_thresh() within a lock section, so we have to do the rstat flush atomically. On systems with a lot of cpus and/or cgroups, this can cause us to disable irqs for a long time, potentially causing problems. Move the call to wb_over_bg_thresh() outside the lock section in preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic. The list_empty(&wb->work_list) check should be okay outside the lock section of wb->list_lock as it is protected by a separate lock (wb->work_lock), and wb_over_bg_thresh() doesn't seem like it is modifying any of wb->b_* lists the wb->list_lock is protecting. Also, the loop seems to be already releasing and reacquring the lock, so this refactoring looks safe. Link: https://lkml.kernel.org/r/20230421174020.2994750-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20230421174020.2994750-2-yosryahmed@google.com Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Reviewed-by:
Michal Koutný <mkoutny@suse.com> Reviewed-by:
Jan Kara <jack@suse.cz> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Tejun Heo <tj@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- May 24, 2023
-
-
Steve French authored
The fs/cifs directory has moved to fs/smb/client, correct mentions of this in Documentation and comments. Acked-by:
Namjae Jeon <linkinjeon@kernel.org> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Steve French authored
Move CIFS/SMB3 related client and server files (cifs.ko and ksmbd.ko and helper modules) to new fs/smb subdirectory: fs/cifs --> fs/smb/client fs/ksmbd --> fs/smb/server fs/smbfs_common --> fs/smb/common Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Acked-by:
Namjae Jeon <linkinjeon@kernel.org> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Steve French authored
There are two ways that special characters (not allowed in some other operating systems like Windows, but allowed in POSIX) have been mapped in the past ("SFU" and "SFM" mappings) to allow them to be stored in a range reserved for special chars. The default for Linux has been to use "mapposix" (ie the SFM mapping) but the conversion to the new mount API in the 5.11 kernel broke the ability to override the default mapping of the reserved characters (like '?' and '*' and '\') via "mapchars" mount option. This patch fixes that - so can now mount with "mapchars" mount option to override the default ("mapposix" ie SFM) mapping. Reported-by:
Tyler Spivey <tspivey8@gmail.com> Fixes: 24e0a1ef ("cifs: switch to new mount api") Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Steve French authored
Fix /proc/fs/cifs/DebugData to use the same case for "encryption" (ie "Encryption" with init capital letter was used in one place). In addition, if gcm256 encryption (intead of gcm128) is used on a connection to a server, note that in the DebugData as well. It now displays (when gcm256 negotiated): Security type: RawNTLMSSP SessionId: 0x86125800bc000b0d encrypted(gcm256) Acked-by:
Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Paulo Alcantara authored
cifs.ko maps NT_STATUS_NOT_FOUND to -EIO when SMB1 servers couldn't resolve referral paths. Proceed to tree connect when we get -EIO from dfs_get_referral() as well. Reported-by:
Kris Karas (Bug Reporting) <bugs-a21@moonlit-rail.com> Tested-by:
Woody Suwalski <terraluna977@gmail.com> Fixes: 8e355415 ("cifs: fix sharing of DFS connections") Cc: stable@vger.kernel.org # v6.2+ Signed-off-by:
Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
- May 23, 2023
-
-
David Howells authored
Fix cifs_limit_bvec_subset() so that it limits the span to the maximum specified and won't return with a size greater than max_size. Fixes: d08089f6 ("cifs: Change the I/O paths to use an iterator rather than a page list") Cc: stable@vger.kernel.org # 6.3 Reported-by:
Shyam Prasad N <sprasad@microsoft.com> Reviewed-by:
Shyam Prasad N <sprasad@microsoft.com> Signed-off-by:
David Howells <dhowells@redhat.com> cc: Steve French <smfrench@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Tom Talpey <tom@talpey.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Gao Xiang authored
As Sandeep shown [1], high priority RT per-cpu kthreads are typically helpful for Android scenarios to minimize the scheduling latencies. Switch EROFS_FS_PCPU_KTHREAD_HIPRI on by default if EROFS_FS_PCPU_KTHREAD is on since it's the typical use cases for EROFS_FS_PCPU_KTHREAD. Also clean up unneeded sched_set_normal(). [1] https://lore.kernel.org/r/CAB=BE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+FW8GbO6wyUqFneQ@mail.gmail.com Reviewed-by:
Yue Hu <huyue2@coolpad.com> Reviewed-by:
Sandeep Dhavale <dhavale@google.com> Reviewed-by:
Chao Yu <chao@kernel.org> Signed-off-by:
Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230522092141.124290-1-hsiangkao@linux.alibaba.com
-
Yue Hu authored
The function of pcpubuf.c is just for low-latency decompression algorithms (e.g. lz4). Signed-off-by:
Yue Hu <huyue2@coolpad.com> Reviewed-by:
Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by:
Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230515095758.10391-1-zbestahu@gmail.com Signed-off-by:
Gao Xiang <hsiangkao@linux.alibaba.com>
-
Jingbo Xu authored
Fragments and dedupe share one feature bit, and thus packed inode may not exist when fragment feature bit (dedupe feature bit exactly) is set, e.g. when deduplication feature is in use while fragments feature is not. In this case, sbi->packed_inode could be NULL while fragments feature bit is set. Fix this by accessing packed inode only when it exists. Reported-by:
<syzbot+902d5a9373ae8f748a94@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=902d5a9373ae8f748a94 Reported-and-tested-by:
<syzbot+bbb353775d51424087f2@syzkaller.appspotmail.com> Fixes: 9e382914 ("erofs: add helpers to load long xattr name prefixes") Fixes: 6a318ccd ("erofs: enable long extended attribute name prefixes") Signed-off-by:
Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by:
Yue Hu <huyue2@coolpad.com> Reviewed-by:
Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230515103941.129784-1-jefflexu@linux.alibaba.com Signed-off-by:
Gao Xiang <hsiangkao@linux.alibaba.com>
-
- May 19, 2023
-
-
Anna Schumaker authored
kfree()-ing the scratch page isn't enough, we also need to set the pointer back to NULL to avoid a double-free in the case of a resend. Fixes: fbd2a05f (NFSv4.2: Rework scratch handling for READ_PLUS) Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
Fabio M. De Francesco authored
kmap_atomic() is deprecated in favor of kmap_local_{folio,page}(). Therefore, replace kmap_atomic() with kmap_local_folio() in nfs_readdir_folio_array_append(). kmap_atomic() disables page-faults and preemption (the latter only for !PREEMPT_RT kernels), However, the code within the mapping/un-mapping in nfs_readdir_folio_array_append() does not depend on the above-mentioned side effects. Therefore, a mere replacement of the old API with the new one is all that is required (i.e., there is no need to explicitly add any calls to pagefault_disable() and/or preempt_disable()). Tested with (x)fstests in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with HIGHMEM64GB enabled. Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by:
Fabio M. De Francesco <fmdefrancesco@gmail.com> Fixes: ec108d3c ("NFS: Convert readdir page array functions to use a folio") Signed-off-by:
Anna Schumaker <Anna.Schumaker@Netapp.com>
-
- May 18, 2023
-
-
Xiubo Li authored
When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the request may still contain a list of 'split_realms', and we need to skip it anyway. Or it will be parsed as a corrupt snaptrace. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/61200 Reported-by:
Frank Schilder <frans@dtu.dk> Signed-off-by:
Xiubo Li <xiubli@redhat.com> Reviewed-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Xiubo Li authored
Smatch static checker warning: fs/ceph/mds_client.c:3968 reconnect_caps_cb() warn: missing error code here? '__get_cap_for_mds()' failed. 'err' = '0' [ idryomov: Dan says that Smatch considers it intentional only if the "ret = 0;" assignment is within 4 or 5 lines of the goto. ] Reported-by:
Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by:
Xiubo Li <xiubli@redhat.com> Reviewed-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- May 17, 2023
-
-
Ryusuke Konishi authored
During unmount process of nilfs2, nothing holds nilfs_root structure after nilfs2 detaches its writer in nilfs_detach_log_writer(). However, since nilfs_evict_inode() uses nilfs_root for some cleanup operations, it may cause use-after-free read if inodes are left in "garbage_list" and released by nilfs_dispose_list() at the end of nilfs_detach_log_writer(). Fix this issue by modifying nilfs_evict_inode() to only clear inode without additional metadata changes that use nilfs_root if the file system is degraded to read-only or the writer is detached. Link: https://lkml.kernel.org/r/20230509152956.8313-1-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+78d4495558999f55d1da@syzkaller.appspotmail.com> Closes: https://lkml.kernel.org/r/00000000000099e5ac05fb1c3b85@google.com Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Bharath SM authored
In cifs_oplock_break function we drop reference to a cfile at the end of function, due to which close command goes on wire after lease break acknowledgment even if file is already closed by application but we had deferred the handle close. If other client with limited file shareaccess waiting on lease break ack proceeds operation on that file as soon as first client sends ack, then we may encounter status sharing violation error because of open handle. Solution is to put reference to cfile(send close on wire if last ref) and then send oplock acknowledgment to server. Fixes: 9e31678f ("SMB3: fix lease break timeout when multiple deferred close handles for the same file.") Cc: stable@kernel.org Signed-off-by:
Bharath SM <bharathsm@microsoft.com> Reviewed-by:
Shyam Prasad N <sprasad@microsoft.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Bharath SM authored
Oplock break may occur for different file handle than the deferred handle. Check for inode deferred closes list, if it's not empty then close all the deferred handles of inode because we should not cache handles if we dont have handle lease. Eg: If openfilelist has one deferred file handle and another open file handle from app for a same file, then on a lease break we choose the first handle in openfile list. The first handle in list can be deferred handle or actual open file handle from app. In case if it is actual open handle then today, we don't close deferred handles if we lose handle lease on a file. Problem with this is, later if app decides to close the existing open handle then we still be caching deferred handles until deferred close timeout. Leaving open handle may result in sharing violation when windows client tries to open a file with limited file share access. So we should check for deferred list of inode and walk through the list of deferred files in inode and close all deferred files. Fixes: 9e31678f ("SMB3: fix lease break timeout when multiple deferred close handles for the same file.") Cc: stable@kernel.org Signed-off-by:
Bharath SM <bharathsm@microsoft.com> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Jeff Layton authored
Commit f2620f16 caused the kernel to start emitting POSIX ACL xattrs for NFSv4 inodes, which it doesn't support. The only other user of generic_listxattr is HFS (classic) and it doesn't support POSIX ACLs either. Fixes: f2620f16 xattr: simplify listxattr helpers Reported-by:
Ondrej Valousek <ondrej.valousek.xm@renesas.com> Signed-off-by:
Jeff Layton <jlayton@kernel.org> Message-Id: <20230516124655.82283-1-jlayton@kernel.org> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
Ilya Leoshkevich authored
s390's struct statfs and struct statfs64 contain padding, which field-by-field copying does not set. Initialize the respective structs with zeros before filling them and copying them to userspace, like it's already done for the compat versions of these structs. Found by KMSAN. [agordeev@linux.ibm.com: fixed typo in patch description] Acked-by:
Heiko Carstens <hca@linux.ibm.com> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by:
Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20230504144021.808932-2-iii@linux.ibm.com Signed-off-by:
Alexander Gordeev <agordeev@linux.ibm.com>
-
Josef Bacik authored
Our CI system caught a lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.3.0-rc7+ #1167 Not tainted ------------------------------------------------------ kswapd0/46 is trying to acquire lock: ffff8c6543abd650 (sb_internal#2){++++}-{0:0}, at: btrfs_commit_inode_delayed_inode+0x5f/0x120 but task is already holding lock: ffffffffabe61b40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x4aa/0x7a0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire+0xa5/0xe0 kmem_cache_alloc+0x31/0x2c0 alloc_extent_state+0x1d/0xd0 __clear_extent_bit+0x2e0/0x4f0 try_release_extent_mapping+0x216/0x280 btrfs_release_folio+0x2e/0x90 invalidate_inode_pages2_range+0x397/0x470 btrfs_cleanup_dirty_bgs+0x9e/0x210 btrfs_cleanup_one_transaction+0x22/0x760 btrfs_commit_transaction+0x3b7/0x13a0 create_subvol+0x59b/0x970 btrfs_mksubvol+0x435/0x4f0 __btrfs_ioctl_snap_create+0x11e/0x1b0 btrfs_ioctl_snap_create_v2+0xbf/0x140 btrfs_ioctl+0xa45/0x28f0 __x64_sys_ioctl+0x88/0xc0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc -> #0 (sb_internal#2){++++}-{0:0}: __lock_acquire+0x1435/0x21a0 lock_acquire+0xc2/0x2b0 start_transaction+0x401/0x730 btrfs_commit_inode_delayed_inode+0x5f/0x120 btrfs_evict_inode+0x292/0x3d0 evict+0xcc/0x1d0 inode_lru_isolate+0x14d/0x1e0 __list_lru_walk_one+0xbe/0x1c0 list_lru_walk_one+0x58/0x80 prune_icache_sb+0x39/0x60 super_cache_scan+0x161/0x1f0 do_shrink_slab+0x163/0x340 shrink_slab+0x1d3/0x290 shrink_node+0x300/0x720 balance_pgdat+0x35c/0x7a0 kswapd+0x205/0x410 kthread+0xf0/0x120 ret_from_fork+0x29/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(sb_internal#2); lock(fs_reclaim); lock(sb_internal#2); *** DEADLOCK *** 3 locks held by kswapd0/46: #0: ffffffffabe61b40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x4aa/0x7a0 #1: ffffffffabe50270 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x113/0x290 #2: ffff8c6543abd0e0 (&type->s_umount_key#44){++++}-{3:3}, at: super_cache_scan+0x38/0x1f0 stack backtrace: CPU: 0 PID: 46 Comm: kswapd0 Not tainted 6.3.0-rc7+ #1167 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x58/0x90 check_noncircular+0xd6/0x100 ? save_trace+0x3f/0x310 ? add_lock_to_list+0x97/0x120 __lock_acquire+0x1435/0x21a0 lock_acquire+0xc2/0x2b0 ? btrfs_commit_inode_delayed_inode+0x5f/0x120 start_transaction+0x401/0x730 ? btrfs_commit_inode_delayed_inode+0x5f/0x120 btrfs_commit_inode_delayed_inode+0x5f/0x120 btrfs_evict_inode+0x292/0x3d0 ? lock_release+0x134/0x270 ? __pfx_wake_bit_function+0x10/0x10 evict+0xcc/0x1d0 inode_lru_isolate+0x14d/0x1e0 __list_lru_walk_one+0xbe/0x1c0 ? __pfx_inode_lru_isolate+0x10/0x10 ? __pfx_inode_lru_isolate+0x10/0x10 list_lru_walk_one+0x58/0x80 prune_icache_sb+0x39/0x60 super_cache_scan+0x161/0x1f0 do_shrink_slab+0x163/0x340 shrink_slab+0x1d3/0x290 shrink_node+0x300/0x720 balance_pgdat+0x35c/0x7a0 kswapd+0x205/0x410 ? __pfx_autoremove_wake_function+0x10/0x10 ? __pfx_kswapd+0x10/0x10 kthread+0xf0/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x29/0x50 </TASK> This happens because when we abort the transaction in the transaction commit path we call invalidate_inode_pages2_range on our block group cache inodes (if we have space cache v1) and any delalloc inodes we may have. The plain invalidate_inode_pages2_range() call passes through GFP_KERNEL, which makes sense in most cases, but not here. Wrap these two invalidate callees with memalloc_nofs_save/memalloc_nofs_restore to make sure we don't end up with the fs reclaim dependency under the transaction dependency. CC: stable@vger.kernel.org # 4.14+ Signed-off-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Johannes Thumshirn authored
Since f8a53bb5 ("btrfs: handle checksum generation in the storage layer") the failures of btrfs_csum_one_bio() are handled via bio_end_io(). This means, we can return BLK_STS_RESOURCE from btrfs_csum_one_bio() in case the allocation of the ordered sums fails. This also fixes a syzkaller report, where injecting a failure into the kvzalloc() call results in a BUG_ON(). Reported-by:
<syzbot+d8941552e21eac774778@syzkaller.appspotmail.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Anand Jain <anand.jain@oracle.com> Signed-off-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
Qu Wenruo authored
Currently we allow a block group not to be marked read-only for scrub. But for RAID56 block groups if we require the block group to be read-only, then we're allowed to use cached content from scrub stripe to reduce unnecessary RAID56 reads. So this patch would: - Make btrfs_inc_block_group_ro() try harder During my tests, for cases like btrfs/061 and btrfs/064, we can hit ENOSPC from btrfs_inc_block_group_ro() calls during scrub. The reason is if we only have one single data chunk, and trying to scrub it, we won't have any space left for any newer data writes. But this check should be done by the caller, especially for scrub cases we only temporarily mark the chunk read-only. And newer data writes would always try to allocate a new data chunk when needed. - Return error for scrub if we failed to mark a RAID56 chunk read-only Signed-off-by:
Qu Wenruo <wqu@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- May 16, 2023
-
-
Gustav Johansson authored
clc length is now accepted to <= 8 less than length, rather than < 8. Solve issues on some of Axis's smb clients which send messages where clc length is 8 bytes less than length. The specific client was running kernel 4.19.217 with smb dialect 3.0.2 on armv7l. Cc: stable@vger.kernel.org Signed-off-by:
Gustav Johansson <gustajo@axis.com> Acked-by:
Namjae Jeon <linkinjeon@kernel.org> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Chih-Yen Chang authored
ksmbd_smb2_check_message allows client to return one byte more, so we need to allocate additional memory in ksmbd_conn_handler_loop to avoid out-of-bound access. Cc: stable@vger.kernel.org Signed-off-by:
Chih-Yen Chang <cc85nod@gmail.com> Acked-by:
Namjae Jeon <linkinjeon@kernel.org> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Chih-Yen Chang authored
The offset of UserName is related to the address of security buffer. To ensure the validaty of UserName, we need to compare name_off + name_len with secbuf_len instead of auth_msg_len. [ 27.096243] ================================================================== [ 27.096890] BUG: KASAN: slab-out-of-bounds in smb_strndup_from_utf16+0x188/0x350 [ 27.097609] Read of size 2 at addr ffff888005e3b542 by task kworker/0:0/7 ... [ 27.099950] Call Trace: [ 27.100194] <TASK> [ 27.100397] dump_stack_lvl+0x33/0x50 [ 27.100752] print_report+0xcc/0x620 [ 27.102305] kasan_report+0xae/0xe0 [ 27.103072] kasan_check_range+0x35/0x1b0 [ 27.103757] smb_strndup_from_utf16+0x188/0x350 [ 27.105474] smb2_sess_setup+0xaf8/0x19c0 [ 27.107935] handle_ksmbd_work+0x274/0x810 [ 27.108315] process_one_work+0x419/0x760 [ 27.108689] worker_thread+0x2a2/0x6f0 [ 27.109385] kthread+0x160/0x190 [ 27.110129] ret_from_fork+0x1f/0x30 [ 27.110454] </TASK> Cc: stable@vger.kernel.org Signed-off-by:
Chih-Yen Chang <cc85nod@gmail.com> Acked-by:
Namjae Jeon <linkinjeon@kernel.org> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
Chih-Yen Chang authored
Add tag_len argument in smb2_find_context_vals() to avoid out-of-bound read when create_context's name_len is larger than tag length. [ 7.995411] ================================================================== [ 7.995866] BUG: KASAN: global-out-of-bounds in memcmp+0x83/0xa0 [ 7.996248] Read of size 8 at addr ffffffff8258d940 by task kworker/0:0/7 ... [ 7.998191] Call Trace: [ 7.998358] <TASK> [ 7.998503] dump_stack_lvl+0x33/0x50 [ 7.998743] print_report+0xcc/0x620 [ 7.999458] kasan_report+0xae/0xe0 [ 7.999895] kasan_check_range+0x35/0x1b0 [ 8.000152] memcmp+0x83/0xa0 [ 8.000347] smb2_find_context_vals+0xf7/0x1e0 [ 8.000635] smb2_open+0x1df2/0x43a0 [ 8.006398] handle_ksmbd_work+0x274/0x810 [ 8.006666] process_one_work+0x419/0x760 [ 8.006922] worker_thread+0x2a2/0x6f0 [ 8.007429] kthread+0x160/0x190 [ 8.007946] ret_from_fork+0x1f/0x30 [ 8.008181] </TASK> Cc: stable@vger.kernel.org Signed-off-by:
Chih-Yen Chang <cc85nod@gmail.com> Acked-by:
Namjae Jeon <linkinjeon@kernel.org> Signed-off-by:
Steve French <stfrench@microsoft.com>
-
- May 15, 2023
-
-
Azeem Shaikh authored
Instead of open coding a __dynamic_array(), use the __string() and __assign_str() helper macros that exist for this kind of use case. Part of an effort to remove deprecated strlcpy() [1] completely from the kernel[2]. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Fixes: 3c92fba5 ("NFSD: Enhance the nfsd_cb_setup tracepoint") Signed-off-by:
Azeem Shaikh <azeemshaikh38@gmail.com> Reviewed-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Chuck Lever <chuck.lever@oracle.com>
-
- May 13, 2023
-
-
Theodore Ts'o authored
In ext4_update_inline_data(), if ext4_xattr_ibody_get() fails for any reason, it's best if we just fail as opposed to stumbling on, especially if the failure is EFSCORRUPTED. Cc: stable@kernel.org Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Theodore Ts'o authored
Normally the extended attributes in the inode body would have been checked when the inode is first opened, but if someone is writing to the block device while the file system is mounted, it's possible for the inode table to get corrupted. Add bounds checking to avoid reading beyond the end of allocated memory if this happens. Reported-by:
<syzbot+1966db24521e5f6e23f7@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=1966db24521e5f6e23f7 Cc: stable@kernel.org Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Theodore Ts'o authored
Whether the file system is mounted read-only or read/write is more important than the quota mode, which we are already printing. Add the ro vs r/w indication since this can be helpful in debugging problems from the console log. Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Theodore Ts'o authored
In no journal mode, ext4_finish_convert_inline_dir() can self-deadlock by calling ext4_handle_dirty_dirblock() when it already has taken the directory lock. There is a similar self-deadlock in ext4_incvert_inline_data_nolock() for data files which we'll fix at the same time. A simple reproducer demonstrating the problem: mke2fs -Fq -t ext2 -O inline_data -b 4k /dev/vdc 64 mount -t ext4 -o dirsync /dev/vdc /vdc cd /vdc mkdir file0 cd file0 touch file0 touch file1 attr -s BurnSpaceInEA -V abcde . touch supercalifragilisticexpialidocious Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230507021608.1290720-1-tytso@mit.edu Reported-by:
<syzbot+91dccab7c64e2850a4e5@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?id=ba84cc80a9491d65416bc7877e1650c87530fe8a Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Theodore Ts'o authored
If there are failures while changing the mount options in __ext4_remount(), we need to restore the old mount options. This commit fixes two problem. The first is there is a chance that we will free the old quota file names before a potential failure leading to a use-after-free. The second problem addressed in this commit is if there is a failed read/write to read-only transition, if the quota has already been suspended, we need to renable quota handling. Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230506142419.984260-2-tytso@mit.edu Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Theodore Ts'o authored
The ext4_dirhash() will *almost* never fail, especially when the hash tree feature was first introduced. However, with the addition of support of encrypted, casefolded file names, that function can most certainly fail today. So make sure the callers of ext4_dirhash() properly check for failures, and reflect the errors back up to their callers. Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu Reported-by:
<syzbot+394aa8a792cb99dbc837@syzkaller.appspotmail.com> Reported-by:
<syzbot+344aaa8697ebd232bfc8@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?id=db56459ea4ac4a676ae4b4678f633e55da005a9b Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Theodore Ts'o authored
When a file system currently mounted read/only is remounted read/write, if we clear the SB_RDONLY flag too early, before the quota is initialized, and there is another process/thread constantly attempting to create a directory, it's possible to trigger the WARN_ON_ONCE(dquot_initialize_needed(inode)); in ext4_xattr_block_set(), with the following stack trace: WARNING: CPU: 0 PID: 5338 at fs/ext4/xattr.c:2141 ext4_xattr_block_set+0x2ef2/0x3680 RIP: 0010:ext4_xattr_block_set+0x2ef2/0x3680 fs/ext4/xattr.c:2141 Call Trace: ext4_xattr_set_handle+0xcd4/0x15c0 fs/ext4/xattr.c:2458 ext4_initxattrs+0xa3/0x110 fs/ext4/xattr_security.c:44 security_inode_init_security+0x2df/0x3f0 security/security.c:1147 __ext4_new_inode+0x347e/0x43d0 fs/ext4/ialloc.c:1324 ext4_mkdir+0x425/0xce0 fs/ext4/namei.c:2992 vfs_mkdir+0x29d/0x450 fs/namei.c:4038 do_mkdirat+0x264/0x520 fs/namei.c:4061 __do_sys_mkdirat fs/namei.c:4076 [inline] __se_sys_mkdirat fs/namei.c:4074 [inline] __x64_sys_mkdirat+0x89/0xa0 fs/namei.c:4074 Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230506142419.984260-1-tytso@mit.edu Reported-by:
<syzbot+6385d7d3065524c5ca6d@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?id=6513f6cb5cd6b5fc9f37e3bb70d273b94be9c34c Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Baokun Li authored
When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may fail for some reason (e.g. memory allocation failure, bare disk write), and later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4 iomap_begin() returns an error, it is normal that the type of iomap->type may not match the expectation. Therefore, we only determine if iomap->type is as expected when ext4_iomap_begin() is executed successfully. Cc: stable@kernel.org Reported-by:
<syzbot+08106c4b7d60702dbc14@syzkaller.appspotmail.com> Link: https://lore.kernel.org/all/00000000000015760b05f9b4eee9@google.com Signed-off-by:
Baokun Li <libaokun1@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230505132429.714648-1-libaokun1@huawei.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Tudor Ambarus authored
When modifying the block device while it is mounted by the filesystem, syzbot reported the following: BUG: KASAN: slab-out-of-bounds in crc16+0x206/0x280 lib/crc16.c:58 Read of size 1 at addr ffff888075f5c0a8 by task syz-executor.2/15586 CPU: 1 PID: 15586 Comm: syz-executor.2 Not tainted 6.2.0-rc5-syzkaller-00205-gc96618275234 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:306 print_report+0x107/0x1f0 mm/kasan/report.c:417 kasan_report+0xcd/0x100 mm/kasan/report.c:517 crc16+0x206/0x280 lib/crc16.c:58 ext4_group_desc_csum+0x81b/0xb20 fs/ext4/super.c:3187 ext4_group_desc_csum_set+0x195/0x230 fs/ext4/super.c:3210 ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline] ext4_free_blocks+0x191a/0x2810 fs/ext4/mballoc.c:6173 ext4_remove_blocks fs/ext4/extents.c:2527 [inline] ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline] ext4_ext_remove_space+0x24ef/0x46a0 fs/ext4/extents.c:2958 ext4_ext_truncate+0x177/0x220 fs/ext4/extents.c:4416 ext4_truncate+0xa6a/0xea0 fs/ext4/inode.c:4342 ext4_setattr+0x10c8/0x1930 fs/ext4/inode.c:5622 notify_change+0xe50/0x1100 fs/attr.c:482 do_truncate+0x200/0x2f0 fs/open.c:65 handle_truncate fs/namei.c:3216 [inline] do_open fs/namei.c:3561 [inline] path_openat+0x272b/0x2dd0 fs/namei.c:3714 do_filp_open+0x264/0x4f0 fs/namei.c:3741 do_sys_openat2+0x124/0x4e0 fs/open.c:1310 do_sys_open fs/open.c:1326 [inline] __do_sys_creat fs/open.c:1402 [inline] __se_sys_creat fs/open.c:1396 [inline] __x64_sys_creat+0x11f/0x160 fs/open.c:1396 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f72f8a8c0c9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f72f97e3168 EFLAGS: 00000246 ORIG_RAX: 0000000000000055 RAX: ffffffffffffffda RBX: 00007f72f8bac050 RCX: 00007f72f8a8c0c9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280 RBP: 00007f72f8ae7ae9 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffd165348bf R14: 00007f72f97e3300 R15: 0000000000022000 Replace le16_to_cpu(sbi->s_es->s_desc_size) with sbi->s_desc_size It reduces ext4's compiled text size, and makes the code more efficient (we remove an extra indirect reference and a potential byte swap on big endian systems), and there is no downside. It also avoids the potential KASAN / syzkaller failure, as a bonus. Reported-by:
<syzbot+fc51227e7100c9294894@syzkaller.appspotmail.com> Reported-by:
<syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?id=70d28d11ab14bd7938f3e088365252aa923cff42 Link: https://syzkaller.appspot.com/bug?id=b85721b38583ecc6b5e72ff524c67302abbc30f3 Link: https://lore.kernel.org/all/000000000000ece18705f3b20934@google.com/ Fixes: 717d50e4 ("Ext4: Uninitialized Block Groups") Cc: stable@vger.kernel.org Signed-off-by:
Tudor Ambarus <tudor.ambarus@linaro.org> Link: https://lore.kernel.org/r/20230504121525.3275886-1-tudor.ambarus@linaro.org Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
When using cached extent stored in extent status tree in tree->cache_es another process holding ei->i_es_lock for reading can be racing with us setting new value of tree->cache_es. If the compiler would decide to refetch tree->cache_es at an unfortunate moment, it could result in a bogus in_range() check. Fix the possible race by using READ_ONCE() when using tree->cache_es only under ei->i_es_lock for reading. Cc: stable@kernel.org Reported-by:
<syzbot+4a03518df1e31b537066@syzkaller.appspotmail.com> Link: https://lore.kernel.org/all/000000000000d3b33905fa0fd4a6@google.com Suggested-by:
Dmitry Vyukov <dvyukov@google.com> Signed-off-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230504125524.10802-1-jack@suse.cz Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Jan Kara authored
Ext4 has a filesystem wide lock protecting ext4_writepages() calls to avoid races with switching of journalled data flag or inode format. This lock can however cause a deadlock like: CPU0 CPU1 ext4_writepages() percpu_down_read(sbi->s_writepages_rwsem); ext4_change_inode_journal_flag() percpu_down_write(sbi->s_writepages_rwsem); - blocks, all readers block from now on ext4_do_writepages() ext4_init_io_end() kmem_cache_zalloc(io_end_cachep, GFP_KERNEL) fs_reclaim frees dentry... dentry_unlink_inode() iput() - last ref => iput_final() - inode dirty => write_inode_now()... ext4_writepages() tries to acquire sbi->s_writepages_rwsem and blocks forever Make sure we cannot recurse into filesystem reclaim from writeback code to avoid the deadlock. Reported-by:
<syzbot+6898da502aef574c5f8a@syzkaller.appspotmail.com> Link: https://lore.kernel.org/all/0000000000004c66b405fa108e27@google.com Fixes: c8585c6f ("ext4: fix races between changing inode journal mode and ext4_writepages") CC: stable@vger.kernel.org Signed-off-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230504124723.20205-1-jack@suse.cz Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Theodore Ts'o authored
In ext4_xattr_move_to_block(), the value of the extended attribute which we need to move to an external block may be allocated by kvmalloc() if the value is stored in an external inode. So at the end of the function the code tried to check if this was the case by testing entry->e_value_inum. However, at this point, the pointer to the xattr entry is no longer valid, because it was removed from the original location where it had been stored. So we could end up calling kvfree() on a pointer which was not allocated by kvmalloc(); or we could also potentially leak memory by not freeing the buffer when it should be freed. Fix this by storing whether it should be freed in a separate variable. Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230430160426.581366-1-tytso@mit.edu Link: https://syzkaller.appspot.com/bug?id=5c2aee8256e30b55ccf57312c16d88417adbd5e1 Link: https://syzkaller.appspot.com/bug?id=41a6b5d4917c0412eb3b3c3c604965bed7d7420b Reported-by:
<syzbot+64b645917ce07d89bde5@syzkaller.appspotmail.com> Reported-by:
<syzbot+0d042627c4f2ad332195@syzkaller.appspotmail.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-
Theodore Ts'o authored
If a malicious fuzzer overwrites the ext4 superblock while it is mounted such that the s_first_data_block is set to a very large number, the calculation of the block group can underflow, and trigger a BUG_ON check. Change this to be an ext4_warning so that we don't crash the kernel. Cc: stable@kernel.org Link: https://lore.kernel.org/r/20230430154311.579720-3-tytso@mit.edu Reported-by:
<syzbot+e2efa3efc15a1c9e95c3@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?id=69b28112e098b070f639efb356393af3ffec4220 Signed-off-by:
Theodore Ts'o <tytso@mit.edu>
-