Commits · 12207f49ef41d5599fb313d103f2c7b485848c9d · hardware-enablement / Rockchip upstream enablement efforts / linux

Jan 21, 2024

bcachefs: comment bch_subvolume · 12207f49
Kent Overstreet authored 1 year ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
12207f49

bcachefs: bch_snapshot::btime · d32088f2


Add a field to bch_snapshot for creation time; this will be important
when we start exposing the snapshot tree to userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d32088f2

Jan 06, 2024

bcachefs: Upgrades now specify errors to fix, like downgrades · 15eaaa4c
Kent Overstreet authored 1 year ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
15eaaa4c

bcachefs: bch_member->seq · 6b00de06

Kent Overstreet authored 1 year ago


Add new fields for split brain detection:

 - bch_member->seq, which tracks the sequence number of the last superblock
   write that happened to each member device

 - bch_sb->write_time, which tracks the time of the last superblock write,
   to allow detection of when two members have diverged but had the same
   number of superblock writes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6b00de06

bcachefs: Check journal entries for invalid keys in trans commit path · 5e329145
Kent Overstreet authored 1 year ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
5e329145

Jan 01, 2024

bcachefs: fix userspace build errors · e06af207
Kent Overstreet authored 1 year ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
e06af207

bcachefs: btree write buffer now slurps keys from journal · 09caeabe

Kent Overstreet authored 1 year ago


Previosuly, the transaction commit path would have to add keys to the
btree write buffer as a separate operation, requiring additional global
synchronization.

This patch introduces a new journal entry type, which indicates that the
keys need to be copied into the btree write buffer prior to being
written out. We switch the journal entry type back to
JSET_ENTRY_btree_keys prior to write, so this is not an on disk format
change.

Flushing the btree write buffer may require pulling keys out of journal
entries yet to be written, and quiescing outstanding journal
reservations; we previously added journal->buf_lock for synchronization
with the journal write path.

We also can't put strict bounds on the number of keys in the journal
destined for the write buffer, which means we might overflow the size of
the preallocated buffer and have to reallocate - this introduces a
potentially fatal memory allocation failure. This is something we'll
have to watch for, if it becomes an issue in practice we can do
additional mitigation.

The transaction commit path no longer has to explicitly check if the
write buffer is full and wait on flushing; this is another performance
optimization. Instead, when the btree write buffer is close to full we
change the journal watermark, so that only reservations for journal
reclaim are allowed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

09caeabe

bcachefs: Improve btree write buffer tracepoints · 56db2429

Kent Overstreet authored 1 year ago


 - add a tracepoint for write_buffer_flush_sync; this is expensive
 - fix the write_buffer_flush_slowpath tracepoint

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

56db2429

bcachefs: Kill dev_usage->buckets_ec · 9b34f02c

Kent Overstreet authored 1 year ago


This counter is redundant; it's simply the sum of BCH_DATA_stripe and
BCH_DATA_parity buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9b34f02c

bcachefs: Rename bch_replicas_entry -> bch_replicas_entry_v1 · 086a52f7
Kent Overstreet authored 1 year ago
```
Prep work for introducing bch_replicas_entry_v2

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
086a52f7
bcachefs: Kill memset() in bch2_btree_iter_init() · 1ae8a090
Kent Overstreet authored 1 year ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
1ae8a090

bcachefs: bch_sb_field_downgrade · 84f16387

Kent Overstreet authored 1 year ago


Add a new superblock section that contains a list of
  { minor version, recovery passes, errors_to_fix }

that is - a list of recovery passes that must be run when downgrading
past a given version, and a list of errors to silently fix.

The upcoming disk accounting rewrite is not going to be fully
compatible: we're going to have to regenerate accounting both when
upgrading to the new version, and also from downgrading from the new
version, since the new method of doing disk space accounting is a
completely different architecture based on deltas, and synchronizing
them for every jounal entry write to maintain compatibility is going to
be too expensive and impractical.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

84f16387

bcachefs: bch_sb.recovery_passes_required · 8b16413c

Kent Overstreet authored 1 year ago


Add two new superblock fields. Since the main section of the superblock
is now fully, we have to add a new variable length section for them -
bch_sb_field_ext.

 - recovery_passes_requried: recovery passes that must be run on the
   next mount
 - errors_silent: errors that will be silently fixed

These are to improve upgrading and dwongrading: these fields won't be
cleared until after recovery successfully completes, so there won't be
any issues with crashing partway through an upgrade or a downgrade.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8b16413c

Nov 28, 2023

bcachefs: trace_move_extent_start_fail() now includes errcode · ae4d612c

Kent Overstreet authored 1 year ago


Renamed from trace_move_extent_alloc_mem_fail, because there are other
reasons we colud fail (disk space allocation failure).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ae4d612c

Nov 26, 2023

bcachefs: bpos is misaligned on big endian · 3f3ae125

Kent Overstreet authored 1 year ago

bkey embeds a bpos that is misaligned on big endian; this is so that
bch2_bkey_swab() works correctly without having to differentiate between
packed and non-packed keys (a debatable design decision).

This means it can't have the __aligned() tag on big endian.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3f3ae125

Nov 05, 2023

bcachefs: x-macro-ify inode flags enum · 103ffe9a

Kent Overstreet authored 1 year ago


This lets us use bch2_prt_bitflags to print them out.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

103ffe9a

bcachefs: rebalance_work btree is not a snapshots btree · d3c7727b

Kent Overstreet authored 1 year ago


rebalance_work entries may refer to entries in the extents btree, which
is a snapshots btree, or they may also refer to entries in the reflink
btree, which is not.

Hence rebalance_work keys may use the snapshot field but it's not
required to be nonzero - add a new btree flag to reflect this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

d3c7727b

Nov 04, 2023

bcachefs: Fix build errors with gcc 10 · 6dfa10ab

Kent Overstreet authored 1 year ago


gcc 10 seems to complain about array bounds in situations where gcc 11
does not - curious.

This unfortunately requires adding some casts for now; we may
investigate getting rid of our __u64 _data[] VLA in a future patch so
that our start[0] members can be VLAs.

Reported-by: John Stoffel <john@stoffel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

6dfa10ab

Nov 02, 2023

bcachefs: bch_sb_field_errors · f5d26fa3

Kent Overstreet authored 1 year ago


Add a new superblock section to keep counts of errors seen since
filesystem creation: we'll be addingcounters for every distinct fsck
error.

The new superblock section has entries of the for [ id, count,
time_of_last_error ]; this is intended to let us see what errors are
occuring - and getting fixed - via show-super output.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f5d26fa3

bcachefs: Add IO error counts to bch_member · 94119eeb

Kent Overstreet authored 1 year ago


We now track IO errors per device since filesystem creation.

IO error counts can be viewed in sysfs, or with the 'bcachefs
show-super' command.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

94119eeb

bcachefs: rebalance_work · fb3f57bb

Kent Overstreet authored 1 year ago


This adds a new btree, rebalance_work, to eliminate scanning required
for finding extents that need work done on them in the background - i.e.
for the background_target and background_compression options.

rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an
extent in the extents or reflink btree at the same pos.

A new extent field is added, bch_extent_rebalance, which indicates that
this extent has work that needs to be done in the background - and which
options to use. This allows per-inode options to be propagated to
indirect extents - at least in some circumstances. In this patch,
changing IO options on a file will not propagate the new options to
indirect extents pointed to by that file.

Updating (setting/clearing) the rebalance_work btree is done by the
extent trigger, which looks at the bch_extent_rebalance field.

Scanning is still requrired after changing IO path options - either just
for a given inode, or for the whole filesystem. We indicate that
scanning is required by adding a KEY_TYPE_cookie key to the
rebalance_work btree: the cookie counter is so that we can detect that
scanning is still required when an option has been flipped mid-way
through an existing scan.

Future possible work:
 - Propagate options to indirect extents when being changed
 - Add other IO path options - nr_replicas, ec, to rebalance_work so
   they can be applied in the background when they change
 - Add a counter, for bcachefs fs usage output, showing the pending
   amount of rebalance work: we'll probably want to do this after the
   disk space accounting rewrite (moving it to a new btree)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

fb3f57bb

Oct 22, 2023

bcachefs: Add iops fields to bch_member · 40f7914e

Hunter Shaffer authored 1 year ago


Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

40f7914e

bcachefs: Rename bch_sb_field_members -> bch_sb_field_members_v1 · 9af26120

Hunter Shaffer authored 1 year ago


Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

9af26120

bcachefs: New superblock section members_v2 · 3f7b9713

Hunter Shaffer authored 1 year ago

members_v2 has dynamically resizable entries so that we can extend
bch_member. The members can no longer be accessed with simple array
indexing Instead members_v2_get is used to find a member's exact
location within the array and returns a copy of that member.
Alternatively member_v2_get_mut retrieves a mutable point to a member.

Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

3f7b9713

bcachefs: Fix W=12 build errors · 96dea3d5
Kent Overstreet authored 1 year ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
96dea3d5

bcachefs: Log finsert/fcollapse operations · f3e374ef

Kent Overstreet authored 1 year ago

Now that we have the logged operations btree, we can make
finsert/fcollapse atomic w.r.t. unclean shutdown as well.

This adds bch_logged_op_finsert to represent the state of an finsert or
fcollapse, which is a bit more complicated than truncate since we need
to track our position in the "shift extents" operation.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f3e374ef

bcachefs: Log truncate operations · b030e262

Kent Overstreet authored 1 year ago


Previously, we guaranteed atomicity of truncate after unclean shutdown
with the BCH_INODE_I_SIZE_DIRTY flag - which required a full scan of the
inodes btree.

Recently the deleted inodes btree was added so that we no longer have to
scan for deleted inodes, but truncate was unfinished and that change
left it broken.

This patch uses the new logged operations btree to fix truncate
atomicity; we now log an operation that can be replayed at the start of
a truncate.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

b030e262

bcachefs: BTREE_ID_logged_ops · aaad530a

Kent Overstreet authored 1 year ago


Add a new btree for long running logged operations - i.e. for logging
operations that we can't do within a single btree transaction, so that
they can be resumed if we crash.

Keys in the logged operations btree will represent operations in
progress, with the state of the operation stored in the value.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

aaad530a

bcachefs: Array bounds fixes · 5cfd6977

Kent Overstreet authored 1 year ago


It's no longer legal to use a zero size array as a flexible array
member - this causes UBSAN to complain.

This patch switches our zero size arrays to normal flexible array
members when possible, and inserts casts in other places (e.g. where we
use the zero size array as a marker partway through an array).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

5cfd6977

bcachefs: Cleanup redundant snapshot nodes · f55d6e07

Kent Overstreet authored 1 year ago


After deleteing snapshots, we may be left with a snapshot tree where
some nodes only have one child, and we have a linear chain.

Interior snapshot nodes are never used directly (i.e. they never have
subvolumes that point to them), they are only referered to by child
snapshot nodes - hence, they are redundant.

The existing code talks about redundant snapshot nodes as forming and
equivalence class; i.e. nodes for which snapshot_t->equiv is equal. In a
given equivalence class, we only ever need a single key at a given
position - i.e. multiple versions with different snapshot fields are
redundant.

The existing snapshot cleanup code deletes these redundant keys, but not
redundant nodes. It turns out this is buggy, because we assume that
after snapshot deletion finishes we should only have a single key per
equivalence class, but the btree update path doesn't preserve this -
overwriting keys in old snapshots doesn't check for the equivalence
class being equal, and thus we can end up with duplicate keys in the
same equivalence class and fsck complaining about snapshot deletion not
having run correctly.

The equivalence class notion has been leaking out of the core snapshots
code and into too much other code, i.e. fsck, so this patch takes a
different approach: snapshot deletion now moves keys to the node in an
equivalence class being kept (the leafiest node) and then deletes the
redundant nodes in the equivalance class.

Some work has to be done to correctly delete interior snapshot nodes;
snapshot node depth and skiplist fields for descendent nodes have to be
fixed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f55d6e07

bcachefs: Split out snapshot.c · 8e877caa

Kent Overstreet authored 1 year ago


subvolume.c has gotten a bit large, this splits out a separate file just
for managing snapshot trees - BTREE_ID_snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

8e877caa

bcachefs: Lower BCH_NAME_MAX to 512 · a125c074

Joshua Ashton authored 1 year ago


To ensure we aren't shooting ourselves in the foot after merge for
potentially doing future revisions for dirent or for storing multiple
names for casefolding, limit this to 512 for now.

Previously this define was linked to the max size a d_name in
bch_dirent could be.

Signed-off-by: Joshua Ashton <joshua@froggi.es>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

a125c074

bcachefs: bcachefs_metadata_version_deleted_inodes · dde8cb11

Kent Overstreet authored 1 year ago


Add a new bitset btree for inodes pending deletion; this means we no
longer have to scan the full inodes btree after an unclean shutdown.

Specifically, this adds:
 - a trigger to update the deleted_inodes btree based on changes to the
   inodes btree
 - a new recovery pass
 - and check_inodes is now only a fsck pass.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

dde8cb11

bcachefs: Assorted fixes for clang · bf5a261c

Kent Overstreet authored 1 year ago


clang had a few more warnings about enum conversion, and also didn't
like the opts.c initializer.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

bf5a261c

bcachefs: Consolidate btree id properties · e8d2fe3b

Kent Overstreet authored 1 year ago


This refactoring centralizes defining per-btree properties.

bch2_key_types_allowed was also about to overflow a u32, so expand that
to a u64.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e8d2fe3b

bcachefs: Extent sb compression type fields to 8 bits · e86e9124

Kent Overstreet authored 1 year ago


The upper 4 bits are for compression level.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

e86e9124

bcachefs: bcachefs_format.h should be using __u64 · a5cf5a4b
Kent Overstreet authored 1 year ago
```
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
```
a5cf5a4b

bcachefs: Snapshot depth, skiplist fields · f26c67f4

Kent Overstreet authored 1 year ago


This extents KEY_TYPE_snapshot to include some new fields:
 - depth, to indicate depth of this particular node from the root
 - skip[3], skiplist entries for quickly walking back up to the root

These are to improve bch2_snapshot_is_ancestor(), making it O(ln(n))
instead of O(n) in the snapshot tree depth.

Skiplist nodes are picked at random from the set of ancestor nodes, not
some fixed fraction.

This introduces bcachefs_metadata_version 1.1, snapshot_skiplists.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

f26c67f4

bcachefs: Version table now lists required recovery passes · 065bd335

Kent Overstreet authored 1 year ago


Now that we've got forward compatibility sorted out, we should be doing
more frequent version upgrades in the future.

To avoid having to run a full fsck for every version upgrade, this
improves the BCH_METADATA_VERSIONS() table to explicitly specify a
bitmask of recovery passes to run when upgrading to or past a given
version.

This means we can also delete PASS_UPGRADE().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

065bd335

bcachefs: bcachefs_metadata_version_major_minor · ba8eeae8

Kent Overstreet authored 1 year ago

This introduces major/minor versioning to the superblock version number.
Major version number changes indicate incompatible releases; we can move
forward to a new major version number, but not backwards. Minor version
numbers indicate compatible changes - these add features, but can still
be mounted and used by old versions.

With the recent patches that make it possible to roll out new btrees and
key types without breaking compatibility, we should be able to roll out
most new features without incompatible changes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

ba8eeae8