aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/recovery.c
AgeCommit message (Collapse)AuthorFilesLines
2024-01-21bcachefs: bch_fs_usage_baseGravatar Kent Overstreet 1-1/+1
Split out base filesystem usage into its own type; prep work for breaking up bch2_trans_fs_usage_apply(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Restart recovery passes more reliablyGravatar Kent Overstreet 1-1/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Upgrades now specify errors to fix, like downgradesGravatar Kent Overstreet 1-9/+9
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Online fsck can now fix errorsGravatar Kent Overstreet 1-2/+5
BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing fsck errors. Also; set fix_errors to ask by default when fsck is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Upgrading uses bch_sb.recovery_passes_requiredGravatar Kent Overstreet 1-8/+6
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Fix nochanges/read_only interactionGravatar Kent Overstreet 1-1/+3
nochanges means "we cannot issue writes at all"; it's possible to go into a pseudo read-write mode where we pin dirty metadata in memory, which is used for fsck in dry run mode and doing journal replay on a read only mount, but we do not want to allow an actual read-write mount in nochanges mode. But we do always want to allow early read-write, during recovery - this patch clarifies that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: vstruct_for_each() now declares loop iterGravatar Kent Overstreet 1-6/+3
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: for_each_member_device() now declares loop iterGravatar Kent Overstreet 1-6/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: darray_for_each() now declares loop iterGravatar Kent Overstreet 1-1/+0
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_err_(fn|msg) check if should printGravatar Kent Overstreet 1-15/+9
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Fix snapshot.c assertion for online fsckGravatar Kent Overstreet 1-0/+3
c->curr_recovery_pass can go backwards; this adds a non rewinding version, c->recovery_pass_done. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Kill for_each_btree_key()Gravatar Kent Overstreet 1-1/+2
for_each_btree_key() handles transaction restarts, like for_each_btree_key2(), but only calls bch2_trans_begin() after a transaction restart - for_each_btree_key2() wraps every loop iteration in a transaction. The for_each_btree_key() behaviour is problematic when it leads to holding the SRCU lock that prevents key cache reclaim for an unbounded amount of time - there's no real need to keep it around. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_run_online_recovery_passes()Gravatar Kent Overstreet 1-19/+38
Add a new helper for running online recovery passes - i.e. online fsck. This is a subset of our normal recovery passes, and does not - for now - use or follow c->curr_recovery_pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Add ability to redirect log outputGravatar Kent Overstreet 1-3/+3
Upcoming patches are going to add two new ioctls for running fsck in the kernel, but pretending that we're running our normal userspace fsck. This patch adds some plumbing for redirecting our normal log messages away from the dmesg log to a thread_with_file file descriptor - via a struct log_output, which will be consumed by the fsck f_op's read method. The new ioctls will allow for running fsck in the kernel against an offline filesystem (without mounting it), and an online filesystem. For an offline filesystem we need a way to pass in a pointer to the log_output, which is done via a new hidden opts.h option. For online fsck, we can set c->output directly, but only want to redirect log messages from the thread running fsck - hence the new c->output_filter method. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Explicity go RW for fsckGravatar Kent Overstreet 1-1/+1
This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less error prone. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: convert bch_fs_flags to x-macroGravatar Kent Overstreet 1-17/+17
Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Kill dev_usage->buckets_ecGravatar Kent Overstreet 1-2/+0
This counter is redundant; it's simply the sum of BCH_DATA_stripe and BCH_DATA_parity buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Don't flush journal after replayGravatar Kent Overstreet 1-3/+1
The flush_all_pins() after journal replay was unecessary, and trying to completely flush the journal while RW is not a great idea - it's not guaranteed to terminate if other threads keep adding things to the jorunal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Rename BTREE_INSERT flagsGravatar Kent Overstreet 1-6/+6
BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Make journal replay more efficientGravatar Kent Overstreet 1-28/+62
Journal replay now first attempts to replay keys in sorted order, similar to how the btree write buffer flush path works. Any keys that can not be replayed due to journal deadlock are then left for later and replayed in journal order, unpinning journal entries as we go. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Go rw before journal replayGravatar Kent Overstreet 1-4/+12
This gets us slightly nicer log messages. Also, this slightly clarifies synchronization of c->journal_keys; after we go RW it's in use by multiple threads (so that the btree iterator code can overlay keys from the journal); so it has to be prepped before that point. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: BTREE_INSERT_JOURNAL_REPLAY now "don't init trans->journal_res"Gravatar Kent Overstreet 1-0/+2
This slightly changes how trans->journal_res works, in preparation for changing the btree write buffer flush path to use it. Now, BTREE_INSERT_JOURNAL_REPLAY means "don't take a journal reservation; trans->journal_res.seq already refers to the journal sequence number to pin". Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Print old version when scanning for old metadataGravatar Kent Overstreet 1-2/+6
Also: we should be using bch2_fs_read_write_early() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Flush fsck errors before running twiceGravatar Kent Overstreet 1-0/+2
It's confusing if we run fsck a second time (in debug mode, to verify the second run is clean), but errors are still ratelimited from the first run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_sb_field_downgradeGravatar Kent Overstreet 1-1/+23
Add a new superblock section that contains a list of { minor version, recovery passes, errors_to_fix } that is - a list of recovery passes that must be run when downgrading past a given version, and a list of errors to silently fix. The upcoming disk accounting rewrite is not going to be fully compatible: we're going to have to regenerate accounting both when upgrading to the new version, and also from downgrading from the new version, since the new method of doing disk space accounting is a completely different architecture based on deltas, and synchronizing them for every jounal entry write to maintain compatibility is going to be too expensive and impractical. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_sb.recovery_passes_requiredGravatar Kent Overstreet 1-15/+60
Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Add persistent identifiers for recovery passesGravatar Kent Overstreet 1-2/+32
The next patch will start to refer to recovery passes from the superblock; naturally, we now need identifiers that don't change, since the existing enum is in the order in which they are run and is not fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: fix setting version_upgrade_completeGravatar Kent Overstreet 1-2/+2
If a superblock write hasn't happened (i.e. we never had to go rw), then c->sb.version will be out of date w.r.t. c->disk_sb.sb->version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-10bcachefs: Fix uninitialized var in bch2_journal_replay()Gravatar Kent Overstreet 1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Proper refcounting for journal_keysGravatar Kent Overstreet 1-4/+7
The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04bcachefs: Fix recovery when forced to use JSET_NO_FLUSH journal entryGravatar Kent Overstreet 1-0/+7
When we didn't find anything in the journal that we'd like to use, and we're forced to use whatever we can find - that entry will have been a JSET_NO_FLUSH entry with a garbage last_seq value, since it's not normally used. Initialize it to something sane, for bch2_fs_journal_start(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-04bcachefs: Fix build errors with gcc 10Gravatar Kent Overstreet 1-1/+1
gcc 10 seems to complain about array bounds in situations where gcc 11 does not - curious. This unfortunately requires adding some casts for now; we may investigate getting rid of our __u64 _data[] VLA in a future patch so that our start[0] members can be VLAs. Reported-by: John Stoffel <john@stoffel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01bcachefs: Enumerate fsck errorsGravatar Kent Overstreet 1-2/+8
This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01bcachefs: rebalance_workGravatar Kent Overstreet 1-0/+1
This adds a new btree, rebalance_work, to eliminate scanning required for finding extents that need work done on them in the background - i.e. for the background_target and background_compression options. rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an extent in the extents or reflink btree at the same pos. A new extent field is added, bch_extent_rebalance, which indicates that this extent has work that needs to be done in the background - and which options to use. This allows per-inode options to be propagated to indirect extents - at least in some circumstances. In this patch, changing IO options on a file will not propagate the new options to indirect extents pointed to by that file. Updating (setting/clearing) the rebalance_work btree is done by the extent trigger, which looks at the bch_extent_rebalance field. Scanning is still requrired after changing IO path options - either just for a given inode, or for the whole filesystem. We indicate that scanning is required by adding a KEY_TYPE_cookie key to the rebalance_work btree: the cookie counter is so that we can detect that scanning is still required when an option has been flipped mid-way through an existing scan. Future possible work: - Propagate options to indirect extents when being changed - Add other IO path options - nr_replicas, ec, to rebalance_work so they can be applied in the background when they change - Add a counter, for bcachefs fs usage output, showing the pending amount of rebalance work: we'll probably want to do this after the disk space accounting rewrite (moving it to a new btree) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: Ensure devices are always correctly initializedGravatar Kent Overstreet 1-15/+9
We can't mark device superblocks or allocate journal on a device that isn't online. That means we may need to do this on every mount, because we may have formatted a new filesystem and then done the first mount (bch2_fs_initialize()) in degraded mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: bch2_btree_id_str()Gravatar Kent Overstreet 1-3/+3
Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: Don't run bch2_delete_dead_snapshots() unnecessarilyGravatar Kent Overstreet 1-1/+1
Be a bit more careful about when bch2_delete_dead_snapshots needs to run: it only needs to run synchronously if we're running fsck, and it only needs to run at all if we have snapshot nodes to delete or if fsck has noticed that it needs to run. Also: Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS Kill bch2_delete_dead_snapshots_hook(), move functionality to bch2_mark_snapshot() Factor out bch2_check_snapshot_needs_deletion(), to explicitly check if we need to be running snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix drop_alloc_keys()Gravatar Kent Overstreet 1-15/+15
For consistency with the rest of the reconstruct_alloc option, we should be skipping all alloc keys. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix another smatch complaintGravatar Kent Overstreet 1-1/+1
This should be harmless, but initialize last_seq anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make btree root read errors recoverableGravatar Kent Overstreet 1-5/+4
The entire btree will be lost, but that is better than the entire filesystem not being recoverable. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Heap allocate btree_transGravatar Kent Overstreet 1-3/+3
We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix W=12 build errorsGravatar Kent Overstreet 1-12/+4
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix a handful of spelling mistakes in various messagesGravatar Colin Ian King 1-1/+1
There are several spelling mistakes in error messages. Fix these. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: BTREE_ID_logged_opsGravatar Kent Overstreet 1-0/+1
Add a new btree for long running logged operations - i.e. for logging operations that we can't do within a single btree transaction, so that they can be resumed if we crash. Keys in the logged operations btree will represent operations in progress, with the state of the operation stored in the value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Split out snapshot.cGravatar Kent Overstreet 1-0/+1
subvolume.c has gotten a bit large, this splits out a separate file just for managing snapshot trees - BTREE_ID_snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix check_version_upgrade()Gravatar Kent Overstreet 1-1/+1
We were failing to upgrade to the latest compatible version - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: btree_journal_iter.cGravatar Kent Overstreet 1-519/+2
Split out a new file from recovery.c for managing the list of keys we read from the journal: before journal replay finishes the btree iterator code needs to be able to iterate over and return keys from the journal as well, so there's a fair bit of code here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: sb-clean.cGravatar Kent Overstreet 1-136/+9
Pull code for bch_sb_field_clean out into its own file. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix assorted checkpatch nitsGravatar Kent Overstreet 1-2/+2
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: BCH_COMPAT_bformat_overflow_done no longer requiredGravatar Kent Overstreet 1-7/+0
Awhile back, we changed bkey_format generation to ensure that the packed representation could never represent fields larger than the unpacked representation. This was to ensure that bkey_packed_successor() always gave a sensible result, but in the current code bkey_packed_successor() is only used in a debug assertion - not for anything important. This kills the requirement that we've gotten rid of those weird bkey formats, and instead changes the assertion to check if we're dealing with an old weird bkey format. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>