aboutsummaryrefslogtreecommitdiff
path: root/fs/bcachefs/bcachefs.h
AgeCommit message (Collapse)AuthorFilesLines
2024-02-13bcachefs: Clamp replicas_required to replicasGravatar Kent Overstreet 1-0/+12
This prevents going emergency read only when the user has specified replicas_required > replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Prep work for variable size btree node buffersGravatar Kent Overstreet 1-5/+0
bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: add time_stats for btree_node_read_done()Gravatar Kent Overstreet 1-0/+1
Seeing weird latency issues in the btree node read path - add one bch2_btree_node_read_done(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: Online fsck can now fix errorsGravatar Kent Overstreet 1-4/+1
BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing fsck errors. Also; set fix_errors to ask by default when fsck is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: factor out thread_with_file, thread_with_stdioGravatar Kent Overstreet 1-8/+12
thread_with_stdio now knows how to handle input - fsck can now prompt to fix errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: track transaction durationsGravatar Kent Overstreet 1-0/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Make sure allocation failure errors are loggedGravatar Kent Overstreet 1-0/+6
The previous patch fixed a bug in allocation path error handling, and it would've been noticed sooner had it been logged properly. Generally speaking, errors that shouldn't happen in normal operation and are being returned up the stack should be logged: the write path was already logging IO errors, but non IO errors were missed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_err_(fn|msg) check if should printGravatar Kent Overstreet 1-2/+7
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: fix userspace build errorsGravatar Kent Overstreet 1-0/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: kill btree_trans->wb_updatesGravatar Kent Overstreet 1-1/+0
the btree write buffer path now creates a journal entry directly Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: btree write buffer now slurps keys from journalGravatar Kent Overstreet 1-0/+11
Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Improve trans->extra_journal_entriesGravatar Kent Overstreet 1-0/+1
Instead of using a darray, we now allocate journal entries for the transaction commit path with our normal bump allocator - with an inlined fastpath, and using btree_transaction_stats to remember how much to initially allocate so as to avoid transaction restarts. This is prep work for converting write buffer updates to use this mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Fix snapshot.c assertion for online fsckGravatar Kent Overstreet 1-0/+3
c->curr_recovery_pass can go backwards; this adds a non rewinding version, c->recovery_pass_done. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: BCH_IOCTL_FSCK_ONLINEGravatar Kent Overstreet 1-0/+1
This adds a new ioctl for running fsck on a mounted, in use filesystem. This reuses the fsck_thread code from the previous patch for running fsck on an offline, unmounted filesystem, so that log messages for the fsck thread are redirected to userspace. Only one running fsck instance is allowed at a time; a new semaphore (since the lock will be taken by one thread and released by another) is added for this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch2_run_online_recovery_passes()Gravatar Kent Overstreet 1-0/+7
Add a new helper for running online recovery passes - i.e. online fsck. This is a subset of our normal recovery passes, and does not - for now - use or follow c->curr_recovery_pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Add ability to redirect log outputGravatar Kent Overstreet 1-14/+41
Upcoming patches are going to add two new ioctls for running fsck in the kernel, but pretending that we're running our normal userspace fsck. This patch adds some plumbing for redirecting our normal log messages away from the dmesg log to a thread_with_file file descriptor - via a struct log_output, which will be consumed by the fsck f_op's read method. The new ioctls will allow for running fsck in the kernel against an offline filesystem (without mounting it), and an online filesystem. For an offline filesystem we need a way to pass in a pointer to the log_output, which is done via a new hidden opts.h option. For online fsck, we can set c->output directly, but only want to redirect log messages from the thread running fsck - hence the new c->output_filter method. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: c->ro_refGravatar Kent Overstreet 1-0/+21
Add a new refcount for async ops that don't necessarily need the fs to be RW, with similar lifetime/rules otherwise as c->writes. To be used by online fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: count_event()Gravatar Kent Overstreet 1-1/+3
Small helper for event counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Clean up btree write buffer write ref handlingGravatar Kent Overstreet 1-1/+2
__bch2_btree_write_buffer_flush() now assumes a write ref is already held (as called by the transaction commit path); and the wrappers bch2_write_buffer_flush() and flush_sync() take an explicit write ref. This means internally the write buffer code can always use BTREE_INSERT_NOCHECK_RW, instead of in the previous code passing flags around and hoping the NOCHECK_RW flag was always carried around correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: convert bch_fs_flags to x-macroGravatar Kent Overstreet 1-28/+34
Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: track_event_change()Gravatar Kent Overstreet 1-1/+3
This introduces a new helper for connecting time_stats to state changes, i.e. when taking journal reservations is blocked for some reason. We use this to track separately the different reasons the journal might be blocked - i.e. space in the journal full, or the journal pin fifo full. Also do some cleanup and improvements on the time stats code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bch_sb.recovery_passes_requiredGravatar Kent Overstreet 1-0/+1
Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-28bcachefs: Fix zstd compress workspace sizeGravatar Kent Overstreet 1-1/+1
zstd apparently lies about the size of the compression workspace it requires; if we double it compression succeeds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24bcachefs: Proper refcounting for journal_keysGravatar Kent Overstreet 1-0/+2
The btree iterator code overlays keys from the journal until journal replay is finished; since we're now starting copygc/rebalance etc. before replay is finished, this is multithreaded access and thus needs refcounting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-13bcachefs: Fix multiple -Warray-bounds warningsGravatar Gustavo A. R. Silva 1-1/+1
Transform zero-length array `entries` into a proper flexible-array member in `struct journal_seq_blacklist_table`; and fix the following -Warray-bounds warnings: fs/bcachefs/journal_seq_blacklist.c:148:26: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:150:30: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:154:27: warning: array subscript idx is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:176:27: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:177:27: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:297:34: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:298:34: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] fs/bcachefs/journal_seq_blacklist.c:300:31: warning: array subscript i is outside array bounds of 'struct journal_seq_blacklist_table_entry[0]' [-Warray-bounds=] This results in no differences in binary output. This helps with the ongoing efforts to globally enable -Warray-bounds. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01bcachefs: bch_sb_field_errorsGravatar Kent Overstreet 1-5/+9
Add a new superblock section to keep counts of errors seen since filesystem creation: we'll be addingcounters for every distinct fsck error. The new superblock section has entries of the for [ id, count, time_of_last_error ]; this is intended to let us see what errors are occuring - and getting fixed - via show-super output. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01bcachefs: Add IO error counts to bch_memberGravatar Kent Overstreet 1-0/+2
We now track IO errors per device since filesystem creation. IO error counts can be viewed in sysfs, or with the 'bcachefs show-super' command. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01bcachefs: rebalance_workGravatar Kent Overstreet 1-0/+1
This adds a new btree, rebalance_work, to eliminate scanning required for finding extents that need work done on them in the background - i.e. for the background_target and background_compression options. rebalance_work is a bitset btree, where a KEY_TYPE_set corresponds to an extent in the extents or reflink btree at the same pos. A new extent field is added, bch_extent_rebalance, which indicates that this extent has work that needs to be done in the background - and which options to use. This allows per-inode options to be propagated to indirect extents - at least in some circumstances. In this patch, changing IO options on a file will not propagate the new options to indirect extents pointed to by that file. Updating (setting/clearing) the rebalance_work btree is done by the extent trigger, which looks at the bch_extent_rebalance field. Scanning is still requrired after changing IO path options - either just for a given inode, or for the whole filesystem. We indicate that scanning is required by adding a KEY_TYPE_cookie key to the rebalance_work btree: the cookie counter is so that we can detect that scanning is still required when an option has been flipped mid-way through an existing scan. Future possible work: - Propagate options to indirect extents when being changed - Add other IO path options - nr_replicas, ec, to rebalance_work so they can be applied in the background when they change - Add a counter, for bcachefs fs usage output, showing the pending amount of rebalance work: we'll probably want to do this after the disk space accounting rewrite (moving it to a new btree) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: move: move_stats refactoringGravatar Kent Overstreet 1-3/+0
data_progress_list is gone - it was redundant with moving_context_list The upcoming rebalance rewrite is going to have it using two different move_stats objects with the same moving_context, depending on whether it's scanning or using the rebalance_work btree - this patch plumbs stats around a bit differently so that will work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: Split out disk_groups_types.hGravatar Kent Overstreet 1-0/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: Don't run bch2_delete_dead_snapshots() unnecessarilyGravatar Kent Overstreet 1-1/+1
Be a bit more careful about when bch2_delete_dead_snapshots needs to run: it only needs to run synchronously if we're running fsck, and it only needs to run at all if we have snapshot nodes to delete or if fsck has noticed that it needs to run. Also: Rename BCH_FS_HAVE_DELETED_SNAPSHOTS -> BCH_FS_NEED_DELETE_DEAD_SNAPSHOTS Kill bch2_delete_dead_snapshots_hook(), move functionality to bch2_mark_snapshot() Factor out bch2_check_snapshot_needs_deletion(), to explicitly check if we need to be running snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: snapshot_create_lockGravatar Kent Overstreet 1-0/+1
Add a new lock for snapshot creation - this addresses a few races with logged operations and snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch_err_msg(), bch_err_fn() now filters out transaction restart errorsGravatar Kent Overstreet 1-2/+10
These errors aren't actual errors, and should never be printed - do this in the common helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Heap allocate btree_transGravatar Kent Overstreet 1-4/+4
We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix W=12 build errorsGravatar Kent Overstreet 1-1/+1
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: BTREE_ID_logged_opsGravatar Kent Overstreet 1-0/+1
Add a new btree for long running logged operations - i.e. for logging operations that we can't do within a single btree transaction, so that they can be resumed if we crash. Keys in the logged operations btree will represent operations in progress, with the state of the operation stored in the value. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add logging to bch2_inode_peek() & relatedGravatar Kent Overstreet 1-2/+2
Add error messages when we fail to lookup an inode, and also add a few missing bch2_err_class() calls. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Ensure topology repair runsGravatar Kent Overstreet 1-0/+1
This fixes should_restart_for_topology_repair() - previously it was returning false if the btree io path had already seleceted topology repair to run, even if it hadn't run yet. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Log a message when running an explicit recovery passGravatar Kent Overstreet 1-16/+0
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bcachefs_metadata_version_deleted_inodesGravatar Kent Overstreet 1-0/+1
Add a new bitset btree for inodes pending deletion; this means we no longer have to scan the full inodes btree after an unclean shutdown. Specifically, this adds: - a trigger to update the deleted_inodes btree based on changes to the inodes btree - a new recovery pass - and check_inodes is now only a fsck pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: recovery_types.hGravatar Kent Overstreet 1-42/+1
Move some code out of bcachefs.h, which is too much of an everything header. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make topology repair a normal recovery passGravatar Kent Overstreet 1-5/+8
This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and explicitly running a specific recovery pass - this is a more general replacement for how we were running topology repair before. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_run_explicit_recovery_pass()Gravatar Kent Overstreet 1-1/+14
This introduces bch2_run_explicit_recovery_pass() and uses it for when fsck detects that we need to re-run dead snaphots cleanup, and makes dead snapshot cleanup more like a normal recovery pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Upgrade path fixesGravatar Kent Overstreet 1-1/+1
Some minor fixes to not print errors that are actually due to a verson upgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert snapshot table to RCU arrayGravatar Kent Overstreet 1-2/+3
This switches the generic radix tree for the in-memory table of snapshot nodes to a simple rcu array. This means we have to add new locking to deal with reallocations, but is faster than traversing the radix tree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Version table now lists required recovery passesGravatar Kent Overstreet 1-11/+6
Now that we've got forward compatibility sorted out, we should be doing more frequent version upgrades in the future. To avoid having to run a full fsck for every version upgrade, this improves the BCH_METADATA_VERSIONS() table to explicitly specify a bitmask of recovery passes to run when upgrading to or past a given version. This means we can also delete PASS_UPGRADE(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bcachefs_metadata_version_major_minorGravatar Kent Overstreet 1-1/+0
This introduces major/minor versioning to the superblock version number. Major version number changes indicate incompatible releases; we can move forward to a new major version number, but not backwards. Minor version numbers indicate compatible changes - these add features, but can still be mounted and used by old versions. With the recent patches that make it possible to roll out new btrees and key types without breaking compatibility, we should be able to roll out most new features without incompatible changes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Enumerate recovery passesGravatar Kent Overstreet 1-5/+44
Recovery and fsck have many different passes/jobs to do, which always run in the same order - but not all of them run all the time. Some are for fsck, some for unclean shutdown, some for version upgrades. This adds some new structure: a defined list of recovery passes that we can run in a loop, as well as consolidating the log messages. The main benefit is consolidating the "should run this recovery pass" logic, as well as cleaning up the "this recovery pass has finished" state; instead of having a bunch of ad-hoc state bits in c->flags, we've now got c->curr_recovery_pass. By consolidating the "should run this recovery pass" logic, in the future on disk format upgrades will be able to say "upgrading to this version requires x passes to run", instead of forcing all of fsck to run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Stash journal replay params in bch_fsGravatar Kent Overstreet 1-0/+3
For the upcoming enumeration of recovery passes, we need all recovery passes to be called the same way - including journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: version_upgrade is now an enumGravatar Kent Overstreet 1-0/+1
The version_upgrade parameter is now an enum, not a bool, and it's persistent in the superblock: - compatible (default): upgrade to the latest compatible version - incompatible: upgrade to latest incompatible version - none Currently all upgrades are incompatible upgrades, but the next release will introduce major:minor versions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>