aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-12-15btrfs: migrate extent_buffer::pages[] to folioGravatar Qu Wenruo 7-77/+104
For now extent_buffer::pages[] are still only accepting single page pointer, thus we can migrate to folios pretty easily. As for single page, page and folio are 1:1 mapped, including their page flags. This patch would just do the conversion from struct page to struct folio, providing the first step to higher order folio in the future. This conversion is pretty simple: - extent_buffer::pages[] -> extent_buffer::folios[] - page_address(eb->pages[i]) -> folio_address(eb->pages[i]) - eb->pages[i] -> folio_page(eb->folios[i], 0) There would be more specific cleanups preparing for the incoming higher order folio support. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: refactor alloc_extent_buffer() to allocate-then-attach methodGravatar Qu Wenruo 6-46/+123
Currently alloc_extent_buffer() utilizes find_or_create_page() to allocate one page a time for an extent buffer. This method has the following disadvantages: - find_or_create_page() is the legacy way of allocating new pages With the new folio infrastructure, find_or_create_page() is just redirected to filemap_get_folio(). - Lacks the way to support higher order (order >= 1) folios As we can not yet let filemap give us a higher order folio. This patch would change the workflow by the following way: Old | new -----------------------------------+------------------------------------- | ret = btrfs_alloc_page_array(); for (i = 0; i < num_pages; i++) { | for (i = 0; i < num_pages; i++) { p = find_or_create_page(); | ret = filemap_add_folio(); /* Attach page private */ | /* Reuse page cache if needed */ /* Reused eb if needed */ | | /* Attach page private and | reuse eb if needed */ | } By this we split the page allocation and private attaching into two parts, allowing future updates to each part more easily, and migrate to folio interfaces (especially for possible higher order folios). Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: sysfs: validate scrub_speed_max valueGravatar David Disseldorp 1-0/+4
The value set as scrub_speed_max accepts size with suffixes (k/m/g/t/p/e) but we should still validate it for trailing characters, similar to what we do with chunk_size_store. CC: stable@vger.kernel.org # 5.15+ Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: switch btrfs_root::delayed_nodes_tree to xarray from radix-treeGravatar David Sterba 4-34/+41
The radix-tree has been superseded by the xarray (https://lwn.net/Articles/745073), this patch converts the btrfs_root::delayed_nodes, the APIs are used in a simple way. First idea is to do xa_insert() but this would require GFP_ATOMIC allocation which we want to avoid if possible. The preload mechanism of radix-tree can be emulated within the xarray API. - xa_reserve() with GFP_NOFS outside of the lock, the reserved entry is inserted atomically at most once - xa_store() under a lock, in case something races in we can detect that and xa_load() returns a valid pointer All uses of xa_load() must check for a valid pointer in case they manage to get between the xa_reserve() and xa_store(), this is handled in btrfs_get_delayed_node(). Otherwise the functionality is equivalent, xarray implements the radix-tree and there should be no performance difference. The patch continues the efforts started in 253bf57555e451 ("btrfs: turn delayed_nodes_tree into an XArray") and fixes the problems with locking and GFP flags 088aea3b97e0ae ("Revert "btrfs: turn delayed_nodes_tree into an XArray""). Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: fix typos found by codespellGravatar David Sterba 9-12/+12
Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: fix mismatching parameter names for btrfs_get_extent()Gravatar Qu Wenruo 1-1/+1
The definition for btrfs_get_extent() is using "u64 end" as the last parameter, but in implementation we go "u64 len", and all call sites follows the implementation. This can be very confusing during development, as most developers including me, would just use the snippet returned by LSP (clangd in my case), which would only check the definition. Unfortunately this mismatch is introduced from the very beginning of btrfs. Fix it to prevent further confusion. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: use the flags of an extent map to identify the compression typeGravatar Filipe Manana 13-131/+158
Currently, in struct extent_map, we use an unsigned int (32 bits) to identify the compression type of an extent and an unsigned long (64 bits on a 64 bits platform, 32 bits otherwise) for flags. We are only using 6 different flags, so an unsigned long is excessive and we can use flags to identify the compression type instead of using a dedicated 32 bits field. We can easily have tens or hundreds of thousands (or more) of extent maps on busy and large filesystems, specially with compression enabled or many or large files with tons of small extents. So it's convenient to have the extent_map structure as small as possible in order to use less memory. So remove the compression type field from struct extent_map, use flags to identify the compression type and shorten the flags field from an unsigned long to a u32. This saves 8 bytes (on 64 bits platforms) and reduces the size of the structure from 136 bytes down to 128 bytes, using now only two cache lines, and increases the number of extent maps we can have per 4K page from 30 to 32. By using a u32 for the flags instead of an unsigned long, we no longer use test_bit(), set_bit() and clear_bit(), but that level of atomicity is not needed as most flags are never cleared once set (before adding an extent map to the tree), and the ones that can be cleared or set after an extent map is added to the tree, are always performed while holding the write lock on the extent map tree, while the reader holds a lock on the tree or tests for a flag that never changes once the extent map is in the tree (such as compression flags). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: refactor mergable_maps() for more readabilityGravatar Filipe Manana 1-14/+14
At mergable_maps() instead of having a single if statement with many ORed and ANDed conditions, refactor it with multiple if statements that check a single condition and return immediately once a requirement fails. This makes it easier to read. Also change the return type from int to bool, make the arguments const and rename the function from mergable_maps() to mergeable_maps(). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: make extent_map_end() argument constGravatar Filipe Manana 1-1/+1
The extent map pointer argument for extent_map_end() can be const as we are not modifyng anything in the extent map. So make it const, as it will allow further changes to callers that have a const extent map pointer. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: avoid useless rbtree iterations when attempting to merge extent mapGravatar Filipe Manana 1-17/+21
When trying to merge an extent map that was just inserted or unpinned, we will try to merge it with any adjacent extent map that is suitable. However we will only check if our extent map is mergeable after searching for the previous and next extent maps in the rbtree, meaning that we are doing unnecessary calls to rb_prev() and rb_next() in case our extent map is not mergeable (it's compressed, in the list of modifed extents, being logged or pinned), wasting CPU time chasing rbtree pointers and pulling in unnecessary cache lines. So change the logic to check first if an extent map is mergeable before searching for the next and previous extent maps in the rbtree. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: log messages at unpin_extent_range() during unexpected casesGravatar Filipe Manana 3-8/+18
At unpin_extent_range() we trigger a WARN_ON() when we don't find an extent map or we find one with a start offset not matching the start offset of the target range. This however isn't very useful for debugging because: 1) We don't know which condition was triggered, as they are both in the same WARN_ON() call; 2) We don't know which inode was affected, from which root, for which range, what's the start offset of the extent map, and so on. So trigger a separate warning for each case and log a message for each case providing information about the inode, its root, the target range, the generation and the start offset of the extent map we found. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: remove redundant value assignment at btrfs_add_extent_mapping()Gravatar Filipe Manana 1-2/+0
At btrfs_add_extent_mapping(), in case add_extent_mapping() returned -EEXIST, it's pointless to assign 0 to 'ret' since we will assign a value to it shortly after, without 'ret' being used before that. So remove that pointless assignment. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: unexport add_extent_mapping()Gravatar Filipe Manana 3-26/+25
There's no need to export add_extent_mapping(), as it's only used inside extent_map.c and in the self tests. For the tests we can use instead btrfs_add_extent_mapping(), which will accomplish exactly the same as we don't expect collisions in any of them. So unexport it and make the tests use btrfs_add_extent_mapping() instead of add_extent_mapping(). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: tests: print all values as decimal in messages for extent map testsGravatar Filipe Manana 1-7/+7
Some error messages of the extent map tests print decimal values of start offsets and lengths, while other are oddly printing in hexadecimal, which is far less human friendly, specially taking into consideration that all the values are small and multiples of 4K, so it's a lot easier to read them as decimal values. Change the format specifiers to print as decimal instead. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: tests: do not ignore NULL extent maps for extent maps testsGravatar Filipe Manana 1-10/+30
Several of the extent map tests call btrfs_add_extent_mapping() which is supposed to succeed and return an extent map through the pointer to pointer argument. However the tests are deliberately ignoring a NULL extent map, which is not expected to happen. So change the tests to error out if a NULL extent map is found. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: tests: fix error messages for test case 4 of extent map testsGravatar Filipe Manana 1-2/+2
In test case 4 for extent maps, if we error out we are supposed to print in interval but instead of printing a non-inclusive end offset, we are printing the length of the interval, which makes it confusing. So fix that to print the exclusive end offset instead. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: assert extent map is not in a list when setting it upGravatar Filipe Manana 1-1/+3
When setting up a new extent map, at setup_extent_mapping(), we're doing a list move operation to add the extent map the tree's list of modified extents. This is confusing because at this point the extent map can not be in any list, because it's a new extent map. So replace the list move with a list add and add an assertion that checks that the extent map is not currently in any list. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: allocate btrfs_inode::file_extent_tree only without NO_HOLESGravatar David Sterba 4-10/+29
The file_extent_tree was added in 41a2ee75aab0 ("btrfs: introduce per-inode file extent tree") so we have an explicit mapping of the file extents to know where it is safe to update i_size. When the feature NO_HOLES is enabled, and it's been a mkfs default since 5.15, the tree is not necessary. To save some space in the inode, allocate the tree only when necessary. This reduces size by 16 bytes from 1096 to 1080 on a x86_64 release config. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: cache that we don't have security.capability setGravatar Josef Bacik 3-2/+62
When profiling a workload I noticed we were constantly calling getxattr. These were mostly coming from __remove_privs, which will lookup if security.capability exists to remove it. However instrumenting getxattr showed we get called nearly constantly on an idle machine on a lot of accesses. These are wasteful and not free. Other security LSMs have a way to cache their results, but capability doesn't have this, so it's asking us all the time for the xattr. Fix this by setting a flag in our inode that it doesn't have a security.capability xattr. We set this on new inodes and after a failed lookup of security.capability. If we set this xattr at all we'll clear the flag. I haven't found a test in fsperf that this makes a visible difference on, but I assume fs_mark related tests would show it clearly. This is a perf report output of the smallfiles100k run where it shows 20% of our time spent in __remove_privs because we're looking up the non-existent xattr. --21.86%--btrfs_write_check.constprop.0 --21.62%--__file_remove_privs --21.55%--security_inode_need_killpriv --21.54%--cap_inode_need_killpriv --21.53%--__vfs_getxattr --20.89%--btrfs_getxattr Obviously this is just CPU time in a mostly IO bound test, so the actual effect of removing this callchain is minimal. However in just normal testing of an idle system tracing showed around 100 getxattr calls per minute, and with this patch there are 0. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: remove code for inode_cache and recovery mount optionsGravatar Josef Bacik 1-35/+0
We've deprecated these a while ago in 5.11, go ahead and remove the code for them. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: set clear_cache if we use usebackuprootGravatar Josef Bacik 2-3/+9
We're currently setting this when we try to load the roots and we see that usebackuproot is set. Instead set this at mount option parsing time. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: move one shot mount option clearing to super.cGravatar Josef Bacik 3-16/+15
There's no reason this has to happen in open_ctree, and in fact in the old mount API we had to call this from remount. Move this to super.c, unexport it, and call it from both mount and reconfigure. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: remove old mount API codeGravatar Josef Bacik 3-1081/+13
Now that we've switched to the new mount API, remove the old stuff. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: move the device specific mount options to super.cGravatar Josef Bacik 2-23/+25
We add these mount options based on the fs_devices settings, which can be set once we've opened the fs_devices. Move these into their own helper and call it from get_tree_super. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: switch to the new mount APIGravatar Josef Bacik 3-41/+60
Now that we have all of the parts in place to use the new mount API, switch our fs_type to use the new callbacks. There are a few things that have to be done at the same time because of the order of operations changes that come along with the new mount API. These must be done in the same patch otherwise things will go wrong. 1. Export and use btrfs_check_options in open_ctree(). This is because the options are done ahead of time, and we need to check them once we have the feature flags loaded. 2. Update the free space cache settings. Since we're coming in with the options already set we need to make sure we don't undo what the user has asked for. 3. Set our sb_flags at init_fs_context time, the fs_context stuff is trying to manage the sb_flagss itself, so move that into init_fs_context and out of the fill super part. Additionally I've marked the unused functions with __maybe_unused and will remove them in a future patch. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: handle the ro->rw transition for mounting different subvolumesGravatar Josef Bacik 1-1/+128
This is a special case that we've carried around since 0723a0473fb4 ("btrfs: allow mounting btrfs subvolumes with different ro/rw options") where we'll under the covers flip the file system to RW if you're mixing and matching ro/rw options with different subvol mounts. The first mount is what the super gets setup as, so we'd handle this by remount the super as rw under the covers to facilitate this behavior. With the new mount API we can't really allow this, because user space has the ability to specify the super block settings, and the mount settings. So if the user explicitly sets the super block as read only, and then tried to mount a rw mount with the super block we'll reject this. However the old API was less descriptive and thus we allowed this kind of behavior. This patch preserves this behavior for the old API calls. This is inspired by Christians work [1], and includes his comment in btrfs_get_tree_super() explaining the history and how it all works in the old and new APIs. Link: https://lore.kernel.org/all/20230626-fs-btrfs-mount-api-v1-2-045e9735a00b@kernel.org/ Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: add get_tree callback for new mount APIGravatar Josef Bacik 1-4/+204
This is the actual mounting callback for the new mount API. Implement this using our current fill super as a guideline, making the appropriate adjustments for the new mount API. Our old mount operation had two fs_types, one to handle the actual opening, and the one that we called to handle the actual opening and then did the subvol lookup for returning the actual root dentry. This is mirrored here, but simply with different behaviors for ->get_tree. We use the existence of ->s_fs_info to tell which part we're in. The initial call allocates the fs_info, then call mount_fc() with a duplicated fc to do the actual open_ctree part. Then we take that vfsmount and use it to look up our subvolume that we're mounting and return that as our s_root. This idea was taken from Christians attempt to convert us to the new mount API [1]. In btrfs_get_tree_super() the mount device is scanned and opened in one go under uuid_mutex we expect that all related devices have been already scanned, either by mount or from the outside. A device forget can be called on some of the devices as the whole context is not protected but it's an unlikely event, though it's a minor behaviour change. References: https://lore.kernel.org/all/20230626-fs-btrfs-mount-api-v1-2-045e9735a00b@kernel.org/ Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add note about device scanning ] Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: add reconfigure callback for fs_contextGravatar Josef Bacik 3-29/+197
This is what is used to remount the file system with the new mount API. Because the mount options are parsed separately and one at a time I've added a helper to emit the mount options after the fact once the mount is configured, this matches the dmesg output for what happens with the old mount API. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: add fs context handling functionsGravatar Josef Bacik 1-1/+35
We are going to use the fs context to hold the mount options, so allocate the btrfs_fs_context when we're asked to init the fs context, and free it in the free callback. Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: add parse_param callback for the new mount APIGravatar Josef Bacik 1-0/+380
The parse_param callback handles one parameter at a time, take our existing mount option parsing loop and adjust it to handle one parameter at a time, and tie it into the fs_context_operations. Create a btrfs_fs_context object that will store the various mount properties, we'll house this in fc->fs_private. This is necessary to separate because remounting will use ->reconfigure, and we'll get a new copy of the parsed parameters, so we can no longer directly mess with the fs_info in this stage. In the future we'll add this to the btrfs_fs_info and update the users to use the new context object instead. There's a change how the option device= is processed. Previously all mount options were parsed in one go under uuid_mutex and the devices opened. This prevented a concurrent scan to happen during mount. Now we could see a device scan happen (e.g. by udev) but this should not affect the end result, mount will either see the populated fs_devices or will scan the device by itself. Alternatively we could save all the device paths first and then process them in one go as before but this does not seem to be necessary. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add note about device scanning ] Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: add fs_parameter definitionsGravatar Josef Bacik 1-1/+125
In order to convert to the new mount API we have to change how we do the mount option parsing. For now we're going to duplicate these helpers to make it easier to follow, and then remove the old code once everything is in place. This patch contains the re-definition of all of our mount options into the new fs_parameter_spec format. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: add a NOSPACECACHE mount option flagGravatar Josef Bacik 2-0/+2
With the old mount API we'd pre-populate the mount options with the space cache settings of the file system, and then the user toggled them on or off with the mount options. When we switch to the new mount API the mount options will be set before we get into opening the file system, so we need a flag to indicate that the user explicitly asked for -o nospace_cache so we can make the appropriate changes after the fact. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: split out ro->rw and rw->ro helpers into their own functionsGravatar Josef Bacik 1-113/+116
When we remount ro->rw or rw->ro we have some cleanup tasks that have to be managed. Split these out into their own function to make btrfs_remount smaller. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: do not allow free space tree rebuild on extent tree v2Gravatar Josef Bacik 1-1/+5
We currently don't allow these options to be set if we're extent tree v2 via the mount option parsing. However when we switch to the new mount API we'll no longer have the super block loaded, so won't be able to make this distinction at mount option parsing time. Address this by checking for extent tree v2 at the point where we make the decision to rebuild the free space tree. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: move space cache settings into open_ctreeGravatar Josef Bacik 3-24/+50
Currently we pre-load the space cache settings in btrfs_parse_options, however when we switch to the new mount API the mount option parsing will happen before we have the super block loaded. Add a helper to set the appropriate options based on the fs settings, this will allow us to have consistent free space cache settings. This also folds in the space cache related decisions we make for subpage sectorsize support, so all of this is done in one place. Since this was being called by parse options it looks like we're changing the behavior of remount, but in fact we aren't. The pre-loading of the free space cache settings is done because we want to handle the case of users not using any space_cache options, we'll derive the appropriate mount option based on the on disk state. On remount this wouldn't reset anything as we'll have cleared the v1 cache generation if we mounted -o nospace_cache. Similarly it's impossible to turn off the free space tree without specifically saying -o nospace_cache,clear_cache, which will delete the free space tree and clear the compat_ro option. Again in this case calling this code in remount wouldn't result in any change. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: set default compress type at btrfs_init_fs_info timeGravatar Josef Bacik 1-7/+3
With the new mount API we'll be setting our compression well before we call open_ctree. We don't want to overwrite our settings, so set the default in btrfs_init_fs_info instead of open_ctree. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: split out the mount option validation code into its own helperGravatar Josef Bacik 1-29/+37
We're going to need to validate mount options after they're all parsed with the new mount API, split this code out into its own helper so we can use it when we swap over to the new mount API. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor adjustments in the messages ] Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15fs: indicate request originates from old mount APIGravatar Christian Brauner 1-0/+11
We already communicate to filesystems when a remount request comes from the old mount API as some filesystems choose to implement different behavior in the new mount API than the old mount API to e.g., take the chance to fix significant API bugs. Allow the same for regular mount requests. Fixes: b330966f79fb ("fuse: reject options on reconfigure via fsconfig(2)") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: remove no longer used EXTENT_MAP_DELALLOC block start valueGravatar Filipe Manana 4-9/+2
After commit ac3c0d36a2a2 ("btrfs: make fiemap more efficient and accurate reporting extent sharedness") we no longer need to create special extent maps during fiemap that have a block start with the EXTENT_MAP_DELALLOC value. So this block start value for extent maps is no longer used since then, therefore remove it. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: allow extent buffer helpers to skip cross-page handlingGravatar Qu Wenruo 3-3/+75
Currently btrfs extent buffer helpers are doing all the cross-page handling, as there is no guarantee that all those eb pages are contiguous. However on systems with enough memory, there is a very high chance the page cache for btree_inode are allocated with physically contiguous pages. In that case, we can skip all the complex cross-page handling, thus speeding up the code. This patch adds a new member, extent_buffer::addr, which is only set to non-NULL if all the extent buffer pages are physically contiguous. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: reflow btrfs_free_tree_blockGravatar Johannes Thumshirn 1-49/+50
Reflow btrfs_free_tree_block() so that there is one level of indentation needed. This patch has no functional changes. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: use memset_page instead of opencoding itGravatar Johannes Thumshirn 1-1/+1
Use memset_page() in memset_extent_buffer() instead of opencoding it. This does not not change any functionality. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: remove now unneeded btrfs_redirty_list_addGravatar Johannes Thumshirn 4-27/+1
Now that we're not clearing the dirty flag off of extent_buffers in zoned mode, all that is left of btrfs_redirty_list_add() is a memzero() and some ASSERT()ions. As we're also memzero()ing the buffer on write-out btrfs_redirty_list_add() has become obsolete and can be removed. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: zoned: don't clear dirty flag of extent bufferGravatar Johannes Thumshirn 3-4/+22
One a zoned filesystem, never clear the dirty flag of an extent buffer, but instead mark it as zeroout. On writeout, when encountering a marked extent_buffer, zero it out. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: rename EXTENT_BUFFER_NO_CHECK to EXTENT_BUFFER_ZONED_ZEROOUTGravatar Johannes Thumshirn 5-5/+6
EXTENT_BUFFER_ZONED_ZEROOUT better describes the state of the extent buffer, namely it is written as all zeros. This is needed in zoned mode, to preserve I/O ordering. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: always set extent_io_tree::inode and drop fs_infoGravatar David Sterba 5-56/+94
The extent_io_tree is embedded in several structures, notably in struct btrfs_inode. The fs_info is only used for reporting errors and for reference in trace points. We can get to the pointer through the inode, but not all io trees set it. However, we always know the owner and can recognize if inode is valid. For access helpers are provided, const variant for the trace points. This reduces size of extent_io_tree by 8 bytes and following structures in turn: - btrfs_inode 1104 -> 1088 - btrfs_device 520 -> 512 - btrfs_root 1360 -> 1344 - btrfs_transaction 456 -> 440 - btrfs_fs_info 3600 -> 3592 - reloc_control 1520 -> 1512 Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: enhance extent_io_tree error reportsGravatar David Sterba 1-10/+14
Pass the type of the extent io tree operation which failed in the report helper. The message wording and contents is updated, though locking might be the cause of the error it's probably not the only one and we're interested in the state. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: constify fs_info parameter in __btrfs_panic()Gravatar David Sterba 2-2/+2
The printk helpers take const fs_info if it's used just for the identifier in the messages, __btrfs_panic() lacks that. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: drop error message in extent_io_tree insert_state()Gravatar David Sterba 1-3/+0
The helper insert_state errors are handled in all callers and reported by extent_io_tree_panic so we don't need to do it twice. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: move lockdep class setting out of extent_io_tree_initGravatar David Sterba 2-10/+11
The per-inode file extent tree was added in 41a2ee75aab0 ("btrfs: introduce per-inode file extent tree"), it's the only tree type that requires the lockdep class. Move it to the file where it is actually used. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>