aboutsummaryrefslogtreecommitdiff
path: root/fs/btrfs/disk-io.h
AgeCommit message (Collapse)AuthorFilesLines
2024-02-29btrfs: fix double free of anonymous device after snapshot creation failureGravatar Filipe Manana 1-1/+1
When creating a snapshot we may do a double free of an anonymous device in case there's an error committing the transaction. The second free may result in freeing an anonymous device number that was allocated by some other subsystem in the kernel or another btrfs filesystem. The steps that lead to this: 1) At ioctl.c:create_snapshot() we allocate an anonymous device number and assign it to pending_snapshot->anon_dev; 2) Then we call btrfs_commit_transaction() and end up at transaction.c:create_pending_snapshot(); 3) There we call btrfs_get_new_fs_root() and pass it the anonymous device number stored in pending_snapshot->anon_dev; 4) btrfs_get_new_fs_root() frees that anonymous device number because btrfs_lookup_fs_root() returned a root - someone else did a lookup of the new root already, which could some task doing backref walking; 5) After that some error happens in the transaction commit path, and at ioctl.c:create_snapshot() we jump to the 'fail' label, and after that we free again the same anonymous device number, which in the meanwhile may have been reallocated somewhere else, because pending_snapshot->anon_dev still has the same value as in step 1. Recently syzbot ran into this and reported the following trace: ------------[ cut here ]------------ ida_free called for id=51 which is not allocated. WARNING: CPU: 1 PID: 31038 at lib/idr.c:525 ida_free+0x370/0x420 lib/idr.c:525 Modules linked in: CPU: 1 PID: 31038 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00410-gc02197fc9076 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 RIP: 0010:ida_free+0x370/0x420 lib/idr.c:525 Code: 10 42 80 3c 28 (...) RSP: 0018:ffffc90015a67300 EFLAGS: 00010246 RAX: be5130472f5dd000 RBX: 0000000000000033 RCX: 0000000000040000 RDX: ffffc90009a7a000 RSI: 000000000003ffff RDI: 0000000000040000 RBP: ffffc90015a673f0 R08: ffffffff81577992 R09: 1ffff92002b4cdb4 R10: dffffc0000000000 R11: fffff52002b4cdb5 R12: 0000000000000246 R13: dffffc0000000000 R14: ffffffff8e256b80 R15: 0000000000000246 FS: 00007fca3f4b46c0(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f167a17b978 CR3: 000000001ed26000 CR4: 0000000000350ef0 Call Trace: <TASK> btrfs_get_root_ref+0xa48/0xaf0 fs/btrfs/disk-io.c:1346 create_pending_snapshot+0xff2/0x2bc0 fs/btrfs/transaction.c:1837 create_pending_snapshots+0x195/0x1d0 fs/btrfs/transaction.c:1931 btrfs_commit_transaction+0xf1c/0x3740 fs/btrfs/transaction.c:2404 create_snapshot+0x507/0x880 fs/btrfs/ioctl.c:848 btrfs_mksubvol+0x5d0/0x750 fs/btrfs/ioctl.c:998 btrfs_mksnapshot+0xb5/0xf0 fs/btrfs/ioctl.c:1044 __btrfs_ioctl_snap_create+0x387/0x4b0 fs/btrfs/ioctl.c:1306 btrfs_ioctl_snap_create_v2+0x1ca/0x400 fs/btrfs/ioctl.c:1393 btrfs_ioctl+0xa74/0xd40 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl+0xfe/0x170 fs/ioctl.c:857 do_syscall_64+0xfb/0x240 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7fca3e67dda9 Code: 28 00 00 00 (...) RSP: 002b:00007fca3f4b40c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fca3e7abf80 RCX: 00007fca3e67dda9 RDX: 00000000200005c0 RSI: 0000000050009417 RDI: 0000000000000003 RBP: 00007fca3e6ca47a R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007fca3e7abf80 R15: 00007fff6bf95658 </TASK> Where we get an explicit message where we attempt to free an anonymous device number that is not currently allocated. It happens in a different code path from the example below, at btrfs_get_root_ref(), so this change may not fix the case triggered by syzbot. To fix at least the code path from the example above, change btrfs_get_root_ref() and its callers to receive a dev_t pointer argument for the anonymous device number, so that in case it frees the number, it also resets it to 0, so that up in the call chain we don't attempt to do the double free. CC: stable@vger.kernel.org # 5.10+ Link: https://lore.kernel.org/linux-btrfs/000000000000f673a1061202f630@google.com/ Fixes: e03ee2fe873e ("btrfs: do not ASSERT() if the newly created subvolume already got read") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: move one shot mount option clearing to super.cGravatar Josef Bacik 1-1/+0
There's no reason this has to happen in open_ctree, and in fact in the old mount API we had to call this from remount. Move this to super.c, unexport it, and call it from both mount and reconfigure. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-12-15btrfs: remove duplicate btrfs_clear_buffer_dirty() prototype from disk-io.hGravatar Filipe Manana 1-2/+0
The prototype for btrfs_clear_buffer_dirty() is declared in both disk-io.h and extent_io.h, but the function is defined at extent_io.c. So remove the prototype declaration from disk-io.h. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-12btrfs: abort transaction on generation mismatch when marking eb as dirtyGravatar Filipe Manana 1-1/+2
When marking an extent buffer as dirty, at btrfs_mark_buffer_dirty(), we check if its generation matches the running transaction and if not we just print a warning. Such mismatch is an indicator that something really went wrong and only printing a warning message (and stack trace) is not enough to prevent a corruption. Allowing a transaction to commit with such an extent buffer will trigger an error if we ever try to read it from disk due to a generation mismatch with its parent generation. So abort the current transaction with -EUCLEAN if we notice a generation mismatch. For this we need to pass a transaction handle to btrfs_mark_buffer_dirty() which is always available except in test code, in which case we can pass NULL since it operates on dummy extent buffers and all test roots have a single node/leaf (root node at level 0). Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-21btrfs: make btrfs_cleanup_fs_roots() staticGravatar Filipe Manana 1-1/+0
btrfs_cleanup_fs_roots() is not used outside disk-io.c, so make it static, remove its prototype from disk-io.h and move its definition above the where it's used in disk-io.c Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: use a separate end_io handler for read_extent_bufferGravatar Christoph Hellwig 1-3/+2
Now that we always use a single bio to read an extent_buffer, the buffer can be passed to the end_io handler as private data. This allows implementing a much simplified dedicated end I/O handler for metadata reads. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-19btrfs: move btrfs_verify_level_key into tree-checker.cGravatar Josef Bacik 1-2/+0
This is more a buffer validation helper, move it into the tree-checker files where it makes more sense. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: rename btrfs_clean_tree_block to btrfs_clear_buffer_dirtyGravatar Josef Bacik 1-2/+2
btrfs_clean_tree_block is a misnomer, it's just clear_extent_buffer_dirty with some extra accounting around it. Rename this to btrfs_clear_buffer_dirty to make it more clear it belongs with it's setter, btrfs_mark_buffer_dirty. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: add trans argument to btrfs_clean_tree_blockGravatar Josef Bacik 1-1/+2
We check the header generation in the extent buffer against the current running transaction id to see if it's safe to clear DIRTY on this buffer. Generally speaking if we're clearing the buffer dirty we're holding the transaction open, but in the case of cleaning up an aborted transaction we don't, so we have extra checks in that path to check the transid. To allow for a future cleanup go ahead and pass in the trans handle so we don't have to rely on ->running_transaction being set. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: remove now spurious bio submission helpersGravatar Christoph Hellwig 1-1/+0
Call btrfs_submit_bio and btrfs_submit_compressed_read directly from submit_one_bio now that all additional functionality has moved into btrfs_submit_bio. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: support cloned bios in btree_csum_one_bioGravatar Christoph Hellwig 1-1/+1
To allow splitting bios in btrfs_submit_bio, btree_csum_one_bio needs to be able to handle cloned bios. As btree_csum_one_bio is always called before handing the bio to the block layer that is trivially done by using bio_for_each_segment instead of bio_for_each_segment_all. Also switch the function to take a btrfs_bio and use that to derive the fs_info. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: handle checksum generation in the storage layerGravatar Christoph Hellwig 1-8/+1
Instead of letting the callers of btrfs_submit_bio deal with checksumming the (meta)data in the bio and making decisions on when to offload the checksumming to the bio, leave that to btrfs_submit_bio. Do do so the existing btrfs_submit_bio function is split into an upper and a lower half, so that the lower half can be offloaded to a workqueue. Note that this changes the behavior for direct writes to raid56 volumes so that async checksum offloading is not skipped when more I/O is expected. This runs counter to the argument explaining why it was done, although I can't measure any affects of the change. Commits later in this series will make sure the entire direct writes is offloaded to the workqueue at once and thus make sure it is sent to the raid56 code from a single thread. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-15btrfs: open code the submit_bio_start helpersGravatar Christoph Hellwig 1-1/+0
The submit helpers are now trivial and can be called directly. Note that btree_csum_one_bio has to be moved up in the file a bit to avoid a forward declaration. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-01-03btrfs: fix compat_ro checks against remountGravatar Qu Wenruo 1-1/+1
[BUG] Even with commit 81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling"), btrfs can still mount a fs with unsupported compat_ro flags read-only, then remount it RW: # btrfs ins dump-super /dev/loop0 | grep compat_ro_flags -A 3 compat_ro_flags 0x403 ( FREE_SPACE_TREE | FREE_SPACE_TREE_VALID | unknown flag: 0x400 ) # mount /dev/loop0 /mnt/btrfs mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error. dmesg(1) may have more information after failed mount system call. ^^^ RW mount failed as expected ^^^ # dmesg -t | tail -n5 loop0: detected capacity change from 0 to 1048576 BTRFS: device fsid cb5b82f5-0fdd-4d81-9b4b-78533c324afa devid 1 transid 7 /dev/loop0 scanned by mount (1146) BTRFS info (device loop0): using crc32c (crc32c-intel) checksum algorithm BTRFS info (device loop0): using free space tree BTRFS error (device loop0): cannot mount read-write because of unknown compat_ro features (0x403) BTRFS error (device loop0): open_ctree failed # mount /dev/loop0 -o ro /mnt/btrfs # mount -o remount,rw /mnt/btrfs ^^^ RW remount succeeded unexpectedly ^^^ [CAUSE] Currently we use btrfs_check_features() to check compat_ro flags against our current mount flags. That function get reused between open_ctree() and btrfs_remount(). But for btrfs_remount(), the super block we passed in still has the old mount flags, thus btrfs_check_features() still believes we're mounting read-only. [FIX] Replace the existing @sb argument with @is_rw_mount. As originally we only use @sb to determine if the mount is RW. Now it's callers' responsibility to determine if the mount is RW, and since there are only two callers, the check is pretty simple: - caller in open_ctree() Just pass !sb_rdonly(). - caller in btrfs_remount() Pass !(*flags & SB_RDONLY), as our check should be against the new flags. Now we can correctly reject the RW remount: # mount /dev/loop0 -o ro /mnt/btrfs # mount -o remount,rw /mnt/btrfs mount: /mnt/btrfs: mount point not mounted or bad option. dmesg(1) may have more information after failed mount system call. # dmesg -t | tail -n 1 BTRFS error (device loop0: state M): cannot mount read-write because of unknown compat_ro features (0x403) Reported-by: Chung-Chiang Cheng <shepjeng@gmail.com> Fixes: 81d5d61454c3 ("btrfs: enhance unsupported compat RO flags handling") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: move struct btrfs_tree_parent_check out of disk-io.hGravatar Christoph Hellwig 1-29/+1
Move struct btrfs_tree_parent_check out of disk-io.h so that volumes.h an various .c files don't have to include disk-io.h just for it. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> [ use tree-checker.h for the structure ] Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: concentrate all tree block parentness check parameters into one structureGravatar Qu Wenruo 1-4/+32
There are several different tree block parentness check parameters used across several helpers: - level Mandatory - transid Under most cases it's mandatory, but there are several backref cases which skips this check. - owner_root - first_key Utilized by most top-down tree search routine. Otherwise can be skipped. Those four members are not always mandatory checks, and some of them are the same u64, which means if some arguments got swapped compiler will not catch it. Furthermore if we're going to further expand the parentness check, we need to modify quite some helpers just to add one more parameter. This patch will concentrate all these members into a structure called btrfs_tree_parent_check, and pass that structure for the following helpers: - btrfs_read_extent_buffer() - read_tree_block() Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: pass btrfs_inode to btrfs_submit_metadata_bioGravatar David Sterba 1-1/+1
The function is for internal interfaces so we should use the btrfs_inode. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: pass btrfs_inode to btrfs_wq_submit_bioGravatar David Sterba 1-1/+1
The function is for internal interfaces so we should use the btrfs_inode. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: simplify btree_submit_bio_start and btrfs_submit_bio_start parametersGravatar David Sterba 1-2/+1
After previous patches the unused parameters can be removed from btree_submit_bio_start and btrfs_submit_bio_start as they don't need to conform to the extent_submit_bio_start_t typedef. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: change how submit bio callback is passed to btrfs_wq_submit_bioGravatar David Sterba 1-2/+10
There's a callback function parameter for btrfs_wq_submit_bio that can be one of: metadata, buffered data, direct io data. The callback abstraction is unnecessary as we have all functions available. Replace the parameter with a command that leads to a direct call in run_one_async_start. The called functions can be then simplified and we can also remove the extent_submit_bio_start_t typedef. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: remove unused function prototypesGravatar Josef Bacik 1-4/+0
I wrote the following coccinelle script to find function declarations that didn't have the corresponding code for them @funcproto@ identifier func; type T; position p0; @@ T func@p0(...); @funccode@ identifier funcproto.func; position p1; @@ func@p1(...) { ... } @script:python depends on !funccode@ p0 << funcproto.p0; @@ print("Proto with no function at %s:%s" % (p0[0].file, p0[0].line)) and ran it against btrfs, which identified the 4 function prototypes I've removed in this patch. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-05btrfs: move btrfs_get_block_group helper out of disk-io.hGravatar Josef Bacik 1-7/+1
This inline helper calls btrfs_fs_compat_ro(), which is defined in another header. To avoid weird header dependency problems move this helper into disk-io.c with the rest of the global root helpers. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-24btrfs: make thaw time super block check to also verify checksumGravatar Qu Wenruo 1-0/+2
Previous commit a05d3c915314 ("btrfs: check superblock to ensure the fs was not modified at thaw time") only checks the content of the super block, but it doesn't really check if the on-disk super block has a matching checksum. This patch will add the checksum verification to thaw time superblock verification. This involves the following extra changes: - Export btrfs_check_super_csum() As we need to call it in super.c. - Change the argument list of btrfs_check_super_csum() Instead of passing a char *, directly pass struct btrfs_super_block * pointer. - Verify that our checksum type didn't change before checking the checksum value, like it's done at mount time Fixes: a05d3c915314 ("btrfs: check superblock to ensure the fs was not modified at thaw time") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26btrfs: relax block-group-tree feature dependency checksGravatar Qu Wenruo 1-0/+1
[BUG] When one user did a wrong attempt to clear block group tree, which can not be done through mount option, by using "-o clear_cache,space_cache=v2", it will cause the following error on a fs with block-group-tree feature: BTRFS info (device dm-1): force clearing of disk cache BTRFS info (device dm-1): using free space tree BTRFS info (device dm-1): clearing free space tree BTRFS info (device dm-1): clearing compat-ro feature flag for FREE_SPACE_TREE (0x1) BTRFS info (device dm-1): clearing compat-ro feature flag for FREE_SPACE_TREE_VALID (0x2) BTRFS error (device dm-1): block-group-tree feature requires fres-space-tree and no-holes BTRFS error (device dm-1): super block corruption detected before writing it to disk BTRFS: error (device dm-1) in write_all_supers:4318: errno=-117 Filesystem corrupted (unexpected superblock corruption detected) BTRFS warning (device dm-1: state E): Skipping commit of aborted transaction. [CAUSE] Although the dependency for block-group-tree feature is just an artificial one (to reduce test matrix), we put the dependency check into btrfs_validate_super(). This is too strict, and during space cache clearing, we will have a window where free space tree is cleared, and we need to commit the super block. In that window, we had block group tree without v2 cache, and triggered the artificial dependency check. This is not necessary at all, especially for such a soft dependency. [FIX] Introduce a new helper, btrfs_check_features(), to do all the runtime limitation checks, including: - Unsupported incompat flags check - Unsupported compat RO flags check - Setting missing incompat flags - Artificial feature dependency checks Currently only block group tree will rely on this. - Subpage runtime check for v1 cache With this helper, we can move quite some checks from open_ctree()/btrfs_remount() into it, and just call it after btrfs_parse_options(). Now "-o clear_cache,space_cache=v2" will not trigger the above error anymore. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ edit messages ] Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26btrfs: separate BLOCK_GROUP_TREE compat RO flag from EXTENT_TREE_V2Gravatar Qu Wenruo 1-1/+1
The problem of long mount time caused by block group item search is already known for some time, and the solution of block group tree has been proposed. There is really no need to bound this feature into extent tree v2, just introduce compat RO flag, BLOCK_GROUP_TREE, to correctly solve the problem. All the code handling block group root is already in the upstream kernel, thus this patch really only needs to introduce the new compat RO flag. This patch introduces one extra artificial limitation on block group tree feature, that free space cache v2 and no-holes feature must be enabled to use this new compat RO feature. This artificial requirement is mostly to reduce the test combinations, and can be a guideline for future features, to mostly rely on the latest default features. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-26btrfs: check superblock to ensure the fs was not modified at thaw timeGravatar Qu Wenruo 1-1/+3
[BACKGROUND] There is an incident report that, one user hibernated the system, with one btrfs on removable device still mounted. Then by some incident, the btrfs got mounted and modified by another system/OS, then back to the hibernated system. After resuming from the hibernation, new write happened into the victim btrfs. Now the fs is completely broken, since the underlying btrfs is no longer the same one before the hibernation, and the user lost their data due to various transid mismatch. [REPRODUCER] We can emulate the situation using the following small script: truncate -s 1G $dev mkfs.btrfs -f $dev mount $dev $mnt fsstress -w -d $mnt -n 500 sync xfs_freeze -f $mnt cp $dev $dev.backup # There is no way to mount the same cloned fs on the same system, # as the conflicting fsid will be rejected by btrfs. # Thus here we have to wipe the fs using a different btrfs. mkfs.btrfs -f $dev.backup dd if=$dev.backup of=$dev bs=1M xfs_freeze -u $mnt fsstress -w -d $mnt -n 20 umount $mnt btrfs check $dev The final fsck will fail due to some tree blocks has incorrect fsid. This is enough to emulate the problem hit by the unfortunate user. [ENHANCEMENT] Although such case should not be that common, it can still happen from time to time. From the view of btrfs, we can detect any unexpected super block change, and if there is any unexpected change, we just mark the fs read-only, and thaw the fs. By this we can limit the damage to minimal, and I hope no one would lose their data by this anymore. Suggested-by: Goffredo Baroncelli <kreijack@libero.it> Link: https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@bluemole.com/ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-17btrfs: move lockdep class helpers to locking.cGravatar Josef Bacik 1-10/+0
These definitions exist in disk-io.c, which is not related to the locking. Move this over to locking.h/c where it makes more sense. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-07-25btrfs: handle allocation failure in btrfs_wq_submit_bio gracefullyGravatar Christoph Hellwig 1-3/+3
btrfs_wq_submit_bio is used for writeback under memory pressure. Instead of failing the I/O when we can't allocate the async_submit_bio, just punt back to the synchronous submission path. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2022-07-25btrfs: remove btrfs_end_io_wqGravatar Christoph Hellwig 1-10/+0
All reads bio that go through btrfs_map_bio need to be completed in user context. And read I/Os are the most common and timing critical in almost any file system workloads. Embed a work_struct into struct btrfs_bio and use it to complete all read bios submitted through btrfs_map, using the REQ_META flag to decide which workqueue they are placed on. This removes the need for a separate 128 byte allocation (typically rounded up to 192 bytes by slab) for all reads with a size increase of 24 bytes for struct btrfs_bio. Future patches will reorganize struct btrfs_bio to make use of this extra space for writes as well. (All sizes are based a on typical 64-bit non-debug build) Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2022-07-25btrfs: defer I/O completion based on the btrfs_raid_bioGravatar Christoph Hellwig 1-1/+0
Instead of attaching an extra allocation an indirect call to each low-level bio issued by the RAID code, add a work_struct to struct btrfs_raid_bio and only defer the per-rbio completion action. The per-bio action for all the I/Os are trivial and can be safely done from interrupt context. As a nice side effect this also allows sharing the boilerplate code for the per-bio completions Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16btrfs: remove unused parameter bio_flags from btrfs_wq_submit_bioGravatar David Sterba 1-2/+1
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16btrfs: do not return errors from btrfs_submit_metadata_bioGravatar Christoph Hellwig 1-2/+1
btrfs_submit_metadata_bio already calls ->bi_end_io on error and the caller must ignore the return value, so remove it. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16btrfs: remove unused bio_flags argument to btrfs_submit_metadata_bioGravatar Christoph Hellwig 1-1/+1
This argument is unused since commit 953651eb308f ("btrfs: factor out helper adding a page to bio") and commit 1b36294a6cd5 ("btrfs: call submit_bio_hook directly for metadata pages") reworked the way metadata bio submission is handled. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-16btrfs: remove trivial wrapper btrfs_read_buffer()Gravatar Filipe Manana 1-2/+2
The function btrfs_read_buffer() is useless, it just calls btree_read_extent_buffer_pages() with exactly the same arguments. So remove it and rename btree_read_extent_buffer_pages() to btrfs_read_extent_buffer(), which is a shorter name, has the "btrfs_" prefix (since it's used outside disk-io.c) and the name is clear enough about what it does. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14btrfs: add code to support the block group rootGravatar Josef Bacik 1-0/+2
This code adds the on disk structures for the block group root, which will hold the block group items for extent tree v2. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: track the csum, extent, and free space trees in a rb treeGravatar Josef Bacik 1-12/+6
In the future we are going to have multiple copies of these trees. To facilitate this we need a way to lookup the different roots we are looking for. Handle this by adding a global root rb tree that is indexed on the root->root_key. Then instead of loading the roots at mount time with individually targeted keys, simply search the tree_root for anything with the specific objectid we want. This will make it straightforward to support both old style and new style file systems. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: stop accessing ->csum_root directlyGravatar Josef Bacik 1-0/+6
We are going to have multiple csum roots in the future, so convert all users of ->csum_root to btrfs_csum_root() and rename ->csum_root to ->_csum_root so we can easily find remaining users in the future. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: stop accessing ->extent_root directlyGravatar Josef Bacik 1-1/+7
When we start having multiple extent roots we'll need to use a helper to get to the correct extent_root. Rename fs_info->extent_root to _extent_root and convert all of the users of the extent root to using the btrfs_extent_root() helper. This will allow us to easily clean up the remaining direct accesses in the future. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: add a btrfs_block_group_root() helperGravatar Josef Bacik 1-0/+5
With extent tree v2 we will have a separate root to hold the block group items. Add a btrfs_block_group_root() that will return the appropriate root given the flags of the fs, and convert all functions that need to modify block group items to use the helper. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: make btrfs_super_block size match BTRFS_SUPER_INFO_SIZEGravatar Qu Wenruo 1-3/+0
It's a common practice to avoid use sizeof(struct btrfs_super_block) (3531), but to use BTRFS_SUPER_INFO_SIZE (4096). The problem is that, sizeof(struct btrfs_super_block) doesn't match BTRFS_SUPER_INFO_SIZE from the very beginning. Furthermore, for all call sites except selftests, we always allocate BTRFS_SUPER_INFO_SIZE space for super block, there isn't any real reason to use the smaller value, and it doesn't really save any space. So let's get rid of such confusing behavior, and unify those two values. This modification also adds a new static_assert() to verify the size, and moves the BTRFS_SUPER_INFO_* macros to the definition of btrfs_super_block for the static_assert(). Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-26btrfs: rename struct btrfs_io_bio to btrfs_bioGravatar Qu Wenruo 1-1/+1
Previously we had "struct btrfs_bio", which records IO context for mirrored IO and RAID56, and "strcut btrfs_io_bio", which records extra btrfs specific info for logical bytenr bio. With "btrfs_bio" renamed to "btrfs_io_context", we are safe to rename "btrfs_io_bio" to "btrfs_bio" which is a more suitable name now. The struct btrfs_bio changes meaning by this commit. There was a suggested name like btrfs_logical_bio but it's a bit long and we'd prefer to use a shorter name. This could be a concern for backports to older kernels where the different meaning could possibly cause confusion or bugs. Comparing the new and old structures, there's no overlap among the struct members so a build would break in case of incorrect backport. We haven't had many backports to bio code anyway so this is more of a theoretical cause of bugs and a matter of precaution but we'll need to keep the semantic change in mind. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-09btrfs: split alloc_log_tree()Gravatar Naohiro Aota 1-0/+2
This is a preparation patch for the next patch. Split alloc_log_tree() into two parts. The first one allocating the tree structure, remains in alloc_log_tree() and the second part allocating the tree node, which is moved into btrfs_alloc_log_tree_node(). Also export the latter part is to be used in the next patch. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-08btrfs: rename btrfs_find_free_objectid to btrfs_get_free_objectidGravatar Nikolay Borisov 1-1/+1
This better reflects the semantics of the function i.e no search is performed whatsoever. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-02-08btrfs: rename btrfs_find_highest_objectid to btrfs_init_root_free_objectidGravatar Nikolay Borisov 1-1/+1
This function is used to initialize the in-memory btrfs_root::highest_objectid member, which is used to get an available objectid. Rename it to better reflect its semantics. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-09btrfs: rename bio_offset of extent_submit_bio_start_t to dio_file_offsetGravatar Qu Wenruo 1-1/+1
The parameter bio_offset of extent_submit_bio_start_t is very confusing. If it's really bio_offset (offset to bio), then it should be u32. But in fact, it's only utilized by dio read, and that member is used as file offset, which must be u64. Rename it to dio_file_offset since the only user uses it as file offset, and add comment for who is using it. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-09btrfs: clear oneshot options on mount and remountGravatar Boris Burkov 1-0/+1
Some options only apply during mount time and are cleared at the end of mount. For now, the example is USEBACKUPROOT, but CLEAR_CACHE also fits the bill, and this is a preparation patch for also clearing that option. One subtlety is that the current code only resets USEBACKUPROOT on rw mounts, but the option is meaningfully "consumed" by a ro mount, so it feels appropriate to clear in that case as well. A subsequent read-write remount would not go through open_ctree, which is the only place that checks the option, so the change should be benign. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-09btrfs: lift read-write mount setup from mount and remountGravatar Boris Burkov 1-0/+1
Mounting rw and remounting from ro to rw naturally share invariants and functionality which result in a correctly setup rw filesystem. Luckily, there is even a strong unity in the code which implements them. In mount's open_ctree, these operations mostly happen after an early return for ro file systems, and in remount, they happen in a section devoted to remounting ro->rw, after some remount specific validation passes. However, there are unfortunately a few differences. There are small deviations in the order of some of the operations, remount does not start orphan cleanup in root_tree or fs_tree, remount does not create the free space tree, and remount does not handle "one-shot" mount options like clear_cache and uuid tree rescan. Since we want to add building the free space tree to remount, and also to start the same orphan cleanup process on a filesystem mounted as ro then remounted rw, we would benefit from unifying the logic between the two code paths. This patch only lifts the existing common functionality, and leaves a natural path for fixing the discrepancies. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-09btrfs: move btrfs_find_highest_objectid/btrfs_find_free_objectid to disk-io.cGravatar Nikolay Borisov 1-0/+2
Those functions are going to be used even after inode cache is removed so moved them to a more appropriate place. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: remove unused parameter phy_offset from btrfs_validate_metadata_bufferGravatar Qu Wenruo 1-1/+1
Parameter @phy_offset is the offset against the bio->bi_iter.bi_sector. @phy_offset is mostly for data io to lookup the csum in btrfs_io_bio. But for metadata, it's completely useless as metadata stores their own csum in its header, so we can remove it. Note: parameters @start and @end, they are not utilized at all for current sectorsize == PAGE_SIZE case, as we can grab eb directly from page. But those two parameters are very important for later subpage support, thus @start/@len are not touched here. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: pass the owner_root and level to alloc_extent_bufferGravatar Josef Bacik 1-1/+2
Now that we've plumbed all of the callers to have the owner root and the level, plumb it down into alloc_extent_buffer(). Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>