aboutsummaryrefslogtreecommitdiff
path: root/drivers/md/dm-thin-metadata.c
AgeCommit message (Collapse)AuthorFilesLines
2023-03-28mm: shrinkers: convert shrinker_rwsem to mutexGravatar Qi Zheng 1-1/+1
Now there are no readers of shrinker_rwsem, so we can simply replace it with mutex lock. Link: https://lkml.kernel.org/r/20230313112819.38938-9-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill Tkhai <tkhai@ya.ru> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christian König <christian.koenig@amd.com> Cc: David Hildenbrand <david@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sultan Alsawaf <sultan@kerneltoast.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-14dm: avoid useless 'else' after 'break' or return'Gravatar Heinz Mauelshagen 1-2/+2
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: prefer '"%s...", __func__'Gravatar Heinz Mauelshagen 1-6/+6
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: avoid split of quoted strings where possibleGravatar Heinz Mauelshagen 1-4/+4
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: remove unnecessary braces from single statement blocksGravatar Heinz Mauelshagen 1-2/+2
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: add missing empty linesGravatar Heinz Mauelshagen 1-0/+1
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: correct block comments format.Gravatar Heinz Mauelshagen 1-8/+12
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: change "unsigned" to "unsigned int"Gravatar Heinz Mauelshagen 1-12/+12
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2023-02-14dm: add missing SPDX-License-IndentifiersGravatar Heinz Mauelshagen 1-0/+1
'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has deprecated its use. Suggested-by: John Wiele <jwiele@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-12-08dm thin: Use last transaction's pmd->root when commit failedGravatar Zhihao Cheng 1-0/+9
Recently we found a softlock up problem in dm thin pool btree lookup code due to corrupted metadata: Kernel panic - not syncing: softlockup: hung tasks CPU: 7 PID: 2669225 Comm: kworker/u16:3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Workqueue: dm-thin do_worker [dm_thin_pool] Call Trace: <IRQ> dump_stack+0x9c/0xd3 panic+0x35d/0x6b9 watchdog_timer_fn.cold+0x16/0x25 __run_hrtimer+0xa2/0x2d0 </IRQ> RIP: 0010:__relink_lru+0x102/0x220 [dm_bufio] __bufio_new+0x11f/0x4f0 [dm_bufio] new_read+0xa3/0x1e0 [dm_bufio] dm_bm_read_lock+0x33/0xd0 [dm_persistent_data] ro_step+0x63/0x100 [dm_persistent_data] btree_lookup_raw.constprop.0+0x44/0x220 [dm_persistent_data] dm_btree_lookup+0x16f/0x210 [dm_persistent_data] dm_thin_find_block+0x12c/0x210 [dm_thin_pool] __process_bio_read_only+0xc5/0x400 [dm_thin_pool] process_thin_deferred_bios+0x1a4/0x4a0 [dm_thin_pool] process_one_work+0x3c5/0x730 Following process may generate a broken btree mixed with fresh and stale btree nodes, which could get dm thin trapped in an infinite loop while looking up data block: Transaction 1: pmd->root = A, A->B->C // One path in btree pmd->root = X, X->Y->Z // Copy-up Transaction 2: X,Z is updated on disk, Y write failed. // Commit failed, dm thin becomes read-only. process_bio_read_only dm_thin_find_block __find_block dm_btree_lookup(pmd->root) The pmd->root points to a broken btree, Y may contain stale node pointing to any block, for example X, which gets dm thin trapped into a dead loop while looking up Z. Fix this by setting pmd->root in __open_metadata(), so that dm thin will use the last transaction's pmd->root if commit failed. Fetch a reproducer in [Link]. Linke: https://bugzilla.kernel.org/show_bug.cgi?id=216790 Cc: stable@vger.kernel.org Fixes: 991d9fa02da0 ("dm: add thin provisioning target") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-12-01dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadataGravatar Zhihao Cheng 1-8/+43
Following concurrent processes: P1(drop cache) P2(kworker) drop_caches_sysctl_handler drop_slab shrink_slab down_read(&shrinker_rwsem) - LOCK A do_shrink_slab super_cache_scan prune_icache_sb dispose_list evict ext4_evict_inode ext4_clear_inode ext4_discard_preallocations ext4_mb_load_buddy_gfp ext4_mb_init_cache ext4_read_block_bitmap_nowait ext4_read_bh_nowait submit_bh dm_submit_bio do_worker process_deferred_bios commit metadata_operation_failed dm_pool_abort_metadata down_write(&pmd->root_lock) - LOCK B __destroy_persistent_data_objects dm_block_manager_destroy dm_bufio_client_destroy unregister_shrinker down_write(&shrinker_rwsem) thin_map | dm_thin_find_block ↓ down_read(&pmd->root_lock) --> ABBA deadlock , which triggers hung task: [ 76.974820] INFO: task kworker/u4:3:63 blocked for more than 15 seconds. [ 76.976019] Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910 [ 76.978521] task:kworker/u4:3 state:D stack:0 pid:63 ppid:2 [ 76.978534] Workqueue: dm-thin do_worker [ 76.978552] Call Trace: [ 76.978564] __schedule+0x6ba/0x10f0 [ 76.978582] schedule+0x9d/0x1e0 [ 76.978588] rwsem_down_write_slowpath+0x587/0xdf0 [ 76.978600] down_write+0xec/0x110 [ 76.978607] unregister_shrinker+0x2c/0xf0 [ 76.978616] dm_bufio_client_destroy+0x116/0x3d0 [ 76.978625] dm_block_manager_destroy+0x19/0x40 [ 76.978629] __destroy_persistent_data_objects+0x5e/0x70 [ 76.978636] dm_pool_abort_metadata+0x8e/0x100 [ 76.978643] metadata_operation_failed+0x86/0x110 [ 76.978649] commit+0x6a/0x230 [ 76.978655] do_worker+0xc6e/0xd90 [ 76.978702] process_one_work+0x269/0x630 [ 76.978714] worker_thread+0x266/0x630 [ 76.978730] kthread+0x151/0x1b0 [ 76.978772] INFO: task test.sh:2646 blocked for more than 15 seconds. [ 76.979756] Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910 [ 76.982111] task:test.sh state:D stack:0 pid:2646 ppid:2459 [ 76.982128] Call Trace: [ 76.982139] __schedule+0x6ba/0x10f0 [ 76.982155] schedule+0x9d/0x1e0 [ 76.982159] rwsem_down_read_slowpath+0x4f4/0x910 [ 76.982173] down_read+0x84/0x170 [ 76.982177] dm_thin_find_block+0x4c/0xd0 [ 76.982183] thin_map+0x201/0x3d0 [ 76.982188] __map_bio+0x5b/0x350 [ 76.982195] dm_submit_bio+0x2b6/0x930 [ 76.982202] __submit_bio+0x123/0x2d0 [ 76.982209] submit_bio_noacct_nocheck+0x101/0x3e0 [ 76.982222] submit_bio_noacct+0x389/0x770 [ 76.982227] submit_bio+0x50/0xc0 [ 76.982232] submit_bh_wbc+0x15e/0x230 [ 76.982238] submit_bh+0x14/0x20 [ 76.982241] ext4_read_bh_nowait+0xc5/0x130 [ 76.982247] ext4_read_block_bitmap_nowait+0x340/0xc60 [ 76.982254] ext4_mb_init_cache+0x1ce/0xdc0 [ 76.982259] ext4_mb_load_buddy_gfp+0x987/0xfa0 [ 76.982263] ext4_discard_preallocations+0x45d/0x830 [ 76.982274] ext4_clear_inode+0x48/0xf0 [ 76.982280] ext4_evict_inode+0xcf/0xc70 [ 76.982285] evict+0x119/0x2b0 [ 76.982290] dispose_list+0x43/0xa0 [ 76.982294] prune_icache_sb+0x64/0x90 [ 76.982298] super_cache_scan+0x155/0x210 [ 76.982303] do_shrink_slab+0x19e/0x4e0 [ 76.982310] shrink_slab+0x2bd/0x450 [ 76.982317] drop_slab+0xcc/0x1a0 [ 76.982323] drop_caches_sysctl_handler+0xb7/0xe0 [ 76.982327] proc_sys_call_handler+0x1bc/0x300 [ 76.982331] proc_sys_write+0x17/0x20 [ 76.982334] vfs_write+0x3d3/0x570 [ 76.982342] ksys_write+0x73/0x160 [ 76.982347] __x64_sys_write+0x1e/0x30 [ 76.982352] do_syscall_64+0x35/0x80 [ 76.982357] entry_SYSCALL_64_after_hwframe+0x63/0xcd Function metadata_operation_failed() is called when operations failed on dm pool metadata, dm pool will destroy and recreate metadata. So, shrinker will be unregistered and registered, which could down write shrinker_rwsem under pmd_write_lock. Fix it by allocating dm_block_manager before locking pmd->root_lock and destroying old dm_block_manager after unlocking pmd->root_lock, then old dm_block_manager is replaced with new dm_block_manager under pmd->root_lock. So, shrinker register/unregister could be done without holding pmd->root_lock. Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216676 Cc: stable@vger.kernel.org #v5.2+ Fixes: e49e582965b3 ("dm thin: add read only and fail io modes") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-07-15dm thin: fix use-after-free crash in dm_sm_register_threshold_callbackGravatar Luo Meng 1-2/+5
Fault inject on pool metadata device reports: BUG: KASAN: use-after-free in dm_pool_register_metadata_threshold+0x40/0x80 Read of size 8 at addr ffff8881b9d50068 by task dmsetup/950 CPU: 7 PID: 950 Comm: dmsetup Tainted: G W 5.19.0-rc6 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_address_description.constprop.0.cold+0xeb/0x3f4 kasan_report.cold+0xe6/0x147 dm_pool_register_metadata_threshold+0x40/0x80 pool_ctr+0xa0a/0x1150 dm_table_add_target+0x2c8/0x640 table_load+0x1fd/0x430 ctl_ioctl+0x2c4/0x5a0 dm_ctl_ioctl+0xa/0x10 __x64_sys_ioctl+0xb3/0xd0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This can be easily reproduced using: echo offline > /sys/block/sda/device/state dd if=/dev/zero of=/dev/mapper/thin bs=4k count=10 dmsetup load pool --table "0 20971520 thin-pool /dev/sda /dev/sdb 128 0 0" If a metadata commit fails, the transaction will be aborted and the metadata space maps will be destroyed. If a DM table reload then happens for this failed thin-pool, a use-after-free will occur in dm_sm_register_threshold_callback (called from dm_pool_register_metadata_threshold). Fix this by in dm_pool_register_metadata_threshold() by returning the -EINVAL error if the thin-pool is in fail mode. Also fail pool_ctr() with a new error message: "Error registering metadata threshold". Fixes: ac8c3f3df65e4 ("dm thin: generate event when metadata threshold passed") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Luo Meng <luomeng12@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-02-22dm thin metadata: remove unused dm_thin_remove_block and __removeGravatar Zhiqiang Liu 1-28/+0
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-10-18dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding themGravatar Christoph Hellwig 1-1/+1
Use the proper helpers to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20211018101130.1838532-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-04dm space maps: improve performance with inc/dec on ranges of blocksGravatar Joe Thornber 1-38/+53
When we break sharing on btree nodes we typically need to increment the reference counts to every value held in the node. This can cause a lot of repeated calls to the space maps. Fix this by changing the interface to the space map inc/dec methods to take ranges of adjacent blocks to be operated on. For installations that are using a lot of snapshots this will reduce cpu overhead of fundamental operations such as provisioning a new block, or deleting a snapshot, by as much as 10 times. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-01-24dm: use bdev_read_only to check if a device is read-onlyGravatar Christoph Hellwig 1-1/+1
dm-thin and dm-cache also work on partitions, so use the proper interface to check if the device is read-only. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-29dm thin metadata: Remove unused local variable when create thin and snapGravatar Huaisheng Ye 1-4/+2
The local variable disk details is not used during the creating of thin & snap devices. Remove them from dm-thin-metadata, and add pointer validity check for pointer value in btree_lookup_raw. Skip memory copy when the caller doesn't need the value. Signed-off-by: Huaisheng Ye <yehs1@lenovo.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-02dm thin metadata: Fix use-after-free in dm_bm_set_read_onlyGravatar Ye Bin 1-1/+1
The following error ocurred when testing disk online/offline: [ 301.798344] device-mapper: thin: 253:5: aborting current metadata transaction [ 301.848441] device-mapper: thin: 253:5: failed to abort metadata transaction [ 301.849206] Aborting journal on device dm-26-8. [ 301.850489] EXT4-fs error (device dm-26) in __ext4_new_inode:943: Journal has aborted [ 301.851095] EXT4-fs (dm-26): Delayed block allocation failed for inode 398742 at logical offset 181 with max blocks 19 with error 30 [ 301.854476] BUG: KASAN: use-after-free in dm_bm_set_read_only+0x3a/0x40 [dm_persistent_data] Reason is: metadata_operation_failed abort_transaction dm_pool_abort_metadata __create_persistent_data_objects r = __open_or_format_metadata if (r) --> If failed will free pmd->bm but pmd->bm not set NULL dm_block_manager_destroy(pmd->bm); set_pool_mode dm_pool_metadata_read_only(pool->pmd); dm_bm_set_read_only(pmd->bm); --> use-after-free Add checks to see if pmd->bm is NULL in dm_bm_set_read_only and dm_bm_set_read_write functions. If bm is NULL it means creating the bm failed and so dm_bm_is_read_only must return true. Signed-off-by: Ye Bin <yebin10@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-09-02dm thin metadata: Avoid returning cmd->bm wild pointer on errorGravatar Ye Bin 1-2/+6
Maybe __create_persistent_data_objects() caller will use PTR_ERR as a pointer, it will lead to some strange things. Signed-off-by: Ye Bin <yebin10@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-02-27dm thin metadata: fix lockdep complaintGravatar Theodore Ts'o 1-1/+1
[ 3934.173244] ====================================================== [ 3934.179572] WARNING: possible circular locking dependency detected [ 3934.185884] 5.4.21-xfstests #1 Not tainted [ 3934.190151] ------------------------------------------------------ [ 3934.196673] dmsetup/8897 is trying to acquire lock: [ 3934.201688] ffffffffbce82b18 (shrinker_rwsem){++++}, at: unregister_shrinker+0x22/0x80 [ 3934.210268] but task is already holding lock: [ 3934.216489] ffff92a10cc5e1d0 (&pmd->root_lock){++++}, at: dm_pool_metadata_close+0xba/0x120 [ 3934.225083] which lock already depends on the new lock. [ 3934.564165] Chain exists of: shrinker_rwsem --> &journal->j_checkpoint_mutex --> &pmd->root_lock For a more detailed lockdep report, please see: https://lore.kernel.org/r/20200220234519.GA620489@mit.edu We shouldn't need to hold the lock while are just tearing down and freeing the whole metadata pool structure. Fixes: 44d8ebf436399a4 ("dm thin metadata: use pool locking at end of dm_pool_metadata_close") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-01-14dm thin metadata: use pool locking at end of dm_pool_metadata_closeGravatar Mike Snitzer 1-4/+6
Ensure that the pool is locked during calls to __commit_transaction and __destroy_persistent_data_objects. Just being consistent with locking, but reality is dm_pool_metadata_close is called once pool is being destroyed so access to pool shouldn't be contended. Also, use pmd_write_lock_in_core rather than __pmd_write_lock in dm_pool_commit_metadata and rename __pmd_write_lock to pmd_write_lock_in_core -- there was no need for the alias. In addition, verify that the pool is locked in __commit_transaction(). Fixes: 873f258becca ("dm thin metadata: do not write metadata if no changes occurred") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-01-07dm thin metadata: Fix trivial math error in on-disk format documentationGravatar Jeffle Xu 1-1/+1
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2020-01-07dm thin metadata: use true/false for bool variableGravatar zhengbin 1-5/+5
Fixes coccicheck warning: drivers/md/dm-thin-metadata.c:814:3-14: WARNING: Assignment of 0/1 to bool variable drivers/md/dm-thin-metadata.c:1109:1-12: WARNING: Assignment of 0/1 to bool variable drivers/md/dm-thin-metadata.c:1621:1-12: WARNING: Assignment of 0/1 to bool variable drivers/md/dm-thin-metadata.c:1652:1-12: WARNING: Assignment of 0/1 to bool variable drivers/md/dm-thin-metadata.c:1706:1-12: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-12-05dm thin metadata: Add support for a pre-commit callbackGravatar Nikos Tsironis 1-0/+29
Add support for one pre-commit callback which is run right before the metadata are committed. This allows the thin provisioning target to run a callback before the metadata are committed and is required by the next commit. Cc: stable@vger.kernel.org Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com> Acked-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-07-02dm thin metadata: check if in fail_io mode when setting needs_checkGravatar Mike Snitzer 1-2/+5
Check if in fail_io mode at start of dm_pool_metadata_set_needs_check(). Otherwise dm_pool_metadata_set_needs_check()'s superblock_lock() can crash in dm_bm_write_lock() while accessing the block manager object that was previously destroyed as part of a failed dm_pool_abort_metadata() that ultimately set fail_io to begin with. Also, update DMERR() message to more accurately describe superblock_lock() failure. Cc: stable@vger.kernel.org Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-18dm thin metadata: do not write metadata if no changes occurredGravatar Mike Snitzer 1-3/+23
Otherwise, just activating a thin-pool and thin device and then deactivating them will cause the thin-pool metadata to be changed (e.g. superblock written) -- even without any metadata being changed. Add 'in_service' flag to struct dm_pool_metadata and set it in pmd_write_lock() because all on-disk metadata changes must take a write lock of pmd->root_lock. Once 'in_service' is set it is never cleared. __commit_transaction() will return 0 if 'in_service' is not set. dm_pool_commit_metadata() is updated to use __pmd_write_lock() so that it isn't the sole reason for putting a thin-pool in service. Also fix dm_pool_commit_metadata() to open the next transaction if the return from __commit_transaction() is 0. Not seeing why the early return ever made since for a return of 0 given that dm-io's async_io(), as used by bufio, always returns 0. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-18dm thin metadata: add wrappers for managing write locking of metadataGravatar Mike Snitzer 1-44/+64
No functional change, but this prepares to hook off of pmd_write_lock() with additional functionality (as provided in next commit). Suggested-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-18dm thin metadata: check __commit_transaction()'s returnGravatar Mike Snitzer 1-1/+6
Fix __reserve_metadata_snap() to return early if __commit_transaction() fails. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-01-15dm thin: fix passdown_double_checking_shared_status()Gravatar Joe Thornber 1-2/+2
Commit 00a0ea33b495 ("dm thin: do not queue freed thin mapping for next stage processing") changed process_prepared_discard_passdown_pt1() to increment all the blocks being discarded until after the passdown had completed to avoid them being prematurely reused. IO issued to a thin device that breaks sharing with a snapshot, followed by a discard issued to snapshot(s) that previously shared the block(s), results in passdown_double_checking_shared_status() being called to iterate through the blocks double checking their reference count is zero and issuing the passdown if so. So a side effect of commit 00a0ea33b495 is passdown_double_checking_shared_status() was broken. Fix this by checking if the block reference count is greater than 1. Also, rename dm_pool_block_is_used() to dm_pool_block_is_shared(). Fixes: 00a0ea33b495 ("dm thin: do not queue freed thin mapping for next stage processing") Cc: stable@vger.kernel.org # 4.9+ Reported-by: ryan.p.norwood@gmail.com Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-17dm thin metadata: fix __udivdi3 undefined on 32-bitGravatar Mike Snitzer 1-4/+2
sector_div() is only viable for use with sector_t. dm_block_t is typedef'd to uint64_t -- so use div_u64() instead. Fixes: 3ab918281 ("dm thin metadata: try to avoid ever aborting transactions") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-10dm thin metadata: try to avoid ever aborting transactionsGravatar Joe Thornber 1-1/+35
Committing a transaction can consume some metadata of it's own, we now reserve a small amount of metadata to cover this. Free metadata reported by the kernel will not include this reserve. If any of the reserve has been used after a commit we enter a new internal state PM_OUT_OF_METADATA_SPACE. This is reported as PM_READ_ONLY, so no userland changes are needed. If the metadata device is resized the pool will move back to PM_WRITE. These changes mean we never need to abort and rollback a transaction due to running out of metadata space. This is particularly important because there have been a handful of reports of data corruption against DM thin-provisioning that can all be attributed to the thin-pool having ran out of metadata space. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-06-22dm thin metadata: remove needless work from __commit_transactionGravatar Mike Snitzer 1-9/+0
Commit 5a32083d03fb5 ("dm: take care to copy the space map roots before locking the superblock") properly removed the calls to dm_sm_root_size() from __write_initial_superblock(). But the dm_sm_root_size() calls were left dangling in __commit_transaction(). Fixes: 5a32083d03fb5 ("dm: take care to copy the space map roots before locking the superblock") Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6Gravatar Dennis Yang 1-1/+5
For btree removal, there is a corner case that a single thread could takes 6 locks which is more than THIN_MAX_CONCURRENT_LOCKS(5) and leads to deadlock. A btree removal might eventually call rebalance_children()->rebalance3() to rebalance entries of three neighbor child nodes when shadow_spine has already acquired two write locks. In rebalance3(), it tries to shadow and acquire the write locks of all three child nodes. However, shadowing a child node requires acquiring a read lock of the original child node and a write lock of the new block. Although the read lock will be released after block shadowing, shadowing the third child node in rebalance3() could still take the sixth lock. (2 write locks for shadow_spine + 2 write locks for the first two child nodes's shadow + 1 write lock for the last child node's shadow + 1 read lock for the last child node) Cc: stable@vger.kernel.org Signed-off-by: Dennis Yang <dennisyang@qnap.com> Acked-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-15dm thin metadata: call precommit before saving the rootsGravatar Joe Thornber 1-2/+2
These calls were the wrong way round in __write_initial_superblock. Cc: stable@vger.kernel.org Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27dm block manager: remove an unused argument from dm_block_manager_create()Gravatar Bart Van Assche 1-2/+0
The 'cache_size' argument of dm_block_manager_create() has never been used. Remove it along with the definitions of the constants passed as the 'cache_size' argument. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-20dm thin: fix a race condition between discarding and provisioning a blockGravatar Joe Thornber 1-0/+30
The discard passdown was being issued after the block was unmapped, which meant the block could be reprovisioned whilst the passdown discard was still in flight. We can only identify unshared blocks (safe to do a passdown a discard to) once they're unmapped and their ref count hits zero. Block ref counts are now used to guard against concurrent allocation of these blocks that are being discarded. So now we unmap the block, issue passdown discards, and the immediately increment ref counts for regions that have been discarded via passed down (this is safe because allocation occurs within the same thread). We then decrement ref counts once the passdown discard IO is complete -- signaling these blocks may now be allocated. This fixes the potential for corruption that was reported here: https://www.redhat.com/archives/dm-devel/2016-June/msg00311.html Reported-by: Dennis Yang <dennisyang@qnap.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-10dm thin metadata: don't issue prefetches if a transaction abort has failedGravatar Joe Thornber 1-1/+4
If a transaction abort has failed then we can no longer use the metadata device. Typically this happens if the superblock is unreadable. This fix addresses a crash seen during metadata device failure testing. Fixes: 8a01a6af75 ("dm thin: prefetch missing metadata pages") Cc: stable@vger.kernel.org # 3.19+ Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-03-10dm thin metadata: remove needless newline from subtree_dec() DMERR messageGravatar Mike Snitzer 1-1/+1
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10dm thin metadata: make dm_thin_find_mapped_range() atomicGravatar Joe Thornber 1-21/+43
Refactor dm_thin_find_mapped_range() so that it takes the read lock on the metadata's lock; rather than relying on finer grained locking that is pushed down inside dm_thin_find_next_mapped_block() and dm_thin_find_block(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-10dm thin metadata: speed up discard of partially mapped volumesGravatar Joe Thornber 1-25/+41
Use dm_btree_lookup_next() to more quickly discard partially mapped volumes. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-12-09dm thin metadata: fix bug when taking a metadata snapshotGravatar Joe Thornber 1-0/+6
When you take a metadata snapshot the btree roots for the mapping and details tree need to have their reference counts incremented so they persist for the lifetime of the metadata snap. The roots being incremented were those currently written in the superblock, which could possibly be out of date if concurrent IO is triggering new mappings, breaking of sharing, etc. Fix this by performing a commit with the metadata lock held while taking a metadata snapshot. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-12-02dm thin metadata: fix bug in dm_thin_remove_range()Gravatar Joe Thornber 1-5/+23
dm_btree_remove_leaves() only unmaps a contiguous region so we need a loop, in __remove_range(), to handle ranges that contain multiple regions. A new btree function, dm_btree_lookup_next(), is introduced which is more efficiently able to skip over regions of the thin device which aren't mapped. __remove_range() uses dm_btree_lookup_next() for each iteration of __remove_range()'s loop. Also, improve description of dm_btree_remove_leaves(). Fixes: 6550f075 ("dm thin metadata: add dm_thin_remove_range()") Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 4.1+
2015-10-31dm persistent data: eliminate unnecessary return valuesGravatar Mikulas Patocka 1-4/+12
dm_bm_unlock and dm_tm_unlock return an integer value but the returned value is always 0. The calling code sometimes checks the return value and sometimes doesn't. Eliminate these unnecessary return values and also the checks for them. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-12dm thin metadata: delete btrees when releasing metadata snapshotGravatar Joe Thornber 1-2/+2
The device details and mapping trees were just being decremented before. Now btree_del() is called to do a deep delete. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2015-06-11dm thin metadata: fix a race when entering fail modeGravatar Joe Thornber 1-3/+4
In dm_thin_find_block() the ->fail_io flag was checked outside the metadata device's root_lock, causing dm_thin_find_block() to race with the setting of this flag. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-06-11dm thin metadata: add dm_thin_remove_range()Gravatar Joe Thornber 1-0/+54
Removes a range of blocks from the btree. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-06-11dm thin metadata: add dm_thin_find_mapped_range()Gravatar Joe Thornber 1-0/+57
Retrieve the next run of contiguously mapped blocks. Useful for working out where to break up IO. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29dm thin metadata: remove in-core 'read_only' flagGravatar Mike Snitzer 1-5/+1
Leverage the block manager's read_only flag instead of duplicating it; access with new dm_bm_is_read_only() method. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-09dm thin metadata: remove unused dm_pool_get_data_block_size()Gravatar Rickard Strandqvist 1-9/+0
The thin-pool target doesn't display the data block size as part of its table status, unlike the dm-cache target, so there is no need for dm_pool_get_data_block_size(). This was found using cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-11-10dm thin: prefetch missing metadata pagesGravatar Joe Thornber 1-0/+5
Prefetch metadata at the start of the worker thread and then again every 128th bio processed from the deferred list. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>