xfs: only run COW extent recovery when there are no live extents

As part of multiple customer escalations due to file data corruption after copy on write operations, I wrote some fstests that use fsstress to hammer on COW to shake things loose. Regrettably, I caught some filesystem shutdowns due to incorrect rmap operations with the following loop: mount <filesystem> # (0) fsstress <run only readonly ops> & # (1) while true; do fsstress <run all ops> mount -o remount,ro # (2) fsstress <run only readonly ops> mount -o remount,rw # (3) done When (2) happens, notice that (1) is still running. xfs_remount_ro will call xfs_blockgc_stop to walk the inode cache to free all the COW extents, but the blockgc mechanism races with (1)'s reader threads to take IOLOCKs and loses, which means that it doesn't clean them all out. Call such a file (A). When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which walks the ondisk refcount btree and frees any COW extent that it finds. This function does not check the inode cache, which means that incore COW forks of inode (A) is now inconsistent with the ondisk metadata. If one of those former COW extents are allocated and mapped into another file (B) and someone triggers a COW to the stale reservation in (A), A's dirty data will be written into (B) and once that's done, those blocks will be transferred to (A)'s data fork without bumping the refcount. The results are catastrophic -- file (B) and the refcount btree are now corrupt. In the first patch, we fixed the race condition in (2) so that (A) will always flush the COW fork. In this second patch, we move the _recover_cow call to the initial mount call in (0) for safety. As mentioned previously, xfs_reflink_recover_cow walks the refcount btree looking for COW staging extents, and frees them. This was intended to be run at mount time (when we know there are no live inodes) to clean up any leftover staging events that may have been left behind during an unclean shutdown. As a time "optimization" for readonly mounts, we deferred this to the ro->rw transition, not realizing that any failure to clean all COW forks during a rw->ro transition would result in catastrophic corruption. Therefore, remove this optimization and only run the recovery routine when we're guaranteed not to have any COW staging extents anywhere, which means we always run this at mount time. While we're at it, move the callsite to xfs_log_mount_finish because any refcount btree expansion (however unlikely given that we're removing records from the right side of the index) must be fed by a per-AG reservation, which doesn't exist in its current location. Fixes: 174edb0e46e5 ("xfs: store in-progress CoW allocations in the refcount btree") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
author: Darrick J. Wong <djwong@kernel.org> 2021-12-15 11:52:23 -0800
committer: Darrick J. Wong <djwong@kernel.org> 2021-12-21 09:49:41 -0800
commit: 7993f1a431bc5271369d359941485a9340658ac3 (patch)
tree: 7d473433c6d8df7192a65e5a58a5efb42b2f0166 /fs/xfs/xfs_log_recover.c
parent: xfs: don't expose internal symlink metadata buffers to the vfs (diff)
download: linux-7993f1a431bc5271369d359941485a9340658ac3.tar.gz
linux-7993f1a431bc5271369d359941485a9340658ac3.tar.bz2
linux-7993f1a431bc5271369d359941485a9340658ac3.zip
1 files changed, 23 insertions, 1 deletions
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 53366cc0bc9e..8ecb9a8567b7 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -27,7 +27,7 @@
 #include "xfs_buf_item.h"
 #include "xfs_ag.h"
 #include "xfs_quota.h"
-
+#include "xfs_reflink.h"
 
 #define BLK_AVG(blk1, blk2)	((blk1+blk2) >> 1)
 
@@ -3498,6 +3498,28 @@ xlog_recover_finish(
 
 	xlog_recover_process_iunlinks(log);
 	xlog_recover_check_summary(log);
+
+	/*
+	 * Recover any CoW staging blocks that are still referenced by the
+	 * ondisk refcount metadata.  During mount there cannot be any live
+	 * staging extents as we have not permitted any user modifications.
+	 * Therefore, it is safe to free them all right now, even on a
+	 * read-only mount.
+	 */
+	error = xfs_reflink_recover_cow(log->l_mp);
+	if (error) {
+		xfs_alert(log->l_mp,
+	"Failed to recover leftover CoW staging extents, err %d.",
+				error);
+		/*
+		 * If we get an error here, make sure the log is shut down
+		 * but return zero so that any log items committed since the
+		 * end of intents processing can be pushed through the CIL
+		 * and AIL.
+		 */
+		xfs_force_shutdown(log->l_mp, SHUTDOWN_LOG_IO_ERROR);
+	}
+
 	return 0;
 }
author	Darrick J. Wong <djwong@kernel.org>	2021-12-15 11:52:23 -0800
committer	Darrick J. Wong <djwong@kernel.org>	2021-12-21 09:49:41 -0800
commit	7993f1a431bc5271369d359941485a9340658ac3 (patch)
tree	7d473433c6d8df7192a65e5a58a5efb42b2f0166 /fs/xfs/xfs_log_recover.c
parent	xfs: don't expose internal symlink metadata buffers to the vfs (diff)
download	linux-7993f1a431bc5271369d359941485a9340658ac3.tar.gz linux-7993f1a431bc5271369d359941485a9340658ac3.tar.bz2 linux-7993f1a431bc5271369d359941485a9340658ac3.zip