linux-stable/fs
Josef Bacik 5111b14836 btrfs: adjust subpage bit start based on sectorsize
[ Upstream commit e08e49d986 ]

When running machines with 64k page size and a 16k nodesize we started
seeing tree log corruption in production.  This turned out to be because
we were not writing out dirty blocks sometimes, so this in fact affects
all metadata writes.

When writing out a subpage EB we scan the subpage bitmap for a dirty
range.  If the range isn't dirty we do

	bit_start++;

to move onto the next bit.  The problem is the bitmap is based on the
number of sectors that an EB has.  So in this case, we have a 64k
pagesize, 16k nodesize, but a 4k sectorsize.  This means our bitmap is 4
bits for every node.  With a 64k page size we end up with 4 nodes per
page.

To make this easier this is how everything looks

[0         16k       32k       48k     ] logical address
[0         4         8         12      ] radix tree offset
[               64k page               ] folio
[ 16k eb ][ 16k eb ][ 16k eb ][ 16k eb ] extent buffers
[ | | | |  | | | |   | | | |   | | | | ] bitmap

Now we use all of our addressing based on fs_info->sectorsize_bits, so
as you can see the above our 16k eb->start turns into radix entry 4.

When we find a dirty range for our eb, we correctly do bit_start +=
sectors_per_node, because if we start at bit 0, the next bit for the
next eb is 4, to correspond to eb->start 16k.

However if our range is clean, we will do bit_start++, which will now
put us offset from our radix tree entries.

In our case, assume that the first time we check the bitmap the block is
not dirty, we increment bit_start so now it == 1, and then we loop
around and check again.  This time it is dirty, and we go to find that
start using the following equation

	start = folio_start + bit_start * fs_info->sectorsize;

so in the case above, eb->start 0 is now dirty, and we calculate start
as

	0 + 1 * fs_info->sectorsize = 4096
	4096 >> 12 = 1

Now we're looking up the radix tree for 1, and we won't find an eb.
What's worse is now we're using bit_start == 1, so we do bit_start +=
sectors_per_node, which is now 5.  If that eb is dirty we will run into
the same thing, we will look at an offset that is not populated in the
radix tree, and now we're skipping the writeout of dirty extent buffers.

The best fix for this is to not use sectorsize_bits to address nodes,
but that's a larger change.  Since this is a fs corruption problem fix
it simply by always using sectors_per_node to increment the start bit.

Fixes: c4aec299fa ("btrfs: introduce submit_eb_subpage() to submit a subpage metadata page")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Boris Burkov <boris@bur.io>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[ Adjust context ]
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-09 18:54:19 +02:00
..
9p
adfs
affs affs: don't write overlarge OFS data block size fields 2025-04-10 14:33:38 +02:00
afs afs: Fix the server_list to unuse a displaced server rather than putting it 2025-03-07 16:56:43 +01:00
autofs
befs
bfs
btrfs btrfs: adjust subpage bit start based on sectorsize 2025-09-09 18:54:19 +02:00
cachefiles cachefiles: Fix the incorrect return value in __cachefiles_write() 2025-07-24 08:51:51 +02:00
ceph ceph: fix possible integer overflow in ceph_zero_objects() 2025-07-06 10:57:56 +02:00
coda
configfs configfs: Do not override creating attribute file failure in populate_attrs() 2025-06-27 11:07:25 +01:00
cramfs
crypto fscrypt: Don't use problematic non-inline crypto engines 2025-08-28 16:26:10 +02:00
debugfs
devpts
dlm dlm: make tcp still work in multi-link env 2025-06-04 14:40:05 +02:00
ecryptfs
efivarfs efivarfs: Fix slab-out-of-bounds in efivarfs_d_compare 2025-09-04 15:26:29 +02:00
efs
erofs erofs: address D-cache aliasing 2025-08-15 12:04:51 +02:00
exfat exfat: fix the infinite loop in exfat_find_last_cluster() 2025-04-10 14:33:37 +02:00
exportfs
ext2 ext2: Handle fiemap on empty files to prevent EINVAL 2025-08-28 16:25:51 +02:00
ext4 ext4: preserve SB_I_VERSION on remount 2025-08-28 16:26:15 +02:00
f2fs f2fs: fix to avoid out-of-boundary access in dnode page 2025-08-28 16:26:15 +02:00
fat
freevxfs
fscache
fuse fuse: Return EPERM rather than ENOSYS from link() 2025-06-04 14:40:02 +02:00
gfs2 gfs2: move msleep to sleepable context 2025-06-27 11:07:25 +01:00
hfs hfs: fix not erasing deleted b-tree node issue 2025-08-28 16:25:51 +02:00
hfsplus hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file() 2025-08-28 16:25:51 +02:00
hostfs
hpfs
hugetlbfs mm: update memfd seal write check to include F_SEAL_WRITE 2025-08-28 16:26:12 +02:00
iomap iomap: avoid avoid truncating 64-bit offset to 32 bits 2025-01-23 17:17:12 +01:00
isofs isofs: Verify inode mode when loading from disk 2025-07-24 08:51:49 +02:00
jbd2 jbd2: prevent softlockup in jbd2_log_do_checkpoint() 2025-08-28 16:26:07 +02:00
jffs2 jffs2: check jffs2_prealloc_raw_node_refs() result in few other places 2025-06-27 11:07:36 +01:00
jfs jfs: upper bound check of tree index in dbAllocAG 2025-08-28 16:26:00 +02:00
kernfs kernfs: Relax constraint in draining guard 2025-06-27 11:07:11 +01:00
lockd
minix
netfs
nfs NFS: Fix a race when updating an existing write 2025-09-04 15:26:26 +02:00
nfs_common
nfsd NFSD: detect mismatch of file handle and delegation stateid in OPEN op 2025-08-28 16:25:48 +02:00
nilfs2 nilfs2: reject invalid file types when reading inodes 2025-08-15 12:04:49 +02:00
nls
notify
ntfs
ntfs3 fs/ntfs3: correctly create symlink for relative path 2025-08-28 16:25:51 +02:00
ocfs2 ocfs2: prevent release journal inode after journal shutdown 2025-09-09 18:54:17 +02:00
omfs fs: omfs: Use flexible-array member in struct omfs_extent 2025-07-06 10:58:03 +02:00
openpromfs
orangefs fs/orangefs: use snprintf() instead of sprintf() 2025-08-28 16:25:59 +02:00
overlayfs ovl: Check for NULL d_inode() in ovl_dentry_upper() 2025-07-06 10:57:56 +02:00
proc proc: fix missing pde_set_flags() for net proc files 2025-09-09 18:54:17 +02:00
pstore pstore/blk: trivial typo fixes 2025-02-21 13:48:53 +01:00
qnx4
qnx6
quota
ramfs
reiserfs
romfs
smb cifs: prevent NULL pointer dereference in UTF16 conversion 2025-09-09 18:54:18 +02:00
squashfs squashfs: fix memory leak in squashfs_fill_super 2025-08-28 16:26:13 +02:00
sysfs
sysv
tracefs
ubifs ubifs: skip dumping tnc tree when zroot is null 2025-02-21 13:49:21 +01:00
udf udf: Verify partition map count 2025-08-28 16:25:51 +02:00
ufs
unicode Revert "unicode: Don't special case ignorable code points" 2024-12-14 19:54:50 +01:00
vboxsf vboxsf: fix building with GCC 15 2025-03-28 21:58:51 +01:00
verity
xfs xfs: do not propagate ENODATA disk errors into xattr code 2025-09-04 15:26:31 +02:00
zonefs
aio.c
anon_inodes.c fs: export anon_inode_make_secure_inode() and fix secretmem LSM bypass 2025-07-17 18:32:10 +02:00
attr.c
bad_inode.c
binfmt_elf_fdpic.c binfmt: Fix whitespace issues 2025-05-22 14:09:58 +02:00
binfmt_elf_test.c
binfmt_elf.c binfmt_elf: Move brk for static PIE even if ASLR disabled 2025-05-22 14:09:59 +02:00
binfmt_flat.c binfmt_flat: Fix integer overflow bug on 32 bit systems 2025-02-21 13:49:39 +01:00
binfmt_misc.c
binfmt_script.c
buffer.c fs/buffer: fix use-after-free when call bh_read() helper 2025-08-28 16:26:14 +02:00
char_dev.c
compat_binfmt_elf.c
coredump.c coredump: hand a pidfd to the usermode coredump helper 2025-06-04 14:40:25 +02:00
d_path.c
dax.c
dcache.c
direct-io.c
drop_caches.c
eventfd.c
eventpoll.c eventpoll: Fix semi-unbounded recursion 2025-08-28 16:25:49 +02:00
exec.c binfmt: Fix whitespace issues 2025-05-22 14:09:58 +02:00
fcntl.c
fhandle.c
file_table.c fs: fix proc_handler for sysctl_nr_open 2025-02-21 13:48:53 +01:00
file.c alloc_fdtable(): change calling conventions. 2025-08-28 16:26:19 +02:00
filesystems.c fs/filesystems: Fix potential unsigned integer underflow in fs_name() 2025-06-27 11:07:23 +01:00
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c fs: writeback: fix use-after-free in __mark_inode_dirty() 2025-09-09 18:54:12 +02:00
fsopen.c
init.c
inode.c
internal.h
ioctl.c
Kconfig nfs: add missing selections of CONFIG_CRC32 2025-04-25 10:43:52 +02:00
Kconfig.binfmt
kernel_read_file.c
libfs.c better lockdep annotations for simple_recursive_removal() 2025-08-28 16:25:51 +02:00
locks.c
Makefile
mbcache.c
mount.h
mpage.c
namei.c fuse: don't truncate cached, mutated symlink 2025-03-28 21:58:53 +01:00
namespace.c use uniform permission checks for all mount propagation changes 2025-08-28 16:26:14 +02:00
no-block.c
nsfs.c
open.c
pipe.c
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c
readdir.c
remap_range.c
select.c hrtimer: Use and report correct timerslack values for realtime tasks 2025-03-28 21:58:48 +01:00
seq_file.c
signalfd.c
splice.c
stack.c
stat.c
statfs.c
super.c
sync.c
sysctls.c
timerfd.c
userfaultfd.c mm/uffd: fix vma operation where start addr cuts part of vma 2025-06-27 11:07:04 +01:00
utimes.c
xattr.c