linux-stable/fs/nfs
NeilBrown 3db63daabe NFSv3: handle out-of-order write replies.
NFSv3 includes pre/post wcc attributes which allow the client to
determine if all changes to the file have been made by the client
itself, or if any might have been made by some other client.

If there are gaps in the pre/post ctime sequence it must be assumed that
some other client changed the file in that gap and the local cache must
be suspect.  The next time the file is opened the cache should be
invalidated.

Since Commit 1c341b7775 ("NFS: Add deferred cache invalidation for
close-to-open consistency violations") in linux 5.3 the Linux client has
been triggering this invalidation.  The chunk in nfs_update_inode() in
particularly triggers.

Unfortunately Linux NFS assumes that all replies will be processed in
the order sent, and will arrive in the order processed.  This is not
true in general.  Consequently Linux NFS might ignore the wcc info in a
WRITE reply because the reply is in response to a WRITE that was sent
before some other request for which a reply has already been seen.  This
is detected by Linux using the gencount tests in nfs_inode_attr_cmp().

Also, when the gencount tests pass it is still possible that the request
were processed on the server in a different order, and a gap seen in
the ctime sequence might be filled in by a subsequent reply, so gaps
should not immediately trigger delayed invalidation.

The net result is that writing to a server and then reading the file
back can result in going to the server for the read rather than serving
it from cache - all because a couple of replies arrived out-of-order.
This is a performance regression over kernels before 5.3, though the
change in 5.3 is a correctness improvement.

This has been seen with Linux writing to a Netapp server which
occasionally re-orders requests.  In testing the majority of requests
were in-order, but a few (maybe 2 or three at a time) could be
re-ordered.

This patch addresses the problem by recording any gaps seen in the
pre/post ctime sequence and not triggering invalidation until either
there are too many gaps to fit in the table, or until there are no more
active writes and the remaining gaps cannot be resolved.

We allocate a table of 16 gaps on demand.  If the allocation fails we
revert to current behaviour which is of little cost as we are unlikely
to be able to cache the writes anyway.

In the table we store "start->end" pair when iversion is updated and
"end<-start" pairs pre/post pairs reported by the server.  Usually these
exactly cancel out and so nothing is stored.  When there are
out-of-order replies we do store gaps and these will eventually be
cancelled against later replies when this client is the only writer.

If the final write is out-of-order there may be one gap remaining when
the file is closed.  This will be noticed and if there is precisely on
gap and if the iversion can be advanced to match it, then we do so.

This patch makes no attempt to handle directories correctly.  The same
problem potentially exists in the out-of-order replies to create/unlink
requests can cause future lookup requires to be sent to the server
unnecessarily.  A similar scheme using the same primitives could be used
to notice and handle out-of-order replies.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-04-11 16:13:21 -04:00
..
blocklayout
filelayout pNFS/filelayout: treat GETDEVICEINFO errors as layout failure 2023-02-15 11:07:54 -05:00
flexfilelayout
cache_lib.c
cache_lib.h
callback_proc.c
callback_xdr.c SUNRPC: Use per-CPU counters to tally server RPC counts 2023-02-20 09:20:32 -05:00
callback.c
callback.h
client.c
delegation.c
delegation.h
dir.c NFS: Correct timing for assigning access cache timestamp 2023-03-14 15:19:44 -04:00
direct.c NFS: Clean up O_DIRECT request allocation 2023-02-14 14:22:33 -05:00
dns_resolve.c
dns_resolve.h
export.c NFSD 6.3 Release Notes 2023-02-22 14:21:40 -08:00
file.c NFS Client Updates for Linux 6.3 2023-02-22 14:47:20 -08:00
fs_context.c
fscache.c NFS: Convert buffered read paths to use netfs when fscache is enabled 2023-04-11 13:08:26 -04:00
fscache.h NFS: Convert buffered read paths to use netfs when fscache is enabled 2023-04-11 13:08:26 -04:00
getroot.c
inode.c NFSv3: handle out-of-order write replies. 2023-04-11 16:13:21 -04:00
internal.h NFS: Convert buffered read paths to use netfs when fscache is enabled 2023-04-11 13:08:26 -04:00
io.c
iostat.h NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API 2023-04-11 13:08:26 -04:00
Kconfig NFS: Configure support for netfs when NFS fscache is configured 2023-04-11 13:00:02 -04:00
Makefile
mount_clnt.c
namespace.c fs: port ->getattr() to pass mnt_idmap 2023-01-19 09:24:25 +01:00
netns.h
nfs2super.c
nfs2xdr.c
nfs3_fs.h fs: port ->set_acl() to pass mnt_idmap 2023-01-19 09:24:27 +01:00
nfs3acl.c fs: port ->set_acl() to pass mnt_idmap 2023-01-19 09:24:27 +01:00
nfs3client.c
nfs3proc.c
nfs3super.c
nfs3xdr.c
nfs4_fs.h filelock: move file locking definitions to separate header file 2023-01-11 06:52:32 -05:00
nfs4client.c
nfs4file.c
nfs4getroot.c
nfs4idmap.c
nfs4idmap.h
nfs4namespace.c
nfs4proc.c NFSv4: Fix hangs when recovering open state after a server reboot 2023-03-22 16:22:35 -04:00
nfs4renewd.c
nfs4session.c
nfs4session.h
nfs4state.c NFSv4.1: Always send a RECLAIM_COMPLETE after establishing lease 2023-04-10 15:55:17 -04:00
nfs4super.c
nfs4sysctl.c nfs: simplify two-level sysctl registration for nfs4_cb_sysctls 2023-04-11 10:18:18 -04:00
nfs4trace.c
nfs4trace.h nfs4trace: fix state manager flag printing 2023-02-14 15:43:57 -05:00
nfs4xdr.c
nfs42.h
nfs42proc.c nfs42: do not fail with EIO if ssc returns NFS4ERR_OFFLOAD_DENIED 2023-02-15 10:42:51 -05:00
nfs42xattr.c
nfs42xdr.c
nfs.h
nfsroot.c
nfstrace.c
nfstrace.h NFS: Remove fscache specific trace points and NFS_INO_FSCACHE bit 2023-04-11 13:08:27 -04:00
pagelist.c NFS: Convert buffered read paths to use netfs when fscache is enabled 2023-04-11 13:08:26 -04:00
pnfs_dev.c
pnfs_nfs.c NFS: Convert buffered writes to use folios 2023-02-14 14:22:32 -05:00
pnfs.c pNFS/filelayout: treat GETDEVICEINFO errors as layout failure 2023-02-15 11:07:54 -05:00
pnfs.h NFS: Convert buffered writes to use folios 2023-02-14 14:22:32 -05:00
proc.c
read.c NFS: Convert buffered read paths to use netfs when fscache is enabled 2023-04-11 13:08:26 -04:00
super.c NFS: Remove all NFSIOS_FSCACHE counters due to conversion to netfs API 2023-04-11 13:08:26 -04:00
symlink.c
sysctl.c nfs: simplify two-level sysctl registration for nfs_cb_sysctls 2023-04-11 10:18:18 -04:00
sysfs.c
sysfs.h
unlink.c
write.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00