aboutsummaryrefslogtreecommitdiff
path: root/fs/locks.c
AgeCommit message (Collapse)AuthorFilesLines
2023-03-09filelocks: use mount idmapping for setlease permission checkGravatar Seth Forshee 1-1/+2
A user should be allowed to take out a lease via an idmapped mount if the fsuid matches the mapped uid of the inode. generic_setlease() is checking the unmapped inode uid, causing these operations to be denied. Fix this by comparing against the mapped inode uid instead of the unmapped uid. Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP") Cc: stable@vger.kernel.org Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-09fs/locks: Remove redundant assignment to cmdGravatar Jiapeng Chong 1-1/+0
Variable 'cmd' set but not used. fs/locks.c:2428:3: warning: Value stored to 'cmd' is never read. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4439 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-02-21Merge tag 'rcu.2023.02.10a' of ↵Gravatar Linus Torvalds 1-25/+0
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Documentation updates - Miscellaneous fixes, perhaps most notably: - Throttling callback invocation based on the number of callbacks that are now ready to invoke instead of on the total number of callbacks - Several patches that suppress false-positive boot-time diagnostics, for example, due to lockdep not yet being initialized - Make expedited RCU CPU stall warnings dump stacks of any tasks that are blocking the stalled grace period. (Normal RCU CPU stall warnings have done this for many years) - Lazy-callback fixes to avoid delays during boot, suspend, and resume. (Note that lazy callbacks must be explicitly enabled, so this should not (yet) affect production use cases) - Make kfree_rcu() and friends take advantage of polled grace periods, thus reducing memory footprint by almost two orders of magnitude, admittedly on a microbenchmark This also begins the transition from kfree_rcu(p) to kfree_rcu_mightsleep(p). This transition was motivated by bugs where kfree_rcu(p), which can block, was typed instead of the intended kfree_rcu(p, rh) - SRCU updates, perhaps most notably fixing a bug that causes SRCU to fail when booted on a system with a non-zero boot CPU. This surprising situation actually happens for kdump kernels on the powerpc architecture This also adds an srcu_down_read() and srcu_up_read(), which act like srcu_read_lock() and srcu_read_unlock(), but allow an SRCU read-side critical section to be handed off from one task to another - Clean up the now-useless SRCU Kconfig option There are a few more commits that are not yet acked or pulled into maintainer trees, and these will be in a pull request for a later merge window - RCU-tasks updates, perhaps most notably these fixes: - A strange interaction between PID-namespace unshare and the RCU-tasks grace period that results in a low-probability but very real hang - A race between an RCU tasks rude grace period on a single-CPU system and CPU-hotplug addition of the second CPU that can result in a too-short grace period - A race between shrinking RCU tasks down to a single callback list and queuing a new callback to some other CPU, but where that queuing is delayed for more than an RCU grace period. This can result in that callback being stranded on the non-boot CPU - Torture-test updates and fixes - Torture-test scripting updates and fixes - Provide additional RCU CPU stall-warning information in kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y, and restore the full five-minute timeout limit for expedited RCU CPU stall warnings * tag 'rcu.2023.02.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits) rcu/kvfree: Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep() kernel/notifier: Remove CONFIG_SRCU init: Remove "select SRCU" fs/quota: Remove "select SRCU" fs/notify: Remove "select SRCU" fs/btrfs: Remove "select SRCU" fs: Remove CONFIG_SRCU drivers/pci/controller: Remove "select SRCU" drivers/net: Remove "select SRCU" drivers/md: Remove "select SRCU" drivers/hwtracing/stm: Remove "select SRCU" drivers/dax: Remove "select SRCU" drivers/base: Remove CONFIG_SRCU rcu: Disable laziness if lazy-tracking says so rcu: Track laziness during boot and suspend rcu: Remove redundant call to rcu_boost_kthread_setaffinity() rcu: Allow up to five minutes expedited RCU CPU stall-warning timeouts rcu: Align the output of RCU CPU stall warning messages rcu: Add RCU stall diagnosis information sched: Add helper nr_context_switches_cpu() ...
2023-02-02fs: Remove CONFIG_SRCUGravatar Paul E. McKenney 1-25/+0
Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in conditional compilation based on CONFIG_SRCU. Therefore, remove the #ifdef and throw away the #else clause. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <linux-fsdevel@vger.kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: John Ogness <john.ogness@linutronix.de>
2023-01-11fs: remove locks_inodeGravatar Jeff Layton 1-14/+14
locks_inode was turned into a wrapper around file_inode in de2a4a501e71 (Partially revert "locks: fix file locking on overlayfs"). Finish replacing locks_inode invocations everywhere with file_inode. Acked-by: Miklos Szeredi <mszeredi@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2023-01-11filelock: move file locking definitions to separate header fileGravatar Jeff Layton 1-0/+1
The file locking definitions have lived in fs.h since the dawn of time, but they are only used by a small subset of the source files that include it. Move the file locking definitions to a new header file, and add the appropriate #include directives to the source files that need them. By doing this we trim down fs.h a bit and limit the amount of rebuilding that has to be done when we make changes to the file locking APIs. Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Steve French <stfrench@microsoft.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30Add process name and pid to locks warningGravatar Andi Kleen 1-1/+1
It's fairly useless to complain about using an obsolete feature without telling the user which process used it. My Fedora desktop randomly drops this message, but I would really need this patch to figure out what triggers is. [ jlayton: print pid as well as process name ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30filelock: add a new locks_inode_context accessor functionGravatar Jeff Layton 1-12/+12
There are a number of places in the kernel that are accessing the inode->i_flctx field without smp_load_acquire. This is required to ensure that the caller doesn't see a partially-initialized structure. Add a new accessor function for it to make this clear and convert all of the relevant accesses in locks.c to use it. Also, convert locks_free_lock_context to use the helper as well instead of just doing a "bare" assignment. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-30filelock: new helper: vfs_inode_has_locksGravatar Jeff Layton 1-0/+23
Ceph has a need to know whether a particular inode has any locks set on it. It's currently tracking that by a num_locks field in its filp->private_data, but that's problematic as it tries to decrement this field when releasing locks and that can race with the file being torn down. Add a new vfs_inode_has_locks helper that just returns whether any locks are currently held on the inode. Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-11-17filelock: WARN_ON_ONCE when ->fl_file and filp don't matchGravatar Jeff Layton 1-0/+3
vfs_lock_file, vfs_test_lock and vfs_cancel_lock all take both a struct file argument and a file_lock. The file_lock has a fl_file field in it howevever and it _must_ match the file passed in. While most of the locks.c routines use the separately-passed file argument, some filesystems rely on fl_file being filled out correctly. I'm working on a patch series to remove the redundant argument from these routines, but for now, let's ensure that the callers always set this properly by issuing a WARN_ON_ONCE if they ever don't match. Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Trond Myklebust <trondmy@hammerspace.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-08-17locks: Fix dropped call to ->fl_release_private()Gravatar David Howells 1-0/+1
Prior to commit 4149be7bda7e, sys_flock() would allocate the file_lock struct it was going to use to pass parameters, call ->flock() and then call locks_free_lock() to get rid of it - which had the side effect of calling locks_release_private() and thus ->fl_release_private(). With commit 4149be7bda7e, however, this is no longer the case: the struct is now allocated on the stack, and locks_free_lock() is no longer called - and thus any remaining private data doesn't get cleaned up either. This causes afs flock to cause oops. Kasan catches this as a UAF by the list_del_init() in afs_fl_release_private() for the file_lock record produced by afs_fl_copy_lock() as the original record didn't get delisted. It can be reproduced using the generic/504 xfstest. Fix this by reinstating the locks_release_private() call in sys_flock(). I'm not sure if this would affect any other filesystems. If not, then the release could be done in afs_flock() instead. Changes ======= ver #2) - Don't need to call ->fl_release_private() after calling the security hook, only after calling ->flock(). Fixes: 4149be7bda7e ("fs/lock: Don't allocate file_lock in flock_make_lock().") cc: Chuck Lever <chuck.lever@oracle.com> cc: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/166075758809.3532462.13307935588777587536.stgit@warthog.procyon.org.uk/ # v1 Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-07-18fs/lock: Rearrange ops in flock syscall.Gravatar Kuniyuki Iwashima 1-24/+19
The previous patch added flock_translate_cmd() in flock syscall. The test and the other one for LOCK_MAND do not depend on struct fd and are cheaper, so we can put them at the top and defer fdget() after that. Also, we can remove the unlock variable and use type instead. While at it, we fix this checkpatch error. CHECK: spaces preferred around that '|' (ctx:VxV) #45: FILE: fs/locks.c:2099: + if (type != F_UNLCK && !(f.file->f_mode & (FMODE_READ|FMODE_WRITE))) ^ Finally, we can move the can_sleep part just before we use it. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-07-18fs/lock: Don't allocate file_lock in flock_make_lock().Gravatar Kuniyuki Iwashima 1-31/+15
Two functions, flock syscall and locks_remove_flock(), call flock_make_lock(). It allocates struct file_lock from slab cache if its argument fl is NULL. When we call flock syscall, we pass NULL to allocate memory for struct file_lock. However, we always free it at the end by locks_free_lock(). We need not allocate it and instead should use a local variable as locks_remove_flock() does. Also, the validation for flock_translate_cmd() is not necessary for locks_remove_flock(). So we move the part to flock syscall and make flock_make_lock() return nothing. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-05-19fs/lock: add 2 callbacks to lock_manager_operations to resolve conflictGravatar Dai Ngo 1-3/+30
Add 2 new callbacks, lm_lock_expirable and lm_expire_lock, to lock_manager_operations to allow the lock manager to take appropriate action to resolve the lock conflict if possible. A new field, lm_mod_owner, is also added to lock_manager_operations. The lm_mod_owner is used by the fs/lock code to make sure the lock manager module such as nfsd, is not freed while lock conflict is being resolved. lm_lock_expirable checks and returns true to indicate that the lock conflict can be resolved else return false. This callback must be called with the flc_lock held so it can not block. lm_expire_lock is called to resolve the lock conflict if the returned value from lm_lock_expirable is true. This callback is called without the flc_lock held since it's allowed to block. Upon returning from this callback, the lock conflict should be resolved and the caller is expected to restart the conflict check from the beginnning of the list. Lock manager, such as NFSv4 courteous server, uses this callback to resolve conflict by destroying lock owner, or the NFSv4 courtesy client (client that has expired but allowed to maintains its states) that owns the lock. Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-05-19fs/lock: add helper locks_owner_has_blockers to check for blockersGravatar Dai Ngo 1-0/+28
Add helper locks_owner_has_blockers to check if there is any blockers for a given lockowner. Reviewed-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-01-22fs: move locking sysctls where they are usedGravatar Luis Chamberlain 1-2/+32
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. The locking fs sysctls are only used on fs/locks.c, so move them there. Link: https://lkml.kernel.org/r/20211129205548.605569-7-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Stephen Kitt <steve@sk2.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-19locks: remove changelog commentsGravatar J. Bruce Fields 1-110/+4
This is only of historical interest, and anyone interested in the history can dig out an old version of locks.c from from git. Triggered by the observation that it references the now-removed Documentation/filesystems/mandatory-locking.rst. Reported-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-09-10locks: remove LOCK_MAND flock lock supportGravatar Jeff Layton 1-25/+22
As best I can tell, the logic for these has been broken for a long time (at least before the move to git), such that they never conflict with anything. Also, nothing checks for these flags and prevented opens or read/write behavior on the files. They don't seem to do anything. Given that, we can rip these symbols out of the kernel, and just make flock(2) return 0 when LOCK_MAND is set in order to preserve existing behavior. Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-09-07Revert "memcg: enable accounting for file lock caches"Gravatar Linus Torvalds 1-4/+2
This reverts commit 0f12156dff2862ac54235fc72703f18770769042. The kernel test robot reports a sizeable performance regression for this commit, and while it clearly does the rigth thing in theory, we'll need to look at just how to avoid or minimize the performance overhead of the memcg accounting. People already have suggestions on how to do that, but it's "future work". So revert it for now. Link: https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex-9020/ Acked-by: Jens Axboe <axboe@kernel.dk> Acked-by: Shakeel Butt <shakeelb@google.com> Acked-by: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-03Merge branch 'akpm' (patches from Andrew)Gravatar Linus Torvalds 1-2/+4
Merge misc updates from Andrew Morton: "173 patches. Subsystems affected by this series: ia64, ocfs2, block, and mm (debug, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock, oom-kill, migration, ksm, percpu, vmstat, and madvise)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (173 commits) mm/madvise: add MADV_WILLNEED to process_madvise() mm/vmstat: remove unneeded return value mm/vmstat: simplify the array size calculation mm/vmstat: correct some wrong comments mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() selftests: vm: add COW time test for KSM pages selftests: vm: add KSM merging time test mm: KSM: fix data type selftests: vm: add KSM merging across nodes test selftests: vm: add KSM zero page merging test selftests: vm: add KSM unmerge test selftests: vm: add KSM merge test mm/migrate: correct kernel-doc notation mm: wire up syscall process_mrelease mm: introduce process_mrelease system call memblock: make memblock_find_in_range method private mm/mempolicy.c: use in_task() in mempolicy_slab_node() mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies mm/mempolicy: advertise new MPOL_PREFERRED_MANY mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY ...
2021-09-03memcg: enable accounting for file lock cachesGravatar Vasily Averin 1-2/+4
User can create file locks for each open file and force kernel to allocate small but long-living objects per each open file. It makes sense to account for these objects to limit the host's memory consumption from inside the memcg-limited container. Link: https://lkml.kernel.org/r/b009f4c7-f0ab-c0ec-8e83-918f47d677da@virtuozzo.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrei Vagin <avagin@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roman Gushchin <guro@fb.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Yutian Yang <nglaive@gmail.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-08-23fs: remove mandatory file locking supportGravatar Jeff Layton 1-116/+1
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it off in Fedora and RHEL8. Several other distros have followed suit. I've heard of one problem in all that time: Someone migrated from an older distro that supported "-o mand" to one that didn't, and the host had a fstab entry with "mand" in it which broke on reboot. They didn't actually _use_ mandatory locking so they just removed the mount option and moved on. This patch rips out mandatory locking support wholesale from the kernel, along with the Kconfig option and the Documentation file. It also changes the mount code to ignore the "mand" mount option instead of erroring out, and to throw a big, ugly warning. Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-05-05Merge tag 'nfsd-5.13-1' of ↵Gravatar Linus Torvalds 1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd updates from Chuck Lever: "Additional fixes and clean-ups for NFSD since tags/nfsd-5.13, including a fix to grant read delegations for files open for writing" * tag 'nfsd-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix null pointer dereference in svc_rqst_free() SUNRPC: fix ternary sign expansion bug in tracing nfsd: Fix fall-through warnings for Clang nfsd: grant read delegations to clients holding writes nfsd: reshuffle some code nfsd: track filehandle aliasing in nfs4_files nfsd: hash nfs4_files by inode number nfsd: ensure new clients break delegations nfsd: removed unused argument in nfsd_startup_generic() nfsd: remove unused function svcrdma: Pass a useful error code to the send_err tracepoint svcrdma: Rename goto labels in svc_rdma_sendto() svcrdma: Don't leak send_ctxt on Send errors
2021-04-26Merge tag 'locks-v5.13' of ↵Gravatar Linus Torvalds 1-10/+56
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "When we reworked the blocked locks into a tree structure instead of a flat list a few releases ago, we lost the ability to see all of the file locks in /proc/locks. Luo's patch fixes it to dump out all of the blocked locks instead, which restores the full output. This changes the format of /proc/locks as the blocked locks are shown at multiple levels of indentation now, but lslocks (the only common program I've ID'ed that scrapes this info) seems to be OK with that. Tian also contributed a small patch to remove a useless assignment" * tag 'locks-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs/locks: remove useless assignment in fcntl_getlk fs/locks: print full locks information
2021-04-19nfsd: grant read delegations to clients holding writesGravatar J. Bruce Fields 1-0/+3
It's OK to grant a read delegation to a client that holds a write, as long as it's the only client holding the write. We originally tried to do this in commit 94415b06eb8a ("nfsd4: a client's own opens needn't prevent delegations"), which had to be reverted in commit 6ee65a773096 ("Revert "nfsd4: a client's own opens needn't prevent delegations""). Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-04-13fs/locks: remove useless assignment in fcntl_getlkGravatar Tian Tao 1-1/+0
Function parameter 'cmd' is rewritten with unused value at locks.c Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-03-11fs/locks: print full locks informationGravatar Luo Longjun 1-9/+56
Commit fd7732e033e3 ("fs/locks: create a tree of dependent requests.") has put blocked locks into a tree. So, with a for loop, we can't check all locks information. To solve this problem, we should traverse the tree. Signed-off-by: Luo Longjun <luolongjun@huawei.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-03-09Revert "nfsd4: a client's own opens needn't prevent delegations"Gravatar J. Bruce Fields 1-3/+0
This reverts commit 94415b06eb8aed13481646026dc995f04a3a534a. That commit claimed to allow a client to get a read delegation when it was the only writer. Actually it allowed a client to get a read delegation when *any* client has a write open! The main problem is that it's depending on nfs4_clnt_odstate structures that are actually only maintained for pnfs exports. This causes clients to miss writes performed by other clients, even when there have been intervening closes and opens, violating close-to-open cache consistency. We can do this a different way, but first we should just revert this. I've added pynfs 4.1 test DELEG19 to test for this, as I should have done originally! Cc: stable@vger.kernel.org Reported-by: Timo Rothenpieler <timo@rothenpieler.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-15Merge branch 'exec-for-v5.11' of ↵Gravatar Linus Torvalds 1-6/+8
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull execve updates from Eric Biederman: "This set of changes ultimately fixes the interaction of posix file lock and exec. Fundamentally most of the change is just moving where unshare_files is called during exec, and tweaking the users of files_struct so that the count of files_struct is not unnecessarily played with. Along the way fcheck and related helpers were renamed to more accurately reflect what they do. There were also many other small changes that fell out, as this is the first time in a long time much of this code has been touched. Benchmarks haven't turned up any practical issues but Al Viro has observed a possibility for a lot of pounding on task_lock. So I have some changes in progress to convert put_files_struct to always rcu free files_struct. That wasn't ready for the merge window so that will have to wait until next time" * 'exec-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits) exec: Move io_uring_task_cancel after the point of no return coredump: Document coredump code exclusively used by cell spufs file: Remove get_files_struct file: Rename __close_fd_get_file close_fd_get_file file: Replace ksys_close with close_fd file: Rename __close_fd to close_fd and remove the files parameter file: Merge __alloc_fd into alloc_fd file: In f_dupfd read RLIMIT_NOFILE once. file: Merge __fd_install into fd_install proc/fd: In fdinfo seq_show don't use get_files_struct bpf/task_iter: In task_file_seq_get_next use task_lookup_next_fd_rcu proc/fd: In proc_readfd_common use task_lookup_next_fd_rcu file: Implement task_lookup_next_fd_rcu kcmp: In get_file_raw_ptr use task_lookup_fd_rcu proc/fd: In tid_fd_mode use task_lookup_fd_rcu file: Implement task_lookup_fd_rcu file: Rename fcheck lookup_fd_rcu file: Replace fcheck_files with files_lookup_fd_rcu file: Factor files_lookup_fd_locked out of fcheck_files file: Rename __fcheck_files to files_lookup_fd_raw ...
2020-12-10file: Factor files_lookup_fd_locked out of fcheck_filesGravatar Eric W. Biederman 1-6/+8
To make it easy to tell where files->file_lock protection is being used when looking up a file create files_lookup_fd_locked. Only allow this function to be called with the file_lock held. Update the callers of fcheck and fcheck_files that are called with the files->file_lock held to call files_lookup_fd_locked instead. Hopefully this makes it easier to quickly understand what is going on. The need for better names became apparent in the last round of discussion of this set of changes[1]. [1] https://lkml.kernel.org/r/CAHk-=wj8BQbgJFLa+J0e=iT-1qpmCRTbPAJ8gd6MJQ=kbRPqyQ@mail.gmail.com Link: https://lkml.kernel.org/r/20201120231441.29911-8-ebiederm@xmission.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-10-26locks: fix a typo at a kernel-doc markupGravatar Mauro Carvalho Chehab 1-1/+1
locks_delete_lock -> locks_delete_block Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2020-10-26locks: Fix UBSAN undefined behaviour in flock64_to_posix_lockGravatar Luo Meng 1-1/+1
When the sum of fl->fl_start and l->l_len overflows, UBSAN shows the following warning: UBSAN: Undefined behaviour in fs/locks.c:482:29 signed integer overflow: 2 + 9223372036854775806 cannot be represented in type 'long long int' Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xe4/0x14e lib/dump_stack.c:118 ubsan_epilogue+0xe/0x81 lib/ubsan.c:161 handle_overflow+0x193/0x1e2 lib/ubsan.c:192 flock64_to_posix_lock fs/locks.c:482 [inline] flock_to_posix_lock+0x595/0x690 fs/locks.c:515 fcntl_setlk+0xf3/0xa90 fs/locks.c:2262 do_fcntl+0x456/0xf60 fs/fcntl.c:387 __do_sys_fcntl fs/fcntl.c:483 [inline] __se_sys_fcntl fs/fcntl.c:468 [inline] __x64_sys_fcntl+0x12d/0x180 fs/fcntl.c:468 do_syscall_64+0xc8/0x5a0 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fix it by parenthesizing 'l->l_len - 1'. Signed-off-by: Luo Meng <luomeng12@huawei.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2020-08-23treewide: Use fallthrough pseudo-keywordGravatar Gustavo A. R. Silva 1-3/+3
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-09Merge tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6Gravatar Linus Torvalds 1-0/+3
Pull NFS server updates from Chuck Lever: "Highlights: - Support for user extended attributes on NFS (RFC 8276) - Further reduce unnecessary NFSv4 delegation recalls Notable fixes: - Fix recent krb5p regression - Address a few resource leaks and a rare NULL dereference Other: - De-duplicate RPC/RDMA error handling and other utility functions - Replace storage and display of kernel memory addresses by tracepoints" * tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits) svcrdma: CM event handler clean up svcrdma: Remove transport reference counting svcrdma: Fix another Receive buffer leak SUNRPC: Refresh the show_rqstp_flags() macro nfsd: netns.h: delete a duplicated word SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()") nfsd: avoid a NULL dereference in __cld_pipe_upcall() nfsd4: a client's own opens needn't prevent delegations nfsd: Use seq_putc() in two functions svcrdma: Display chunk completion ID when posting a rw_ctxt svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send() svcrdma: Introduce Send completion IDs svcrdma: Record Receive completion ID in svc_rdma_decode_rqst svcrdma: Introduce Receive completion IDs svcrdma: Introduce infrastructure to support completion IDs svcrdma: Add common XDR encoders for RDMA and Read segments svcrdma: Add common XDR decoders for RDMA and Read segments SUNRPC: Add helpers for decoding list discriminators symbolically svcrdma: Remove declarations for functions long removed svcrdma: Clean up trace_svcrdma_send_failed() tracepoint ...
2020-08-03Merge tag 'filelock-v5.9-1' of ↵Gravatar Linus Torvalds 1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking fix from Jeff Layton: "Just a single, one-line patch to fix an inefficiency in the posix locking code that can lead to it doing more wakeups than necessary" * tag 'filelock-v5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: locks: add locks_move_blocks in posix_lock_inode
2020-07-13nfsd4: a client's own opens needn't prevent delegationsGravatar J. Bruce Fields 1-0/+3
We recently fixed lease breaking so that a client's actions won't break its own delegations. But we still have an unnecessary self-conflict when granting delegations: a client's own write opens will prevent us from handing out a read delegation even when no other client has the file open for write. Fix that by turning off the checks for conflicting opens under vfs_setlease, and instead performing those checks in the nfsd code. We don't depend much on locks here: instead we acquire the delegation, then check for conflicts, and drop the delegation again if we find any. The check beforehand is an optimization of sorts, just to avoid acquiring the delegation unnecessarily. There's a race where the first check could cause us to deny the delegation when we could have granted it. But, that's OK, delegation grants are optional (and probably not even a good idea in that case). Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-06-11Merge tag 'nfsd-5.8' of git://linux-nfs.org/~bfields/linuxGravatar Linus Torvalds 1-0/+3
Pull nfsd updates from Bruce Fields: "Highlights: - Keep nfsd clients from unnecessarily breaking their own delegations. Note this requires a small kthreadd addition. The result is Tejun Heo's suggestion (see link), and he was OK with this going through my tree. - Patch nfsd/clients/ to display filenames, and to fix byte-order when displaying stateid's. - fix a module loading/unloading bug, from Neil Brown. - A big series from Chuck Lever with RPC/RDMA and tracing improvements, and lay some groundwork for RPC-over-TLS" Link: https://lore.kernel.org/r/1588348912-24781-1-git-send-email-bfields@redhat.com * tag 'nfsd-5.8' of git://linux-nfs.org/~bfields/linux: (49 commits) sunrpc: use kmemdup_nul() in gssp_stringify() nfsd: safer handling of corrupted c_type nfsd4: make drc_slab global, not per-net SUNRPC: Remove unreachable error condition in rpcb_getport_async() nfsd: Fix svc_xprt refcnt leak when setup callback client failed sunrpc: clean up properly in gss_mech_unregister() sunrpc: svcauth_gss_register_pseudoflavor must reject duplicate registrations. sunrpc: check that domain table is empty at module unload. NFSD: Fix improperly-formatted Doxygen comments NFSD: Squash an annoying compiler warning SUNRPC: Clean up request deferral tracepoints NFSD: Add tracepoints for monitoring NFSD callbacks NFSD: Add tracepoints to the NFSD state management code NFSD: Add tracepoints to NFSD's duplicate reply cache SUNRPC: svc_show_status() macro should have enum definitions SUNRPC: Restructure svc_udp_recvfrom() SUNRPC: Refactor svc_recvfrom() SUNRPC: Clean up svc_release_skb() functions SUNRPC: Refactor recvfrom path dealing with incomplete TCP receives SUNRPC: Replace dprintk() call sites in TCP receive path ...
2020-06-04Merge branch 'proc-linus' of ↵Gravatar Linus Torvalds 1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull proc updates from Eric Biederman: "This has four sets of changes: - modernize proc to support multiple private instances - ensure we see the exit of each process tid exactly - remove has_group_leader_pid - use pids not tasks in posix-cpu-timers lookup Alexey updated proc so each mount of proc uses a new superblock. This allows people to actually use mount options with proc with no fear of messing up another mount of proc. Given the kernel's internal mounts of proc for things like uml this was a real problem, and resulted in Android's hidepid mount options being ignored and introducing security issues. The rest of the changes are small cleanups and fixes that came out of my work to allow this change to proc. In essence it is swapping the pids in de_thread during exec which removes a special case the code had to handle. Then updating the code to stop handling that special case" * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: proc_pid_ns takes super_block as an argument remove the no longer needed pid_alive() check in __task_pid_nr_ns() posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type posix-cpu-timers: Extend rcu_read_lock removing task_struct references signal: Remove has_group_leader_pid exec: Remove BUG_ON(has_group_leader_pid) posix-cpu-timer: Unify the now redundant code in lookup_task posix-cpu-timer: Tidy up group_leader logic in lookup_task proc: Ensure we see the exit of each process tid exactly once rculist: Add hlists_swap_heads_rcu proc: Use PIDTYPE_TGID in next_tgid Use proc_pid_ns() to get pid_namespace from the proc superblock proc: use named enums for better readability proc: use human-readable values for hidepid docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior proc: add option to mount only a pids subset proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option proc: allow to mount many instances of proc in one pid namespace proc: rename struct proc_fs_info to proc_fs_opts
2020-06-02locks: add locks_move_blocks in posix_lock_inodeGravatar yangerkun 1-0/+1
We forget to call locks_move_blocks in posix_lock_inode when try to process same owner and different types. Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2020-05-19proc: proc_pid_ns takes super_block as an argumentGravatar Alexey Gladkov 1-2/+2
syzbot found that touch /proc/testfile causes NULL pointer dereference at tomoyo_get_local_path() because inode of the dentry is NULL. Before c59f415a7cb6, Tomoyo received pid_ns from proc's s_fs_info directly. Since proc_pid_ns() can only work with inode, using it in the tomoyo_get_local_path() was wrong. To avoid creating more functions for getting proc_ns, change the argument type of the proc_pid_ns() function. Then, Tomoyo can use the existing super_block to get pid_ns. Link: https://lkml.kernel.org/r/0000000000002f0c7505a5b0e04c@google.com Link: https://lkml.kernel.org/r/20200518180738.2939611-1-gladkov.alexey@gmail.com Reported-by: syzbot+c1af344512918c61362c@syzkaller.appspotmail.com Fixes: c59f415a7cb6 ("Use proc_pid_ns() to get pid_namespace from the proc superblock") Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-05-08nfsd: clients don't need to break their own delegationsGravatar J. Bruce Fields 1-0/+3
We currently revoke read delegations on any write open or any operation that modifies file data or metadata (including rename, link, and unlink). But if the delegation in question is the only read delegation and is held by the client performing the operation, that's not really necessary. It's not always possible to prevent this in the NFSv4.0 case, because there's not always a way to determine which client an NFSv4.0 delegation came from. (In theory we could try to guess this from the transport layer, e.g., by assuming all traffic on a given TCP connection comes from the same client. But that's not really correct.) In the NFSv4.1 case the session layer always tells us the client. This patch should remove such self-conflicts in all cases where we can reliably determine the client from the compound. To do that we need to track "who" is performing a given (possibly lease-breaking) file operation. We're doing that by storing the information in the svc_rqst and using kthread_data() to map the current task back to a svc_rqst. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-05-05docs: filesystems: convert mandatory-locking.txt to ReSTGravatar Mauro Carvalho Chehab 1-1/+1
- Add a SPDX header; - Adjust document title; - Some whitespace fixes and new line breaks; - Use notes markups; - Add it to filesystems/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/aecd6259fe9f99b2c2b3440eab6a2b989125e00d.1588021877.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-04-24Use proc_pid_ns() to get pid_namespace from the proc superblockGravatar Alexey Gladkov 1-2/+2
To get pid_namespace from the procfs superblock should be used a special helper. This will avoid errors when s_fs_info will change the type. Link: https://lore.kernel.org/lkml/20200423200316.164518-3-gladkov.alexey@gmail.com/ Link: https://lore.kernel.org/lkml/20200423112858.95820-1-gladkov.alexey@gmail.com/ Link: https://lore.kernel.org/lkml/06B50A1C-406F-4057-BFA8-3A7729EA7469@lca.pw/ Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2020-03-18locks: reinstate locks_delete_block optimizationGravatar Linus Torvalds 1-6/+48
There is measurable performance impact in some synthetic tests due to commit 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter). Fix the race condition instead by clearing the fl_blocker pointer after the wake_up, using explicit acquire/release semantics. This does mean that we can no longer use the clearing of fl_blocker as the wait condition, so switch the waiters over to checking whether the fl_blocked_member list_head is empty. Reviewed-by: yangerkun <yangerkun@huawei.com> Reviewed-by: NeilBrown <neilb@suse.de> Fixes: 6d390e4b5d48 (locks: fix a potential use-after-free problem when wakeup a waiter) Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-06locks: fix a potential use-after-free problem when wakeup a waiterGravatar yangerkun 1-14/+0
'16306a61d3b7 ("fs/locks: always delete_block after waiting.")' add the logic to check waiter->fl_blocker without blocked_lock_lock. And it will trigger a UAF when we try to wakeup some waiter: Thread 1 has create a write flock a on file, and now thread 2 try to unlock and delete flock a, thread 3 try to add flock b on the same file. Thread2 Thread3 flock syscall(create flock b) ...flock_lock_inode_wait flock_lock_inode(will insert our fl_blocked_member list to flock a's fl_blocked_requests) sleep flock syscall(unlock) ...flock_lock_inode_wait locks_delete_lock_ctx ...__locks_wake_up_blocks __locks_delete_blocks( b->fl_blocker = NULL) ... break by a signal locks_delete_block b->fl_blocker == NULL && list_empty(&b->fl_blocked_requests) success, return directly locks_free_lock b wake_up(&b->fl_waiter) trigger UAF Fix it by remove this logic, and this patch may also fix CVE-2019-19769. Cc: stable@vger.kernel.org Fixes: 16306a61d3b7 ("fs/locks: always delete_block after waiting.") Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2019-12-29locks: print unsigned ino in /proc/locksGravatar Amir Goldstein 1-1/+1
An ino is unsigned, so display it as such in /proc/locks. Cc: stable@vger.kernel.org Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2019-09-27Merge tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linuxGravatar Linus Torvalds 1-0/+62
Pull nfsd updates from Bruce Fields: "Highlights: - Add a new knfsd file cache, so that we don't have to open and close on each (NFSv2/v3) READ or WRITE. This can speed up read and write in some cases. It also replaces our readahead cache. - Prevent silent data loss on write errors, by treating write errors like server reboots for the purposes of write caching, thus forcing clients to resend their writes. - Tweak the code that allocates sessions to be more forgiving, so that NFSv4.1 mounts are less likely to hang when a server already has a lot of clients. - Eliminate an arbitrary limit on NFSv4 ACL sizes; they should now be limited only by the backend filesystem and the maximum RPC size. - Allow the server to enforce use of the correct kerberos credentials when a client reclaims state after a reboot. And some miscellaneous smaller bugfixes and cleanup" * tag 'nfsd-5.4' of git://linux-nfs.org/~bfields/linux: (34 commits) sunrpc: clean up indentation issue nfsd: fix nfs read eof detection nfsd: Make nfsd_reset_boot_verifier_locked static nfsd: degraded slot-count more gracefully as allocation nears exhaustion. nfsd: handle drc over-allocation gracefully. nfsd: add support for upcall version 2 nfsd: add a "GetVersion" upcall for nfsdcld nfsd: Reset the boot verifier on all write I/O errors nfsd: Don't garbage collect files that might contain write errors nfsd: Support the server resetting the boot verifier nfsd: nfsd_file cache entries should be per net namespace nfsd: eliminate an unnecessary acl size limit Deprecate nfsd fault injection nfsd: remove duplicated include from filecache.c nfsd: Fix the documentation for svcxdr_tmpalloc() nfsd: Fix up some unused variable warnings nfsd: close cached files prior to a REMOVE or RENAME that would replace target nfsd: rip out the raparms cache nfsd: have nfsd_test_lock use the nfsd_file cache nfsd: hook up nfs4_preprocess_stateid_op to the nfsd_file cache ...
2019-08-20locks: fix a memory leak bug in __break_lease()Gravatar Wenwen Wang 1-1/+2
In __break_lease(), the file lock 'new_fl' is allocated in lease_alloc(). However, it is not deallocated in the following execution if smp_load_acquire() fails, leading to a memory leak bug. To fix this issue, free 'new_fl' before returning the error. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2019-08-19nfsd: convert fi_deleg_file and ls_file fields to nfsd_fileGravatar Jeff Layton 1-0/+1
Have them keep an nfsd_file reference instead of a struct file. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-08-19locks: create a new notifier chain for lease attemptsGravatar Jeff Layton 1-0/+61
With the new file caching infrastructure in nfsd, we can end up holding files open for an indefinite period of time, even when they are still idle. This may prevent the kernel from handing out leases on the file, which is something we don't want to block. Fix this by running a SRCU notifier call chain whenever on any lease attempt. nfsd can then purge the cache for that inode before returning. Since SRCU is only conditionally compiled in, we must only define the new chain if it's enabled, and users of the chain must ensure that SRCU is enabled. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>