aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2022-08-17stmmac: intel: remove unused 'has_crossts' flagGravatar Wong Vee Khee 1-1/+0
The 'has_crossts' flag was not used anywhere in the stmmac driver, removing it from both header file and dwmac-intel driver. Signed-off-by: Wong Vee Khee <veekhee@apple.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20220817064324.10025-1-veekhee@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextGravatar Jakub Kicinski 4-10/+32
Andrii Nakryiko says: ==================== bpf-next 2022-08-17 We've added 45 non-merge commits during the last 14 day(s) which contain a total of 61 files changed, 986 insertions(+), 372 deletions(-). The main changes are: 1) New bpf_ktime_get_tai_ns() BPF helper to access CLOCK_TAI, from Kurt Kanzenbach and Jesper Dangaard Brouer. 2) Few clean ups and improvements for libbpf 1.0, from Andrii Nakryiko. 3) Expose crash_kexec() as kfunc for BPF programs, from Artem Savkov. 4) Add ability to define sleepable-only kfuncs, from Benjamin Tissoires. 5) Teach libbpf's bpf_prog_load() and bpf_map_create() to gracefully handle unsupported names on old kernels, from Hangbin Liu. 6) Allow opting out from auto-attaching BPF programs by libbpf's BPF skeleton, from Hao Luo. 7) Relax libbpf's requirement for shared libs to be marked executable, from Henqgi Chen. 8) Improve bpf_iter internals handling of error returns, from Hao Luo. 9) Few accommodations in libbpf to support GCC-BPF quirks, from James Hilliard. 10) Fix BPF verifier logic around tracking dynptr ref_obj_id, from Joanne Koong. 11) bpftool improvements to handle full BPF program names better, from Manu Bretelle. 12) bpftool fixes around libcap use, from Quentin Monnet. 13) BPF map internals clean ups and improvements around memory allocations, from Yafang Shao. 14) Allow to use cgroup_get_from_file() on cgroupv1, allowing BPF cgroup iterator to work on cgroupv1, from Yosry Ahmed. 15) BPF verifier internal clean ups, from Dave Marchevsky and Joanne Koong. 16) Various fixes and clean ups for selftests/bpf and vmtest.sh, from Daniel Xu, Artem Savkov, Joanne Koong, Andrii Nakryiko, Shibin Koikkara Reeny. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits) selftests/bpf: Few fixes for selftests/bpf built in release mode libbpf: Clean up deprecated and legacy aliases libbpf: Streamline bpf_attr and perf_event_attr initialization libbpf: Fix potential NULL dereference when parsing ELF selftests/bpf: Tests libbpf autoattach APIs libbpf: Allows disabling auto attach selftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm libbpf: Making bpf_prog_load() ignore name if kernel doesn't support selftests/bpf: Update CI kconfig selftests/bpf: Add connmark read test selftests/bpf: Add existing connection bpf_*_ct_lookup() test bpftool: Clear errno after libcap's checks bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation bpftool: Fix a typo in a comment libbpf: Add names for auxiliary maps bpf: Use bpf_map_area_alloc consistently on bpf map creation bpf: Make __GFP_NOWARN consistent in bpf map creation bpf: Use bpf_map_area_free instread of kvfree bpf: Remove unneeded memset in queue_stack_map creation libbpf: preserve errno across pr_warn/pr_info/pr_debug ... ==================== Link: https://lore.kernel.org/r/20220817215656.1180215-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: phy: broadcom: Implement suspend/resume for AC131 and BCM5241Gravatar Florian Fainelli 1-0/+1
Implement the suspend/resume procedure for the Broadcom AC131 and BCM5241 type of PHYs (10/100 only) by entering the standard power down followed by the proprietary standby mode in the auxiliary mode 4 shadow register. On resume, the PHY software reset is enough to make it come out of standby mode so we can utilize brcm_fet_config_init() as the resume hook. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-16net: sched: remove the unused return value of unregister_qdiscGravatar Zhengchao Shao 1-1/+1
Return value of unregister_qdisc is unused, remove it. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220815030417.271894-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-15bpf: Clear up confusion in bpf_skb_adjust_room()'s documentationGravatar Quentin Monnet 1-2/+4
Adding or removing room space _below_ layers 2 or 3, as the description mentions, is ambiguous. This was written with a mental image of the packet with layer 2 at the top, layer 3 under it, and so on. But it has led users to believe that it was on lower layers (before the beginning of the L2 and L3 headers respectively). Let's make it more explicit, and specify between which layers the room space is adjusted. Reported-by: Rumen Telbizov <rumen.telbizov@menlosecurity.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220812153727.224500-3-quentin@isovalent.com
2022-08-11Merge tag 'net-6.0-rc1' of ↵Gravatar Linus Torvalds 12-35/+139
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf, can and netfilter. A little larger than usual but it's all fixes, no late features. It's large partially because of timing, and partially because of follow ups to stuff that got merged a week or so before the merge window and wasn't as widely tested. Maybe the Bluetooth fixes are a little alarming so we'll address that, but the rest seems okay and not scary. Notably we're including a fix for the netfilter Kconfig [1], your WiFi warning [2] and a bluetooth fix which should unblock syzbot [3]. Current release - regressions: - Bluetooth: - don't try to cancel uninitialized works [3] - L2CAP: fix use-after-free caused by l2cap_chan_put - tls: rx: fix device offload after recent rework - devlink: fix UAF on failed reload and leftover locks in mlxsw Current release - new code bugs: - netfilter: - flowtable: fix incorrect Kconfig dependencies [1] - nf_tables: fix crash when nf_trace is enabled - bpf: - use proper target btf when exporting attach_btf_obj_id - arm64: fixes for bpf trampoline support - Bluetooth: - ISO: unlock on error path in iso_sock_setsockopt() - ISO: fix info leak in iso_sock_getsockopt() - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP - ISO: fix memory corruption on iso_pinfo.base - ISO: fix not using the correct QoS - hci_conn: fix updating ISO QoS PHY - phy: dp83867: fix get nvmem cell fail Previous releases - regressions: - wifi: cfg80211: fix validating BSS pointers in __cfg80211_connect_result [2] - atm: bring back zatm uAPI after ATM had been removed - properly fix old bug making bonding ARP monitor mode not being able to work with software devices with lockless Tx - tap: fix null-deref on skb->dev in dev_parse_header_protocol - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some devices and breaks others - netfilter: - nf_tables: many fixes rejecting cross-object linking which may lead to UAFs - nf_tables: fix null deref due to zeroed list head - nf_tables: validate variable length element extension - bgmac: fix a BUG triggered by wrong bytes_compl - bcmgenet: indicate MAC is in charge of PHY PM Previous releases - always broken: - bpf: - fix bad pointer deref in bpf_sys_bpf() injected via test infra - disallow non-builtin bpf programs calling the prog_run command - don't reinit map value in prealloc_lru_pop - fix UAFs during the read of map iterator fd - fix invalidity check for values in sk local storage map - reject sleepable program for non-resched map iterator - mptcp: - move subflow cleanup in mptcp_destroy_common() - do not queue data on closed subflows - virtio_net: fix memory leak inside XDP_TX with mergeable - vsock: fix memory leak when multiple threads try to connect() - rework sk_user_data sharing to prevent psock leaks - geneve: fix TOS inheriting for ipv4 - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel - phy: c45 baset1: do not skip aneg configuration if clock role is not specified - rose: avoid overflow when /proc displays timer information - x25: fix call timeouts in blocking connects - can: mcp251x: fix race condition on receive interrupt - can: j1939: - replace user-reachable WARN_ON_ONCE() with netdev_warn_once() - fix memory leak of skbs in j1939_session_destroy() Misc: - docs: bpf: clarify that many things are not uAPI - seg6: initialize induction variable to first valid array index (to silence clang vs objtool warning) - can: ems_usb: fix clang 14's -Wunaligned-access warning" * tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits) net: atm: bring back zatm uAPI dpaa2-eth: trace the allocated address instead of page struct net: add missing kdoc for struct genl_multicast_group::flags nfp: fix use-after-free in area_cache_get() MAINTAINERS: use my korg address for mt7601u mlxsw: minimal: Fix deadlock in ports creation bonding: fix reference count leak in balance-alb mode net: usb: qmi_wwan: Add support for Cinterion MV32 bpf: Shut up kern_sys_bpf warning. net/tls: Use RCU API to access tls_ctx->netdev tls: rx: device: don't try to copy too much on detach tls: rx: device: bound the frag walk net_sched: cls_route: remove from list when handle is 0 selftests: forwarding: Fix failing tests with old libnet net: refactor bpf_sk_reuseport_detach() net: fix refcount bug in sk_psock_get (2) selftests/bpf: Ensure sleepable program is rejected by hash map iter selftests/bpf: Add write tests for sk local storage map iterator selftests/bpf: Add tests for reading a dangling map iter fd bpf: Only allow sleepable program for resched-able iterator ...
2022-08-11Merge tag 'acpi-5.20-rc1-2' of ↵Gravatar Linus Torvalds 2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "These fix up direct references to the fwnode field in struct device and extend ACPI device properties support. Specifics: - Replace direct references to the fwnode field in struct device with dev_fwnode() and device_match_fwnode() (Andy Shevchenko) - Make the ACPI code handling device properties support properties with buffer values (Sakari Ailus)" * tag 'acpi-5.20-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: property: Fix error handling in acpi_init_properties() ACPI: VIOT: Do not dereference fwnode in struct device ACPI: property: Read buffer properties as integers ACPI: property: Add support for parsing buffer property UUID ACPI: property: Unify integer value reading functions ACPI: property: Switch node property referencing from ifs to a switch ACPI: property: Move property ref argument parsing into a new function ACPI: property: Use acpi_object_type consistently in property ref parsing ACPI: property: Tie data nodes to acpi handles ACPI: property: Return type of acpi_add_nondev_subnodes() should be bool
2022-08-11Merge tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxGravatar Linus Torvalds 1-3/+0
Pull more iomap updates from Darrick Wong: "In the past 10 days or so I've not heard any ZOMG STOP style complaints about removing ->writepage support from gfs2 or zonefs, so here's the pull request removing them (and the underlying fs iomap support) from the kernel: - Remove iomap_writepage and all callers, since the mm apparently never called the zonefs or gfs2 writepage functions" * tag 'iomap-6.0-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: remove iomap_writepage zonefs: remove ->writepage gfs2: remove ->writepage gfs2: stop using generic_writepages in gfs2_ail1_start_one
2022-08-11Merge tag 'ceph-for-5.20-rc1' of https://github.com/ceph/ceph-clientGravatar Linus Torvalds 6-7/+24
Pull ceph updates from Ilya Dryomov: "We have a good pile of various fixes and cleanups from Xiubo, Jeff, Luis and others, almost exclusively in the filesystem. Several patches touch files outside of our normal purview to set the stage for bringing in Jeff's long awaited ceph+fscrypt series in the near future. All of them have appropriate acks and sat in linux-next for a while" * tag 'ceph-for-5.20-rc1' of https://github.com/ceph/ceph-client: (27 commits) libceph: clean up ceph_osdc_start_request prototype libceph: fix ceph_pagelist_reserve() comment typo ceph: remove useless check for the folio ceph: don't truncate file in atomic_open ceph: make f_bsize always equal to f_frsize ceph: flush the dirty caps immediatelly when quota is approaching libceph: print fsid and epoch with osd id libceph: check pointer before assigned to "c->rules[]" ceph: don't get the inline data for new creating files ceph: update the auth cap when the async create req is forwarded ceph: make change_auth_cap_ses a global symbol ceph: fix incorrect old_size length in ceph_mds_request_args ceph: switch back to testing for NULL folio->private in ceph_dirty_folio ceph: call netfs_subreq_terminated with was_async == false ceph: convert to generic_file_llseek ceph: fix the incorrect comment for the ceph_mds_caps struct ceph: don't leak snap_rwsem in handle_cap_grant ceph: prevent a client from exceeding the MDS maximum xattr size ceph: choose auth MDS for getxattr with the Xs caps ceph: add session already open notify support ...
2022-08-11Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmGravatar Linus Torvalds 1-1/+1
Pull more kvm updates from Paolo Bonzini: - Xen timer fixes - Documentation formatting fixes - Make rseq selftest compatible with glibc-2.35 - Fix handling of illegal LEA reg, reg - Cleanup creation of debugfs entries - Fix steal time cache handling bug - Fixes for MMIO caching - Optimize computation of number of LBRs - Fix uninitialized field in guest_maxphyaddr < host_maxphyaddr path * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits) KVM: x86/MMU: properly format KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability table Documentation: KVM: extend KVM_CAP_VM_DISABLE_NX_HUGE_PAGES heading underline KVM: VMX: Adjust number of LBR records for PERF_CAPABILITIES at refresh KVM: VMX: Use proper type-safe functions for vCPU => LBRs helpers KVM: x86: Refresh PMU after writes to MSR_IA32_PERF_CAPABILITIES KVM: selftests: Test all possible "invalid" PERF_CAPABILITIES.LBR_FMT vals KVM: selftests: Use getcpu() instead of sched_getcpu() in rseq_test KVM: selftests: Make rseq compatible with glibc-2.35 KVM: Actually create debugfs in kvm_create_vm() KVM: Pass the name of the VM fd to kvm_create_vm_debugfs() KVM: Get an fd before creating the VM KVM: Shove vcpu stats_id init into kvm_vcpu_init() KVM: Shove vm stats_id init into kvm_create_vm() KVM: x86/mmu: Add sanity check that MMIO SPTE mask doesn't overlap gen KVM: x86/mmu: rename trace function name for asynchronous page fault KVM: x86/xen: Stop Xen timer before changing IRQ KVM: x86/xen: Initialize Xen timer only once KVM: SVM: Disable SEV-ES support if MMIO caching is disable KVM: x86/mmu: Fully re-evaluate MMIO caching when SPTE masks change KVM: x86: Tag kvm_mmu_x86_module_init() with __init ...
2022-08-11net: atm: bring back zatm uAPIGravatar Jakub Kicinski 1-0/+47
Jiri reports that linux-atm does not build without this header. Bring it back. It's completely dead code but we can't break the build for user space :( Reported-by: Jiri Slaby <jirislaby@kernel.org> Fixes: 052e1f01bfae ("net: atm: remove support for ZeitNet ZN122x ATM devices") Link: https://lore.kernel.org/all/8576aef3-37e4-8bae-bab5-08f82a78efd3@kernel.org/ Link: https://lore.kernel.org/r/20220810164547.484378-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-11Merge branch 'acpi-properties'Gravatar Rafael J. Wysocki 2-2/+3
Merge changes adding support for device properties with buffer values to the ACPI device properties handling code. * acpi-properties: ACPI: property: Fix error handling in acpi_init_properties() ACPI: property: Read buffer properties as integers ACPI: property: Add support for parsing buffer property UUID ACPI: property: Unify integer value reading functions ACPI: property: Switch node property referencing from ifs to a switch ACPI: property: Move property ref argument parsing into a new function ACPI: property: Use acpi_object_type consistently in property ref parsing ACPI: property: Tie data nodes to acpi handles ACPI: property: Return type of acpi_add_nondev_subnodes() should be bool
2022-08-11net: add missing kdoc for struct genl_multicast_group::flagsGravatar Jakub Kicinski 1-2/+3
Multicast group flags were added in commit 4d54cc32112d ("mptcp: avoid lock_fast usage in accept path"), but it missed adding the kdoc. Mention which flags go into that field, and do the same for op structs. Link: https://lore.kernel.org/r/20220809232012.403730-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10net/tls: Use RCU API to access tls_ctx->netdevGravatar Maxim Mikityanskiy 1-1/+1
Currently, tls_device_down synchronizes with tls_device_resync_rx using RCU, however, the pointer to netdev is stored using WRITE_ONCE and loaded using READ_ONCE. Although such approach is technically correct (rcu_dereference is essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store NULL), using special RCU helpers for pointers is more valid, as it includes additional checks and might change the implementation transparently to the callers. Mark the netdev pointer as __rcu and use the correct RCU helpers to access it. For non-concurrent access pass the right conditions that guarantee safe access (locks taken, refcount value). Also use the correct helper in mlx5e, where even READ_ONCE was missing. The transition to RCU exposes existing issues, fixed by this commit: 1. bond_tls_device_xmit could read netdev twice, and it could become NULL the second time, after the NULL check passed. 2. Drivers shouldn't stop processing the last packet if tls_device_down just set netdev to NULL, before tls_dev_del was called. This prevents a possible packet drop when transitioning to the fallback software mode. Fixes: 89df6a810470 ("net/bonding: Implement TLS TX device offload") Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfGravatar Jakub Kicinski 2-2/+10
Daniel Borkmann says: ==================== bpf 2022-08-10 We've added 23 non-merge commits during the last 7 day(s) which contain a total of 19 files changed, 424 insertions(+), 35 deletions(-). The main changes are: 1) Several fixes for BPF map iterator such as UAFs along with selftests, from Hou Tao. 2) Fix BPF syscall program's {copy,strncpy}_from_bpfptr() to not fault, from Jinghao Jia. 3) Reject BPF syscall programs calling BPF_PROG_RUN, from Alexei Starovoitov and YiFei Zhu. 4) Fix attach_btf_obj_id info to pick proper target BTF, from Stanislav Fomichev. 5) BPF design Q/A doc update to clarify what is not stable ABI, from Paul E. McKenney. 6) Fix BPF map's prealloc_lru_pop to not reinitialize, from Kumar Kartikeya Dwivedi. 7) Fix bpf_trampoline_put to avoid leaking ftrace hash, from Jiri Olsa. 8) Fix arm64 JIT to address sparse errors around BPF trampoline, from Xu Kuohai. 9) Fix arm64 JIT to use kvcalloc instead of kcalloc for internal program address offset buffer, from Aijun Sun. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits) selftests/bpf: Ensure sleepable program is rejected by hash map iter selftests/bpf: Add write tests for sk local storage map iterator selftests/bpf: Add tests for reading a dangling map iter fd bpf: Only allow sleepable program for resched-able iterator bpf: Check the validity of max_rdwr_access for sock local storage map iterator bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator bpf: Acquire map uref in .init_seq_private for sock local storage map iterator bpf: Acquire map uref in .init_seq_private for hash map iterator bpf: Acquire map uref in .init_seq_private for array map iterator bpf: Disallow bpf programs call prog_run command. bpf, arm64: Fix bpf trampoline instruction endianness selftests/bpf: Add test for prealloc_lru_pop bug bpf: Don't reinit map value in prealloc_lru_pop bpf: Allow calling bpf_prog_test kfuncs in tracing programs bpf, arm64: Allocate program buffer using kvcalloc instead of kcalloc selftests/bpf: Excercise bpf_obj_get_info_by_fd for bpf2bpf bpf: Use proper target btf when exporting attach_btf_obj_id mptcp, btf: Add struct mptcp_sock definition when CONFIG_MPTCP is disabled bpf: Cleanup ftrace hash in bpf_trampoline_put BPF: Fix potential bad pointer dereference in bpf_sys_bpf() ... ==================== Link: https://lore.kernel.org/r/20220810190624.10748-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10net: fix refcount bug in sk_psock_get (2)Gravatar Hawkins Jiawei 2-21/+50
Syzkaller reports refcount bug as follows: ------------[ cut here ]------------ refcount_t: saturated; leaking memory. WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19 Modules linked in: CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0 <TASK> __refcount_add_not_zero include/linux/refcount.h:163 [inline] __refcount_inc_not_zero include/linux/refcount.h:227 [inline] refcount_inc_not_zero include/linux/refcount.h:245 [inline] sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439 tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091 tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983 tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057 tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659 tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682 sk_backlog_rcv include/net/sock.h:1061 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2849 release_sock+0x54/0x1b0 net/core/sock.c:3404 inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909 __sys_shutdown_sock net/socket.c:2331 [inline] __sys_shutdown_sock net/socket.c:2325 [inline] __sys_shutdown+0xf1/0x1b0 net/socket.c:2343 __do_sys_shutdown net/socket.c:2351 [inline] __se_sys_shutdown net/socket.c:2349 [inline] __x64_sys_shutdown+0x50/0x70 net/socket.c:2349 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> During SMC fallback process in connect syscall, kernel will replaces TCP with SMC. In order to forward wakeup smc socket waitqueue after fallback, kernel will sets clcsk->sk_user_data to origin smc socket in smc_fback_replace_callbacks(). Later, in shutdown syscall, kernel will calls sk_psock_get(), which treats the clcsk->sk_user_data as psock type, triggering the refcnt warning. So, the root cause is that smc and psock, both will use sk_user_data field. So they will mismatch this field easily. This patch solves it by using another bit(defined as SK_USER_DATA_PSOCK) in PTRMASK, to mark whether sk_user_data points to a psock object or not. This patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data pointer on clone if tagged"). For there will possibly be more flags in the sk_user_data field, this patch also refactor sk_user_data flags code to be more generic to improve its maintainability. Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com Suggested-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-10Merge tag 'nfs-for-5.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsGravatar Linus Torvalds 9-8/+54
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - pNFS/flexfiles: Fix infinite looping when the RDMA connection errors out Bugfixes: - NFS: fix port value parsing - SUNRPC: Reinitialise the backchannel request buffers before reuse - SUNRPC: fix expiry of auth creds - NFSv4: Fix races in the legacy idmapper upcall - NFS: O_DIRECT fixes from Jeff Layton - NFSv4.1: Fix OP_SEQUENCE error handling - SUNRPC: Fix an RPC/RDMA performance regression - NFS: Fix case insensitive renames - NFSv4/pnfs: Fix a use-after-free bug in open - NFSv4.1: RECLAIM_COMPLETE must handle EACCES Features: - NFSv4.1: session trunking enhancements - NFSv4.2: READ_PLUS performance optimisations - NFS: relax the rules for rsize/wsize mount options - NFS: don't unhash dentry during unlink/rename - SUNRPC: Fail faster on bad verifier - NFS/SUNRPC: Various tracing improvements" * tag 'nfs-for-5.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits) NFS: Improve readpage/writepage tracing NFS: Improve O_DIRECT tracing NFS: Improve write error tracing NFS: don't unhash dentry during unlink/rename NFSv4/pnfs: Fix a use-after-free bug in open NFS: nfs_async_write_reschedule_io must not recurse into the writeback code SUNRPC: Don't reuse bvec on retransmission of the request SUNRPC: Reinitialise the backchannel request buffers before reuse NFSv4.1: RECLAIM_COMPLETE must handle EACCES NFSv4.1 probe offline transports for trunking on session creation SUNRPC create a function that probes only offline transports SUNRPC export xprt_iter_rewind function SUNRPC restructure rpc_clnt_setup_test_and_add_xprt NFSv4.1 remove xprt from xprt_switch if session trunking test fails SUNRPC create an rpc function that allows xprt removal from rpc_clnt SUNRPC enable back offline transports in trunking discovery SUNRPC create an iterator to list only OFFLINE xprts NFSv4.1 offline trunkable transports on DESTROY_SESSION SUNRPC add function to offline remove trunkable transports SUNRPC expose functions for offline remote xprt functionality ...
2022-08-10KVM: x86/mmu: rename trace function name for asynchronous page faultGravatar Mingwei Zhang 1-1/+1
Rename the tracepoint function from trace_kvm_async_pf_doublefault() to trace_kvm_async_pf_repeated_fault() to make it clear, since double fault has nothing to do with this trace function. Asynchronous Page Fault (APF) is an artifact generated by KVM when it cannot find a physical page to satisfy an EPT violation. KVM uses APF to tell the guest OS to do something else such as scheduling other guest processes to make forward progress. However, when another guest process also touches a previously APFed page, KVM halts the vCPU instead of generating a repeated APF to avoid wasting cycles. Double fault (#DF) clearly has a different meaning and a different consequence when triggered. #DF requires two nested contributory exceptions instead of two page faults faulting at the same address. A prevous bug on APF indicates that it may trigger a double fault in the guest [1] and clearly this trace function has nothing to do with it. So rename this function should be a valid choice. No functional change intended. [1] https://www.spinics.net/lists/kvm/msg214957.html Signed-off-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20220807052141.69186-1-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-10Merge tag 'mm-stable-2022-08-09' of ↵Gravatar Linus Torvalds 7-47/+39
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull remaining MM updates from Andrew Morton: "Three patch series - two that perform cleanups and one feature: - hugetlb_vmemmap cleanups from Muchun Song - hardware poisoning support for 1GB hugepages, from Naoya Horiguchi - highmem documentation fixups from Fabio De Francesco" * tag 'mm-stable-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits) Documentation/mm: add details about kmap_local_page() and preemption highmem: delete a sentence from kmap_local_page() kdocs Documentation/mm: rrefer kmap_local_page() and avoid kmap() Documentation/mm: avoid invalid use of addresses from kmap_local_page() Documentation/mm: don't kmap*() pages which can't come from HIGHMEM highmem: specify that kmap_local_page() is callable from interrupts highmem: remove unneeded spaces in kmap_local_page() kdocs mm, hwpoison: enable memory error handling on 1GB hugepage mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage mm, hwpoison: make __page_handle_poison returns int mm, hwpoison: set PG_hwpoison for busy hugetlb pages mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage mm, hwpoison, hugetlb: support saving mechanism of raw error pages mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages() mm: hugetlb_vmemmap: use PTRS_PER_PTE instead of PMD_SIZE / PAGE_SIZE mm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability mm: hugetlb_vmemmap: replace early_param() with core_param() mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c ...
2022-08-10Merge tag 'cxl-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlGravatar Linus Torvalds 6-1/+130
Pull cxl updates from Dan Williams: "Compute Express Link (CXL) updates for 6.0: - Introduce a 'struct cxl_region' object with support for provisioning and assembling persistent memory regions. - Introduce alloc_free_mem_region() to accompany the existing request_free_mem_region() as a method to allocate physical memory capacity out of an existing resource. - Export insert_resource_expand_to_fit() for the CXL subsystem to late-publish CXL platform windows in iomem_resource. - Add a polled mode PCI DOE (Data Object Exchange) driver service and use it in cxl_pci to retrieve the CDAT (Coherent Device Attribute Table)" * tag 'cxl-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (74 commits) cxl/hdm: Fix skip allocations vs multiple pmem allocations cxl/region: Disallow region granularity != window granularity cxl/region: Fix x1 interleave to greater than x1 interleave routing cxl/region: Move HPA setup to cxl_region_attach() cxl/region: Fix decoder interleave programming Documentation: cxl: remove dangling kernel-doc reference cxl/region: describe targets and nr_targets members of cxl_region_params cxl/regions: add padding for cxl_rr_ep_add nested lists cxl/region: Fix IS_ERR() vs NULL check cxl/region: Fix region reference target accounting cxl/region: Fix region commit uninitialized variable warning cxl/region: Fix port setup uninitialized variable warnings cxl/region: Stop initializing interleave granularity cxl/hdm: Fix DPA reservation vs cxl_endpoint_decoder lifetime cxl/acpi: Minimize granularity for x1 interleaves cxl/region: Delete 'region' attribute from root decoders cxl/acpi: Autoload driver for 'cxl_acpi' test devices cxl/region: decrement ->nr_targets on error in cxl_region_attach() cxl/region: prevent underflow in ways_to_cxl() cxl/region: uninitialized variable in alloc_hpa() ...
2022-08-10Merge tag 'pinctrl-v6.0-1' of ↵Gravatar Linus Torvalds 5-2/+50
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control updates from Linus Walleij: "Outside the pinctrl driver and DT bindings we hit some Arm DT files, patched by the maintainers. Other than that it is business as usual. Core changes: - Add PINCTRL_PINGROUP() helper macro (and use it in the AMD driver). New drivers: - Intel Meteor Lake support. - Reneasas RZ/V2M and r8a779g0 (R-Car V4H). - AXP209 variants AXP221, AXP223 and AXP809. - Qualcomm MSM8909, PM8226, PMP8074 and SM6375. - Allwinner D1. Improvements: - Proper pin multiplexing in the AMD driver. - Mediatek MT8192 can use generic drive strength and pin bias, then fixes on top plus some I2C pin group fixes. - Have the Allwinner Sunplus SP7021 use the generic DT schema and make interrupts optional. - Handle Qualcomm SC7280 ADSP. - Handle Qualcomm MSM8916 CAMSS GP clock muxing. - High impedance bias on ZynqMP. - Serialize StarFive access to MMIO. - Immutable gpiochip for BCM2835, Ingenic, Qualcomm SPMI GPIO" * tag 'pinctrl-v6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (117 commits) dt-bindings: pinctrl: qcom,pmic-gpio: add PM8226 constraints pinctrl: qcom: Make PINCTRL_SM8450 depend on PINCTRL_MSM pinctrl: qcom: sm8250: Fix PDC map pinctrl: amd: Fix an unused variable dt-bindings: pinctrl: mt8186: Add and use drive-strength-microamp dt-bindings: pinctrl: mt8186: Add gpio-line-names property ARM: dts: imxrt1170-pinfunc: Add pinctrl binding header pinctrl: amd: Use unicode for debugfs output pinctrl: amd: Fix newline declaration in debugfs output pinctrl: at91: Fix typo 'the the' in comment dt-bindings: pinctrl: st,stm32: Correct 'resets' property name pinctrl: mvebu: Missing a blank line after declarations. pinctrl: qcom: Add SM6375 TLMM driver dt-bindings: pinctrl: Add DT schema for SM6375 TLMM dt-bindings: pinctrl: mt8195: Use drive-strength-microamp in examples Revert "pinctrl: qcom: spmi-gpio: make the irqchip immutable" pinctrl: imx93: Add MODULE_DEVICE_TABLE() pinctrl: sunxi: Add driver for Allwinner D1 pinctrl: sunxi: Make some layout parameters dynamic pinctrl: sunxi: Refactor register/offset calculation ...
2022-08-10bpf: add destructive kfunc flagGravatar Artem Savkov 1-1/+2
Add KF_DESTRUCTIVE flag for destructive functions. Functions with this flag set will require CAP_SYS_BOOT capabilities. Signed-off-by: Artem Savkov <asavkov@redhat.com> Link: https://lore.kernel.org/r/20220810065905.475418-2-asavkov@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-10genetlink: correct uAPI definesGravatar Jakub Kicinski 1-2/+3
Commit 50a896cf2d6f ("genetlink: properly support per-op policy dumping") seems to have copy'n'pasted things a little incorrectly. The #define CTRL_ATTR_MCAST_GRP_MAX should have stayed right after the previous enum. The new CTRL_ATTR_POLICY_* needs its own define for MAX and that max should not contain the superfluous _DUMP in the name. We probably can't do anything about the CTRL_ATTR_POLICY_DUMP_MAX any more, there's likely code which uses it. For consistency (*cough* codegen *cough*) let's add the correctly name define nonetheless. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-09ax88796: Fix some typo in a commentGravatar Christophe JAILLET 1-2/+2
s/by caused/be caused/ s/ax88786/ax88796/ Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/7db4b622d2c3e5af58c1d1f32b81836f4af71f18.1659801746.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-09add barriers to buffer_uptodate and set_buffer_uptodateGravatar Mikulas Patocka 1-1/+24
Let's have a look at this piece of code in __bread_slow: get_bh(bh); bh->b_end_io = end_buffer_read_sync; submit_bh(REQ_OP_READ, 0, bh); wait_on_buffer(bh); if (buffer_uptodate(bh)) return bh; Neither wait_on_buffer nor buffer_uptodate contain any memory barrier. Consequently, if someone calls sb_bread and then reads the buffer data, the read of buffer data may be executed before wait_on_buffer(bh) on architectures with weak memory ordering and it may return invalid data. Fix this bug by adding a memory barrier to set_buffer_uptodate and an acquire barrier to buffer_uptodate (in a similar way as folio_test_uptodate and folio_mark_uptodate). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-09Merge tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxGravatar Linus Torvalds 5-8/+15
Pull nfsd updates from Chuck Lever: "Work on 'courteous server', which was introduced in 5.19, continues apace. This release introduces a more flexible limit on the number of NFSv4 clients that NFSD allows, now that NFSv4 clients can remain in courtesy state long after the lease expiration timeout. The client limit is adjusted based on the physical memory size of the server. The NFSD filecache is a cache of files held open by NFSv4 clients or recently touched by NFSv2 or NFSv3 clients. This cache had some significant scalability constraints that have been relieved in this release. Thanks to all who contributed to this work. A data corruption bug found during the most recent NFS bake-a-thon that involves NFSv3 and NFSv4 clients writing the same file has been addressed in this release. This release includes several improvements in CPU scalability for NFSv4 operations. In addition, Neil Brown provided patches that simplify locking during file lookup, creation, rename, and removal that enables subsequent work on making these operations more scalable. We expect to see that work materialize in the next release. There are also numerous single-patch fixes, clean-ups, and the usual improvements in observability" * tag 'nfsd-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (78 commits) lockd: detect and reject lock arguments that overflow NFSD: discard fh_locked flag and fh_lock/fh_unlock NFSD: use (un)lock_inode instead of fh_(un)lock for file operations NFSD: use explicit lock/unlock for directory ops NFSD: reduce locking in nfsd_lookup() NFSD: only call fh_unlock() once in nfsd_link() NFSD: always drop directory lock in nfsd_unlink() NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning. NFSD: add posix ACLs to struct nfsd_attrs NFSD: add security label to struct nfsd_attrs NFSD: set attributes when creating symlinks NFSD: introduce struct nfsd_attrs NFSD: verify the opened dentry after setting a delegation NFSD: drop fh argument from alloc_init_deleg NFSD: Move copy offload callback arguments into a separate structure NFSD: Add nfsd4_send_cb_offload() NFSD: Remove kmalloc from nfsd4_do_async_copy() NFSD: Refactor nfsd4_do_copy() NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2) NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2) ...
2022-08-09netfilter: nf_tables: disallow jump to implicit chain from set elementGravatar Pablo Neira Ayuso 1-0/+5
Extend struct nft_data_desc to add a flag field that specifies nft_data_init() is being called for set element data. Use it to disallow jump to implicit chain from set element, only jump to chain via immediate expression is allowed. Fixes: d0e2c7de92c7 ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09netfilter: nf_tables: upfront validation of data via nft_data_init()Gravatar Pablo Neira Ayuso 1-2/+2
Instead of parsing the data and then validate that type and length are correct, pass a description of the expected data so it can be validated upfront before parsing it to bail out earlier. This patch adds a new .size field to specify the maximum size of the data area. The .len field is optional and it is used as an input/output field, it provides the specific length of the expected data in the input path. If then .len field is not specified, then obtained length from the netlink attribute is stored. This is required by cmp, bitwise, range and immediate, which provide no netlink attribute that describes the data length. The immediate expression uses the destination register type to infer the expected data type. Relying on opencoded validation of the expected data might lead to subtle bugs as described in 7e6bc1f6cabc ("netfilter: nf_tables: stricter validation of element data"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09NFS: Improve write error tracingGravatar Trond Myklebust 1-2/+1
Don't leak request pointers, but use the "device:inode" labelling that is used by all the other trace points. Furthermore, replace use of page indexes with an offset, again in order to align behaviour with other NFS trace points. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-08-09netfilter: ip6t_LOG: Fix a typo in a commentGravatar Christophe JAILLET 1-1/+1
s/_IPT_LOG_H/_IP6T_LOG_H/ While at it add some surrounding space to ease reading. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09netfilter: nf_tables: validate variable length element extensionGravatar Pablo Neira Ayuso 1-1/+3
Update template to validate variable length extensions. This patch adds a new .ext_len[id] field to the template to store the expected extension length. This is used to sanity check the initialization of the variable length extension. Use PTR_ERR() in nft_set_elem_init() to report errors since, after this update, there are two reason why this might fail, either because of ENOMEM or insufficient room in the extension field (EINVAL). Kernels up until 7e6bc1f6cabc ("netfilter: nf_tables: stricter validation of element data") allowed to copy more data to the extension than was allocated. This ext_len field allows to validate if the destination has the correct size as additional check. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09Merge tag 'fscache-fixes-20220809' of ↵Gravatar Linus Torvalds 1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull fscache updates from David Howells: - Fix a cookie access ref leak if a cookie is invalidated a second time before the first invalidation is actually processed. - Add a tracepoint to log cookie lookup failure * tag 'fscache-fixes-20220809' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: fscache: add tracepoint when failing cookie fscache: don't leak cookie access refs if invalidation is in progress or failed
2022-08-09Merge tag 'afs-fixes-20220802' of ↵Gravatar Linus Torvalds 1-18/+18
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS fixes from David Howells: "Fix AFS refcount handling. The first patch converts afs to use refcount_t for its refcounts and the second patch fixes afs_put_call() and afs_put_server() to save the values they're going to log in the tracepoint before decrementing the refcount" * tag 'afs-fixes-20220802' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Fix access after dec in put functions afs: Use refcount_t rather than atomic_t
2022-08-09Merge tag 'fs.setgid.v6.0' of ↵Gravatar Linus Torvalds 1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull setgid updates from Christian Brauner: "This contains the work to move setgid stripping out of individual filesystems and into the VFS itself. Creating files that have both the S_IXGRP and S_ISGID bit raised in directories that themselves have the S_ISGID bit set requires additional privileges to avoid security issues. When a filesystem creates a new inode it needs to take care that the caller is either in the group of the newly created inode or they have CAP_FSETID in their current user namespace and are privileged over the parent directory of the new inode. If any of these two conditions is true then the S_ISGID bit can be raised for an S_IXGRP file and if not it needs to be stripped. However, there are several key issues with the current implementation: - S_ISGID stripping logic is entangled with umask stripping. For example, if the umask removes the S_IXGRP bit from the file about to be created then the S_ISGID bit will be kept. The inode_init_owner() helper is responsible for S_ISGID stripping and is called before posix_acl_create(). So we can end up with two different orderings: 1. FS without POSIX ACL support First strip umask then strip S_ISGID in inode_init_owner(). In other words, if a filesystem doesn't support or enable POSIX ACLs then umask stripping is done directly in the vfs before calling into the filesystem: 2. FS with POSIX ACL support First strip S_ISGID in inode_init_owner() then strip umask in posix_acl_create(). In other words, if the filesystem does support POSIX ACLs then unmask stripping may be done in the filesystem itself when calling posix_acl_create(). Note that technically filesystems are free to impose their own ordering between posix_acl_create() and inode_init_owner() meaning that there's additional ordering issues that influence S_ISGID inheritance. (Note that the commit message of commit 1639a49ccdce ("fs: move S_ISGID stripping into the vfs_*() helpers") gets the ordering between inode_init_owner() and posix_acl_create() the wrong way around. I realized this too late.) - Filesystems that don't rely on inode_init_owner() don't get S_ISGID stripping logic. While that may be intentional (e.g. network filesystems might just defer setgid stripping to a server) it is often just a security issue. Note that mandating the use of inode_init_owner() was proposed as an alternative solution but that wouldn't fix the ordering issues and there are examples such as afs where the use of inode_init_owner() isn't possible. In any case, we should also try the cleaner and generalized solution first before resorting to this approach. - We still have S_ISGID inheritance bugs years after the initial round of S_ISGID inheritance fixes: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") All of this led us to conclude that the current state is too messy. While we won't be able to make it completely clean as posix_acl_create() is still a filesystem specific call we can improve the S_SIGD stripping situation quite a bit by hoisting it out of inode_init_owner() and into the respective vfs creation operations. The obvious advantage is that we don't need to rely on individual filesystems getting S_ISGID stripping right and instead can standardize the ordering between S_ISGID and umask stripping directly in the VFS. A few short implementation notes: - The stripping logic needs to happen in vfs_*() helpers for the sake of stacking filesystems such as overlayfs that rely on these helpers taking care of S_ISGID stripping. - Security hooks have never seen the mode as it is ultimately seen by the filesystem because of the ordering issue we mentioned. Nothing is changed for them. We simply continue to strip the umask before passing the mode down to the security hooks. - The following filesystems use inode_init_owner() and thus relied on S_ISGID stripping: spufs, 9p, bfs, btrfs, ext2, ext4, f2fs, hfsplus, hugetlbfs, jfs, minix, nilfs2, ntfs3, ocfs2, omfs, overlayfs, ramfs, reiserfs, sysv, ubifs, udf, ufs, xfs, zonefs, bpf, tmpfs. We've audited all callchains as best as we could. More details can be found in the commit message to 1639a49ccdce ("fs: move S_ISGID stripping into the vfs_*() helpers")" * tag 'fs.setgid.v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: ceph: rely on vfs for setgid stripping fs: move S_ISGID stripping into the vfs_*() helpers fs: Add missing umask strip in vfs_tmpfile fs: add mode_strip_sgid() helper
2022-08-09bpf: Add BPF-helper for accessing CLOCK_TAIGravatar Jesper Dangaard Brouer 2-0/+14
Commit 3dc6ffae2da2 ("timekeeping: Introduce fast accessor to clock tai") introduced a fast and NMI-safe accessor for CLOCK_TAI. Especially in time sensitive networks (TSN), where all nodes are synchronized by Precision Time Protocol (PTP), it's helpful to have the possibility to generate timestamps based on CLOCK_TAI instead of CLOCK_MONOTONIC. With a BPF helper for TAI in place, it becomes very convenient to correlate activity across different machines in the network. Use cases for such a BPF helper include functionalities such as Tx launch time (e.g. ETF and TAPRIO Qdiscs) and timestamping. Note: CLOCK_TAI is nothing new per se, only the NMI-safe variant of it is. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> [Kurt: Wrote changelog and renamed helper] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20220809060803.5773-2-kurt@linutronix.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-09net: netfilter: Remove ifdefs for code shared by BPF and ctnetlinkGravatar Kumar Kartikeya Dwivedi 1-6/+0
The current ifdefry for code shared by the BPF and ctnetlink side looks ugly. As per Pablo's request, simplify this by unconditionally compiling in the code. This can be revisited when the shared code between the two grows further. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220725085130.11553-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-09btf: Add a new kfunc flag which allows to mark a function to be sleepableGravatar Benjamin Tissoires 1-0/+1
This allows to declare a kfunc as sleepable and prevents its use in a non sleepable program. Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Co-developed-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220805214821.1058337-2-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-09fscache: add tracepoint when failing cookieGravatar Jeff Layton 1-0/+2
Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com>
2022-08-08Merge tag 'pull-work.iov_iter-rebased' of ↵Gravatar Linus Torvalds 2-23/+32
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more iov_iter updates from Al Viro: - more new_sync_{read,write}() speedups - ITER_UBUF introduction - ITER_PIPE cleanups - unification of iov_iter_get_pages/iov_iter_get_pages_alloc and switching them to advancing semantics - making ITER_PIPE take high-order pages without splitting them - handling copy_page_from_iter() for high-order pages properly * tag 'pull-work.iov_iter-rebased' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits) fix copy_page_from_iter() for compound destinations hugetlbfs: copy_page_to_iter() can deal with compound pages copy_page_to_iter(): don't split high-order page in case of ITER_PIPE expand those iov_iter_advance()... pipe_get_pages(): switch to append_pipe() get rid of non-advancing variants ceph: switch the last caller of iov_iter_get_pages_alloc() 9p: convert to advancing variant of iov_iter_get_pages_alloc() af_alg_make_sg(): switch to advancing variant of iov_iter_get_pages() iter_to_pipe(): switch to advancing variant of iov_iter_get_pages() block: convert to advancing variants of iov_iter_get_pages{,_alloc}() iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() iov_iter: saner helper for page array allocation fold __pipe_get_pages() into pipe_get_pages() ITER_XARRAY: don't open-code DIV_ROUND_UP() unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts unify xarray_get_pages() and xarray_get_pages_alloc() unify pipe_get_pages() and pipe_get_pages_alloc() iov_iter_get_pages(): sanity-check arguments iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper ...
2022-08-08get rid of non-advancing variantsGravatar Al Viro 1-22/+2
mechanical change; will be further massaged in subsequent commits Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()Gravatar Al Viro 1-0/+20
Most of the users immediately follow successful iov_iter_get_pages() with advancing by the amount it had returned. Provide inline wrappers doing that, convert trivial open-coded uses of those. BTW, iov_iter_get_pages() never returns more than it had been asked to; such checks in cifs ought to be removed someday... Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ITER_PIPE: fold data_start() and pipe_space_for_user() togetherGravatar Al Viro 1-20/+0
All their callers are next to each other; all of them want the total amount of pages and, possibly, the offset in the partial final buffer. Combine into a new helper (pipe_npages()), fix the bogosity in pipe_space_for_user(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08ITER_PIPE: cache the type of last bufferGravatar Al Viro 1-1/+4
We often need to find whether the last buffer is anon or not, and currently it's rather clumsy: check if ->iov_offset is non-zero (i.e. that pipe is not empty) if so, get the corresponding pipe_buffer and check its ->ops if it's &default_pipe_buf_ops, we have an anon buffer. Let's replace the use of ->iov_offset (which is nowhere near similar to its role for other flavours) with signed field (->last_offset), with the following rules: empty, no buffers occupied: 0 anon, with bytes up to N-1 filled: N zero-copy, with bytes up to N-1 filled: -N That way abs(i->last_offset) is equal to what used to be in i->iov_offset and empty vs. anon vs. zero-copy can be distinguished by the sign of i->last_offset. Checks for "should we extend the last buffer or should we start a new one?" become easier to follow that way. Note that most of the operations can only be done in a sane state - i.e. when the pipe has nothing past the current position of iterator. About the only thing that could be done outside of that state is iov_iter_advance(), which transitions to the sane state by truncating the pipe. There are only two cases where we leave the sane state: 1) iov_iter_get_pages()/iov_iter_get_pages_alloc(). Will be dealt with later, when we make get_pages advancing - the callers are actually happier that way. 2) iov_iter copied, then something is put into the copy. Since they share the underlying pipe, the original gets behind. When we decide that we are done with the copy (original is not usable until then) we advance the original. direct_io used to be done that way; nowadays it operates on the original and we do iov_iter_revert() to discard the excessive data. At the moment there's nothing in the kernel that could do that to ITER_PIPE iterators, so this reason for insane state is theoretical right now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08new iov_iter flavour - ITER_UBUFGravatar Al Viro 1-0/+26
Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones. We are going to expose the things like ->write_iter() et.al. to those in subsequent commits. New predicate (user_backed_iter()) that is true for ITER_IOVEC and ITER_UBUF; places like direct-IO handling should use that for checking that pages we modify after getting them from iov_iter_get_pages() would need to be dirtied. DO NOT assume that replacing iter_is_iovec() with user_backed_iter() will solve all problems - there's code that uses iter_is_iovec() to decide how to poke around in iov_iter guts and for that the predicate replacement obviously won't suffice. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-08highmem: delete a sentence from kmap_local_page() kdocsGravatar Fabio M. De Francesco 1-2/+1
kmap_local_page() should always be preferred in place of kmap() and kmap_atomic(). "Only use when really necessary." is not consistent with the Documentation/mm/highmem.rst and these kdocs it embeds. Therefore, delete the above-mentioned sentence from kdocs. Link: https://lkml.kernel.org/r/20220728154844.10874-7-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Collingbourne <pcc@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08highmem: specify that kmap_local_page() is callable from interruptsGravatar Fabio M. De Francesco 1-1/+1
In a recent thread about converting kmap() to kmap_local_page(), the safety of calling kmap_local_page() was questioned.[1] "any context" should probably be enough detail for users who want to know whether or not kmap_local_page() can be called from interrupts. However, Linux still has kmap_atomic() which might make users think they must use the latter in interrupts. Add "including interrupts" for better clarity. [1] https://lore.kernel.org/lkml/3187836.aeNJFYEL58@opensuse/ Link: https://lkml.kernel.org/r/20220728154844.10874-3-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Collingbourne <pcc@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08highmem: remove unneeded spaces in kmap_local_page() kdocsGravatar Fabio M. De Francesco 1-1/+1
Patch series "highmem: Extend kmap_local_page() documentation", v2. The Highmem interface is evolving and the current documentation does not reflect the intended uses of each of the calls. Furthermore, after a recent series of reworks, the differences of the calls can still be confusing and may lead to the expanded use of calls which are deprecated. This series is the second round of changes towards an enhanced documentation of the Highmem's interface; at this stage the patches are only focused to kmap_local_page(). In addition it also contains some minor clean ups. This patch (of 7): In the kdocs of kmap_local_page(), the description of @page starts after several unnecessary spaces. Therefore, remove those spaces. Link: https://lkml.kernel.org/r/20220728154844.10874-1-fmdefrancesco@gmail.com Link: https://lkml.kernel.org/r/20220728154844.10874-2-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Suggested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Peter Collingbourne <pcc@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08mm, hwpoison: enable memory error handling on 1GB hugepageGravatar Naoya Horiguchi 2-2/+0
Now error handling code is prepared, so remove the blocking code and enable memory error handling on 1GB hugepage. Link: https://lkml.kernel.org/r/20220714042420.1847125-9-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: kernel test robot <lkp@intel.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08mm, hwpoison: set PG_hwpoison for busy hugetlb pagesGravatar Naoya Horiguchi 1-0/+1
If memory_failure() fails to grab page refcount on a hugetlb page because it's busy, it returns without setting PG_hwpoison on it. This not only loses a chance of error containment, but breaks the rule that action_result() should be called only when memory_failure() do any of handling work (even if that's just setting PG_hwpoison). This inconsistency could harm code maintainability. So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case. Link: https://lkml.kernel.org/r/20220714042420.1847125-6-naoya.horiguchi@linux.dev Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()") Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: kernel test robot <lkp@intel.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-08-08mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepageGravatar Naoya Horiguchi 1-0/+9
Raw error info list needs to be removed when hwpoisoned hugetlb is unpoisoned. And unpoison handler needs to know how many errors there are in the target hugepage. So add them. HPageVmemmapOptimized(hpage) and HPageRawHwpUnreliable(hpage)) sometimes can't be unpoisoned, so skip them. Link: https://lkml.kernel.org/r/20220714042420.1847125-5-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@redhat.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>