aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-07-08selftests: forwarding: Install no_forwarding.shGravatar Martin Blumenstingl 1-0/+1
When using the Makefile from tools/testing/selftests/net/forwarding/ all tests should be installed. Add no_forwarding.sh to the list of "to be installed tests" where it has been missing so far. Fixes: 476a4f05d9b83f ("selftests: forwarding: add a no_forwarding.sh test") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08selftests: forwarding: Install local_termination.shGravatar Martin Blumenstingl 1-0/+1
When using the Makefile from tools/testing/selftests/net/forwarding/ all tests should be installed. Add local_termination.sh to the list of "to be installed tests" where it has been missing so far. Fixes: 90b9566aa5cd3f ("selftests: forwarding: add a test for local_termination.sh") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfGravatar Jakub Kicinski 7-24/+40
Daniel Borkmann says: ==================== bpf 2022-07-08 We've added 3 non-merge commits during the last 2 day(s) which contain a total of 7 files changed, 40 insertions(+), 24 deletions(-). The main changes are: 1) Fix cBPF splat triggered by skb not having a mac header, from Eric Dumazet. 2) Fix spurious packet loss in generic XDP when pushing packets out (note that native XDP is not affected by the issue), from Johan Almbladh. 3) Fix bpf_dynptr_{read,write}() helper signatures with flag argument before its set in stone as UAPI, from Joanne Koong. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs bpf: Make sure mac_header was set before using it xdp: Fix spurious packet loss in generic XDP TX path ==================== Link: https://lore.kernel.org/r/20220708213418.19626-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08Merge branch 'sysctl-data-races'Gravatar David S. Miller 8-26/+37
Kuniyuki Iwashima says: ==================== sysctl: Fix data-races around ipv4_table. A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. The first half of this series changes some proc handlers used in ipv4_table to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. Then, the second half adds READ_ONCE() to the other readers of ipv4_table. Changes: v2: * Drop some changes that makes backporting difficult * First cleanup patch * Lockless helpers and .proc_handler changes * Drop the tracing part for .sysctl_mem * Steve already posted a fix * Drop int-to-bool change for cipso * Should be posted to net-next later * Drop proc_dobool() change * Can be included in another series v1: https://lore.kernel.org/netdev/20220706052130.16368-1-kuniyu@amazon.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08ipv4: Fix a data-race around sysctl_fib_sync_mem.Gravatar Kuniyuki Iwashima 1-1/+1
While reading sysctl_fib_sync_mem, it can be changed concurrently. So, we need to add READ_ONCE() to avoid a data-race. Fixes: 9ab948a91b2c ("ipv4: Allow amount of dirty memory from fib resizing to be controllable") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08icmp: Fix data-races around sysctl.Gravatar Kuniyuki Iwashima 1-2/+3
While reading icmp sysctl variables, they can be changed concurrently. So, we need to add READ_ONCE() to avoid data-races. Fixes: 4cdf507d5452 ("icmp: add a global rate limitation") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08cipso: Fix data-races around sysctl.Gravatar Kuniyuki Iwashima 2-6/+8
While reading cipso sysctl variables, they can be changed concurrently. So, we need to add READ_ONCE() to avoid data-races. Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08net: Fix data-races around sysctl_mem.Gravatar Kuniyuki Iwashima 1-1/+1
While reading .sysctl_mem, it can be changed concurrently. So, we need to add READ_ONCE() to avoid data-races. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08inetpeer: Fix data-races around sysctl.Gravatar Kuniyuki Iwashima 1-4/+8
While reading inetpeer sysctl variables, they can be changed concurrently. So, we need to add READ_ONCE() to avoid data-races. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08tcp: Fix a data-race around sysctl_tcp_max_orphans.Gravatar Kuniyuki Iwashima 1-1/+2
While reading sysctl_tcp_max_orphans, it can be changed concurrently. So, we need to add READ_ONCE() to avoid a data-race. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08sysctl: Fix data races in proc_dointvec_jiffies().Gravatar Kuniyuki Iwashima 1-2/+5
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_dointvec_jiffies() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_dointvec_jiffies() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08sysctl: Fix data races in proc_doulongvec_minmax().Gravatar Kuniyuki Iwashima 1-2/+2
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_doulongvec_minmax() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_doulongvec_minmax() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08sysctl: Fix data races in proc_douintvec_minmax().Gravatar Kuniyuki Iwashima 1-1/+1
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_douintvec_minmax() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_douintvec_minmax() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: 61d9b56a8920 ("sysctl: add unsigned int range support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08sysctl: Fix data races in proc_dointvec_minmax().Gravatar Kuniyuki Iwashima 1-1/+1
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_dointvec_minmax() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_dointvec_minmax() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08sysctl: Fix data races in proc_douintvec().Gravatar Kuniyuki Iwashima 1-2/+2
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_douintvec() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_douintvec() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08sysctl: Fix data races in proc_dointvec().Gravatar Kuniyuki Iwashima 1-3/+3
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing. This patch changes proc_dointvec() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_dointvec() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointerGravatar Steven Rostedt (Google) 1-2/+4
The trace event sock_exceed_buf_limit saves the prot->sysctl_mem pointer and then dereferences it in the TP_printk() portion. This is unsafe as the TP_printk() portion is executed at the time the buffer is read. That is, it can be seconds, minutes, days, months, even years later. If the proto is freed, then this dereference will can also lead to a kernel crash. Instead, save the sysctl_mem array into the ring buffer and have the TP_printk() reference that instead. This is the proper and safe way to read pointers in trace events. Link: https://lore.kernel.org/all/20220706052130.16368-12-kuniyu@amazon.com/ Cc: stable@vger.kernel.org Fixes: 3847ce32aea9f ("core: add tracepoints for queueing skb to rcvbuf") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-08bpf: Add flags arg to bpf_dynptr_read and bpf_dynptr_write APIsGravatar Joanne Koong 5-19/+29
Commit 13bbbfbea759 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write") added the bpf_dynptr_write() and bpf_dynptr_read() APIs. However, it will be needed for some dynptr types to pass in flags as well (e.g. when writing to a skb, the user may like to invalidate the hash or recompute the checksum). This patch adds a "u64 flags" arg to the bpf_dynptr_read() and bpf_dynptr_write() APIs before their UAPI signature freezes where we then cannot change them anymore with a 5.19.x released kernel. Fixes: 13bbbfbea759 ("bpf: Add bpf_dynptr_read and bpf_dynptr_write") Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20220706232547.4016651-1-joannelkoong@gmail.com
2022-07-07net: ocelot: fix wrong time_after usageGravatar Pavel Skripkin 1-9/+8
Accidentally noticed, that this driver is the only user of while (time_after(jiffies...)). It looks like typo, because likely this while loop will finish after 1st iteration, because time_after() returns true when 1st argument _is after_ 2nd one. There is one possible problem with this poll loop: the scheduler could put the thread to sleep, and it does not get woken up for OCELOT_FDMA_CH_SAFE_TIMEOUT_US. During that time, the hardware has done its thing, but you exit the while loop and return -ETIMEDOUT. Fix it by using sane poll API that avoids all problems described above Fixes: 753a026cfec1 ("net: ocelot: add FDMA support") Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220706132845.27968-1-paskripkin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07Merge tag 'mlx5-fixes-2022-07-06' of ↵Gravatar Jakub Kicinski 11-38/+76
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2022-07-06 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2022-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Ring the TX doorbell on DMA errors net/mlx5e: Fix capability check for updating vnic env counters net/mlx5e: CT: Use own workqueue instead of mlx5e priv net/mlx5: Lag, correct get the port select mode str net/mlx5e: Fix enabling sriov while tc nic rules are offloaded net/mlx5e: kTLS, Fix build time constant test in RX net/mlx5e: kTLS, Fix build time constant test in TX net/mlx5: Lag, decouple FDB selection and shared FDB net/mlx5: TC, allow offload from uplink to other PF's VF ==================== Link: https://lore.kernel.org/r/20220706231309.38579-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07net: ethernet: ti: am65-cpsw: Fix devlink port register sequenceGravatar Siddharth Vadapalli 1-7/+10
Renaming interfaces using udevd depends on the interface being registered before its netdev is registered. Otherwise, udevd reads an empty phys_port_name value, resulting in the interface not being renamed. Fix this by registering the interface before registering its netdev by invoking am65_cpsw_nuss_register_devlink() before invoking register_netdev() for the interface. Move the function call to devlink_port_type_eth_set(), invoking it after register_netdev() is invoked, to ensure that netlink notification for the port state change is generated after the netdev is completely initialized. Fixes: 58356eb31d60 ("net: ti: am65-cpsw-nuss: Add devlink support") Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Link: https://lore.kernel.org/r/20220706070208.12207-1-s-vadapalli@ti.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07net: stmmac: dwc-qos: Disable split header for Tegra194Gravatar Jon Hunter 1-0/+1
There is a long-standing issue with the Synopsys DWC Ethernet driver for Tegra194 where random system crashes have been observed [0]. The problem occurs when the split header feature is enabled in the stmmac driver. In the bad case, a larger than expected buffer length is received and causes the calculation of the total buffer length to overflow. This results in a very large buffer length that causes the kernel to crash. Why this larger buffer length is received is not clear, however, the feedback from the NVIDIA design team is that the split header feature is not supported for Tegra194. Therefore, disable split header support for Tegra194 to prevent these random crashes from occurring. [0] https://lore.kernel.org/linux-tegra/b0b17697-f23e-8fa5-3757-604a86f3a095@nvidia.com/ Fixes: 67afd6d1cfdf ("net: stmmac: Add Split Header support and enable it in XGMAC cores") Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Link: https://lore.kernel.org/r/20220706083913.13750-1-jonathanh@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-07netfilter: conntrack: fix crash due to confirmed bit load reorderingGravatar Florian Westphal 3-0/+26
Kajetan Puchalski reports crash on ARM, with backtrace of: __nf_ct_delete_from_lists nf_ct_delete early_drop __nf_conntrack_alloc Unlike atomic_inc_not_zero, refcount_inc_not_zero is not a full barrier. conntrack uses SLAB_TYPESAFE_BY_RCU, i.e. it is possible that a 'newly' allocated object is still in use on another CPU: CPU1 CPU2 encounter 'ct' during hlist walk delete_from_lists refcount drops to 0 kmem_cache_free(ct); __nf_conntrack_alloc() // returns same object refcount_inc_not_zero(ct); /* might fail */ /* If set, ct is public/in the hash table */ test_bit(IPS_CONFIRMED_BIT, &ct->status); In case CPU1 already set refcount back to 1, refcount_inc_not_zero() will succeed. The expected possibilities for a CPU that obtained the object 'ct' (but no reference so far) are: 1. refcount_inc_not_zero() fails. CPU2 ignores the object and moves to the next entry in the list. This happens for objects that are about to be free'd, that have been free'd, or that have been reallocated by __nf_conntrack_alloc(), but where the refcount has not been increased back to 1 yet. 2. refcount_inc_not_zero() succeeds. CPU2 checks the CONFIRMED bit in ct->status. If set, the object is public/in the table. If not, the object must be skipped; CPU2 calls nf_ct_put() to un-do the refcount increment and moves to the next object. Parallel deletion from the hlists is prevented by a 'test_and_set_bit(IPS_DYING_BIT, &ct->status);' check, i.e. only one cpu will do the unlink, the other one will only drop its reference count. Because refcount_inc_not_zero is not a full barrier, CPU2 may try to delete an object that is not on any list: 1. refcount_inc_not_zero() successful (refcount inited to 1 on other CPU) 2. CONFIRMED test also successful (load was reordered or zeroing of ct->status not yet visible) 3. delete_from_lists unlinks entry not on the hlist, because IPS_DYING_BIT is 0 (already cleared). 2) is already wrong: CPU2 will handle a partially initited object that is supposed to be private to CPU1. Add needed barriers when refcount_inc_not_zero() is successful. It also inserts a smp_wmb() before the refcount is set to 1 during allocation. Because other CPU might still see the object, refcount_set(1) "resurrects" it, so we need to make sure that other CPUs will also observe the right content. In particular, the CONFIRMED bit test must only pass once the object is fully initialised and either in the hash or about to be inserted (with locks held to delay possible unlink from early_drop or gc worker). I did not change flow_offload_alloc(), as far as I can see it should call refcount_inc(), not refcount_inc_not_zero(): the ct object is attached to the skb so its refcount should be >= 1 in all cases. v2: prefer smp_acquire__after_ctrl_dep to smp_rmb (Will Deacon). v3: keep smp_acquire__after_ctrl_dep close to refcount_inc_not_zero call add comment in nf_conntrack_netlink, no control dependency there due to locks. Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/all/Yr7WTfd6AVTQkLjI@e126311.manchester.arm.com/ Reported-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Diagnosed-by: Will Deacon <will@kernel.org> Fixes: 719774377622 ("netfilter: conntrack: convert to refcount_t api") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Will Deacon <will@kernel.org>
2022-07-07bpf: Make sure mac_header was set before using itGravatar Eric Dumazet 1-3/+5
Classic BPF has a way to load bytes starting from the mac header. Some skbs do not have a mac header, and skb_mac_header() in this case is returning a pointer that 65535 bytes after skb->head. Existing range check in bpf_internal_load_pointer_neg_helper() was properly kicking and no illegal access was happening. New sanity check in skb_mac_header() is firing, so we need to avoid it. WARNING: CPU: 1 PID: 28990 at include/linux/skbuff.h:2785 skb_mac_header include/linux/skbuff.h:2785 [inline] WARNING: CPU: 1 PID: 28990 at include/linux/skbuff.h:2785 bpf_internal_load_pointer_neg_helper+0x1b1/0x1c0 kernel/bpf/core.c:74 Modules linked in: CPU: 1 PID: 28990 Comm: syz-executor.0 Not tainted 5.19.0-rc4-syzkaller-00865-g4874fb9484be #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022 RIP: 0010:skb_mac_header include/linux/skbuff.h:2785 [inline] RIP: 0010:bpf_internal_load_pointer_neg_helper+0x1b1/0x1c0 kernel/bpf/core.c:74 Code: ff ff 45 31 f6 e9 5a ff ff ff e8 aa 27 40 00 e9 3b ff ff ff e8 90 27 40 00 e9 df fe ff ff e8 86 27 40 00 eb 9e e8 2f 2c f3 ff <0f> 0b eb b1 e8 96 27 40 00 e9 79 fe ff ff 90 41 57 41 56 41 55 41 RSP: 0018:ffffc9000309f668 EFLAGS: 00010216 RAX: 0000000000000118 RBX: ffffffffffeff00c RCX: ffffc9000e417000 RDX: 0000000000040000 RSI: ffffffff81873f21 RDI: 0000000000000003 RBP: ffff8880842878c0 R08: 0000000000000003 R09: 000000000000ffff R10: 000000000000ffff R11: 0000000000000001 R12: 0000000000000004 R13: ffff88803ac56c00 R14: 000000000000ffff R15: dffffc0000000000 FS: 00007f5c88a16700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdaa9f6c058 CR3: 000000003a82c000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ____bpf_skb_load_helper_32 net/core/filter.c:276 [inline] bpf_skb_load_helper_32+0x191/0x220 net/core/filter.c:264 Fixes: f9aefd6b2aa3 ("net: warn if mac header was not set") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220707123900.945305-1-edumazet@google.com
2022-07-07Merge tag 'net-5.19-rc6' of ↵Gravatar Linus Torvalds 65-524/+1089
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf, netfilter, can, and bluetooth. Current release - regressions: - bluetooth: fix deadlock on hci_power_on_sync Previous releases - regressions: - sched: act_police: allow 'continue' action offload - eth: usbnet: fix memory leak in error case - eth: ibmvnic: properly dispose of all skbs during a failover Previous releases - always broken: - bpf: - fix insufficient bounds propagation from adjust_scalar_min_max_vals - clear page contiguity bit when unmapping pool - netfilter: nft_set_pipapo: release elements in clone from abort path - mptcp: netlink: issue MP_PRIO signals from userspace PMs - can: - rcar_canfd: fix data transmission failed on R-Car V3U - gs_usb: gs_usb_open/close(): fix memory leak Misc: - add Wenjia as SMC maintainer" * tag 'net-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits) wireguard: Kconfig: select CRYPTO_CHACHA_S390 crypto: s390 - do not depend on CRYPTO_HW for SIMD implementations wireguard: selftests: use microvm on x86 wireguard: selftests: always call kernel makefile wireguard: selftests: use virt machine on m68k wireguard: selftests: set fake real time in init r8169: fix accessing unset transport header net: rose: fix UAF bug caused by rose_t0timer_expiry usbnet: fix memory leak in error case Revert "tls: rx: move counting TlsDecryptErrors for sync" mptcp: update MIB_RMSUBFLOW in cmd_sf_destroy mptcp: fix local endpoint accounting selftests: mptcp: userspace PM support for MP_PRIO signals mptcp: netlink: issue MP_PRIO signals from userspace PMs mptcp: Acquire the subflow socket lock before modifying MP_PRIO flags mptcp: Avoid acquiring PM lock for subflow priority changes mptcp: fix locking in mptcp_nl_cmd_sf_destroy() net/mlx5e: Fix matchall police parameters validation net/sched: act_police: allow 'continue' action offload net: lan966x: hardcode the number of external ports ...
2022-07-07Merge tag 'pinctrl-v5.19-2' of ↵Gravatar Linus Torvalds 6-16/+23
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fixes from Linus Walleij: - Tag Intel pin control as supported in MAINTAINERS - Fix a NULL pointer exception in the Aspeed driver - Correct some NAND functions in the Sunxi A83T driver - Use the right offset for some Sunxi pins - Fix a zero base offset in the Freescale (NXP) i.MX93 - Fix the IRQ support in the STM32 driver * tag 'pinctrl-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: stm32: fix optional IRQ support to gpios pinctrl: imx: Add the zero base flag for imx93 pinctrl: sunxi: sunxi_pconf_set: use correct offset pinctrl: sunxi: a83t: Fix NAND function name for some pins pinctrl: aspeed: Fix potential NULL dereference in aspeed_pinmux_set_mux() MAINTAINERS: Update Intel pin control to Supported
2022-07-07signal handling: don't use BUG_ON() for debuggingGravatar Linus Torvalds 1-4/+4
These are indeed "should not happen" situations, but it turns out recent changes made the 'task_is_stopped_or_trace()' case trigger (fix for that exists, is pending more testing), and the BUG_ON() makes it unnecessarily hard to actually debug for no good reason. It's been that way for a long time, but let's make it clear: BUG_ON() is not good for debugging, and should never be used in situations where you could just say "this shouldn't happen, but we can continue". Use WARN_ON_ONCE() instead to make sure it gets logged, and then just continue running. Instead of making the system basically unusuable because you crashed the machine while potentially holding some very core locks (eg this function is commonly called while holding 'tasklist_lock' for writing). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-06Merge branch 'wireguard-patches-for-5-19-rc6'Gravatar Jakub Kicinski 13-134/+157
Jason A. Donenfeld says: ==================== wireguard patches for 5.19-rc6 1) A few small fixups to the selftests, per usual. Of particular note is a fix for a test flake that occurred on especially fast systems that boot in less than a second. 2) An addition during this cycle of some s390 crypto interacted with the way wireguard selects dependencies, resulting in linker errors reported by the kernel test robot. So Vladis sent in a patch for that, which also required a small preparatory fix moving some Kconfig symbols around. ==================== Link: https://lore.kernel.org/r/20220707003157.526645-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06wireguard: Kconfig: select CRYPTO_CHACHA_S390Gravatar Vladis Dronov 1-0/+1
Select the new implementation of CHACHA20 for S390 when available. It is faster than the generic software implementation, but also prevents some linker errors in certain situations. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/linux-kernel/202207030630.6SZVkrWf-lkp@intel.com/ Signed-off-by: Vladis Dronov <vdronov@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06crypto: s390 - do not depend on CRYPTO_HW for SIMD implementationsGravatar Jason A. Donenfeld 2-115/+114
Various accelerated software implementation Kconfig values for S390 were mistakenly placed into drivers/crypto/Kconfig, even though they're mainly just SIMD code and live in arch/s390/crypto/ like usual. This gives them the very unusual dependency on CRYPTO_HW, which leads to problems elsewhere. This patch fixes the issue by moving the Kconfig values for non-hardware drivers into the usual place in crypto/Kconfig. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06wireguard: selftests: use microvm on x86Gravatar Jason A. Donenfeld 3-8/+16
This makes for faster tests, faster compile time, and allows us to ditch ACPI finally. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06wireguard: selftests: always call kernel makefileGravatar Jason A. Donenfeld 1-3/+2
These selftests are used for much more extensive changes than just the wireguard source files. So always call the kernel's build file, which will do something or nothing after checking the whole tree, per usual. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06wireguard: selftests: use virt machine on m68kGravatar Jason A. Donenfeld 2-8/+6
This should be a bit more stable hopefully. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06wireguard: selftests: set fake real time in initGravatar Jason A. Donenfeld 8-0/+18
Not all platforms have an RTC, and rather than trying to force one into each, it's much easier to just set a fixed time. This is necessary because WireGuard's latest handshakes parameter is returned in wallclock time, and if the system time isn't set, and the system is really fast, then this returns 0, which trips the test. Turning this on requires setting CONFIG_COMPAT_32BIT_TIME=y, as musl doesn't support settimeofday without it. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06r8169: fix accessing unset transport headerGravatar Heiner Kallweit 1-6/+4
66e4c8d95008 ("net: warn if transport header was not set") added a check that triggers a warning in r8169, see [0]. The commit referenced in the Fixes tag refers to the change from which the patch applies cleanly, there's nothing wrong with this commit. It seems the actual issue (not bug, because the warning is harmless here) was introduced with bdfa4ed68187 ("r8169: use Giant Send"). [0] https://bugzilla.kernel.org/show_bug.cgi?id=216157 Fixes: 8d520b4de3ed ("r8169: work around RTL8125 UDP hw bug") Reported-by: Erhard F. <erhard_f@mailbox.org> Tested-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/1b2c2b29-3dc0-f7b6-5694-97ec526d51a0@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06net: rose: fix UAF bug caused by rose_t0timer_expiryGravatar Duoming Zhou 1-2/+2
There are UAF bugs caused by rose_t0timer_expiry(). The root cause is that del_timer() could not stop the timer handler that is running and there is no synchronization. One of the race conditions is shown below: (thread 1) | (thread 2) | rose_device_event | rose_rt_device_down | rose_remove_neigh rose_t0timer_expiry | rose_stop_t0timer(rose_neigh) ... | del_timer(&neigh->t0timer) | kfree(rose_neigh) //[1]FREE neigh->dce_mode //[2]USE | The rose_neigh is deallocated in position [1] and use in position [2]. The crash trace triggered by POC is like below: BUG: KASAN: use-after-free in expire_timers+0x144/0x320 Write of size 8 at addr ffff888009b19658 by task swapper/0/0 ... Call Trace: <IRQ> dump_stack_lvl+0xbf/0xee print_address_description+0x7b/0x440 print_report+0x101/0x230 ? expire_timers+0x144/0x320 kasan_report+0xed/0x120 ? expire_timers+0x144/0x320 expire_timers+0x144/0x320 __run_timers+0x3ff/0x4d0 run_timer_softirq+0x41/0x80 __do_softirq+0x233/0x544 ... This patch changes rose_stop_ftimer() and rose_stop_t0timer() in rose_remove_neigh() to del_timer_sync() in order that the timer handler could be finished before the resources such as rose_neigh and so on are deallocated. As a result, the UAF bugs could be mitigated. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://lore.kernel.org/r/20220705125610.77971-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06usbnet: fix memory leak in error caseGravatar Oliver Neukum 1-5/+12
usbnet_write_cmd_async() mixed up which buffers need to be freed in which error case. v2: add Fixes tag v3: fix uninitialized buf pointer Fixes: 877bd862f32b8 ("usbnet: introduce usbnet 3 command helpers") Signed-off-by: Oliver Neukum <oneukum@suse.com> Link: https://lore.kernel.org/r/20220705125351.17309-1-oneukum@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-06net/mlx5e: Ring the TX doorbell on DMA errorsGravatar Maxim Mikityanskiy 1-9/+30
TX doorbells may be postponed, because sometimes the driver knows that another packet follows (for example, when xmit_more is true, or when a MPWQE session is closed before transmitting a packet). However, the DMA mapping may fail for the next packet, in which case a new WQE is not posted, the doorbell isn't updated either, and the transmission of the previous packet will be delayed indefinitely. This commit fixes the described rare error flow by posting a NOP and ringing the doorbell on errors to flush all the previous packets. The MPWQE session is closed before that. DMA mapping in the MPWQE flow is moved to the beginning of mlx5e_sq_xmit_mpwqe, because empty sessions are not allowed. Stop room always has enough space for a NOP, because the actual TX WQE is not posted. Fixes: e586b3b0baee ("net/mlx5: Ethernet Datapath files") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5e: Fix capability check for updating vnic env countersGravatar Gal Pressman 1-1/+1
The existing capability check for vnic env counters only checks for receive steering discards, although we need the counters update for the exposed internal queue oob counter as well. This could result in the latter counter not being updated correctly when the receive steering discards counter is not supported. Fix that by checking whether any counter is supported instead of only the steering counter capability. Fixes: 0cfafd4b4ddf ("net/mlx5e: Add device out of buffer counter") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5e: CT: Use own workqueue instead of mlx5e privGravatar Roi Dayan 1-8/+12
Allocate a ct priv workqueue instead of using mlx5e priv one so flushing will only be of related CT entries. Also move flushing of the workqueue before rhashtable destroy otherwise entries won't be valid. Fixes: b069e14fff46 ("net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5: Lag, correct get the port select mode strGravatar Liu, Changcheng 3-5/+5
mode & mode_flags is updated at the end of mlx5_activate_lag which may not reflect the actual mode as shown in below logic: mlx5_activate_lag(struct mlx5_lag *ldev, |-- unsigned long flags = 0; |-- err = mlx5_lag_set_flags(ldev, mode, tracker, shared_fdb, &flags); |-- err = mlx5_create_lag(ldev, tracker, mode, flags); |-- mlx5_get_str_port_sel_mode(ldev); |-- ldev->mode = mode; |-- ldev->mode_flags = flags; Use mode & flag as parameters to get port select mode info. Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode") Signed-off-by: Liu, Changcheng <jerrliu@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5e: Fix enabling sriov while tc nic rules are offloadedGravatar Paul Blakey 1-2/+3
There is a total of four 4M entries flow tables. In sriov disabled mode, ct, ct_nat and post_act take three of them. When adding the first tc nic rule in this mode, it will take another 4M table for the tc <chain,prio> table. If user then enables sriov, the legacy flow table tries to take another 4M and fails, and so enablement fails. To fix that, have legacy fdb take the next available maximum size from the fs ft pool. Fixes: 4a98544d1827 ("net/mlx5: Move chains ft pool to be used by all firmware steering") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5e: kTLS, Fix build time constant test in RXGravatar Tariq Toukan 1-2/+1
Use the correct constant (TLS_DRIVER_STATE_SIZE_RX) in the comparison against the size of the private RX TLS driver context. Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5e: kTLS, Fix build time constant test in TXGravatar Tariq Toukan 1-2/+1
Use the correct constant (TLS_DRIVER_STATE_SIZE_TX) in the comparison against the size of the private TX TLS driver context. Fixes: df8d866770f9 ("net/mlx5e: kTLS, Use kernel API to extract private offload context") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5: Lag, decouple FDB selection and shared FDBGravatar Mark Bloch 4-8/+22
Multiport eswitch is required to use native FDB selection instead of affinity, This was achieved by passing the shared_fdb flag down the HW lag creation path. While it did accomplish the goal of setting FDB selection mode to native, it had the side effect of also creating a shared FDB configuration. This created a few issues: - TC rules are inserted into a non active FDB, which means traffic isn't offloaded as all traffic will reach only a single FDB. - All wire traffic is treated as if a single physical port received it; while this is true for a bond configuration, this shouldn't be the case for multiport eswitch. Create a new flag MLX5_LAG_MODE_FLAG_FDB_SEL_MODE_NATIVE to indicate what FDB selection mode should be used. Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06net/mlx5: TC, allow offload from uplink to other PF's VFGravatar Eli Cohen 1-1/+1
Redirecting traffic from uplink to a VF is a legal operation of mulitport eswitch mode. Remove the limitation. Fixes: 94db33177819 ("net/mlx5: Support multiport eswitch mode") Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-06Merge tag 'for-linus' of https://github.com/openrisc/linuxGravatar Linus Torvalds 2-2/+1
Pull OpenRISC fixes from Stafford Horne: "Fixups for OpenRISC found during recent testing: - An OpenRISC irqchip fix to stop acking level interrupts which was causing issues on SMP platforms - A comment typo fix in our unwinder code" * tag 'for-linus' of https://github.com/openrisc/linux: openrisc: unwinder: Fix grammar issue in comment irqchip: or1k-pic: Undefine mask_ack for level triggered hardware
2022-07-06Merge tag 'sound-5.19-rc6' of ↵Gravatar Linus Torvalds 43-249/+812
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "This became largish as it includes the pending ASoC fixes. Almost all changes are device-specific small fixes, while many of them are coverage for mixer issues that were detected by selftest. In addition, usual suspects for HD/USB-audio are there" * tag 'sound-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (43 commits) ALSA: cs46xx: Fix missing snd_card_free() call at probe error ALSA: usb-audio: Add quirk for Fiero SC-01 (fw v1.0.0) ALSA: usb-audio: Add quirk for Fiero SC-01 ALSA: hda/realtek: Add quirk for Clevo L140PU ALSA: usb-audio: Add quirks for MacroSilicon MS2100/MS2106 devices ASoC: madera: Fix event generation for rate controls ASoC: madera: Fix event generation for OUT1 demux ASoC: cs47l15: Fix event generation for low power mux control ASoC: cs35l41: Add ASP TX3/4 source to register patch ASoC: dapm: Initialise kcontrol data for mux/demux controls ASoC: rt711-sdca: fix kernel NULL pointer dereference when IO error ASoC: cs35l41: Correct some control names ASoC: wm5110: Fix DRE control ASoC: wm_adsp: Fix event for preloader MAINTAINERS: update ASoC Qualcomm maintainer email-id ASoC: rockchip: i2s: switch BCLK to GPIO ASoC: SOF: Intel: disable IMR boot when resuming from ACPI S4 and S5 states ASoC: SOF: pm: add definitions for S4 and S5 states ASoC: SOF: pm: add explicit behavior for ACPI S1 and S2 ASoC: SOF: Intel: hda: Fix compressed stream position tracking ...
2022-07-06xdp: Fix spurious packet loss in generic XDP TX pathGravatar Johan Almbladh 1-2/+6
The byte queue limits (BQL) mechanism is intended to move queuing from the driver to the network stack in order to reduce latency caused by excessive queuing in hardware. However, when transmitting or redirecting a packet using generic XDP, the qdisc layer is bypassed and there are no additional queues. Since netif_xmit_stopped() also takes BQL limits into account, but without having any alternative queuing, packets are silently dropped. This patch modifies the drop condition to only consider cases when the driver itself cannot accept any more packets. This is analogous to the condition in __dev_direct_xmit(). Dropped packets are also counted on the device. Bypassing the qdisc layer in the generic XDP TX path means that XDP packets are able to starve other packets going through a qdisc, and DDOS attacks will be more effective. In-driver-XDP use dedicated TX queues, so they do not have this starvation issue. Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220705082345.2494312-1-johan.almbladh@anyfinetworks.com
2022-07-06Revert "tls: rx: move counting TlsDecryptErrors for sync"Gravatar Gal Pressman 1-4/+4
This reverts commit 284b4d93daee56dff3e10029ddf2e03227f50dbf. When using TLS device offload and coming from tls_device_reencrypt() flow, -EBADMSG error in tls_do_decryption() should not be counted towards the TLSTlsDecryptError counter. Move the counter increase back to the decrypt_internal() call site in decrypt_skb_update(). This also fixes an issue where: if (n_sgin < 1) return -EBADMSG; Errors in decrypt_internal() were not counted after the cited patch. Fixes: 284b4d93daee ("tls: rx: move counting TlsDecryptErrors for sync") Cc: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>