aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2020-06-30xsk: Remove a double pool->dev assignment in xp_dma_mapGravatar Christoph Hellwig 1-1/+0
->dev is already assigned at the top of the function, remove the duplicate one at the end. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200629130359.2690853-4-hch@lst.de
2020-06-30xsk: Replace the cheap_dma flag with a dma_need_sync flagGravatar Christoph Hellwig 1-3/+2
Invert the polarity and better name the flag so that the use case is properly documented. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200629130359.2690853-3-hch@lst.de
2020-06-29genetlink: get rid of family->attrbufGravatar Cong Wang 1-35/+13
genl_family_rcv_msg_attrs_parse() reuses the global family->attrbuf when family->parallel_ops is false. However, family->attrbuf is not protected by any lock on the genl_family_rcv_msg_doit() code path. This leads to several different consequences, one of them is UAF, like the following: genl_family_rcv_msg_doit(): genl_start(): genl_family_rcv_msg_attrs_parse() attrbuf = family->attrbuf __nlmsg_parse(attrbuf); genl_family_rcv_msg_attrs_parse() attrbuf = family->attrbuf __nlmsg_parse(attrbuf); info->attrs = attrs; cb->data = info; netlink_unicast_kernel(): consume_skb() genl_lock_dumpit(): genl_dumpit_info(cb)->attrs Note family->attrbuf is an array of pointers to the skb data, once the skb is freed, any dereference of family->attrbuf will be a UAF. Maybe we could serialize the family->attrbuf with genl_mutex too, but that would make the locking more complicated. Instead, we can just get rid of family->attrbuf and always allocate attrbuf from heap like the family->parallel_ops==true code path. This may add some performance overhead but comparing with taking the global genl_mutex, it still looks better. Fixes: 75cdbdd08900 ("net: ieee802154: have genetlink code to parse the attrs during dumpit") Fixes: 057af7071344 ("net: tipc: have genetlink code to parse the attrs during dumpit") Reported-and-tested-by: syzbot+3039ddf6d7b13daf3787@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+80cad1e3cb4c41cde6ff@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+736bcbcb11b60d0c0792@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+520f8704db2b68091d44@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+c96e4dfb32f8987fdeed@syzkaller.appspotmail.com Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-29Merge tag 'mac80211-for-net-2020-06-29' of ↵Gravatar David S. Miller 5-16/+52
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Couple of fixes/small things: * TX control port status check fixed to not assume frame format * mesh control port fixes * error handling/leak fixes when starting AP, with HE attributes * fix broadcast packet handling with encapsulation offload * add new AKM suites * and a small code cleanup ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-29SUNRPC: Add missing definition of ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGEGravatar Christophe Leroy 1-0/+1
Even if that's only a warning, not including asm/cacheflush.h leads to svc_flush_bvec() being empty allthough powerpc defines ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE. CC net/sunrpc/svcsock.o net/sunrpc/svcsock.c:227:5: warning: "ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE" is not defined [-Wundef] #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE ^ Include linux/highmem.h so that asm/cacheflush.h will be included. Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reported-by: kernel test robot <lkp@intel.com> Fixes: ca07eda33e01 ("SUNRPC: Refactor svc_recvfrom()") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-06-28llc: make sure applications use ARPHRD_ETHERGravatar Eric Dumazet 1-3/+7
syzbot was to trigger a bug by tricking AF_LLC with non sensible addr->sllc_arphrd It seems clear LLC requires an Ethernet device. Back in commit abf9d537fea2 ("llc: add support for SO_BINDTODEVICE") Octavian Purdila added possibility for application to use a zero value for sllc_arphrd, convert it to ARPHRD_ETHER to not cause regressions on existing applications. BUG: KASAN: use-after-free in __read_once_size include/linux/compiler.h:199 [inline] BUG: KASAN: use-after-free in list_empty include/linux/list.h:268 [inline] BUG: KASAN: use-after-free in waitqueue_active include/linux/wait.h:126 [inline] BUG: KASAN: use-after-free in wq_has_sleeper include/linux/wait.h:160 [inline] BUG: KASAN: use-after-free in skwq_has_sleeper include/net/sock.h:2092 [inline] BUG: KASAN: use-after-free in sock_def_write_space+0x642/0x670 net/core/sock.c:2813 Read of size 8 at addr ffff88801e0b4078 by task ksoftirqd/3/27 CPU: 3 PID: 27 Comm: ksoftirqd/3 Not tainted 5.5.0-rc1-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135 __read_once_size include/linux/compiler.h:199 [inline] list_empty include/linux/list.h:268 [inline] waitqueue_active include/linux/wait.h:126 [inline] wq_has_sleeper include/linux/wait.h:160 [inline] skwq_has_sleeper include/net/sock.h:2092 [inline] sock_def_write_space+0x642/0x670 net/core/sock.c:2813 sock_wfree+0x1e1/0x260 net/core/sock.c:1958 skb_release_head_state+0xeb/0x260 net/core/skbuff.c:652 skb_release_all+0x16/0x60 net/core/skbuff.c:663 __kfree_skb net/core/skbuff.c:679 [inline] consume_skb net/core/skbuff.c:838 [inline] consume_skb+0xfb/0x410 net/core/skbuff.c:832 __dev_kfree_skb_any+0xa4/0xd0 net/core/dev.c:2967 dev_kfree_skb_any include/linux/netdevice.h:3650 [inline] e1000_unmap_and_free_tx_resource.isra.0+0x21b/0x3a0 drivers/net/ethernet/intel/e1000/e1000_main.c:1963 e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3854 [inline] e1000_clean+0x4cc/0x1d10 drivers/net/ethernet/intel/e1000/e1000_main.c:3796 napi_poll net/core/dev.c:6532 [inline] net_rx_action+0x508/0x1120 net/core/dev.c:6600 __do_softirq+0x262/0x98c kernel/softirq.c:292 run_ksoftirqd kernel/softirq.c:603 [inline] run_ksoftirqd+0x8e/0x110 kernel/softirq.c:595 smpboot_thread_fn+0x6a3/0xa40 kernel/smpboot.c:165 kthread+0x361/0x430 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Allocated by task 8247: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc mm/kasan/common.c:513 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486 kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:521 slab_post_alloc_hook mm/slab.h:584 [inline] slab_alloc mm/slab.c:3320 [inline] kmem_cache_alloc+0x121/0x710 mm/slab.c:3484 sock_alloc_inode+0x1c/0x1d0 net/socket.c:240 alloc_inode+0x68/0x1e0 fs/inode.c:230 new_inode_pseudo+0x19/0xf0 fs/inode.c:919 sock_alloc+0x41/0x270 net/socket.c:560 __sock_create+0xc2/0x730 net/socket.c:1384 sock_create net/socket.c:1471 [inline] __sys_socket+0x103/0x220 net/socket.c:1513 __do_sys_socket net/socket.c:1522 [inline] __se_sys_socket net/socket.c:1520 [inline] __ia32_sys_socket+0x73/0xb0 net/socket.c:1520 do_syscall_32_irqs_on arch/x86/entry/common.c:337 [inline] do_fast_syscall_32+0x27b/0xe16 arch/x86/entry/common.c:408 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139 Freed by task 17: save_stack+0x23/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:335 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:474 kasan_slab_free+0xe/0x10 mm/kasan/common.c:483 __cache_free mm/slab.c:3426 [inline] kmem_cache_free+0x86/0x320 mm/slab.c:3694 sock_free_inode+0x20/0x30 net/socket.c:261 i_callback+0x44/0x80 fs/inode.c:219 __rcu_reclaim kernel/rcu/rcu.h:222 [inline] rcu_do_batch kernel/rcu/tree.c:2183 [inline] rcu_core+0x570/0x1540 kernel/rcu/tree.c:2408 rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2417 __do_softirq+0x262/0x98c kernel/softirq.c:292 The buggy address belongs to the object at ffff88801e0b4000 which belongs to the cache sock_inode_cache of size 1152 The buggy address is located 120 bytes inside of 1152-byte region [ffff88801e0b4000, ffff88801e0b4480) The buggy address belongs to the page: page:ffffea0000782d00 refcount:1 mapcount:0 mapping:ffff88807aa59c40 index:0xffff88801e0b4ffd raw: 00fffe0000000200 ffffea00008e6c88 ffffea0000782d48 ffff88807aa59c40 raw: ffff88801e0b4ffd ffff88801e0b4000 0000000100000003 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88801e0b3f00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff88801e0b3f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88801e0b4000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88801e0b4080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801e0b4100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: abf9d537fea2 ("llc: add support for SO_BINDTODEVICE") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28net: explain the lockdep annotations for dev_uc_unsync()Gravatar Cong Wang 1-0/+10
The lockdep annotations for dev_uc_unsync() and dev_mc_unsync() are not easy to understand, so add some comments to explain why they are correct. Similar for the rest netif_addr_lock_bh() cases, they don't need nested version. Cc: Taehee Yoo <ap420073@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28net: get rid of lockdep_set_class_and_subclass()Gravatar Cong Wang 1-5/+4
lockdep_set_class_and_subclass() is meant to reduce the _nested() annotations by assigning a default subclass. For addr_list_lock, we have to compute the subclass at run-time as the netdevice topology changes after creation. So, we should just get rid of these lockdep_set_class_and_subclass() and stick with our _nested() annotations. Fixes: 845e0ebb4408 ("net: change addr_list_lock back to static key") Suggested-by: Taehee Yoo <ap420073@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28bridge: mrp: Fix endian conversion and some other warningsGravatar Horatiu Vultur 3-3/+3
The following sparse warnings are fixed: net/bridge/br_mrp.c:106:18: warning: incorrect type in assignment (different base types) net/bridge/br_mrp.c:106:18: expected unsigned short [usertype] net/bridge/br_mrp.c:106:18: got restricted __be16 [usertype] net/bridge/br_mrp.c:281:23: warning: incorrect type in argument 1 (different modifiers) net/bridge/br_mrp.c:281:23: expected struct list_head *entry net/bridge/br_mrp.c:281:23: got struct list_head [noderef] * net/bridge/br_mrp.c:332:28: warning: incorrect type in argument 1 (different modifiers) net/bridge/br_mrp.c:332:28: expected struct list_head *new net/bridge/br_mrp.c:332:28: got struct list_head [noderef] * net/bridge/br_mrp.c:332:40: warning: incorrect type in argument 2 (different modifiers) net/bridge/br_mrp.c:332:40: expected struct list_head *head net/bridge/br_mrp.c:332:40: got struct list_head [noderef] * net/bridge/br_mrp.c:682:29: warning: incorrect type in argument 1 (different modifiers) net/bridge/br_mrp.c:682:29: expected struct list_head const *head net/bridge/br_mrp.c:682:29: got struct list_head [noderef] * Reported-by: kernel test robot <lkp@intel.com> Fixes: 2f1a11ae11d222 ("bridge: mrp: Add MRP interface.") Fixes: 4b8d7d4c599182 ("bridge: mrp: Extend bridge interface") Fixes: 9a9f26e8f7ea30 ("bridge: mrp: Connect MRP API with the switchdev API") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28bpf, sockmap: RCU dereferenced psock may be used outside RCU blockGravatar John Fastabend 1-1/+9
If an ingress verdict program specifies message sizes greater than skb->len and there is an ENOMEM error due to memory pressure we may call the rcv_msg handler outside the strp_data_ready() caller context. This is because on an ENOMEM error the strparser will retry from a workqueue. The caller currently protects the use of psock by calling the strp_data_ready() inside a rcu_read_lock/unlock block. But, in above workqueue error case the psock is accessed outside the read_lock/unlock block of the caller. So instead of using psock directly we must do a look up against the sk again to ensure the psock is available. There is an an ugly piece here where we must handle the case where we paused the strp and removed the psock. On psock removal we first pause the strparser and then remove the psock. If the strparser is paused while an skb is scheduled on the workqueue the skb will be dropped on the flow and kfree_skb() is called. If the workqueue manages to get called before we pause the strparser but runs the rcvmsg callback after the psock is removed we will hit the unlikely case where we run the sockmap rcvmsg handler but do not have a psock. For now we will follow strparser logic and drop the skb on the floor with skb_kfree(). This is ugly because the data is dropped. To date this has not caused problems in practice because either the application controlling the sockmap is coordinating with the datapath so that skbs are "flushed" before removal or we simply wait for the sock to be closed before removing it. This patch fixes the describe RCU bug and dropping the skb doesn't make things worse. Future patches will improve this by allowing the normal case where skbs are not merged to skip the strparser altogether. In practice many (most?) use cases have no need to merge skbs so its both a code complexity hit as seen above and a performance issue. For example, in the Cilium case we always set the strparser up to return sbks 1:1 without any merging and have avoided above issues. Fixes: e91de6afa81c1 ("bpf: Fix running sk_skb program types with ktls") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159312679888.18340.15248924071966273998.stgit@john-XPS-13-9370
2020-06-28bpf, sockmap: RCU splat with redirect and strparser error or TLSGravatar John Fastabend 1-7/+6
There are two paths to generate the below RCU splat the first and most obvious is the result of the BPF verdict program issuing a redirect on a TLS socket (This is the splat shown below). Unlike the non-TLS case the caller of the *strp_read() hooks does not wrap the call in a rcu_read_lock/unlock. Then if the BPF program issues a redirect action we hit the RCU splat. However, in the non-TLS socket case the splat appears to be relatively rare, because the skmsg caller into the strp_data_ready() is wrapped in a rcu_read_lock/unlock. Shown here, static void sk_psock_strp_data_ready(struct sock *sk) { struct sk_psock *psock; rcu_read_lock(); psock = sk_psock(sk); if (likely(psock)) { if (tls_sw_has_ctx_rx(sk)) { psock->parser.saved_data_ready(sk); } else { write_lock_bh(&sk->sk_callback_lock); strp_data_ready(&psock->parser.strp); write_unlock_bh(&sk->sk_callback_lock); } } rcu_read_unlock(); } If the above was the only way to run the verdict program we would be safe. But, there is a case where the strparser may throw an ENOMEM error while parsing the skb. This is a result of a failed skb_clone, or alloc_skb_for_msg while building a new merged skb when the msg length needed spans multiple skbs. This will in turn put the skb on the strp_wrk workqueue in the strparser code. The skb will later be dequeued and verdict programs run, but now from a different context without the rcu_read_lock()/unlock() critical section in sk_psock_strp_data_ready() shown above. In practice I have not seen this yet, because as far as I know most users of the verdict programs are also only working on single skbs. In this case no merge happens which could trigger the above ENOMEM errors. In addition the system would need to be under memory pressure. For example, we can't hit the above case in selftests because we missed having tests to merge skbs. (Added in later patch) To fix the below splat extend the rcu_read_lock/unnlock block to include the call to sk_psock_tls_verdict_apply(). This will fix both TLS redirect case and non-TLS redirect+error case. Also remove psock from the sk_psock_tls_verdict_apply() function signature its not used there. [ 1095.937597] WARNING: suspicious RCU usage [ 1095.940964] 5.7.0-rc7-02911-g463bac5f1ca79 #1 Tainted: G W [ 1095.944363] ----------------------------- [ 1095.947384] include/linux/skmsg.h:284 suspicious rcu_dereference_check() usage! [ 1095.950866] [ 1095.950866] other info that might help us debug this: [ 1095.950866] [ 1095.957146] [ 1095.957146] rcu_scheduler_active = 2, debug_locks = 1 [ 1095.961482] 1 lock held by test_sockmap/15970: [ 1095.964501] #0: ffff9ea6b25de660 (sk_lock-AF_INET){+.+.}-{0:0}, at: tls_sw_recvmsg+0x13a/0x840 [tls] [ 1095.968568] [ 1095.968568] stack backtrace: [ 1095.975001] CPU: 1 PID: 15970 Comm: test_sockmap Tainted: G W 5.7.0-rc7-02911-g463bac5f1ca79 #1 [ 1095.977883] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 1095.980519] Call Trace: [ 1095.982191] dump_stack+0x8f/0xd0 [ 1095.984040] sk_psock_skb_redirect+0xa6/0xf0 [ 1095.986073] sk_psock_tls_strp_read+0x1d8/0x250 [ 1095.988095] tls_sw_recvmsg+0x714/0x840 [tls] v2: Improve commit message to identify non-TLS redirect plus error case condition as well as more common TLS case. In the process I decided doing the rcu_read_unlock followed by the lock/unlock inside branches was unnecessarily complex. We can just extend the current rcu block and get the same effeective without the shuffling and branching. Thanks Martin! Fixes: e91de6afa81c1 ("bpf: Fix running sk_skb program types with ktls") Reported-by: Jakub Sitnicki <jakub@cloudflare.com> Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/159312677907.18340.11064813152758406626.stgit@john-XPS-13-9370
2020-06-27net: ipv4: Fix wrong type conversion from hint to rt in ip_route_use_hint()Gravatar Miaohe Lin 1-1/+1
We can't cast sk_buff to rtable by (struct rtable *)hint. Use skb_rtable(). Fixes: 02b24941619f ("ipv4: use dst hint for ipv4 list receive") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-27Merge tag 'nfs-for-5.8-2' of git://git.linux-nfs.org/projects/anna/linux-nfsGravatar Linus Torvalds 7-41/+57
Pull NFS client bugfixes from Anna Schumaker: "Stable Fixes: - xprtrdma: Fix handling of RDMA_ERROR replies - sunrpc: Fix rollback in rpc_gssd_dummy_populate() - pNFS/flexfiles: Fix list corruption if the mirror count changes - NFSv4: Fix CLOSE not waiting for direct IO completion - SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment() Other Fixes: - xprtrdma: Fix a use-after-free with r_xprt->rx_ep - Fix other xprtrdma races during disconnect - NFS: Fix memory leak of export_path" * tag 'nfs-for-5.8-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment() NFSv4 fix CLOSE not waiting for direct IO compeletion pNFS/flexfiles: Fix list corruption if the mirror count changes nfs: Fix memory leak of export_path sunrpc: fixed rollback in rpc_gssd_dummy_populate() xprtrdma: Fix handling of RDMA_ERROR replies xprtrdma: Clean up disconnect xprtrdma: Clean up synopsis of rpcrdma_flush_disconnect() xprtrdma: Use re_connect_status safely in rpcrdma_xprt_connect() xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed
2020-06-26SUNRPC: Properly set the @subbuf parameter of xdr_buf_subsegment()Gravatar Chuck Lever 1-0/+4
@subbuf is an output parameter of xdr_buf_subsegment(). A survey of call sites shows that @subbuf is always uninitialized before xdr_buf_segment() is invoked by callers. There are some execution paths through xdr_buf_subsegment() that do not set all of the fields in @subbuf, leaving some pointer fields containing garbage addresses. Subsequent processing of that buffer then results in a page fault. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-26sunrpc: fixed rollback in rpc_gssd_dummy_populate()Gravatar Vasily Averin 1-0/+1
__rpc_depopulate(gssd_dentry) was lost on error path cc: stable@vger.kernel.org Fixes: commit 4b9a445e3eeb ("sunrpc: create a new dummy pipe for gssd to hold open") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-26nl80211: fix memory leak when parsing NL80211_ATTR_HE_BSS_COLORGravatar Luca Coelho 1-1/+1
If there is an error when parsing the NL80211_ATTR_HE_BSS_COLOR attribute, we return immediately without freeing param.acl. Fit it by using goto out instead of returning immediately. Fixes: 5c5e52d1bb96 ("nl80211: add handling for BSS color") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20200626124931.7ad2a3eb894f.I60905fb70bd20389a3b170db515a07275e31845e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-06-26nl80211: don't return err unconditionally in nl80211_start_ap()Gravatar Luca Coelho 1-1/+2
When a memory leak was fixed, a return err was changed to goto err, but, accidentally, the if (err) was removed, so now we always exit at this point. Fix it by adding if (err) back. Fixes: 9951ebfcdf2b ("nl80211: fix potential leak in AP start") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20200626124931.871ba5b31eee.I97340172d92164ee92f3c803fe20a8a6e97714e1@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-06-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netGravatar Linus Torvalds 105-270/+580
Pull networking fixes from David Miller: 1) Don't insert ESP trailer twice in IPSEC code, from Huy Nguyen. 2) The default crypto algorithm selection in Kconfig for IPSEC is out of touch with modern reality, fix this up. From Eric Biggers. 3) bpftool is missing an entry for BPF_MAP_TYPE_RINGBUF, from Andrii Nakryiko. 4) Missing init of ->frame_sz in xdp_convert_zc_to_xdp_frame(), from Hangbin Liu. 5) Adjust packet alignment handling in ax88179_178a driver to match what the hardware actually does. From Jeremy Kerr. 6) register_netdevice can leak in the case one of the notifiers fail, from Yang Yingliang. 7) Use after free in ip_tunnel_lookup(), from Taehee Yoo. 8) VLAN checks in sja1105 DSA driver need adjustments, from Vladimir Oltean. 9) tg3 driver can sleep forever when we get enough EEH errors, fix from David Christensen. 10) Missing {READ,WRITE}_ONCE() annotations in various Intel ethernet drivers, from Ciara Loftus. 11) Fix scanning loop break condition in of_mdiobus_register(), from Florian Fainelli. 12) MTU limit is incorrect in ibmveth driver, from Thomas Falcon. 13) Endianness fix in mlxsw, from Ido Schimmel. 14) Use after free in smsc95xx usbnet driver, from Tuomas Tynkkynen. 15) Missing bridge mrp configuration validation, from Horatiu Vultur. 16) Fix circular netns references in wireguard, from Jason A. Donenfeld. 17) PTP initialization on recovery is not done properly in qed driver, from Alexander Lobakin. 18) Endian conversion of L4 ports in filters of cxgb4 driver is wrong, from Rahul Lakkireddy. 19) Don't clear bound device TX queue of socket prematurely otherwise we get problems with ktls hw offloading, from Tariq Toukan. 20) ipset can do atomics on unaligned memory, fix from Russell King. 21) Align ethernet addresses properly in bridging code, from Thomas Martitz. 22) Don't advertise ipv4 addresses on SCTP sockets having ipv6only set, from Marcelo Ricardo Leitner. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (149 commits) rds: transport module should be auto loaded when transport is set sch_cake: fix a few style nits sch_cake: don't call diffserv parsing code when it is not needed sch_cake: don't try to reallocate or unshare skb unconditionally ethtool: fix error handling in linkstate_prepare_data() wil6210: account for napi_gro_receive never returning GRO_DROP hns: do not cast return value of napi_gro_receive to null socionext: account for napi_gro_receive never returning GRO_DROP wireguard: receive: account for napi_gro_receive never returning GRO_DROP vxlan: fix last fdb index during dump of fdb with nhid sctp: Don't advertise IPv4 addresses if ipv6only is set on the socket tc-testing: avoid action cookies with odd length. bpf: tcp: bpf_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTT net: dsa: sja1105: fix tc-gate schedule with single element net: dsa: sja1105: recalculate gating subschedule after deleting tc-gate rules net: dsa: sja1105: unconditionally free old gating config net: dsa: sja1105: move sja1105_compose_gating_subschedule at the top net: macb: free resources on failure path of at91ether_open() net: macb: call pm_runtime_put_sync on failure path ...
2020-06-25rds: transport module should be auto loaded when transport is setGravatar Rao Shoaib 1-9/+17
This enhancement auto loads transport module when the transport is set via SO_RDS_TRANSPORT socket option. Reviewed-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25sch_cake: fix a few style nitsGravatar Toke Høiland-Jørgensen 1-2/+2
I spotted a few nits when comparing the in-tree version of sch_cake with the out-of-tree one: A redundant error variable declaration shadowing an outer declaration, and an indentation alignment issue. Fix both of these. Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25sch_cake: don't call diffserv parsing code when it is not neededGravatar Toke Høiland-Jørgensen 1-4/+9
As a further optimisation of the diffserv parsing codepath, we can skip it entirely if CAKE is configured to neither use diffserv-based classification, nor to zero out the diffserv bits. Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before changing DSCP bits") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25sch_cake: don't try to reallocate or unshare skb unconditionallyGravatar Ilya Ponetayev 1-11/+30
cake_handle_diffserv() tries to linearize mac and network header parts of skb and to make it writable unconditionally. In some cases it leads to full skb reallocation, which reduces throughput and increases CPU load. Some measurements of IPv4 forward + NAPT on MIPS router with 580 MHz single-core CPU was conducted. It appears that on kernel 4.9 skb_try_make_writable() reallocates skb, if skb was allocated in ethernet driver via so-called 'build skb' method from page cache (it was discovered by strange increase of kmalloc-2048 slab at first). Obtain DSCP value via read-only skb_header_pointer() call, and leave linearization only for DSCP bleaching or ECN CE setting. And, as an additional optimisation, skip diffserv parsing entirely if it is not needed by the current configuration. Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before changing DSCP bits") Signed-off-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com> [ fix a few style issues, reflow commit message ] Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25ethtool: fix error handling in linkstate_prepare_data()Gravatar Michal Kubecek 1-6/+5
When getting SQI or maximum SQI value fails in linkstate_prepare_data(), we must not return without calling ethnl_ops_complete(dev) as that could result in imbalance between ethtool_ops ->begin() and ->complete() calls. Fixes: 806602191592 ("ethtool: provide UAPI for PHY Signal Quality Index (SQI)") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25sctp: Don't advertise IPv4 addresses if ipv6only is set on the socketGravatar Marcelo Ricardo Leitner 3-2/+7
If a socket is set ipv6only, it will still send IPv4 addresses in the INIT and INIT_ACK packets. This potentially misleads the peer into using them, which then would cause association termination. The fix is to not add IPv4 addresses to ipv6only sockets. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25tcp_cubic: fix spurious HYSTART_DELAY exit upon drop in min RTTGravatar Neal Cardwell 1-3/+2
Mirja Kuehlewind reported a bug in Linux TCP CUBIC Hystart, where Hystart HYSTART_DELAY mechanism can exit Slow Start spuriously on an ACK when the minimum rtt of a connection goes down. From inspection it is clear from the existing code that this could happen in an example like the following: o The first 8 RTT samples in a round trip are 150ms, resulting in a curr_rtt of 150ms and a delay_min of 150ms. o The 9th RTT sample is 100ms. The curr_rtt does not change after the first 8 samples, so curr_rtt remains 150ms. But delay_min can be lowered at any time, so delay_min falls to 100ms. The code executes the HYSTART_DELAY comparison between curr_rtt of 150ms and delay_min of 100ms, and the curr_rtt is declared far enough above delay_min to force a (spurious) exit of Slow start. The fix here is simple: allow every RTT sample in a round trip to lower the curr_rtt. Fixes: ae27e98a5152 ("[TCP] CUBIC v2.3") Reported-by: Mirja Kuehlewind <mirja.kuehlewind@ericsson.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfGravatar David S. Miller 54-15/+159
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net, they are: 1) Unaligned atomic access in ipset, from Russell King. 2) Missing module description, from Rob Gill. 3) Patches to fix a module unload causing NULL pointer dereference in xtables, from David Wilder. For the record, I posting here his cover letter explaining the problem: A crash happened on ppc64le when running ltp network tests triggered by "rmmod iptable_mangle". See previous discussion in this thread: https://lists.openwall.net/netdev/2020/06/03/161 . In the crash I found in iptable_mangle_hook() that state->net->ipv4.iptable_mangle=NULL causing a NULL pointer dereference. net->ipv4.iptable_mangle is set to NULL in +iptable_mangle_net_exit() and called when ip_mangle modules is unloaded. A rmmod task was found running in the crash dump. A 2nd crash showed the same problem when running "rmmod iptable_filter" (net->ipv4.iptable_filter=NULL). To fix this I added .pre_exit hook in all iptable_foo.c. The pre_exit will un-register the underlying hook and exit would do the table freeing. The netns core does an unconditional +synchronize_rcu after the pre_exit hooks insuring no packets are in flight that have picked up the pointer before completing the un-register. These patches include changes for both iptables and ip6tables. We tested this fix with ltp running iptables01.sh and iptables01.sh -6 a loop for 72 hours. 4) Add a selftest for conntrack helper assignment, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25net: bridge: enfore alignment for ethernet addressGravatar Thomas Martitz 1-1/+1
The eth_addr member is passed to ether_addr functions that require 2-byte alignment, therefore the member must be properly aligned to avoid unaligned accesses. The problem is in place since the initial merge of multicast to unicast: commit 6db6f0eae6052b70885562e1733896647ec1d807 bridge: multicast to unicast Fixes: 6db6f0eae605 ("bridge: multicast to unicast") Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: David S. Miller <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Felix Fietkau <nbd@nbd.name> Signed-off-by: Thomas Martitz <t.martitz@avm.de> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25tcp: don't ignore ECN CWR on pure ACKGravatar Denis Kirjanov 1-3/+11
there is a problem with the CWR flag set in an incoming ACK segment and it leads to the situation when the ECE flag is latched forever the following packetdrill script shows what happens: // Stack receives incoming segments with CE set +0.1 <[ect0] . 11001:12001(1000) ack 1001 win 65535 +0.0 <[ce] . 12001:13001(1000) ack 1001 win 65535 +0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535 // Stack repsonds with ECN ECHO +0.0 >[noecn] . 1001:1001(0) ack 12001 +0.0 >[noecn] E. 1001:1001(0) ack 13001 +0.0 >[noecn] E. 1001:1001(0) ack 14001 // Write a packet +0.1 write(3, ..., 1000) = 1000 +0.0 >[ect0] PE. 1001:2001(1000) ack 14001 // Pure ACK received +0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535 // Since CWR was sent, this packet should NOT have ECE set +0.1 write(3, ..., 1000) = 1000 +0.0 >[ect0] P. 2001:3001(1000) ack 14001 // but Linux will still keep ECE latched here, with packetdrill // flagging a missing ECE flag, expecting // >[ect0] PE. 2001:3001(1000) ack 14001 // in the script In the situation above we will continue to send ECN ECHO packets and trigger the peer to reduce the congestion window. To avoid that we can check CWR on pure ACKs received. v3: - Add a sequence check to avoid sending an ACK to an ACK v2: - Adjusted the comment - move CWR check before checking for unacknowledged packets Signed-off-by: Denis Kirjanov <denis.kirjanov@suse.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25mac80211: allow rx of mesh eapol frames with default rx keyGravatar Markus Theil 1-0/+26
Without this patch, eapol frames cannot be received in mesh mode, when 802.1X should be used. Initially only a MGTK is defined, which is found and set as rx->key, when there are no other keys set. ieee80211_drop_unencrypted would then drop these eapol frames, as they are data frames without encryption and there exists some rx->key. Fix this by differentiating between mesh eapol frames and other data frames with existing rx->key. Allow mesh mesh eapol frames only if they are for our vif address. With this patch in-place, ieee80211_rx_h_mesh_fwding continues after the ieee80211_drop_unencrypted check and notices, that these eapol frames have to be delivered locally, as they should. Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200625104214.50319-1-markus.theil@tu-ilmenau.de [small code cleanups] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-06-25mac80211: skip mpath lookup also for control port txGravatar Markus Theil 1-1/+5
When using 802.1X over mesh networks, at first an ordinary mesh peering is established, then the 802.1X EAPOL dialog happens, afterwards an authenticated mesh peering exchange (AMPE) happens, finally the peering is complete and we can set the STA authorized flag. As 802.1X is an intermediate step here and key material is not yet exchanged for stations we have to skip mesh path lookup for these EAPOL frames. Otherwise the already configure mesh group encryption key would be used to send a mesh path request which no one can decipher, because we didn't already establish key material on both peers, like with SAE and directly using AMPE. Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Link: https://lore.kernel.org/r/20200617082637.22670-2-markus.theil@tu-ilmenau.de [remove pointless braces, remove unnecessary local variable, the list can only process one such frame (or its fragments)] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-06-25mac80211: Fix dropping broadcast packets in 802.11 encapGravatar Seevalamuthu Mariappan 1-1/+1
Broadcast pkts like arp are getting dropped in 'ieee80211_8023_xmit'. Fix this by replacing is_valid_ether_addr api with is_zero_ether_addr. Fixes: 50ff477a8639 ("mac80211: add 802.11 encapsulation offloading support") Signed-off-by: Seevalamuthu Mariappan <seevalam@codeaurora.org> Link: https://lore.kernel.org/r/1591697754-4975-1-git-send-email-seevalam@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-06-25mac80211: simplify mesh codeGravatar Pavel Machek 1-5/+2
Doing mod_timer() conditionaly is easier than conditionally unlocking and jumping around... Signed-off-by: Pavel Machek (CIP) <pavel@denx.de> Acked-by: Linus Lüssing <ll@simonwunderlich.de> Link: https://lore.kernel.org/r/20200604214157.GA9737@amd Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-06-25mac80211: fix control port tx status checkGravatar Markus Theil 1-7/+15
The initial control port tx status patch assumed, that we have IEEE 802.11 frames, but actually ethernet frames are stored in the ack skb. Fix this by checking for the correct ethertype and skb protocol 802.3. Also allow tx status reports for ETH_P_PREAUTH, as preauth frames can also be send over the nl80211 control port. Fixes: a7528198add8 ("mac80211: support control port TX status reporting") Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20200622123542.173695-1-markus.theil@tu-ilmenau.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-06-25netfilter: ip6tables: Add a .pre_exit hook in all ip6table_foo.c.Gravatar David Wilder 5-6/+44
Using new helpers ip6t_unregister_table_pre_exit() and ip6t_unregister_table_exit(). Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: ip6tables: Split ip6t_unregister_table() into pre_exit and exit ↵Gravatar David Wilder 1-1/+14
helpers. The pre_exit will un-register the underlying hook and .exit will do the table freeing. The netns core does an unconditional synchronize_rcu after the pre_exit hooks insuring no packets are in flight that have picked up the pointer before completing the un-register. Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: iptables: Add a .pre_exit hook in all iptable_foo.c.Gravatar David Wilder 5-7/+44
Using new helpers ipt_unregister_table_pre_exit() and ipt_unregister_table_exit(). Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: iptables: Split ipt_unregister_table() into pre_exit and exit ↵Gravatar David Wilder 1-1/+14
helpers. The pre_exit will un-register the underlying hook and .exit will do the table freeing. The netns core does an unconditional synchronize_rcu after the pre_exit hooks insuring no packets are in flight that have picked up the pointer before completing the un-register. Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default") Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: Add MODULE_DESCRIPTION entries to kernel modulesGravatar Rob Gill 41-0/+41
The user tool modinfo is used to get information on kernel modules, including a description where it is available. This patch adds a brief MODULE_DESCRIPTION to netfilter kernel modules (descriptions taken from Kconfig file or code comments) Signed-off-by: Rob Gill <rrobgill@protonmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-25netfilter: ipset: fix unaligned atomic accessGravatar Russell King 1-0/+2
When using ip_set with counters and comment, traffic causes the kernel to panic on 32-bit ARM: Alignment trap: not handling instruction e1b82f9f at [<bf01b0dc>] Unhandled fault: alignment exception (0x221) at 0xea08133c PC is at ip_set_match_extensions+0xe0/0x224 [ip_set] The problem occurs when we try to update the 64-bit counters - the faulting address above is not 64-bit aligned. The problem occurs due to the way elements are allocated, for example: set->dsize = ip_set_elem_len(set, tb, 0, 0); map = ip_set_alloc(sizeof(*map) + elements * set->dsize); If the element has a requirement for a member to be 64-bit aligned, and set->dsize is not a multiple of 8, but is a multiple of four, then every odd numbered elements will be misaligned - and hitting an atomic64_add() on that element will cause the kernel to panic. ip_set_elem_len() must return a size that is rounded to the maximum alignment of any extension field stored in the element. This change ensures that is the case. Fixes: 95ad1f4a9358 ("netfilter: ipset: Fix extension alignment") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-24dsa: Allow forwarding of redirected IGMP trafficGravatar Daniel Mack 1-3/+34
The driver for Marvell switches puts all ports in IGMP snooping mode which results in all IGMP/MLD frames that ingress on the ports to be forwarded to the CPU only. The bridge code in the kernel can then interpret these frames and act upon them, for instance by updating the mdb in the switch to reflect multicast memberships of stations connected to the ports. However, the IGMP/MLD frames must then also be forwarded to other ports of the bridge so external IGMP queriers can track membership reports, and external multicast clients can receive query reports from foreign IGMP queriers. Currently, this is impossible as the EDSA tagger sets offload_fwd_mark on the skb when it unwraps the tagged frames, and that will make the switchdev layer prevent the skb from egressing on any other port of the same switch. To fix that, look at the To_CPU code in the DSA header and make forwarding of the frame possible for trapped IGMP packets. Introduce some #defines for the frame types to make the code a bit more comprehensive. This was tested on a Marvell 88E6352 variant. Signed-off-by: Daniel Mack <daniel@zonque.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24openvswitch: take into account de-fragmentation/gso_size in ↵Gravatar Lorenzo Bianconi 1-2/+7
execute_check_pkt_len ovs connection tracking module performs de-fragmentation on incoming fragmented traffic. Take info account if traffic has been de-fragmented in execute_check_pkt_len action otherwise we will perform the wrong nested action considering the original packet size. This issue typically occurs if ovs-vswitchd adds a rule in the pipeline that requires connection tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action. Moreover take into account GSO fragment size for GSO packet in execute_check_pkt_len routine Fixes: 4d5ec89fc8d14 ("net: openvswitch: Add a new action check_pkt_len") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23net: Do not clear the sock TX queue in sk_set_socket()Gravatar Tariq Toukan 1-0/+2
Clearing the sock TX queue in sk_set_socket() might cause unexpected out-of-order transmit when called from sock_orphan(), as outstanding packets can pick a different TX queue and bypass the ones already queued. This is undesired in general. More specifically, it breaks the in-order scheduling property guarantee for device-offloaded TLS sockets. Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it explicitly only where needed. Fixes: e022f0b4a03f ("net: Introduce sk_tx_queue_mapping") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23net: ethtool: Handle missing cable test TDR parametersGravatar Andrew Lunn 1-8/+9
A last minute change put the TDR cable test parameters into a nest. The validation is not sufficient, resulting in an oops if the nest is missing. Set default values first, then update them if the nest is provided. Fixes: f2bc8ad31a7f ("net: ethtool: Allow PHY cable test TDR data to configured") Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23net: ethtool: add missing string for NETIF_F_GSO_TUNNEL_REMCSUMGravatar Alexander Lobakin 1-0/+1
Commit e585f2363637 ("udp: Changes to udp_offload to support remote checksum offload") added new GSO type and a corresponding netdev feature, but missed Ethtool's 'netdev_features_strings' table. Give it a name so it will be exposed to userspace and become available for manual configuration. v3: - decouple from "netdev_features_strings[] cleanup" series; - no functional changes. v2: - don't split the "Fixes:" tag across lines; - no functional changes. Fixes: e585f2363637 ("udp: Changes to udp_offload to support remote checksum offload") Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-23bridge: mrp: Validate when setting the port roleGravatar Horatiu Vultur 1-2/+8
This patch adds specific checks for primary(0x0) and secondary(0x1) when setting the port role. For any other value the function 'br_mrp_set_port_role' will return -EINVAL. Fixes: 20f6a05ef63594 ("bridge: mrp: Rework the MRP netlink interface") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22mptcp: drop sndr_key in mptcp_syn_optionsGravatar Geliang Tang 1-2/+0
In RFC 8684, we don't need to send sndr_key in SYN package anymore, so drop it. Fixes: cc7972ea1932 ("mptcp: parse and emit MP_CAPABLE option according to v1 spec") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22ethtool: Fix check in ethtool_rx_flow_rule_createGravatar Gaurav Singh 1-1/+1
Fix check in ethtool_rx_flow_rule_create Fixes: eca4205f9ec3 ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator") Signed-off-by: Gaurav Singh <gaurav1086@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22hsr: avoid to create proc file after unregisterGravatar Taehee Yoo 4-24/+25
When an interface is being deleted, "/proc/net/dev_snmp6/<interface name>" is deleted. The function for this is addrconf_ifdown() in the addrconf_notify() and it is called by notification, which is NETDEV_UNREGISTER. But, if NETDEV_CHANGEMTU is triggered after NETDEV_UNREGISTER, this proc file will be created again. This recreated proc file will be deleted by netdev_wati_allrefs(). Before netdev_wait_allrefs() is called, creating a new HSR interface routine can be executed and It tries to create a proc file but it will find an un-deleted proc file. At this point, it warns about it. To avoid this situation, it can use ->dellink() instead of ->ndo_uninit() to release resources because ->dellink() is called before NETDEV_UNREGISTER. So, a proc file will not be recreated. Test commands ip link add dummy0 type dummy ip link add dummy1 type dummy ip link set dummy0 mtu 1300 #SHELL1 while : do ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1 done #SHELL2 while : do ip link del hsr0 done Splat looks like: [ 9888.980852][ T2752] proc_dir_entry 'dev_snmp6/hsr0' already registered [ 9888.981797][ C2] WARNING: CPU: 2 PID: 2752 at fs/proc/generic.c:372 proc_register+0x2d5/0x430 [ 9888.981798][ C2] Modules linked in: hsr dummy veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6x [ 9888.981814][ C2] CPU: 2 PID: 2752 Comm: ip Tainted: G W 5.8.0-rc1+ #616 [ 9888.981815][ C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 9888.981816][ C2] RIP: 0010:proc_register+0x2d5/0x430 [ 9888.981818][ C2] Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 65 01 00 00 49 8b b5 e0 00 00 00 48 89 ea 40 [ 9888.981819][ C2] RSP: 0018:ffff8880628dedf0 EFLAGS: 00010286 [ 9888.981821][ C2] RAX: dffffc0000000008 RBX: ffff888028c69170 RCX: ffffffffaae09a62 [ 9888.981822][ C2] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88806c9f75ac [ 9888.981823][ C2] RBP: ffff888028c693f4 R08: ffffed100d9401bd R09: ffffed100d9401bd [ 9888.981824][ C2] R10: ffffffffaddf406f R11: 0000000000000001 R12: ffff888028c69308 [ 9888.981825][ C2] R13: ffff8880663584c8 R14: dffffc0000000000 R15: ffffed100518d27e [ 9888.981827][ C2] FS: 00007f3876b3b0c0(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000 [ 9888.981828][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9888.981829][ C2] CR2: 00007f387601a8c0 CR3: 000000004101a002 CR4: 00000000000606e0 [ 9888.981830][ C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 9888.981831][ C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 9888.981832][ C2] Call Trace: [ 9888.981833][ C2] ? snmp6_seq_show+0x180/0x180 [ 9888.981834][ C2] proc_create_single_data+0x7c/0xa0 [ 9888.981835][ C2] snmp6_register_dev+0xb0/0x130 [ 9888.981836][ C2] ipv6_add_dev+0x4b7/0xf60 [ 9888.981837][ C2] addrconf_notify+0x684/0x1ca0 [ 9888.981838][ C2] ? __mutex_unlock_slowpath+0xd0/0x670 [ 9888.981839][ C2] ? kasan_unpoison_shadow+0x30/0x40 [ 9888.981840][ C2] ? wait_for_completion+0x250/0x250 [ 9888.981841][ C2] ? inet6_ifinfo_notify+0x100/0x100 [ 9888.981842][ C2] ? dropmon_net_event+0x227/0x410 [ 9888.981843][ C2] ? notifier_call_chain+0x90/0x160 [ 9888.981844][ C2] ? inet6_ifinfo_notify+0x100/0x100 [ 9888.981845][ C2] notifier_call_chain+0x90/0x160 [ 9888.981846][ C2] register_netdevice+0xbe5/0x1070 [ ... ] Reported-by: syzbot+1d51c8b74efa4c44adeb@syzkaller.appspotmail.com Fixes: e0a4b99773d3 ("hsr: use upper/lower device infrastructure") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22xprtrdma: Fix handling of RDMA_ERROR repliesGravatar Chuck Lever 1-6/+3
The RPC client currently doesn't handle ERR_CHUNK replies correctly. rpcrdma_complete_rqst() incorrectly passes a negative number to xprt_complete_rqst() as the number of bytes copied. Instead, set task->tk_status to the error value, and return zero bytes copied. In these cases, return -EIO rather than -EREMOTEIO. The RPC client's finite state machine doesn't know what to do with -EREMOTEIO. Additional clean ups: - Don't double-count RDMA_ERROR replies - Remove a stale comment Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@kernel.vger.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-22xprtrdma: Clean up disconnectGravatar Chuck Lever 2-10/+14
1. Ensure that only rpcrdma_cm_event_handler() modifies ep->re_connect_status to avoid racy changes to that field. 2. Ensure that xprt_force_disconnect() is invoked only once as a transport is closed or destroyed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>