aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2022-03-17net: dsa: Handle MST state changesGravatar Tobias Waldekranz 3-8/+86
Add the usual trampoline functionality from the generic DSA layer down to the drivers for MST state changes. When a state changes to disabled/blocking/listening, make sure to fast age any dynamic entries in the affected VLANs (those controlled by the MSTI in question). Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: Pass VLAN MSTI migration notifications to driverGravatar Tobias Waldekranz 3-1/+23
Add the usual trampoline functionality from the generic DSA layer down to the drivers for VLAN MSTI migrations. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: Validate hardware support for MSTGravatar Tobias Waldekranz 3-0/+30
When joining a bridge where MST is enabled, we validate that the proper offloading support is in place, otherwise we fallback to software bridging. When then mode is changed on a bridge in which we are members, we refuse the change if offloading is not supported. At the moment we only check for configurable learning, but this will be further restricted as we support more MST related switchdev events. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Add helper to query a port's MST stateGravatar Tobias Waldekranz 1-0/+25
This is useful for switchdev drivers who are offloading MST states into hardware. As an example, a driver may wish to flush the FDB for a port when it transitions from forwarding to blocking - which means that the previous state must be discoverable. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Add helper to check if MST is enabledGravatar Tobias Waldekranz 1-0/+9
This is useful for switchdev drivers that might want to refuse to join a bridge where MST is enabled, if the hardware can't support it. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Add helper to map an MSTI to a VID setGravatar Tobias Waldekranz 1-0/+26
br_mst_get_info answers the question: "On this bridge, which VIDs are mapped to the given MSTI?" This is useful in switchdev drivers, which might have to fan-out operations, relating to an MSTI, per VLAN. An example: When a port's MST state changes from forwarding to blocking, a driver may choose to flush the dynamic FDB entries on that port to get faster reconvergence of the network, but this should only be done in the VLANs that are managed by the MSTI in question. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Notify switchdev drivers of MST state changesGravatar Tobias Waldekranz 1-0/+18
Generate a switchdev notification whenever an MST state changes. This notification is keyed by the VLANs MSTI rather than the VID, since multiple VLANs may share the same MST instance. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrationsGravatar Tobias Waldekranz 2-0/+59
Whenever a VLAN moves to a new MSTI, send a switchdev notification so that switchdevs can track a bridge's VID to MSTI mappings. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Notify switchdev drivers of MST mode changesGravatar Tobias Waldekranz 1-0/+11
Trigger a switchdev event whenever the bridge's MST mode is enabled/disabled. This allows constituent ports to either perform any required hardware config, or refuse the change if it not supported. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Support setting and reporting MST port statesGravatar Tobias Waldekranz 3-1/+192
Make it possible to change the port state in a given MSTI by extending the bridge port netlink interface (RTM_SETLINK on PF_BRIDGE).The proposed iproute2 interface would be: bridge mst set dev <PORT> msti <MSTI> state <STATE> Current states in all applicable MSTIs can also be dumped via a corresponding RTM_GETLINK. The proposed iproute interface looks like this: $ bridge mst port msti vb1 0 state forwarding 100 state disabled vb2 0 state forwarding 100 state forwarding The preexisting per-VLAN states are still valid in the MST mode (although they are read-only), and can be queried as usual if one is interested in knowing a particular VLAN's state without having to care about the VID to MSTI mapping (in this example VLAN 20 and 30 are bound to MSTI 100): $ bridge -d vlan port vlan-id vb1 10 state forwarding mcast_router 1 20 state disabled mcast_router 1 30 state disabled mcast_router 1 40 state forwarding mcast_router 1 vb2 10 state forwarding mcast_router 1 20 state forwarding mcast_router 1 30 state forwarding mcast_router 1 40 state forwarding mcast_router 1 Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Allow changing a VLAN's MSTIGravatar Tobias Waldekranz 3-0/+58
Allow a VLAN to move out of the CST (MSTI 0), to an independent tree. The user manages the VID to MSTI mappings via a global VLAN setting. The proposed iproute2 interface would be: bridge vlan global set dev br0 vid <VID> msti <MSTI> Changing the state in non-zero MSTIs is still not supported, but will be addressed in upcoming changes. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: bridge: mst: Multiple Spanning Tree (MST) modeGravatar Tobias Waldekranz 8-4/+175
Allow the user to switch from the current per-VLAN STP mode to an MST mode. Up to this point, per-VLAN STP states where always isolated from each other. This is in contrast to the MSTP standard (802.1Q-2018, Clause 13.5), where VLANs are grouped into MST instances (MSTIs), and the state is managed on a per-MSTI level, rather that at the per-VLAN level. Perhaps due to the prevalence of the standard, many switching ASICs are built after the same model. Therefore, add a corresponding MST mode to the bridge, which we can later add offloading support for in a straight-forward way. For now, all VLANs are fixed to MSTI 0, also called the Common Spanning Tree (CST). That is, all VLANs will follow the port-global state. Upcoming changes will make this actually useful by allowing VLANs to be mapped to arbitrary MSTIs and allow individual MSTI states to be changed. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17vlan: use correct format charactersGravatar Bill Wendling 1-1/+1
When compiling with -Wformat, clang emits the following warning: net/8021q/vlanproc.c:284:22: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat] mp->priority, ((mp->vlan_qos >> 13) & 0x7)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <morbo@google.com> Link: https://lore.kernel.org/r/20220316213125.2353370-1-morbo@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netGravatar Jakub Kicinski 11-56/+41
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: Add missing of_node_put() in dsa_port_parse_ofGravatar Miaoqian Lin 1-0/+1
The device_node pointer is returned by of_parse_phandle() with refcount incremented. We should use of_node_put() on it when done. Fixes: 6d4e5c570c2d ("net: dsa: get port type at parse time") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20220316082602.10785-1-linmq006@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-16net/sched: add vlan push_eth and pop_eth action to the hardware IRGravatar Maor Dickman 1-0/+13
Add vlan push_eth and pop_eth action to the hardware intermediate representation model which would subsequently allow it to be used by drivers for offload. Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-16net: dsa: Never offload FDB entries on standalone portsGravatar Tobias Waldekranz 1-0/+3
If a port joins a bridge that it can't offload, it will fallback to standalone mode and software bridging. In this case, we never want to offload any FDB entries to hardware either. Previously, for host addresses, we would eventually end up in dsa_port_bridge_host_fdb_add, which would unconditionally dereference dp->bridge and cause a segfault. Fixes: c26933639b54 ("net: dsa: request drivers to perform FDB isolation") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220315233033.1468071-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-16Merge tag 'linux-can-next-for-5.18-20220316' of ↵Gravatar Jakub Kicinski 1-32/+40
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2022-03-16 the first 3 patches are by Oliver Hartkopp target the CAN ISOTP protocol and fix a problem found by syzbot in isotp_bind(), return -EADDRNOTAVAIL in unbound sockets in isotp_recvmsg() and add support for MSG_TRUNC to isotp_recvmsg(). Amit Kumar Mahapatra converts the xilinx,can device tree bindings to yaml. The last patch is by Julia Lawall and fixes typos in the ucan driver. * tag 'linux-can-next-for-5.18-20220316' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: ucan: fix typos in comments dt-bindings: can: xilinx_can: Convert Xilinx CAN binding to YAML can: isotp: support MSG_TRUNC flag when reading from socket can: isotp: return -EADDRNOTAVAIL when reading from unbound socket can: isotp: sanitize CAN ID checks in isotp_bind() ==================== Link: https://lore.kernel.org/r/20220316204710.716341-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-16can: isotp: support MSG_TRUNC flag when reading from socketGravatar Oliver Hartkopp 1-12/+15
When providing the MSG_TRUNC flag via recvmsg() syscall the return value provides the real length of the packet or datagram, even when it was longer than the passed buffer. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://github.com/linux-can/can-utils/issues/347#issuecomment-1065932671 Link: https://lore.kernel.org/all/20220316164258.54155-3-socketcan@hartkopp.net Suggested-by: Derek Will <derekrobertwill@gmail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-03-16can: isotp: return -EADDRNOTAVAIL when reading from unbound socketGravatar Oliver Hartkopp 1-0/+4
When reading from an unbound can-isotp socket the syscall blocked indefinitely. As unbound sockets (without given CAN address information) do not make sense anyway we directly return -EADDRNOTAVAIL on read() analogue to the known behavior from sendmsg(). Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://github.com/linux-can/can-utils/issues/349 Link: https://lore.kernel.org/all/20220316164258.54155-2-socketcan@hartkopp.net Suggested-by: Derek Will <derekrobertwill@gmail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-03-16can: isotp: sanitize CAN ID checks in isotp_bind()Gravatar Oliver Hartkopp 1-20/+21
Syzbot created an environment that lead to a state machine status that can not be reached with a compliant CAN ID address configuration. The provided address information consisted of CAN ID 0x6000001 and 0xC28001 which both boil down to 11 bit CAN IDs 0x001 in sending and receiving. Sanitize the SFF/EFF CAN ID values before performing the address checks. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/all/20220316164258.54155-1-socketcan@hartkopp.net Reported-by: syzbot+2339c27f5c66c652843e@syzkaller.appspotmail.com Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-03-16devlink: pass devlink_port to port_split / port_unsplit callbacksGravatar Jakub Kicinski 1-35/+12
Now that devlink ports are protected by the instance lock it seems natural to pass devlink_port as an argument to the port_split / port_unsplit callbacks. This should save the drivers from doing a lookup. In theory drivers may have supported unsplitting ports which were not registered prior to this change. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-16devlink: hold the instance lock in port_split / port_unsplit callbacksGravatar Jakub Kicinski 1-2/+0
Let the core take the devlink instance lock around port splitting and remove the now redundant locking in the drivers. Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-16devlink: expose instance locking and add locked port registeringGravatar Jakub Kicinski 1-24/+71
It should be familiar and beneficial to expose devlink instance lock to the drivers. This way drivers can block devlink from calling them during critical sections without breakneck locking. Add port helpers, port splitting callbacks will be the first target. Use 'devl_' prefix for "explicitly locked" API. Initial RFC used '__devlink' but that's too much typing. devl_lock_is_held() is not defined without lockdep, which is the same behavior as lockdep_is_held() itself. Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-16Merge branch 'master' of ↵Gravatar Jakub Kicinski 2-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-03-16 1) Fix a kernel-info-leak in pfkey. From Haimin Zhang. 2) Fix an incorrect check of the return value of ipv6_skip_exthdr. From Sabrina Dubroca. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: esp6: fix check on ipv6_skip_exthdr's return value af_key: add __GFP_ZERO flag for compose_sadb_supported in function pfkey_register ==================== Link: https://lore.kernel.org/r/20220316121142.3142336-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-15net: Add l3mdev index to flow struct and avoid oif reset for port devicesGravatar David Ahern 9-57/+28
The fundamental premise of VRF and l3mdev core code is binding a socket to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope. Legacy code resets flowi_oif to the l3mdev losing any original port device binding. Ben (among others) has demonstrated use cases where the original port device binding is important and needs to be retained. This patch handles that by adding a new entry to the common flow struct that can indicate the l3mdev index for later rule and table matching avoiding the need to reset flowi_oif. In addition to allowing more use cases that require port device binds, this patch brings a few datapath simplications: 1. l3mdev_fib_rule_match is only called when walking fib rules and always after l3mdev_update_flow. That allows an optimization to bail early for non-VRF type uses cases when flowi_l3mdev is not set. Also, only that index needs to be checked for the FIB table id. 2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev (e.g., VRF) device. By resetting flowi_oif only for this case the FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed, removing several checks in the datapath. The flowi_iif path can be simplified to only be called if the it is not loopback (loopback can not be assigned to an L3 domain) and the l3mdev index is not already set. 3. Avoid another device lookup in the output path when the fib lookup returns a reject failure. Note: 2 functional tests for local traffic with reject fib rules are updated to reflect the new direct failure at FIB lookup time for ping rather than the failure on packet path. The current code fails like this: HINT: Fails since address on vrf device is out of device scope COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1 ping: Warning: source address might be selected on device other than: eth1 PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data. --- 172.16.3.1 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms where the test now directly fails: HINT: Fails since address on vrf device is out of device scope COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1 ping: connect: No route to host Signed-off-by: David Ahern <dsahern@kernel.org> Tested-by: Ben Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextGravatar Jakub Kicinski 8-47/+208
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Revert CHECKSUM_UNNECESSARY for UDP packet from conntrack. 2) Reject unsupported families when creating tables, from Phil Sutter. 3) GRE support for the flowtable, from Toshiaki Makita. 4) Add GRE offload support for act_ct, also from Toshiaki. 5) Update mlx5 driver to support for GRE flowtable offload, from Toshiaki Makita. 6) Oneliner to clean up incorrect indentation in nf_conntrack_bridge, from Jiapeng Chong. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: bridge: clean up some inconsistent indenting net/mlx5: Support GRE conntrack offload act_ct: Support GRE offload netfilter: flowtable: Support GRE netfilter: nf_tables: Reject tables of unsupported family Revert "netfilter: conntrack: mark UDP zero checksum as CHECKSUM_UNNECESSARY" ==================== Link: https://lore.kernel.org/r/20220315091513.66544-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-14net/packet: fix slab-out-of-bounds access in packet_recvmsg()Gravatar Eric Dumazet 1-1/+10
syzbot found that when an AF_PACKET socket is using PACKET_COPY_THRESH and mmap operations, tpacket_rcv() is queueing skbs with garbage in skb->cb[], triggering a too big copy [1] Presumably, users of af_packet using mmap() already gets correct metadata from the mapped buffer, we can simply make sure to clear 12 bytes that might be copied to user space later. BUG: KASAN: stack-out-of-bounds in memcpy include/linux/fortify-string.h:225 [inline] BUG: KASAN: stack-out-of-bounds in packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489 Write of size 165 at addr ffffc9000385fb78 by task syz-executor233/3631 CPU: 0 PID: 3631 Comm: syz-executor233 Not tainted 5.17.0-rc7-syzkaller-02396-g0b3660695e80 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description.constprop.0.cold+0xf/0x336 mm/kasan/report.c:255 __kasan_report mm/kasan/report.c:442 [inline] kasan_report.cold+0x83/0xdf mm/kasan/report.c:459 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189 memcpy+0x39/0x60 mm/kasan/shadow.c:66 memcpy include/linux/fortify-string.h:225 [inline] packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_recvmsg net/socket.c:962 [inline] ____sys_recvmsg+0x2c4/0x600 net/socket.c:2632 ___sys_recvmsg+0x127/0x200 net/socket.c:2674 __sys_recvmsg+0xe2/0x1a0 net/socket.c:2704 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fdfd5954c29 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffcf8e71e48 EFLAGS: 00000246 ORIG_RAX: 000000000000002f RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fdfd5954c29 RDX: 0000000000000000 RSI: 0000000020000500 RDI: 0000000000000005 RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffcf8e71e60 R13: 00000000000f4240 R14: 000000000000c1ff R15: 00007ffcf8e71e54 </TASK> addr ffffc9000385fb78 is located in stack of task syz-executor233/3631 at offset 32 in frame: ____sys_recvmsg+0x0/0x600 include/linux/uio.h:246 this frame has 1 object: [32, 160) 'addr' Memory state around the buggy address: ffffc9000385fa80: 00 04 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 ffffc9000385fb00: 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 >ffffc9000385fb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f3 ^ ffffc9000385fc00: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 f1 ffffc9000385fc80: f1 f1 f1 00 f2 f2 f2 00 f2 f2 f2 00 00 00 00 00 ================================================================== Fixes: 0fb375fb9b93 ("[AF_PACKET]: Allow for > 8 byte hardware addresses.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220312232958.3535620-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfGravatar Jakub Kicinski 3-45/+10
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net coming late in the 5.17-rc process: 1) Revert port remap to mitigate shadowing service ports, this is causing problems in existing setups and this mitigation can be achieved with explicit ruleset, eg. ... tcp sport < 16386 tcp dport >= 32768 masquerade random This patches provided a built-in policy similar to the one described above. 2) Disable register tracking infrastructure in nf_tables. Florian reported two issues: - Existing expressions with no implemented .reduce interface that causes data-store on register should cancel the tracking. - Register clobbering might be possible storing data on registers that are larger than 32-bits. This might lead to generating incorrect ruleset bytecode. These two issues are scheduled to be addressed in the next release cycle. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: disable register tracking Revert "netfilter: conntrack: tag conntracks picked up in local out hook" Revert "netfilter: nat: force port remap to prevent shadowing well-known ports" ==================== Link: https://lore.kernel.org/r/20220312220315.64531-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-14esp6: fix check on ipv6_skip_exthdr's return valueGravatar Sabrina Dubroca 1-2/+1
Commit 5f9c55c8066b ("ipv6: check return value of ipv6_skip_exthdr") introduced an incorrect check, which leads to all ESP packets over either TCPv6 or UDPv6 encapsulation being dropped. In this particular case, offset is negative, since skb->data points to the ESP header in the following chain of headers, while skb->network_header points to the IPv6 header: IPv6 | ext | ... | ext | UDP | ESP | ... That doesn't seem to be a problem, especially considering that if we reach esp6_input_done2, we're guaranteed to have a full set of headers available (otherwise the packet would have been dropped earlier in the stack). However, it means that the return value will (intentionally) be negative. We can make the test more specific, as the expected return value of ipv6_skip_exthdr will be the (negated) size of either a UDP header, or a TCP header with possible options. In the future, we should probably either make ipv6_skip_exthdr explicitly accept negative offsets (and adjust its return value for error cases), or make ipv6_skip_exthdr only take non-negative offsets (and audit all callers). Fixes: 5f9c55c8066b ("ipv6: check return value of ipv6_skip_exthdr") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-03-14net: dsa: report and change port dscp priority using dcbnlGravatar Vladimir Oltean 1-0/+86
Similar to the port-based default priority, IEEE 802.1Q-2018 allows the Application Priority Table to define QoS classes (0 to 7) per IP DSCP value (0 to 63). In the absence of an app table entry for a packet with DSCP value X, QoS classification for that packet falls back to other methods (VLAN PCP or port-based default). The presence of an app table for DSCP value X with priority Y makes the hardware classify the packet to QoS class Y. As opposed to the default-prio where DSA exposes only a "set" in dsa_switch_ops (because the port-based default is the fallback, it always exists, either implicitly or explicitly), for DSCP priorities we expose an "add" and a "del". The addition of a DSCP entry means trusting that DSCP priority, the deletion means ignoring it. Drivers that already trust (at least some) DSCP values can describe their configuration in dsa_switch_ops :: port_get_dscp_prio(), which is called for each DSCP value from 0 to 63. Again, there can be more than one dcbnl app table entry for the same DSCP value, DSA chooses the one with the largest configured priority. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-14net: dsa: report and change port default priority using dcbnlGravatar Vladimir Oltean 1-0/+137
The port-based default QoS class is assigned to packets that lack a VLAN PCP (or the port is configured to not trust the VLAN PCP), an IP DSCP (or the port is configured to not trust IP DSCP), and packets on which no tc-skbedit action has matched. Similar to other drivers, this can be exposed to user space using the DCB Application Priority Table. IEEE 802.1Q-2018 specifies in Table D-8 - Sel field values that when the Selector is 1, the Protocol ID value of 0 denotes the "Default application priority. For use when application priority is not otherwise specified." The way in which the dcbnl integration in DSA has been designed has to do with its requirements. Andrew Lunn explains that SOHO switches are expected to come with some sort of pre-configured QoS profile, and that it is desirable for this to come pre-loaded into the DSA slave interfaces' DCB application priority table. In the dcbnl design, this is possible because calls to dcb_ieee_setapp() can be initiated by anyone including being self-initiated by this device driver. However, what makes this challenging to implement in DSA is that the DSA core manages the net_devices (effectively hiding them from drivers), while drivers manage the hardware. The DSA core has no knowledge of what individual drivers' QoS policies are. DSA could export to drivers a wrapper over dcb_ieee_setapp() and these could call that function to pre-populate the app priority table, however drivers don't have a good moment in time to do this. The dsa_switch_ops :: setup() method gets called before the net_devices are created (dsa_slave_create), and so is dsa_switch_ops :: port_setup(). What remains is dsa_switch_ops :: port_enable(), but this gets called upon each ndo_open. If we add app table entries on every open, we'd need to remove them on close, to avoid duplicate entry errors. But if we delete app priority entries on close, what we delete may not be the initial, driver pre-populated entries, but rather user-added entries. So it is clear that letting drivers choose the timing of the dcb_ieee_setapp() call is inappropriate. The alternative which was chosen is to introduce hardware-specific ops in dsa_switch_ops, and effectively hide dcbnl details from drivers as well. For pre-populating the application table, dsa_slave_dcbnl_init() will call ds->ops->port_get_default_prio() which is supposed to read from hardware. If the operation succeeds, DSA creates a default-prio app table entry. The method is called as soon as the slave_dev is registered, but before we release the rtnl_mutex. This is done such that user space sees the app table entries as soon as it sees the interface being registered. The fact that we populate slave_dev->dcbnl_ops with a non-NULL pointer changes behavior in dcb_doit() from net/dcb/dcbnl.c, which used to return -EOPNOTSUPP for any dcbnl operation where netdev->dcbnl_ops is NULL. Because there are still dcbnl-unaware DSA drivers even if they have dcbnl_ops populated, the way to restore the behavior is to make all dcbnl_ops return -EOPNOTSUPP on absence of the hardware-specific dsa_switch_ops method. The dcbnl framework absurdly allows there to be more than one app table entry for the same selector and protocol (in other words, more than one port-based default priority). In the iproute2 dcb program, there is a "replace" syntactical sugar command which performs an "add" and a "del" to hide this away. But we choose the largest configured priority when we call ds->ops->port_set_default_prio(), using __fls(). When there is no default-prio app table entry left, the port-default priority is restored to 0. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210113154139.1803705-2-olteanv@gmail.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-14net: Add lockdep asserts to ____napi_schedule().Gravatar Sebastian Andrzej Siewior 1-1/+4
____napi_schedule() needs to be invoked with disabled interrupts due to __raise_softirq_irqoff (in order not to corrupt the per-CPU list). ____napi_schedule() needs also to be invoked from an interrupt context so that the raised-softirq is processed while the interrupt context is left. Add lockdep asserts for both conditions. While this is the second time the irq/softirq check is needed, provide a generic lockdep_assert_softirq_will_run() which is used by both caller. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-12netfilter: nf_tables: disable register trackingGravatar Pablo Neira Ayuso 1-2/+7
The register tracking infrastructure is incomplete, it might lead to generating incorrect ruleset bytecode, disable it by now given we are late in the release process. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-03-12Merge branch '100GbE' of ↵Gravatar David S. Miller 1-0/+116
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: GTP support in switchdev Marcin Szycik says: Add support for adding GTP-C and GTP-U filters in switchdev mode. To create a filter for GTP, create a GTP-type netdev with ip tool, enable hardware offload, add qdisc and add a filter in tc: ip link add $GTP0 type gtp role <sgsn/ggsn> hsize <hsize> ethtool -K $PF0 hw-tc-offload on tc qdisc add dev $GTP0 ingress tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \ action mirred egress redirect dev $VF1_PR By default, a filter for GTP-U will be added. To add a filter for GTP-C, specify enc_dst_port = 2123, e.g.: tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \ enc_dst_port 2123 action mirred egress redirect dev $VF1_PR Note: outer IPv6 offload is not supported yet. Note: GTP-U with no payload offload is not supported yet. ICE COMMS package is required to create a filter as it contains GTP profiles. Changes in iproute2 [1] are required to be able to add GTP netdev and use GTP-specific options (QFI and PDU type). [1] https://lore.kernel.org/netdev/20220211182902.11542-1-wojciech.drewek@intel.com/T --- v2: Add more CC v3: Fix mail thread, sorry for spam v4: Add GTP echo response in gtp module v5: Change patch order v6: Add GTP echo request in gtp module v7: Fix kernel-docs in ice v8: Remove handling of GTP Echo Response v9: Add sending of multicast message on GTP Echo Response, fix GTP-C dummy packet selection v10: Rebase, fixed most 80 char line limits v11: Rebase, collect Harald's Reviewed-by on patch 3 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-11net: add per-cpu storage and net->core_statsGravatar Eric Dumazet 4-12/+45
Before adding yet another possibly contended atomic_long_t, it is time to add per-cpu storage for existing ones: dev->tx_dropped, dev->rx_dropped, and dev->rx_nohandler Because many devices do not have to increment such counters, allocate the per-cpu storage on demand, so that dev_get_stats() does not have to spend considerable time folding zero counters. Note that some drivers have abused these counters which were supposed to be only used by core networking stack. v4: should use per_cpu_ptr() in dev_get_stats() (Jakub) v3: added a READ_ONCE() in netdev_core_stats_alloc() (Paolo) v2: add a missing include (reported by kernel test robot <lkp@intel.com>) Change in netdev_core_stats_alloc() (Jakub) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: jeffreyji <jeffreyji@google.com> Reviewed-by: Brian Vazquez <brianvv@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220311051420.2608812-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11vsock: each transport cycles only on its own socketsGravatar Jiyong Park 3-5/+16
When iterating over sockets using vsock_for_each_connected_socket, make sure that a transport filters out sockets that don't belong to the transport. There actually was an issue caused by this; in a nested VM configuration, destroying the nested VM (which often involves the closing of /dev/vhost-vsock if there was h2g connections to the nested VM) kills not only the h2g connections, but also all existing g2h connections to the (outmost) host which are totally unrelated. Tested: Executed the following steps on Cuttlefish (Android running on a VM) [1]: (1) Enter into an `adb shell` session - to have a g2h connection inside the VM, (2) open and then close /dev/vhost-vsock by `exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb session is not reset. [1] https://android.googlesource.com/device/google/cuttlefish/ Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jiyong Park <jiyong@google.com> Link: https://lore.kernel.org/r/20220311020017.1509316-1-jiyong@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11net: remove exports for netdev_name_node_alt_create() and destroyGravatar Jakub Kicinski 1-2/+0
netdev_name_node_alt_create() and netdev_name_node_alt_destroy() are only called by rtnetlink, so no need for exports. Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220310223952.558779-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11tcp: unexport tcp_ca_get_key_by_name and tcp_ca_get_name_by_keyGravatar Christoph Hellwig 1-2/+0
Both functions are only used by core networking code. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220310143229.895319-1-hch@lst.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11net: ipv6: fix skb_over_panic in __ip6_append_dataGravatar Tadeusz Struk 1-2/+2
Syzbot found a kernel bug in the ipv6 stack: LINK: https://syzkaller.appspot.com/bug?id=205d6f11d72329ab8d62a610c44c5e7e25415580 The reproducer triggers it by sending a crafted message via sendmmsg() call, which triggers skb_over_panic, and crashes the kernel: skbuff: skb_over_panic: text:ffffffff84647fb4 len:65575 put:65575 head:ffff888109ff0000 data:ffff888109ff0088 tail:0x100af end:0xfec0 dev:<NULL> Update the check that prevents an invalid packet with MTU equal to the fregment header size to eat up all the space for payload. The reproducer can be found here: LINK: https://syzkaller.appspot.com/text?tag=ReproC&x=1648c83fb00000 Reported-by: syzbot+e223cf47ec8ae183f2a0@syzkaller.appspotmail.com Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20220310232538.1044947-1-tadeusz.struk@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11Merge tag 'wireless-next-2022-03-11' of ↵Gravatar Jakub Kicinski 16-108/+907
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== brcmfmac * add BCM43454/6 support rtw89 * add support for 160 MHz channels and 6 GHz band * hardware scan support iwlwifi * support UHB TAS enablement via BIOS * remove a bunch of W=1 warnings * add support for channel switch offload * support 32 Rx AMPDU sessions in newer devices * add support for a couple of new devices * add support for band disablement via BIOS mt76 * mt7915 thermal management improvements * SAR support for more mt76 drivers * mt7986 wmac support on mt7915 ath11k * debugfs interface to configure firmware debug log level * debugfs interface to test Target Wake Time (TWT) * provide 802.11ax High Efficiency (HE) data via radiotap ath9k * use hw_random API instead of directly dumping into random.c wcn36xx * fix wcn3660 to work on 5 GHz band ath6kl * add device ID for WLU5150-D81 cfg80211/mac80211 * initial EHT (from 802.11be) support (EHT rates, 320 MHz, larger block-ack) * support disconnect on HW restart * tag 'wireless-next-2022-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (247 commits) mac80211: Add support to trigger sta disconnect on hardware restart mac80211: fix potential double free on mesh join mac80211: correct legacy rates check in ieee80211_calc_rx_airtime nl80211: fix typo of NL80211_IF_TYPE_OCB in documentation mac80211: Use GFP_KERNEL instead of GFP_ATOMIC when possible mac80211: replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTE rtw89: 8852c: process logic efuse map rtw89: 8852c: process efuse of phycap rtw89: support DAV efuse reading operation rtw89: 8852c: add chip::dle_mem rtw89: add page_regs to handle v1 chips rtw89: add chip_info::{h2c,c2h}_reg to support more chips rtw89: add hci_func_en_addr to support variant generation rtw89: add power_{on/off}_func rtw89: read chip version depends on chip ID rtw89: pci: use a struct to describe all registers address related to DMA channel rtw89: pci: add V1 of PCI channel address rtw89: pci: add struct rtw89_pci_info rtw89: 8852c: add 8852c empty files MAINTAINERS: add devicetree bindings entry for mt76 ... ==================== Link: https://lore.kernel.org/r/20220311124029.213470-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-11net/sched: Allow flower to match on GTP optionsGravatar Wojciech Drewek 1-0/+116
Options are as follows: PDU_TYPE:QFI and they refernce to the fields from the PDU Session Protocol. PDU Session data is conveyed in GTP-U Extension Header. GTP-U Extension Header is described in 3GPP TS 29.281. PDU Session Protocol is described in 3GPP TS 38.415. PDU_TYPE - indicates the type of the PDU Session Information (4 bits) QFI - QoS Flow Identifier (6 bits) # ip link add gtp_dev type gtp role sgsn # tc qdisc add dev gtp_dev ingress # tc filter add dev gtp_dev protocol ip parent ffff: \ flower \ enc_key_id 11 \ gtp_opts 1:8/ff:ff \ action mirred egress redirect dev eth0 Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-11flow_dissector: Add support for HSRv0Gravatar Kurt Kanzenbach 1-0/+1
Commit bf08824a0f47 ("flow_dissector: Add support for HSR") added support for HSR within the flow dissector. However, it only works for HSR in version 1. Version 0 uses a different Ether Type. Add support for it. Reported-by: Anthony Harivel <anthony.harivel@linutronix.de> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-11mac80211: Add support to trigger sta disconnect on hardware restartGravatar Youghandhar Chintala 3-3/+45
Currently in case of target hardware restart, we just reconfig and re-enable the security keys and enable the network queues to start data traffic back from where it was interrupted. Many ath10k wifi chipsets have sequence numbers for the data packets assigned by firmware and the mac sequence number will restart from zero after target hardware restart leading to mismatch in the sequence number expected by the remote peer vs the sequence number of the frame sent by the target firmware. This mismatch in sequence number will cause out-of-order packets on the remote peer and all the frames sent by the device are dropped until we reach the sequence number which was sent before we restarted the target hardware In order to fix this, we trigger a sta disconnect, in case of target hw restart. After this there will be a fresh connection and thereby avoiding the dropping of frames by remote peer. The right fix would be to pull the entire data path into the host which is not feasible or would need lots of complex changes and will still be inefficient. Tested on ath10k using WCN3990, QCA6174 Signed-off-by: Youghandhar Chintala <youghand@codeaurora.org> Link: https://lore.kernel.org/r/20220308115325.5246-2-youghand@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-11mac80211: fix potential double free on mesh joinGravatar Linus Lüssing 1-3/+0
While commit 6a01afcf8468 ("mac80211: mesh: Free ie data when leaving mesh") fixed a memory leak on mesh leave / teardown it introduced a potential memory corruption caused by a double free when rejoining the mesh: ieee80211_leave_mesh() -> kfree(sdata->u.mesh.ie); ... ieee80211_join_mesh() -> copy_mesh_setup() -> old_ie = ifmsh->ie; -> kfree(old_ie); This double free / kernel panics can be reproduced by using wpa_supplicant with an encrypted mesh (if set up without encryption via "iw" then ifmsh->ie is always NULL, which avoids this issue). And then calling: $ iw dev mesh0 mesh leave $ iw dev mesh0 mesh join my-mesh Note that typically these commands are not used / working when using wpa_supplicant. And it seems that wpa_supplicant or wpa_cli are going through a NETDEV_DOWN/NETDEV_UP cycle between a mesh leave and mesh join where the NETDEV_UP resets the mesh.ie to NULL via a memcpy of default_mesh_setup in cfg80211_netdev_notifier_call, which then avoids the memory corruption, too. The issue was first observed in an application which was not using wpa_supplicant but "Senf" instead, which implements its own calls to nl80211. Fixing the issue by removing the kfree()'ing of the mesh IE in the mesh join function and leaving it solely up to the mesh leave to free the mesh IE. Cc: stable@vger.kernel.org Fixes: 6a01afcf8468 ("mac80211: mesh: Free ie data when leaving mesh") Reported-by: Matthias Kretschmer <mathias.kretschmer@fit.fraunhofer.de> Signed-off-by: Linus Lüssing <ll@simonwunderlich.de> Tested-by: Mathias Kretschmer <mathias.kretschmer@fit.fraunhofer.de> Link: https://lore.kernel.org/r/20220310183513.28589-1-linus.luessing@c0d3.blue Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-11mac80211: correct legacy rates check in ieee80211_calc_rx_airtimeGravatar MeiChia Chiu 1-1/+3
There are no legacy rates on 60GHz or sub-1GHz band, so modify the check. Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Signed-off-by: MeiChia Chiu <MeiChia.Chiu@mediatek.com> Link: https://lore.kernel.org/r/20220308021645.16272-1-MeiChia.Chiu@mediatek.com [Ghz -> GHz] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-11mac80211: Use GFP_KERNEL instead of GFP_ATOMIC when possibleGravatar Christophe JAILLET 1-1/+1
Previous memory allocations in this function already use GFP_KERNEL, so use __dev_alloc_skb() and an explicit GFP_KERNEL instead of an implicit GFP_ATOMIC. This gives more opportunities of successful allocation. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/194a0e2ff00c3fae88cc9fba47431747360c8242.1645345378.git.christophe.jaillet@wanadoo.fr Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-03-10net: limit altnames to 64k totalGravatar Jakub Kicinski 1-0/+11
Property list (altname is a link "property") is wrapped in a nlattr. nlattrs length is 16bit so practically speaking the list of properties can't be longer than that, otherwise user space would have to interpret broken netlink messages. Prevent the problem from occurring by checking the length of the property list before adding new entries. Reported-by: George Shuklin <george.shuklin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10net: account alternate interface name memoryGravatar Jakub Kicinski 1-1/+1
George reports that altnames can eat up kernel memory. We should charge that memory appropriately. Reported-by: George Shuklin <george.shuklin@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-10net: openvswitch: fix uAPI incompatibility with existing user spaceGravatar Ilya Maximets 1-3/+10
Few years ago OVS user space made a strange choice in the commit [1] to define types only valid for the user space inside the copy of a kernel uAPI header. '#ifndef __KERNEL__' and another attribute was added later. This leads to the inevitable clash between user space and kernel types when the kernel uAPI is extended. The issue was unveiled with the addition of a new type for IPv6 extension header in kernel uAPI. When kernel provides the OVS_KEY_ATTR_IPV6_EXTHDRS attribute to the older user space application, application tries to parse it as OVS_KEY_ATTR_PACKET_TYPE and discards the whole netlink message as malformed. Since OVS_KEY_ATTR_IPV6_EXTHDRS is supplied along with every IPv6 packet that goes to the user space, IPv6 support is fully broken. Fixing that by bringing these user space attributes to the kernel uAPI to avoid the clash. Strictly speaking this is not the problem of the kernel uAPI, but changing it is the only way to avoid breakage of the older user space applications at this point. These 2 types are explicitly rejected now since they should not be passed to the kernel. Additionally, OVS_KEY_ATTR_TUNNEL_INFO moved out from the '#ifdef __KERNEL__' as there is no good reason to hide it from the userspace. And it's also explicitly rejected now, because it's for in-kernel use only. Comments with warnings were added to avoid the problem coming back. (1 << type) converted to (1ULL << type) to avoid integer overflow on OVS_KEY_ATTR_IPV6_EXTHDRS, since it equals 32 now. [1] beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline") Fixes: 28a3f0601727 ("net: openvswitch: IPv6: Add IPv6 extension header support") Link: https://lore.kernel.org/netdev/3adf00c7-fe65-3ef4-b6d7-6d8a0cad8a5f@nvidia.com Link: https://github.com/openvswitch/ovs/commit/beb75a40fdc295bfd6521b0068b4cd12f6de507c Reported-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20220309222033.3018976-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>