aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2018-11-13Merge tag 'linux-can-fixes-for-4.20-20181109' of ↵Gravatar David S. Miller 1-7/+8
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2018-11-09 this is a pull request of 20 patches for net/master. First we have a patch by Oliver Hartkopp which changes the raw socket's raw_sendmsg() to return an error value if the user tries to send a CANFD frame to a CAN-2.0 device. The next two patches are by Jimmy Assarsson and fix potential problems in the kvaser_usb driver. YueHaibing's patches for the ucan driver fix a compile time warning and remove a duplicate include. Eugeniu Rosca patch adds more binding documentation to the rcar_can driver bindings. The next two patches are by Fabrizio Castro for the rcar_can driver and fixes a problem in the driver's probe function and document the r8a774a1 binding. Lukas Wunner's patch fixes a recpetion problem in hi311x driver by switching from edge to level triggered interruts. The next three patches all target the flexcan driver. Pankaj Bansal's patch unconditionally unlocks the last mailbox used for RX. Alexander Stein provides a better workaround for a hardware limitation when sending RTR frames, by using the last mailbox for TX, resulting in fewer lost frames. The patch by me simplyfies the driver, by making a runtime value a compile time constant. The following 4 patches are by me and provide the groundwork for the next patches by Oleksij Rempel. To avoid code duplication common code in the common CAN driver infrastructure is factured out and error handling is cleaned up. The next 4 patches are by Oleksij Rempel and fix the problem in the flexcan driver that other processes see TX frames arrive out of order with ragards to a RX'ed frame (which are send by a different system on the CAN bus as the result of our TX frame). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-13netfilter: nf_tables: fix use-after-free when deleting compat expressionsGravatar Florian Westphal 2-3/+5
nft_compat ops do not have static storage duration, unlike all other expressions. When nf_tables_expr_destroy() returns, expr->ops might have been free'd already, so we need to store next address before calling expression destructor. For same reason, we can't deref match pointer after nft_xt_put(). This can be easily reproduced by adding msleep() before nft_match_destroy() returns. Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-13netfilter: xt_RATEEST: remove netns exit routineGravatar Taehee Yoo 1-10/+0
xt_rateest_net_exit() was added to check whether rules are flushed successfully. but ->net_exit() callback is called earlier than ->destroy() callback. So that ->net_exit() callback can't check that. test commands: %ip netns add vm1 %ip netns exec vm1 iptables -t mangle -I PREROUTING -p udp \ --dport 1111 -j RATEEST --rateest-name ap \ --rateest-interval 250ms --rateest-ewma 0.5s %ip netns del vm1 splat looks like: [ 668.813518] WARNING: CPU: 0 PID: 87 at net/netfilter/xt_RATEEST.c:210 xt_rateest_net_exit+0x210/0x340 [xt_RATEEST] [ 668.813518] Modules linked in: xt_RATEEST xt_tcpudp iptable_mangle bpfilter ip_tables x_tables [ 668.813518] CPU: 0 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc7+ #21 [ 668.813518] Workqueue: netns cleanup_net [ 668.813518] RIP: 0010:xt_rateest_net_exit+0x210/0x340 [xt_RATEEST] [ 668.813518] Code: 00 48 8b 85 30 ff ff ff 4c 8b 23 80 38 00 0f 85 24 01 00 00 48 8b 85 30 ff ff ff 4d 85 e4 4c 89 a5 58 ff ff ff c6 00 f8 74 b2 <0f> 0b 48 83 c3 08 4c 39 f3 75 b0 48 b8 00 00 00 00 00 fc ff df 49 [ 668.813518] RSP: 0018:ffff8801156c73f8 EFLAGS: 00010282 [ 668.813518] RAX: ffffed0022ad8e85 RBX: ffff880118928e98 RCX: 5db8012a00000000 [ 668.813518] RDX: ffff8801156c7428 RSI: 00000000cb1d185f RDI: ffff880115663b74 [ 668.813518] RBP: ffff8801156c74d0 R08: ffff8801156633c0 R09: 1ffff100236440be [ 668.813518] R10: 0000000000000001 R11: ffffed002367d852 R12: ffff880115142b08 [ 668.813518] R13: 1ffff10022ad8e81 R14: ffff880118928ea8 R15: dffffc0000000000 [ 668.813518] FS: 0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000 [ 668.813518] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 668.813518] CR2: 0000563aa69f4f28 CR3: 0000000105a16000 CR4: 00000000001006f0 [ 668.813518] Call Trace: [ 668.813518] ? unregister_netdevice_many+0xe0/0xe0 [ 668.813518] ? xt_rateest_net_init+0x2c0/0x2c0 [xt_RATEEST] [ 668.813518] ? default_device_exit+0x1ca/0x270 [ 668.813518] ? remove_proc_entry+0x1cd/0x390 [ 668.813518] ? dev_change_net_namespace+0xd00/0xd00 [ 668.813518] ? __init_waitqueue_head+0x130/0x130 [ 668.813518] ops_exit_list.isra.10+0x94/0x140 [ 668.813518] cleanup_net+0x45b/0x900 [ 668.813518] ? net_drop_ns+0x110/0x110 [ 668.813518] ? swapgs_restore_regs_and_return_to_usermode+0x3c/0x80 [ 668.813518] ? save_trace+0x300/0x300 [ 668.813518] ? lock_acquire+0x196/0x470 [ 668.813518] ? lock_acquire+0x196/0x470 [ 668.813518] ? process_one_work+0xb60/0x1de0 [ 668.813518] ? _raw_spin_unlock_irq+0x29/0x40 [ 668.813518] ? _raw_spin_unlock_irq+0x29/0x40 [ 668.813518] ? __lock_acquire+0x4500/0x4500 [ 668.813518] ? __lock_is_held+0xb4/0x140 [ 668.813518] process_one_work+0xc13/0x1de0 [ 668.813518] ? pwq_dec_nr_in_flight+0x3c0/0x3c0 [ 668.813518] ? set_load_weight+0x270/0x270 [ ... ] Fixes: 3427b2ab63fa ("netfilter: make xt_rateest hash table per net") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-12SUNRPC: Fix a bogus get/put in generic_key_to_expire()Gravatar Trond Myklebust 1-7/+1
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-12SUNRPC: Fix a Oops when destroying the RPCSEC_GSS credential cacheGravatar Trond Myklebust 1-19/+42
Commit 07d02a67b7fa causes a use-after free in the RPCSEC_GSS credential destroy code, because the call to get_rpccred() in gss_destroying_context() will now always fail to increment the refcount. While we could just replace the get_rpccred() with a refcount_set(), that would have the unfortunate consequence of resurrecting a credential in the credential cache for which we are in the process of destroying the RPCSEC_GSS context. Rather than do this, we choose to make a copy that is never added to the cache and use that to destroy the context. Fixes: 07d02a67b7fa ("SUNRPC: Simplify lookup code") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-11-12netfilter: nf_tables: don't use position attribute on rule replacementGravatar Florian Westphal 1-10/+7
Its possible to set both HANDLE and POSITION when replacing a rule. In this case, the rule at POSITION gets replaced using the userspace-provided handle. Rule handles are supposed to be generated by the kernel only. Duplicate handles should be harmless, however better disable this "feature" by only checking for the POSITION attribute on insert operations. Fixes: 5e94846686d0 ("netfilter: nf_tables: add insert operation") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-12netfilter: nf_tables: don't skip inactive chains during updateGravatar Florian Westphal 1-6/+3
There is no synchronization between packet path and the configuration plane. The packet path uses two arrays with rules, one contains the current (active) generation. The other either contains the last (obsolete) generation or the future one. Consider: cpu1 cpu2 nft_do_chain(c); delete c net->gen++; genbit = !!net->gen; rules = c->rg[genbit]; cpu1 ignores c when updating if c is not active anymore in the new generation. On cpu2, we now use rules from wrong generation, as c->rg[old] contains the rules matching 'c' whereas c->rg[new] was not updated and can even point to rules that have been free'd already, causing a crash. To fix this, make sure that 'current' to the 'next' generation are identical for chains that are going away so that c->rg[new] will just use the matching rules even if genbit was incremented already. Fixes: 0cbc06b3faba7 ("netfilter: nf_tables: remove synchronize_rcu in commit phase") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-12netfilter: nf_conncount: fix unexpected permanent node of list.Gravatar Taehee Yoo 1-3/+15
When list->count is 0, the list is deleted by GC. But list->count is never reached 0 because initial count value is 1 and it is increased when node is inserted. So that initial value of list->count should be 0. Originally GC always finds zero count list through deleting node and decreasing count. However, list may be left empty since node insertion may fail eg. allocaton problem. In order to solve this problem, GC routine also finds zero count list without deleting node. Fixes: cb2b36f5a97d ("netfilter: nf_conncount: Switch to plain list") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-12netfilter: nf_conncount: fix list_del corruption in conn_freeGravatar Taehee Yoo 1-2/+5
nf_conncount_tuple is an element of nft_connlimit and that is deleted by conn_free(). Elements can be deleted by both GC routine and data path functions (nf_conncount_lookup, nf_conncount_add) and they call conn_free() to free elements. But conn_free() only protects lists, not each element. So that list_del corruption could occurred. The conn_free() doesn't check whether element is already deleted. In order to protect elements, dead flag is added. If an element is deleted, dead flag is set. The only conn_free() can delete elements so that both list lock and dead flag are enough to protect it. test commands: %nft add table ip filter %nft add chain ip filter input { type filter hook input priority 0\; } %nft add rule filter input meter test { ip id ct count over 2 } counter splat looks like: [ 1779.495778] list_del corruption, ffff8800b6e12008->prev is LIST_POISON2 (dead000000000200) [ 1779.505453] ------------[ cut here ]------------ [ 1779.506260] kernel BUG at lib/list_debug.c:50! [ 1779.515831] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 1779.516772] CPU: 0 PID: 33 Comm: kworker/0:2 Not tainted 4.19.0-rc6+ #22 [ 1779.516772] Workqueue: events_power_efficient nft_rhash_gc [nf_tables_set] [ 1779.516772] RIP: 0010:__list_del_entry_valid+0xd8/0x150 [ 1779.516772] Code: 39 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 89 ea 48 c7 c7 00 c3 5b 98 e8 0f dc 40 ff 0f 0b 48 c7 c7 60 c3 5b 98 e8 01 dc 40 ff <0f> 0b 48 c7 c7 c0 c3 5b 98 e8 f3 db 40 ff 0f 0b 48 c7 c7 20 c4 5b [ 1779.516772] RSP: 0018:ffff880119127420 EFLAGS: 00010286 [ 1779.516772] RAX: 000000000000004e RBX: dead000000000200 RCX: 0000000000000000 [ 1779.516772] RDX: 000000000000004e RSI: 0000000000000008 RDI: ffffed0023224e7a [ 1779.516772] RBP: ffff88011934bc10 R08: ffffed002367cea9 R09: ffffed002367cea9 [ 1779.516772] R10: 0000000000000001 R11: ffffed002367cea8 R12: ffff8800b6e12008 [ 1779.516772] R13: ffff8800b6e12010 R14: ffff88011934bc20 R15: ffff8800b6e12008 [ 1779.516772] FS: 0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000 [ 1779.516772] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1779.516772] CR2: 00007fc876534010 CR3: 000000010da16000 CR4: 00000000001006f0 [ 1779.516772] Call Trace: [ 1779.516772] conn_free+0x9f/0x2b0 [nf_conncount] [ 1779.516772] ? nf_ct_tmpl_alloc+0x2a0/0x2a0 [nf_conntrack] [ 1779.516772] ? nf_conncount_add+0x520/0x520 [nf_conncount] [ 1779.516772] ? do_raw_spin_trylock+0x1a0/0x1a0 [ 1779.516772] ? do_raw_spin_trylock+0x10/0x1a0 [ 1779.516772] find_or_evict+0xe5/0x150 [nf_conncount] [ 1779.516772] nf_conncount_gc_list+0x162/0x360 [nf_conncount] [ 1779.516772] ? nf_conncount_lookup+0xee0/0xee0 [nf_conncount] [ 1779.516772] ? _raw_spin_unlock_irqrestore+0x45/0x50 [ 1779.516772] ? trace_hardirqs_off+0x6b/0x220 [ 1779.516772] ? trace_hardirqs_on_caller+0x220/0x220 [ 1779.516772] nft_rhash_gc+0x16b/0x540 [nf_tables_set] [ ... ] Fixes: 5c789e131cbb ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-12netfilter: nf_conncount: use spin_lock_bh instead of spin_lockGravatar Taehee Yoo 1-6/+6
conn_free() holds lock with spin_lock() and it is called by both nf_conncount_lookup() and nf_conncount_gc_list(). nf_conncount_lookup() is called from bottom-half context and nf_conncount_gc_list() from process context. So that spin_lock() call is not safe. Hence conn_free() should use spin_lock_bh() instead of spin_lock(). test commands: %nft add table ip filter %nft add chain ip filter input { type filter hook input priority 0\; } %nft add rule filter input meter test { ip saddr ct count over 2 } \ counter splat looks like: [ 461.996507] ================================ [ 461.998999] WARNING: inconsistent lock state [ 461.998999] 4.19.0-rc6+ #22 Not tainted [ 461.998999] -------------------------------- [ 461.998999] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 461.998999] kworker/0:2/134 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 461.998999] 00000000a71a559a (&(&list->list_lock)->rlock){+.?.}, at: conn_free+0x69/0x2b0 [nf_conncount] [ 461.998999] {IN-SOFTIRQ-W} state was registered at: [ 461.998999] _raw_spin_lock+0x30/0x70 [ 461.998999] nf_conncount_add+0x28a/0x520 [nf_conncount] [ 461.998999] nft_connlimit_eval+0x401/0x580 [nft_connlimit] [ 461.998999] nft_dynset_eval+0x32b/0x590 [nf_tables] [ 461.998999] nft_do_chain+0x497/0x1430 [nf_tables] [ 461.998999] nft_do_chain_ipv4+0x255/0x330 [nf_tables] [ 461.998999] nf_hook_slow+0xb1/0x160 [ ... ] [ 461.998999] other info that might help us debug this: [ 461.998999] Possible unsafe locking scenario: [ 461.998999] [ 461.998999] CPU0 [ 461.998999] ---- [ 461.998999] lock(&(&list->list_lock)->rlock); [ 461.998999] <Interrupt> [ 461.998999] lock(&(&list->list_lock)->rlock); [ 461.998999] [ 461.998999] *** DEADLOCK *** [ 461.998999] [ ... ] Fixes: 5c789e131cbb ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-12batman-adv: Expand merged fragment buffer for full packetGravatar Sven Eckelmann 1-1/+1
The complete size ("total_size") of the fragmented packet is stored in the fragment header and in the size of the fragment chain. When the fragments are ready for merge, the skbuff's tail of the first fragment is expanded to have enough room after the data pointer for at least total_size. This means that it gets expanded by total_size - first_skb->len. But this is ignoring the fact that after expanding the buffer, the fragment header is pulled by from this buffer. Assuming that the tailroom of the buffer was already 0, the buffer after the data pointer of the skbuff is now only total_size - len(fragment_header) large. When the merge function is then processing the remaining fragments, the code to copy the data over to the merged skbuff will cause an skb_over_panic when it tries to actually put enough data to fill the total_size bytes of the packet. The size of the skb_pull must therefore also be taken into account when the buffer's tailroom is expanded. Fixes: 610bfc6bc99b ("batman-adv: Receive fragmented packets and merge") Reported-by: Martin Weinelt <martin@darmstadt.freifunk.net> Co-authored-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-12batman-adv: Use explicit tvlv padding for ELP packetsGravatar Sven Eckelmann 1-2/+4
The announcement messages of batman-adv COMPAT_VERSION 15 have the possibility to announce additional information via a dynamic TVLV part. This part is optional for the ELP packets and currently not parsed by the Linux implementation. Still out-of-tree versions are using it to transport things like neighbor hashes to optimize the rebroadcast behavior. Since the ELP broadcast packets are smaller than the minimal ethernet packet, it often has to be padded. This is often done (as specified in RFC894) with octets of zero and thus work perfectly fine with the TVLV part (making it a zero length and thus empty). But not all ethernet compatible hardware seems to follow this advice. To avoid ambiguous situations when parsing the TVLV header, just force the 4 bytes (TVLV length + padding) after the required ELP header to zero. Fixes: d6f94d91f766 ("batman-adv: ELP - adding basic infrastructure") Reported-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2018-11-11act_mirred: clear skb->tstamp on redirectGravatar Eric Dumazet 2-10/+2
If sch_fq is used at ingress, skbs that might have been timestamped by net_timestamp_set() if a packet capture is requesting timestamps could be delayed by arbitrary amount of time, since sch_fq time base is MONOTONIC. Fix this problem by moving code from sch_netem.c to act_mirred.c. Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-11tipc: fix link re-establish failureGravatar Jon Maloy 1-4/+7
When a link failure is detected locally, the link is reset, the flag link->in_session is set to false, and a RESET_MSG with the 'stopping' bit set is sent to the peer. The purpose of this bit is to inform the peer that this endpoint just is going down, and that the peer should handle the reception of this particular RESET message as a local failure. This forces the peer to accept another RESET or ACTIVATE message from this endpoint before it can re-establish the link. This again is necessary to ensure that link session numbers are properly exchanged before the link comes up again. If a failure is detected locally at the same time at the peer endpoint this will do the same, which is also a correct behavior. However, when receiving such messages, the endpoints will not distinguish between 'stopping' RESETs and ordinary ones when it comes to updating session numbers. Both endpoints will copy the received session number and set their 'in_session' flags to true at the reception, while they are still expecting another RESET from the peer before they can go ahead and re-establish. This is contradictory, since, after applying the validation check referred to below, the 'in_session' flag will cause rejection of all such messages, and the link will never come up again. We now fix this by not only handling received RESET/STOPPING messages as a local failure, but also by omitting to set a new session number and the 'in_session' flag in such cases. Fixes: 7ea817f4e832 ("tipc: check session number before accepting link protocol messages") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-10net: sched: cls_flower: validate nested enc_opts_policy to avoid warningGravatar Jakub Kicinski 1-1/+13
TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only currently contain further nested attributes, which are parsed by hand, so the policy is never actually used resulting in a W=1 build warning: net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used [-Wunused-const-variable=] enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = { Add the validation anyway to avoid potential bugs when other attributes are added and to make the attribute structure slightly more clear. Validation will also set extact to point to bad attribute on error. Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09flow_dissector: do not dissect l4 ports for fragmentsGravatar 배석진 1-2/+2
Only first fragment has the sport/dport information, not the following ones. If we want consistent hash for all fragments, we need to ignore ports even for first fragment. This bug is visible for IPv6 traffic, if incoming fragments do not have a flow label, since skb_get_hash() will give different results for first fragment and following ones. It is also visible if any routing rule wants dissection and sport or dport. See commit 5e5d6fed3741 ("ipv6: route: dissect flow in input path if fib rules need it") for details. [edumazet] rewrote the changelog completely. Fixes: 06635a35d13d ("flow_dissect: use programable dissector in skb_flow_dissect and friends") Signed-off-by: 배석진 <soukjin.bae@samsung.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-09can: raw: check for CAN FD capable netdev in raw_sendmsg()Gravatar Oliver Hartkopp 1-7/+8
When the socket is CAN FD enabled it can handle CAN FD frame transmissions. Add an additional check in raw_sendmsg() as a CAN2.0 CAN driver (non CAN FD) should never see a CAN FD frame. Due to the commonly used can_dropped_invalid_skb() function the CAN 2.0 driver would drop that CAN FD frame anyway - but with this patch the user gets a proper -EINVAL return code. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2018-11-09bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udpGravatar Andrey Ignatov 1-3/+2
Lookup functions in sk_lookup have different expectations about byte order of provided arguments. Specifically __inet_lookup, __udp4_lib_lookup and __udp6_lib_lookup expect dport to be in network byte order and do ntohs(dport) internally. At the same time __inet6_lookup expects dport to be in host byte order and correspondingly name the argument hnum. sk_lookup works correctly with __inet_lookup, __udp4_lib_lookup and __inet6_lookup with regard to dport. But in __udp6_lib_lookup case it uses host instead of expected network byte order. It makes result returned by bpf_sk_lookup_udp for IPv6 incorrect. The patch fixes byte order of dport passed to __udp6_lib_lookup. Originally sk_lookup properly handled UDPv6, but not TCPv6. 5ef0ae84f02a fixes TCPv6 but breaks UDPv6. Fixes: 5ef0ae84f02a ("bpf: Fix IPv6 dport byte-order in bpf_sk_lookup") Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Joe Stringer <joe@wand.net.nz> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-08inet: frags: better deal with smp racesGravatar Eric Dumazet 1-14/+15
Multiple cpus might attempt to insert a new fragment in rhashtable, if for example RPS is buggy, as reported by 배석진 in https://patchwork.ozlabs.org/patch/994601/ We use rhashtable_lookup_get_insert_key() instead of rhashtable_insert_fast() to let cpus losing the race free their own inet_frag_queue and use the one that was inserted by another cpu. Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: 배석진 <soukjin.bae@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08SUNRPC: drop pointless static qualifier in xdr_get_next_encode_buffer()Gravatar YueHaibing 1-1/+1
There is no need to have the '__be32 *p' variable static since new value always be assigned before use it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGravatar Linus Torvalds 32-268/+302
Pull networking fixes from David Miller: 1) Handle errors mid-stream of an all dump, from Alexey Kodanev. 2) Fix build of openvswitch with certain combinations of netfilter options, from Arnd Bergmann. 3) Fix interactions between GSO and BQL, from Eric Dumazet. 4) Don't put a '/' in RTL8201F's sysfs file name, from Holger Hoffstätte. 5) S390 qeth driver fixes from Julian Wiedmann. 6) Allow ipv6 link local addresses for netconsole when both source and destination are link local, from Matwey V. Kornilov. 7) Fix the BPF program address seen in /proc/kallsyms, from Song Liu. 8) Initialize mutex before use in dsa microchip driver, from Tristram Ha. 9) Out-of-bounds access in hns3, from Yunsheng Lin. 10) Various netfilter fixes from Stefano Brivio, Jozsef Kadlecsik, Jiri Slaby, Florian Westphal, Eric Westbrook, Andrey Ryabinin, and Pablo Neira Ayuso. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (50 commits) net: alx: make alx_drv_name static net: bpfilter: fix iptables failure if bpfilter_umh is disabled sock_diag: fix autoloading of the raw_diag module net: core: netpoll: Enable netconsole IPv6 link local address ipv6: properly check return value in inet6_dump_all() rtnetlink: restore handling of dumpit return value in rtnl_dump_all() net/ipv6: Move anycast init/cleanup functions out of CONFIG_PROC_FS bonding/802.3ad: fix link_failure_count tracking net: phy: realtek: fix RTL8201F sysfs name sctp: define SCTP_SS_DEFAULT for Stream schedulers sctp: fix strchange_flags name for Stream Change Event mlxsw: spectrum: Fix IP2ME CPU policer configuration openvswitch: fix linking without CONFIG_NF_CONNTRACK_LABELS qed: fix link config error handling net: hns3: Fix for out-of-bounds access when setting pfc back pressure net/mlx4_en: use __netdev_tx_sent_queue() net: do not abort bulk send on BQL status net: bql: add __netdev_tx_sent_queue() s390/qeth: report 25Gbit link speed s390/qeth: sanitize ARP requests ...
2018-11-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfGravatar David S. Miller 17-242/+152
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains the first batch of Netfilter fixes for your net tree: 1) Fix splat with IPv6 defragmenting locally generated fragments, from Florian Westphal. 2) Fix Incorrect check for missing attribute in nft_osf. 3) Missing INT_MIN & INT_MAX definition for netfilter bridge uapi header, from Jiri Slaby. 4) Revert map lookup in nft_numgen, this is already possible with the existing infrastructure without this extension. 5) Fix wrong listing of set reference counter, make counter synchronous again, from Stefano Brivio. 6) Fix CIDR 0 in hash:net,port,net, from Eric Westbrook. 7) Fix allocation failure with large set, use kvcalloc(). From Andrey Ryabinin. 8) No need to disable BH when fetch ip set comment, patch from Jozsef Kadlecsik. 9) Sanity check for valid sysfs entry in xt_IDLETIMER, from Taehee Yoo. 10) Fix suspicious rcu usage via ip_set() macro at netlink dump, from Jozsef Kadlecsik. 11) Fix setting default timeout via nfnetlink_cttimeout, this comes with preparation patch to add nf_{tcp,udp,...}_pernet() helper. 12) Allow ebtables table nat to be of filter type via nft_compat. From Florian Westphal. 13) Incorrect calculation of next bucket in early_drop, do no bump hash value, update bucket counter instead. From Vasily Khoruzhick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05net: bpfilter: fix iptables failure if bpfilter_umh is disabledGravatar Taehee Yoo 1-3/+3
When iptables command is executed, ip_{set/get}sockopt() try to upload bpfilter.ko if bpfilter is enabled. if it couldn't find bpfilter.ko, command is failed. bpfilter.ko is generated if CONFIG_BPFILTER_UMH is enabled. ip_{set/get}sockopt() only checks CONFIG_BPFILTER. So that if CONFIG_BPFILTER is enabled and CONFIG_BPFILTER_UMH is disabled, iptables command is always failed. test config: CONFIG_BPFILTER=y # CONFIG_BPFILTER_UMH is not set test command: %iptables -L iptables: No chain/target/match by that name. Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05sock_diag: fix autoloading of the raw_diag moduleGravatar Andrei Vagin 1-0/+1
IPPROTO_RAW isn't registred as an inet protocol, so inet_protos[protocol] is always NULL for it. Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Xin Long <lucien.xin@gmail.com> Fixes: bf2ae2e4bf93 ("sock_diag: request _diag module only when the family or proto has been registered") Signed-off-by: Andrei Vagin <avagin@gmail.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05net: core: netpoll: Enable netconsole IPv6 link local addressGravatar Matwey V. Kornilov 1-1/+2
There is no reason to discard using source link local address when remote netconsole IPv6 address is set to be link local one. The patch allows administrators to use IPv6 netconsole without explicitly configuring source address: netconsole=@/,@fe80::5054:ff:fe2f:6012/ Signed-off-by: Matwey V. Kornilov <matwey@sai.msu.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05ipv6: properly check return value in inet6_dump_all()Gravatar Alexey Kodanev 1-2/+2
Make sure we call fib6_dump_end() if it happens that skb->len is zero. rtnl_dump_all() can reset cb->args on the next loop iteration there. Fixes: 08e814c9e8eb ("net/ipv6: Bail early if user only wants cloned entries") Fixes: ae677bbb4441 ("net: Don't return invalid table id error when dumping all families") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05rtnetlink: restore handling of dumpit return value in rtnl_dump_all()Gravatar Alexey Kodanev 1-1/+1
For non-zero return from dumpit() we should break the loop in rtnl_dump_all() and return the result. Otherwise, e.g., we could get the memory leak in inet6_dump_fib() [1]. The pointer to the allocated struct fib6_walker there (saved in cb->args) can be lost, reset on the next iteration. Fix it by partially restoring the previous behavior before commit c63586dc9b3e ("net: rtnl_dump_all needs to propagate error from dumpit function"). The returned error from dumpit() is still passed further. [1]: unreferenced object 0xffff88001322a200 (size 96): comm "sshd", pid 1484, jiffies 4296032768 (age 1432.542s) hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de ................ 18 09 41 36 00 88 ff ff 18 09 41 36 00 88 ff ff ..A6......A6.... backtrace: [<0000000095846b39>] kmem_cache_alloc_trace+0x151/0x220 [<000000007d12709f>] inet6_dump_fib+0x68d/0x940 [<000000002775a316>] rtnl_dump_all+0x1d9/0x2d0 [<00000000d7cd302b>] netlink_dump+0x945/0x11a0 [<000000002f43485f>] __netlink_dump_start+0x55d/0x800 [<00000000f76bbeec>] rtnetlink_rcv_msg+0x4fa/0xa00 [<000000009b5761f3>] netlink_rcv_skb+0x29c/0x420 [<0000000087a1dae1>] rtnetlink_rcv+0x15/0x20 [<00000000691b703b>] netlink_unicast+0x4e3/0x6c0 [<00000000b5be0204>] netlink_sendmsg+0x7f2/0xba0 [<0000000096d2aa60>] sock_sendmsg+0xba/0xf0 [<000000008c1b786f>] __sys_sendto+0x1e4/0x330 [<0000000019587b3f>] __x64_sys_sendto+0xe1/0x1a0 [<00000000071f4d56>] do_syscall_64+0x9f/0x300 [<000000002737577f>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [<0000000057587684>] 0xffffffffffffffff Fixes: c63586dc9b3e ("net: rtnl_dump_all needs to propagate error from dumpit function") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05net/ipv6: Move anycast init/cleanup functions out of CONFIG_PROC_FSGravatar Jeff Barnhill 1-1/+1
Move the anycast.c init and cleanup functions which were inadvertently added inside the CONFIG_PROC_FS definition. Fixes: 2384d02520ff ("net/ipv6: Add anycast addresses to a global hashtable") Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-05sunrpc: correct the computation for page_ptr when truncatingGravatar Frank Sorenson 1-3/+2
When truncating the encode buffer, the page_ptr is getting advanced, causing the next page to be skipped while encoding. The page is still included in the response, so the response contains a page of bogus data. We need to adjust the page_ptr backwards to ensure we encode the next page into the correct place. We saw this triggered when concurrent directory modifications caused nfsd4_encode_direct_fattr() to return nfserr_noent, and the resulting call to xdr_truncate_encode() corrupted the READDIR reply. Signed-off-by: Frank Sorenson <sorenson@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-11-04Merge tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsGravatar Linus Torvalds 3-34/+14
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Bugfix: - Fix build issues on architectures that don't provide 64-bit cmpxchg Cleanups: - Fix a spelling mistake" * tag 'nfs-for-4.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: fix spelling mistake, EACCESS -> EACCES SUNRPC: Use atomic(64)_t for seq_send(64)
2018-11-03sctp: define SCTP_SS_DEFAULT for Stream schedulersGravatar Xin Long 1-1/+1
According to rfc8260#section-4.3.2, SCTP_SS_DEFAULT is required to defined as SCTP_SS_FCFS or SCTP_SS_RR. SCTP_SS_FCFS is used for SCTP_SS_DEFAULT's value in this patch. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03openvswitch: fix linking without CONFIG_NF_CONNTRACK_LABELSGravatar Arnd Bergmann 1-1/+2
When CONFIG_CC_OPTIMIZE_FOR_DEBUGGING is enabled, the compiler fails to optimize out a dead code path, which leads to a link failure: net/openvswitch/conntrack.o: In function `ovs_ct_set_labels': conntrack.c:(.text+0x2e60): undefined reference to `nf_connlabels_replace' In this configuration, we can take a shortcut, and completely remove the contrack label code. This may also help the regular optimization. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03Merge branch 'x86-urgent-for-linus' of ↵Gravatar Linus Torvalds 2-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A number of fixes and some late updates: - make in_compat_syscall() behavior on x86-32 similar to other platforms, this touches a number of generic files but is not intended to impact non-x86 platforms. - objtool fixes - PAT preemption fix - paravirt fixes/cleanups - cpufeatures updates for new instructions - earlyprintk quirk - make microcode version in sysfs world-readable (it is already world-readable in procfs) - minor cleanups and fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: compat: Cleanup in_compat_syscall() callers x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT objtool: Support GCC 9 cold subfunction naming scheme x86/numa_emulation: Fix uniform-split numa emulation x86/paravirt: Remove unused _paravirt_ident_32 x86/mm/pat: Disable preemption around __flush_tlb_all() x86/paravirt: Remove GPL from pv_ops export x86/traps: Use format string with panic() call x86: Clean up 'sizeof x' => 'sizeof(x)' x86/cpufeatures: Enumerate MOVDIR64B instruction x86/cpufeatures: Enumerate MOVDIRI instruction x86/earlyprintk: Add a force option for pciserial device objtool: Support per-function rodata sections x86/microcode: Make revision and processor flags world-readable
2018-11-03Merge branch 'core/urgent' into x86/urgent, to pick up objtool fixGravatar Ingo Molnar 32-823/+1006
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-03net: do not abort bulk send on BQL statusGravatar Eric Dumazet 1-1/+1
Before calling dev_hard_start_xmit(), upper layers tried to cook optimal skb list based on BQL budget. Problem is that GSO packets can end up comsuming more than the BQL budget. Breaking the loop is not useful, since requeued packets are ahead of any packets still in the qdisc. It is also more expensive, since next TX completion will push these packets later, while skbs are not in cpu caches. It is also a behavior difference with TSO packets, that can break the BQL limit by a large amount. Note that drivers should use __netdev_tx_sent_queue() in order to have optimal xmit_more support, and avoid useless atomic operations as shown in the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-03Merge branch 'work.afs' of ↵Gravatar Linus Torvalds 1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 9p fix from Al Viro: "Regression fix for net/9p handling of iov_iter; broken by braino when switching to iov_iter_is_kvec() et.al., spotted and fixed by Marc" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: iov_iter: Fix 9p virtio breakage
2018-11-03netfilter: conntrack: fix calculation of next bucket number in early_dropGravatar Vasily Khoruzhick 1-5/+8
If there's no entry to drop in bucket that corresponds to the hash, early_drop() should look for it in other buckets. But since it increments hash instead of bucket number, it actually looks in the same bucket 8 times: hsize is 16k by default (14 bits) and hash is 32-bit value, so reciprocal_scale(hash, hsize) returns the same value for hash..hash+7 in most cases. Fix it by increasing bucket number instead of hash and rename _hash to bucket to avoid future confusion. Fixes: 3e86638e9a0b ("netfilter: conntrack: consider ct netns in early_drop logic") Cc: <stable@vger.kernel.org> # v4.7+ Signed-off-by: Vasily Khoruzhick <vasilykh@arista.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-03netfilter: nft_compat: ebtables 'nat' table is normal chain typeGravatar Florian Westphal 1-9/+12
Unlike ip(6)tables, the ebtables nat table has no special properties. This bug causes 'ebtables -A' to fail when using a target such as 'snat' (ebt_snat target sets ".table = "nat"'). Targets that have no table restrictions work fine. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-03netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattrGravatar Pablo Neira Ayuso 1-7/+40
Otherwise, we hit a NULL pointer deference since handlers always assume default timeout policy is passed. netlink: 24 bytes leftover after parsing attributes in process `syz-executor2'. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 9575 Comm: syz-executor1 Not tainted 4.19.0+ #312 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:icmp_timeout_obj_to_nlattr+0x77/0x170 net/netfilter/nf_conntrack_proto_icmp.c:297 Fixes: c779e849608a ("netfilter: conntrack: remove get_timeout() indirection") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-03netfilter: conntrack: add nf_{tcp,udp,sctp,icmp,dccp,icmpv6,generic}_pernet()Gravatar Pablo Neira Ayuso 7-59/+24
Expose these functions to access conntrack protocol tracker netns area, nfnetlink_cttimeout needs this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-03netfilter: ipset: Fix calling ip_set() macro at dumpingGravatar Jozsef Kadlecsik 1-4/+8
The ip_set() macro is called when either ip_set_ref_lock held only or no lock/nfnl mutex is held at dumping. Take this into account properly. Also, use Pablo's suggestion to use rcu_dereference_raw(), the ref_netlink protects the set. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-03netfilter: xt_IDLETIMER: add sysfs filename checking routineGravatar Taehee Yoo 1-0/+20
When IDLETIMER rule is added, sysfs file is created under /sys/class/xt_idletimer/timers/ But some label name shouldn't be used. ".", "..", "power", "uevent", "subsystem", etc... So that sysfs filename checking routine is needed. test commands: %iptables -I INPUT -j IDLETIMER --timeout 1 --label "power" splat looks like: [95765.423132] sysfs: cannot create duplicate filename '/devices/virtual/xt_idletimer/timers/power' [95765.433418] CPU: 0 PID: 8446 Comm: iptables Not tainted 4.19.0-rc6+ #20 [95765.449755] Call Trace: [95765.449755] dump_stack+0xc9/0x16b [95765.449755] ? show_regs_print_info+0x5/0x5 [95765.449755] sysfs_warn_dup+0x74/0x90 [95765.449755] sysfs_add_file_mode_ns+0x352/0x500 [95765.449755] sysfs_create_file_ns+0x179/0x270 [95765.449755] ? sysfs_add_file_mode_ns+0x500/0x500 [95765.449755] ? idletimer_tg_checkentry+0x3e5/0xb1b [xt_IDLETIMER] [95765.449755] ? rcu_read_lock_sched_held+0x114/0x130 [95765.449755] ? __kmalloc_track_caller+0x211/0x2b0 [95765.449755] ? memcpy+0x34/0x50 [95765.449755] idletimer_tg_checkentry+0x4e2/0xb1b [xt_IDLETIMER] [ ... ] Fixes: 0902b469bd25 ("netfilter: xtables: idletimer target implementation") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-02rxrpc: Fix lockup due to no error backoff after ack transmit errorGravatar David Howells 3-8/+46
If the network becomes (partially) unavailable, say by disabling IPv6, the background ACK transmission routine can get itself into a tizzy by proposing immediate ACK retransmission. Since we're in the call event processor, that happens immediately without returning to the workqueue manager. The condition should clear after a while when either the network comes back or the call times out. Fix this by: (1) When re-proposing an ACK on failed Tx, don't schedule it immediately. This will allow a certain amount of time to elapse before we try again. (2) Enforce a return to the workqueue manager after a certain number of iterations of the call processing loop. (3) Add a backoff delay that increases the delay on deferred ACKs by a jiffy per failed transmission to a limit of HZ. The backoff delay is cleared on a successful return from kernel_sendmsg(). (4) Cancel calls immediately if the opening sendmsg fails. The layer above can arrange retransmission or rotate to another server. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-02net/ipv6: Add anycast addresses to a global hashtableGravatar Jeff Barnhill 2-4/+81
icmp6_send() function is expensive on systems with a large number of interfaces. Every time it’s called, it has to verify that the source address does not correspond to an existing anycast address by looping through every device and every anycast address on the device. This can result in significant delays for a CPU when there are a large number of neighbors and ND timers are frequently timing out and calling neigh_invalidate(). Add anycast addresses to a global hashtable to allow quick searching for matching anycast addresses. This is based on inet6_addr_lst in addrconf.c. Signed-off-by: Jeff Barnhill <0xeffeff@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-02net: document skb parameter in function 'skb_gso_size_check'Gravatar Mathieu Malaterre 1-0/+2
Remove kernel-doc warning: net/core/skbuff.c:4953: warning: Function parameter or member 'skb' not described in 'skb_gso_size_check' Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-02iov_iter: Fix 9p virtio breakageGravatar Marc Zyngier 1-1/+1
When switching to the new iovec accessors, a negation got subtly dropped, leading to 9p being remarkably broken (here with kvmtool): [ 7.430941] VFS: Mounted root (9p filesystem) on device 0:15. [ 7.432080] devtmpfs: mounted [ 7.432717] Freeing unused kernel memory: 1344K [ 7.433658] Run /virt/init as init process Warning: unable to translate guest address 0x7e00902ff000 to host Warning: unable to translate guest address 0x7e00902fefc0 to host Warning: unable to translate guest address 0x7e00902ff000 to host Warning: unable to translate guest address 0x7e008febef80 to host Warning: unable to translate guest address 0x7e008febf000 to host Warning: unable to translate guest address 0x7e008febef00 to host Warning: unable to translate guest address 0x7e008febf000 to host [ 7.436376] Kernel panic - not syncing: Requested init /virt/init failed (error -8). [ 7.437554] CPU: 29 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc8-02267-g00e23707442a #291 [ 7.439006] Hardware name: linux,dummy-virt (DT) [ 7.439902] Call trace: [ 7.440387] dump_backtrace+0x0/0x148 [ 7.441104] show_stack+0x14/0x20 [ 7.441768] dump_stack+0x90/0xb4 [ 7.442425] panic+0x120/0x27c [ 7.443036] kernel_init+0xa4/0x100 [ 7.443725] ret_from_fork+0x10/0x18 [ 7.444444] SMP: stopping secondary CPUs [ 7.445391] Kernel Offset: disabled [ 7.446169] CPU features: 0x0,23000438 [ 7.446974] Memory Limit: none [ 7.447645] ---[ end Kernel panic - not syncing: Requested init /virt/init failed (error -8). ]--- Restoring the missing "!" brings the guest back to life. Fixes: 00e23707442a ("iov_iter: Use accessor function") Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-01Merge branch 'work.afs' of ↵Gravatar Linus Torvalds 14-22/+22
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "AFS series, with some iov_iter bits included" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) missing bits of "iov_iter: Separate type from direction and use accessor functions" afs: Probe multiple fileservers simultaneously afs: Fix callback handling afs: Eliminate the address pointer from the address list cursor afs: Allow dumping of server cursor on operation failure afs: Implement YFS support in the fs client afs: Expand data structure fields to support YFS afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Calc callback expiry in op reply delivery afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Implement the YFS cache manager service afs: Remove callback details from afs_callback_break struct afs: Commit the status on a new file/dir/symlink afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Don't invoke the server to read data beyond EOF afs: Add a couple of tracepoints to log I/O errors afs: Handle EIO from delivery function afs: Fix TTL on VL server and address lists afs: Implement VL server rotation afs: Improve FS server rotation error handling ...
2018-11-01missing bits of "iov_iter: Separate type from direction and use accessor ↵Gravatar Al Viro 1-2/+2
functions" sunrpc patches from nfs tree conflict with calling conventions change done in iov_iter work. Trivial fixup... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-11-01Merge tag 'nfs-for-4.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsGravatar Al Viro 26-1590/+1921
backmerge to do fixup of iov_iter_kvec() conflict
2018-11-01net: drop skb on failure in ip_check_defrag()Gravatar Cong Wang 1-4/+8
Most callers of pskb_trim_rcsum() simply drop the skb when it fails, however, ip_check_defrag() still continues to pass the skb up to stack. This is suspicious. In ip_check_defrag(), after we learn the skb is an IP fragment, passing the skb to callers makes no sense, because callers expect fragments are defrag'ed on success. So, dropping the skb when we can't defrag it is reasonable. Note, prior to commit 88078d98d1bb, this is not a big problem as checksum will be fixed up anyway. After it, the checksum is not correct on failure. Found this during code review. Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>