aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2022-06-29bpf: expose bpf_{g,s}etsockopt to lsm cgroupGravatar Stanislav Fomichev 1-0/+38
I don't see how to make it nice without introducing btf id lists for the hooks where these helpers are allowed. Some LSM hooks work on the locked sockets, some are triggering early and don't grab any locks, so have two lists for now: 1. LSM hooks which trigger under socket lock - minority of the hooks, but ideal case for us, we can expose existing BTF-based helpers 2. LSM hooks which trigger without socket lock, but they trigger early in the socket creation path where it should be safe to do setsockopt without any locks 3. The rest are prohibited. I'm thinking that this use-case might be a good gateway to sleeping lsm cgroup hooks in the future. We can either expose lock/unlock operations (and add tracking to the verifier) or have another set of bpf_setsockopt wrapper that grab the locks and might sleep. Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-7-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-29bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUPGravatar Stanislav Fomichev 2-32/+71
We have two options: 1. Treat all BPF_LSM_CGROUP the same, regardless of attach_btf_id 2. Treat BPF_LSM_CGROUP+attach_btf_id as a separate hook point I was doing (2) in the original patch, but switching to (1) here: * bpf_prog_query returns all attached BPF_LSM_CGROUP programs regardless of attach_btf_id * attach_btf_id is exported via bpf_prog_info Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-6-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-29bpf: minimize number of allocated lsm slots per programGravatar Stanislav Fomichev 5-16/+54
Previous patch adds 1:1 mapping between all 211 LSM hooks and bpf_cgroup program array. Instead of reserving a slot per possible hook, reserve 10 slots per cgroup for lsm programs. Those slots are dynamically allocated on demand and reclaimed. struct cgroup_bpf { struct bpf_prog_array * effective[33]; /* 0 264 */ /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */ struct hlist_head progs[33]; /* 264 264 */ /* --- cacheline 8 boundary (512 bytes) was 16 bytes ago --- */ u8 flags[33]; /* 528 33 */ /* XXX 7 bytes hole, try to pack */ struct list_head storages; /* 568 16 */ /* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */ struct bpf_prog_array * inactive; /* 584 8 */ struct percpu_ref refcnt; /* 592 16 */ struct work_struct release_work; /* 608 72 */ /* size: 680, cachelines: 11, members: 7 */ /* sum members: 673, holes: 1, sum holes: 7 */ /* last cacheline: 40 bytes */ }; Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-5-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-29bpf: per-cgroup lsm flavorGravatar Stanislav Fomichev 7-11/+426
Allow attaching to lsm hooks in the cgroup context. Attaching to per-cgroup LSM works exactly like attaching to other per-cgroup hooks. New BPF_LSM_CGROUP is added to trigger new mode; the actual lsm hook we attach to is signaled via existing attach_btf_id. For the hooks that have 'struct socket' or 'struct sock' as its first argument, we use the cgroup associated with that socket. For the rest, we use 'current' cgroup (this is all on default hierarchy == v2 only). Note that for some hooks that work on 'struct sock' we still take the cgroup from 'current' because some of them work on the socket that hasn't been properly initialized yet. Behind the scenes, we allocate a shim program that is attached to the trampoline and runs cgroup effective BPF programs array. This shim has some rudimentary ref counting and can be shared between several programs attaching to the same lsm hook from different cgroups. Note that this patch bloats cgroup size because we add 211 cgroup_bpf_attach_type(s) for simplicity sake. This will be addressed in the subsequent patch. Also note that we only add non-sleepable flavor for now. To enable sleepable use-cases, bpf_prog_run_array_cg has to grab trace rcu, shim programs have to be freed via trace rcu, cgroup_bpf.effective should be also trace-rcu-managed + maybe some other changes that I'm not aware of. Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-4-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-29bpf: convert cgroup_bpf.progs to hlistGravatar Stanislav Fomichev 1-32/+44
This lets us reclaim some space to be used by new cgroup lsm slots. Before: struct cgroup_bpf { struct bpf_prog_array * effective[23]; /* 0 184 */ /* --- cacheline 2 boundary (128 bytes) was 56 bytes ago --- */ struct list_head progs[23]; /* 184 368 */ /* --- cacheline 8 boundary (512 bytes) was 40 bytes ago --- */ u32 flags[23]; /* 552 92 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 10 boundary (640 bytes) was 8 bytes ago --- */ struct list_head storages; /* 648 16 */ struct bpf_prog_array * inactive; /* 664 8 */ struct percpu_ref refcnt; /* 672 16 */ struct work_struct release_work; /* 688 32 */ /* size: 720, cachelines: 12, members: 7 */ /* sum members: 716, holes: 1, sum holes: 4 */ /* last cacheline: 16 bytes */ }; After: struct cgroup_bpf { struct bpf_prog_array * effective[23]; /* 0 184 */ /* --- cacheline 2 boundary (128 bytes) was 56 bytes ago --- */ struct hlist_head progs[23]; /* 184 184 */ /* --- cacheline 5 boundary (320 bytes) was 48 bytes ago --- */ u8 flags[23]; /* 368 23 */ /* XXX 1 byte hole, try to pack */ /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */ struct list_head storages; /* 392 16 */ struct bpf_prog_array * inactive; /* 408 8 */ struct percpu_ref refcnt; /* 416 16 */ struct work_struct release_work; /* 432 72 */ /* size: 504, cachelines: 8, members: 7 */ /* sum members: 503, holes: 1, sum holes: 1 */ /* last cacheline: 56 bytes */ }; Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-3-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-29bpf: add bpf_func_t and trampoline helpersGravatar Stanislav Fomichev 1-30/+33
I'll be adding lsm cgroup specific helpers that grab trampoline mutex. No functional changes. Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-2-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-24bpf: Merge "types_are_compat" logic into relo_core.cGravatar Daniel Müller 1-83/+1
BPF type compatibility checks (bpf_core_types_are_compat()) are currently duplicated between kernel and user space. That's a historical artifact more than intentional doing and can lead to subtle bugs where one implementation is adjusted but another is forgotten. That happened with the enum64 work, for example, where the libbpf side was changed (commit 23b2a3a8f63a ("libbpf: Add enum64 relocation support")) to use the btf_kind_core_compat() helper function but the kernel side was not (commit 6089fb325cf7 ("bpf: Add btf enum64 support")). This patch addresses both the duplication issue, by merging both implementations and moving them into relo_core.c, and fixes the alluded to kind check (by giving preference to libbpf's already adjusted logic). For discussion of the topic, please refer to: https://lore.kernel.org/bpf/CAADnVQKbWR7oarBdewgOBZUPzryhRYvEbkhyPJQHHuxq=0K1gw@mail.gmail.com/T/#mcc99f4a33ad9a322afaf1b9276fb1f0b7add9665 Changelog: v1 -> v2: - limited libbpf recursion limit to 32 - changed name to __bpf_core_types_are_compat - included warning previously present in libbpf version - merged kernel and user space changes into a single patch Signed-off-by: Daniel Müller <deso@posteo.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220623182934.2582827-1-deso@posteo.net
2022-06-24bpf: Fix for use-after-free bug in inline_bpf_loopGravatar Eduard Zingerman 1-1/+1
As reported by Dan Carpenter, the following statements in inline_bpf_loop() might cause a use-after-free bug: struct bpf_prog *new_prog; // ... new_prog = bpf_patch_insn_data(env, position, insn_buf, *cnt); // ... env->prog->insnsi[call_insn_offset].imm = callback_offset; The bpf_patch_insn_data() might free the memory used by env->prog. Fixes: 1ade23711971 ("bpf: Inline calls to bpf_loop when callback is known") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220624020613.548108-2-eddyz87@gmail.com
2022-06-24bpf: Replace hard-coded 0 with BPF_K in check_alu_opGravatar Simon Wang 1-1/+1
Enhance readability a bit. Signed-off-by: Simon Wang <wangchuanguo@inspur.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220622031923.65692-1-wangchuanguo@inspur.com
2022-06-23bpf: Require only one of cong_avoid() and cong_control() from a TCP CCGravatar Jörn-Thorben Hinz 1-4/+3
Remove the check for required and optional functions in a struct tcp_congestion_ops from bpf_tcp_ca.c. Rely on tcp_register_congestion_control() to reject a BPF CC that does not implement all required functions, as it will do for a non-BPF CC. When a CC implements tcp_congestion_ops.cong_control(), the alternate cong_avoid() is not in use in the TCP stack. Previously, a BPF CC was still forced to implement cong_avoid() as a no-op since it was non-optional in bpf_tcp_ca.c. Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220622191227.898118-3-jthinz@mailbox.tu-berlin.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-21bpf, x64: Add predicate for bpf2bpf with tailcalls support in JITGravatar Tony Ambardar 2-1/+8
The BPF core/verifier is hard-coded to permit mixing bpf2bpf and tail calls for only x86-64. Change the logic to instead rely on a new weak function 'bool bpf_jit_supports_subprog_tailcalls(void)', which a capable JIT backend can override. Update the x86-64 eBPF JIT to reflect this. Signed-off-by: Tony Ambardar <Tony.Ambardar@gmail.com> [jakub: drop MIPS bits and tweak patch subject] Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220617105735.733938-2-jakub@cloudflare.com
2022-06-20bpf: Inline calls to bpf_loop when callback is knownGravatar Eduard Zingerman 2-9/+180
Calls to `bpf_loop` are replaced with direct loops to avoid indirection. E.g. the following: bpf_loop(10, foo, NULL, 0); Is replaced by equivalent of the following: for (int i = 0; i < 10; ++i) foo(i, NULL); This transformation could be applied when: - callback is known and does not change during program execution; - flags passed to `bpf_loop` are always zero. Inlining logic works as follows: - During execution simulation function `update_loop_inline_state` tracks the following information for each `bpf_loop` call instruction: - is callback known and constant? - are flags constant and zero? - Function `optimize_bpf_loop` increases stack depth for functions where `bpf_loop` calls can be inlined and invokes `inline_bpf_loop` to apply the inlining. The additional stack space is used to spill registers R6, R7 and R8. These registers are used as loop counter, loop maximal bound and callback context parameter; Measurements using `benchs/run_bench_bpf_loop.sh` inside QEMU / KVM on i7-4710HQ CPU show a drop in latency from 14 ns/op to 2 ns/op. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/r/20220620235344.569325-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-20uprobe: gate bpf call behind BPF_EVENTSGravatar Delyan Kratunov 1-0/+2
The call into bpf from uprobes needs to be gated now that it doesn't use the trace_events.h helpers. Randy found this as a randconfig build failure on linux-next [1]. [1]: https://lore.kernel.org/linux-next/2de99180-7d55-2fdf-134d-33198c27cc58@infradead.org/ Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Delyan Kratunov <delyank@fb.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/cb8bfbbcde87ed5d811227a393ef4925f2aadb7b.camel@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-17Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextGravatar Jakub Kicinski 10-65/+277
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-06-17 We've added 72 non-merge commits during the last 15 day(s) which contain a total of 92 files changed, 4582 insertions(+), 834 deletions(-). The main changes are: 1) Add 64 bit enum value support to BTF, from Yonghong Song. 2) Implement support for sleepable BPF uprobe programs, from Delyan Kratunov. 3) Add new BPF helpers to issue and check TCP SYN cookies without binding to a socket especially useful in synproxy scenarios, from Maxim Mikityanskiy. 4) Fix libbpf's internal USDT address translation logic for shared libraries as well as uprobe's symbol file offset calculation, from Andrii Nakryiko. 5) Extend libbpf to provide an API for textual representation of the various map/prog/attach/link types and use it in bpftool, from Daniel Müller. 6) Provide BTF line info for RV64 and RV32 JITs, and fix a put_user bug in the core seen in 32 bit when storing BPF function addresses, from Pu Lehui. 7) Fix libbpf's BTF pointer size guessing by adding a list of various aliases for 'long' types, from Douglas Raillard. 8) Fix bpftool to readd setting rlimit since probing for memcg-based accounting has been unreliable and caused a regression on COS, from Quentin Monnet. 9) Fix UAF in BPF cgroup's effective program computation triggered upon BPF link detachment, from Tadeusz Struk. 10) Fix bpftool build bootstrapping during cross compilation which was pointing to the wrong AR process, from Shahab Vahedi. 11) Fix logic bug in libbpf's is_pow_of_2 implementation, from Yuze Chi. 12) BPF hash map optimization to avoid grabbing spinlocks of all CPUs when there is no free element. Also add a benchmark as reproducer, from Feng Zhou. 13) Fix bpftool's codegen to bail out when there's no BTF, from Michael Mullin. 14) Various minor cleanup and improvements all over the place. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits) bpf: Fix bpf_skc_lookup comment wrt. return type bpf: Fix non-static bpf_func_proto struct definitions selftests/bpf: Don't force lld on non-x86 architectures selftests/bpf: Add selftests for raw syncookie helpers in TC mode bpf: Allow the new syncookie helpers to work with SKBs selftests/bpf: Add selftests for raw syncookie helpers bpf: Add helpers to issue and check SYN cookies in XDP bpf: Allow helpers to accept pointers with a fixed size bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie selftests/bpf: add tests for sleepable (uk)probes libbpf: add support for sleepable uprobe programs bpf: allow sleepable uprobe programs to attach bpf: implement sleepable uprobes by chaining gps bpf: move bpf_prog to bpf.h libbpf: Fix internal USDT address translation logic for shared libraries samples/bpf: Check detach prog exist or not in xdp_fwd selftests/bpf: Avoid skipping certain subtests selftests/bpf: Fix test_varlen verification failure with latest llvm bpftool: Do not check return value from libbpf_set_strict_mode() Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK" ... ==================== Link: https://lore.kernel.org/r/20220617220836.7373-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-17bpf: Fix non-static bpf_func_proto struct definitionsGravatar Joanne Koong 2-7/+7
This patch does two things: 1) Marks the dynptr bpf_func_proto structs that were added in [1] as static, as pointed out by the kernel test robot in [2]. 2) There are some bpf_func_proto structs marked as extern which can instead be statically defined. [1] https://lore.kernel.org/bpf/20220523210712.3641569-1-joannelkoong@gmail.com/ [2] https://lore.kernel.org/bpf/62ab89f2.Pko7sI08RAKdF8R6%25lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220616225407.1878436-1-joannelkoong@gmail.com
2022-06-16bpf: Allow helpers to accept pointers with a fixed sizeGravatar Maxim Mikityanskiy 1-11/+32
Before this commit, the BPF verifier required ARG_PTR_TO_MEM arguments to be followed by ARG_CONST_SIZE holding the size of the memory region. The helpers had to check that size in runtime. There are cases where the size expected by a helper is a compile-time constant. Checking it in runtime is an unnecessary overhead and waste of BPF registers. This commit allows helpers to accept pointers to memory without the corresponding ARG_CONST_SIZE, given that they define the memory region size in struct bpf_func_proto and use ARG_PTR_TO_FIXED_SIZE_MEM type. arg_size is unionized with arg_btf_id to reduce the kernel image size, and it's valid because they are used by different argument types. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220615134847.3753567-3-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16bpf: allow sleepable uprobe programs to attachGravatar Delyan Kratunov 2-8/+12
uprobe and kprobe programs have the same program type, KPROBE, which is currently not allowed to load sleepable programs. To avoid adding a new UPROBE type, instead allow sleepable KPROBE programs to load and defer the is-it-actually-a-uprobe-program check to attachment time, where there's already validation of the corresponding perf_event. A corollary of this patch is that you can now load a sleepable kprobe program but cannot attach it. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/fcd44a7cd204f372f6bb03ef794e829adeaef299.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-16bpf: implement sleepable uprobes by chaining gpsGravatar Delyan Kratunov 3-5/+19
uprobes work by raising a trap, setting a task flag from within the interrupt handler, and processing the actual work for the uprobe on the way back to userspace. As a result, uprobe handlers already execute in a might_fault/_sleep context. The primary obstacle to sleepable bpf uprobe programs is therefore on the bpf side. Namely, the bpf_prog_array attached to the uprobe is protected by normal rcu. In order for uprobe bpf programs to become sleepable, it has to be protected by the tasks_trace rcu flavor instead (and kfree() called after a corresponding grace period). Therefore, the free path for bpf_prog_array now chains a tasks_trace and normal grace periods one after the other. Users who iterate under tasks_trace read section would be safe, as would users who iterate under normal read sections (from non-sleepable locations). The downside is that the tasks_trace latency affects all perf_event-attached bpf programs (and not just uprobe ones). This is deemed safe given the possible attach rates for kprobe/uprobe/tp programs. Separately, non-sleepable programs need access to dynamically sized rcu-protected maps, so bpf_run_prog_array_sleepables now conditionally takes an rcu read section, in addition to the overarching tasks_trace section. Signed-off-by: Delyan Kratunov <delyank@fb.com> Link: https://lore.kernel.org/r/ce844d62a2fd0443b08c5ab02e95bc7149f9aeb1.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-13cfi: Fix __cfi_slowpath_diag RCU usage with cpuidleGravatar Sami Tolvanen 1-6/+16
RCU_NONIDLE usage during __cfi_slowpath_diag can result in an invalid RCU state in the cpuidle code path: WARNING: CPU: 1 PID: 0 at kernel/rcu/tree.c:613 rcu_eqs_enter+0xe4/0x138 ... Call trace: rcu_eqs_enter+0xe4/0x138 rcu_idle_enter+0xa8/0x100 cpuidle_enter_state+0x154/0x3a8 cpuidle_enter+0x3c/0x58 do_idle.llvm.6590768638138871020+0x1f4/0x2ec cpu_startup_entry+0x28/0x2c secondary_start_kernel+0x1b8/0x220 __secondary_switched+0x94/0x98 Instead, call rcu_irq_enter/exit to wake up RCU only when needed and disable interrupts for the entire CFI shadow/module check when we do. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20220531175910.890307-1-samitolvanen@google.com Fixes: cf68fffb66d6 ("add support for Clang CFI") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2022-06-12Merge tag 'wq-for-5.19-rc1-fixes' of ↵Gravatar Linus Torvalds 1-4/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Tetsuo's patch to trigger build warnings if system-wide wq's are flushed along with a TP type update and trivial comment update" * tag 'wq-for-5.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Switch to new kerneldoc syntax for named variable macro argument workqueue: Fix type of cpu in trace event workqueue: Wrap flush_workqueue() using a macro
2022-06-11bpf: avoid grabbing spin_locks of all cpus when no free elemsGravatar Feng Zhou 1-6/+14
This patch use head->first in pcpu_freelist_head to check freelist having free or not. If having, grab spin_lock, or check next cpu's freelist. Before patch: hash_map performance ./map_perf_test 1 0:hash_map_perf pre-alloc 1043397 events per sec ... The average of the test results is around 1050000 events per sec. hash_map the worst: no free ./run_bench_bpf_hashmap_full_update.sh Setting up benchmark 'bpf-hashmap-ful-update'... Benchmark 'bpf-hashmap-ful-update' started. 1:hash_map_full_perf 15687 events per sec ... The average of the test results is around 16000 events per sec. ftrace trace: 0) | htab_map_update_elem() { 0) | __pcpu_freelist_pop() { 0) | _raw_spin_lock() 0) | _raw_spin_unlock() 0) | ... 0) + 25.188 us | } 0) + 28.439 us | } The test machine is 16C, trying to get spin_lock 17 times, in addition to 16c, there is an extralist. after patch: hash_map performance ./map_perf_test 1 0:hash_map_perf pre-alloc 1053298 events per sec ... The average of the test results is around 1050000 events per sec. hash_map worst: no free ./run_bench_bpf_hashmap_full_update.sh Setting up benchmark 'bpf-hashmap-ful-update'... Benchmark 'bpf-hashmap-ful-update' started. 1:hash_map_full_perf 555830 events per sec ... The average of the test results is around 550000 events per sec. ftrace trace: 0) | htab_map_update_elem() { 0) | alloc_htab_elem() { 0) 0.586 us | __pcpu_freelist_pop(); 0) 0.945 us | } 0) 8.669 us | } It can be seen that after adding this patch, the map performance is almost not degraded, and when free=0, first check head->first instead of directly acquiring spin_lock. Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com> Link: https://lore.kernel.org/r/20220610023308.93798-2-zhoufeng.zf@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-10Merge tag 'pm-5.19-rc2' of ↵Gravatar Linus Torvalds 1-25/+62
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix an intel_idle issue introduced during the 5.16 development cycle and two recent regressions in the system reboot/poweroff code. Specifics: - Fix CPUIDLE_FLAG_IRQ_ENABLE handling in intel_idle (Peter Zijlstra) - Allow all platforms to use the global poweroff handler and make non-syscall poweroff code paths work again (Dmitry Osipenko)" * tag 'pm-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE kernel/reboot: Fix powering off using a non-syscall code paths kernel/reboot: Use static handler for register_platform_power_off()
2022-06-10Merge branch 'pm-sysoff'Gravatar Rafael J. Wysocki 1-25/+62
Merge fixes for regressions introduced by the recent rework of the system reboot/poweroff code. * pm-sysoff: kernel/reboot: Fix powering off using a non-syscall code paths kernel/reboot: Use static handler for register_platform_power_off()
2022-06-10Merge tag 'for-linus-5.19a-rc2-tag' of ↵Gravatar Linus Torvalds 2-1/+28
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - a small cleanup removing "export" of an __init function - a small series adding a new infrastructure for platform flags - a series adding generic virtio support for Xen guests (frontend side) * tag 'for-linus-5.19a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: unexport __init-annotated xen_xlate_map_ballooned_pages() arm/xen: Assign xen-grant DMA ops for xen-grant DMA devices xen/grant-dma-ops: Retrieve the ID of backend's domain for DT devices xen/grant-dma-iommu: Introduce stub IOMMU driver dt-bindings: Add xen,grant-dma IOMMU description for xen-grant DMA ops xen/virtio: Enable restricted memory access using Xen grant mappings xen/grant-dma-ops: Add option to restrict memory access under Xen xen/grants: support allocating consecutive grants arm/xen: Introduce xen_setup_dma_ops() virtio: replace arch_has_restricted_virtio_memory_access() kernel: add platform_has() infrastructure
2022-06-09Merge tag 'net-5.19-rc2' of ↵Gravatar Linus Torvalds 2-5/+6
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - eth: amt: fix possible null-ptr-deref in amt_rcv() Previous releases - regressions: - tcp: use alloc_large_system_hash() to allocate table_perturb - af_unix: fix a data-race in unix_dgram_peer_wake_me() - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF Previous releases - always broken: - ipv6: fix signed integer overflow in __ip6_append_data - netfilter: - nat: really support inet nat without l3 address - nf_tables: memleak flow rule from commit path - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs - openvswitch: fix misuse of the cached connection on tuple changes - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred - eth: altera: fix refcount leak in altera_tse_mdio_create Misc: - add Quentin Monnet to bpftool maintainers" * tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) net: amd-xgbe: fix clang -Wformat warning tcp: use alloc_large_system_hash() to allocate table_perturb net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY net: dsa: mv88e6xxx: correctly report serdes link failure net: dsa: mv88e6xxx: fix BMSR error to be consistent with others net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete net: altera: Fix refcount leak in altera_tse_mdio_create net: openvswitch: fix misuse of the cached connection on tuple changes net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag ip_gre: test csum_start instead of transport header au1000_eth: stop using virt_to_bus() ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg ipv6: Fix signed integer overflow in __ip6_append_data nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION net: ipv6: unexport __init-annotated seg6_hmac_init() net: xfrm: unexport __init-annotated xfrm4_protocol_init() net: mdio: unexport __init-annotated mdio_bus_init() ...
2022-06-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmGravatar Linus Torvalds 1-6/+0
Pull KVM fixes from Paolo Bonzini: - syzkaller NULL pointer dereference - TDP MMU performance issue with disabling dirty logging - 5.14 regression with SVM TSC scaling - indefinite stall on applying live patches - unstable selftest - memory leak from wrong copy-and-paste - missed PV TLB flush when racing with emulation * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: do not report a vCPU as preempted outside instruction boundaries KVM: x86: do not set st->preempted when going back to user space KVM: SVM: fix tsc scaling cache logic KVM: selftests: Make hyperv_clock selftest more stable KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm() KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots() entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set KVM: Don't null dereference ops->destroy
2022-06-07kernel/reboot: Fix powering off using a non-syscall code pathsGravatar Dmitry Osipenko 1-20/+26
There are other methods of powering off machine than the reboot syscall. Previously we missed to cover those methods and it created power-off regression for some machines, like the PowerPC e500. Fix this problem by moving the legacy sys-off handler registration to the latest phase of power-off process and making the kernel_can_power_off() check the legacy pm_power_off presence. Tested-by: Michael Ellerman <mpe@ellerman.id.au> # ppce500 Reported-by: Michael Ellerman <mpe@ellerman.id.au> # ppce500 Fixes: da007f171fc9 ("kernel/reboot: Change registration order of legacy power-off handler") Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-06-07bpf: Fix calling global functions from BPF_PROG_TYPE_EXT programsGravatar Toke Høiland-Jørgensen 1-1/+2
The verifier allows programs to call global functions as long as their argument types match, using BTF to check the function arguments. One of the allowed argument types to such global functions is PTR_TO_CTX; however the check for this fails on BPF_PROG_TYPE_EXT functions because the verifier uses the wrong type to fetch the vmlinux BTF ID for the program context type. This failure is seen when an XDP program is loaded using libxdp (which loads it as BPF_PROG_TYPE_EXT and attaches it to a global XDP type program). Fix the issue by passing in the target program type instead of the BPF_PROG_TYPE_EXT type to bpf_prog_get_ctx() when checking function argument compatibility. The first Fixes tag refers to the latest commit that touched the code in question, while the second one points to the code that first introduced the global function call verification. v2: - Use resolve_prog_type() Fixes: 3363bd0cfbb8 ("bpf: Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support") Fixes: 51c39bb1d5d1 ("bpf: Introduce function-by-function verification") Reported-by: Simon Sundberg <simon.sundberg@kau.se> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20220606075253.28422-1-toke@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07bpf: Use safer kvmalloc_array() where possibleGravatar Dan Carpenter 1-4/+4
The kvmalloc_array() function is safer because it has a check for integer overflows. These sizes come from the user and I was not able to see any bounds checking so an integer overflow seems like a realistic concern. Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/Yo9VRVMeHbALyjUH@kili Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07bpf: Add btf enum64 supportGravatar Yonghong Song 2-15/+129
Currently, BTF only supports upto 32bit enum value with BTF_KIND_ENUM. But in kernel, some enum indeed has 64bit values, e.g., in uapi bpf.h, we have enum { BPF_F_INDEX_MASK = 0xffffffffULL, BPF_F_CURRENT_CPU = BPF_F_INDEX_MASK, BPF_F_CTXLEN_MASK = (0xfffffULL << 32), }; In this case, BTF_KIND_ENUM will encode the value of BPF_F_CTXLEN_MASK as 0, which certainly is incorrect. This patch added a new btf kind, BTF_KIND_ENUM64, which permits 64bit value to cover the above use case. The BTF_KIND_ENUM64 has the following three fields followed by the common type: struct bpf_enum64 { __u32 nume_off; __u32 val_lo32; __u32 val_hi32; }; Currently, btf type section has an alignment of 4 as all element types are u32. Representing the value with __u64 will introduce a pad for bpf_enum64 and may also introduce misalignment for the 64bit value. Hence, two members of val_hi32 and val_lo32 are chosen to avoid these issues. The kflag is also introduced for BTF_KIND_ENUM and BTF_KIND_ENUM64 to indicate whether the value is signed or unsigned. The kflag intends to provide consistent output of BTF C fortmat with the original source code. For example, the original BTF_KIND_ENUM bit value is 0xffffffff. The format C has two choices, printing out 0xffffffff or -1 and current libbpf prints out as unsigned value. But if the signedness is preserved in btf, the value can be printed the same as the original source code. The kflag value 0 means unsigned values, which is consistent to the default by libbpf and should also cover most cases as well. The new BTF_KIND_ENUM64 is intended to support the enum value represented as 64bit value. But it can represent all BTF_KIND_ENUM values as well. The compiler ([1]) and pahole will generate BTF_KIND_ENUM64 only if the value has to be represented with 64 bits. In addition, a static inline function btf_kind_core_compat() is introduced which will be used later when libbpf relo_core.c changed. Here the kernel shares the same relo_core.c with libbpf. [1] https://reviews.llvm.org/D124641 Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220607062600.3716578-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-06-07workqueue: Wrap flush_workqueue() using a macroGravatar Tetsuo Handa 1-4/+12
Since flush operation synchronously waits for completion, flushing system-wide WQs (e.g. system_wq) might introduce possibility of deadlock due to unexpected locking dependency. Tejun Heo commented at [1] that it makes no sense at all to call flush_workqueue() on the shared WQs as the caller has no idea what it's gonna end up waiting for. Although there is flush_scheduled_work() which flushes system_wq WQ with "Think twice before calling this function! It's very easy to get into trouble if you don't take great care." warning message, syzbot found a circular locking dependency caused by flushing system_wq WQ [2]. Therefore, let's change the direction to that developers had better use their local WQs if flush_scheduled_work()/flush_workqueue(system_*_wq) is inevitable. Steps for converting system-wide WQs into local WQs are explained at [3], and a conversion to stop flushing system-wide WQs is in progress. Now we want some mechanism for preventing developers who are not aware of this conversion from again start flushing system-wide WQs. Since I found that WARN_ON() is complete but awkward approach for teaching developers about this problem, let's use __compiletime_warning() for incomplete but handy approach. For completeness, we will also insert WARN_ON() into __flush_workqueue() after all in-tree users stopped calling flush_scheduled_work(). Link: https://lore.kernel.org/all/YgnQGZWT%2Fn3VAITX@slm.duckdns.org/ [1] Link: https://syzkaller.appspot.com/bug?extid=bde0f89deacca7c765b8 [2] Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp [3] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Tejun Heo <tj@kernel.org>
2022-06-07entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is setGravatar Seth Forshee 1-6/+0
A livepatch transition may stall indefinitely when a kvm vCPU is heavily loaded. To the host, the vCPU task is a user thread which is spending a very long time in the ioctl(KVM_RUN) syscall. During livepatch transition, set_notify_signal() will be called on such tasks to interrupt the syscall so that the task can be transitioned. This interrupts guest execution, but when xfer_to_guest_mode_work() sees that TIF_NOTIFY_SIGNAL is set but not TIF_SIGPENDING it concludes that an exit to user mode is unnecessary, and guest execution is resumed without transitioning the task for the livepatch. This handling of TIF_NOTIFY_SIGNAL is incorrect, as set_notify_signal() is expected to break tasks out of interruptible kernel loops and cause them to return to userspace. Change xfer_to_guest_mode_work() to handle TIF_NOTIFY_SIGNAL the same as TIF_SIGPENDING, signaling to the vCPU run loop that an exit to userpsace is needed. Any pending task_work will be run when get_signal() is called from exit_to_user_mode_loop(), so there is no longer any need to run task work from xfer_to_guest_mode_work(). Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Seth Forshee <sforshee@digitalocean.com> Message-Id: <20220504180840.2907296-1-sforshee@digitalocean.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-06Merge tag 'dma-mapping-5.19-2022-06-06' of ↵Gravatar Linus Torvalds 2-9/+7
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - fix a regressin in setting swiotlb ->force_bounce (me) - make dma-debug less chatty (Rob Clark) * tag 'dma-mapping-5.19-2022-06-06' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: fix setting ->force_bounce dma-debug: make things less spammy under memory pressure
2022-06-06kernel: add platform_has() infrastructureGravatar Juergen Gross 2-1/+28
Add a simple infrastructure for setting, resetting and querying platform feature flags. Flags can be either global or architecture specific. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> # Arm64 only Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2022-06-05Merge tag 'mm-nonmm-stable-2022-06-05' of ↵Gravatar Linus Torvalds 1-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull delay-accounting update from Andrew Morton: "A single featurette for delay accounting. Delayed a bit because, unusually, it had dependencies on both the mm-stable and mm-nonmm-stable queues" * tag 'mm-nonmm-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: delayacct: track delays from write-protect copy
2022-06-05Merge tag 'sched-urgent-2022-06-05' of ↵Gravatar Linus Torvalds 1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "Fix the fallout of sysctl code move which placed the init function wrong" * tag 'sched-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/autogroup: Fix sysctl move
2022-06-05Merge tag 'perf-urgent-2022-06-05' of ↵Gravatar Linus Torvalds 1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: - Make the ICL event constraints match reality - Remove a unused local variable * tag 'perf-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Remove unused local variable perf/x86/intel: Fix event constraints for ICL
2022-06-04Merge tag 'pull-18-rc1-work.mount' of ↵Gravatar Linus Torvalds 1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mount handling updates from Al Viro: "Cleanups (and one fix) around struct mount handling. The fix is usermode_driver.c one - once you've done kern_mount(), you must kern_unmount(); simple mntput() will end up with a leak. Several failure exits in there messed up that way... In practice you won't hit those particular failure exits without fault injection, though" * tag 'pull-18-rc1-work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: move mount-related externs from fs.h to mount.h blob_to_mnt(): kern_unmount() is needed to undo kern_mount() m->mnt_root->d_inode->i_sb is a weird way to spell m->mnt_sb... linux/mount.h: trim includes uninline may_mount() and don't opencode it in fspick(2)/fsopen(2)
2022-06-03Merge tag 'ptrace_stop-cleanup-for-v5.19' of ↵Gravatar Linus Torvalds 4-145/+93
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ptrace_stop cleanups from Eric Biederman: "While looking at the ptrace problems with PREEMPT_RT and the problems Peter Zijlstra was encountering with ptrace in his freezer rewrite I identified some cleanups to ptrace_stop that make sense on their own and move make resolving the other problems much simpler. The biggest issue is the habit of the ptrace code to change task->__state from the tracer to suppress TASK_WAKEKILL from waking up the tracee. No other code in the kernel does that and it is straight forward to update signal_wake_up and friends to make that unnecessary. Peter's task freezer sets frozen tasks to a new state TASK_FROZEN and then it stores them by calling "wake_up_state(t, TASK_FROZEN)" relying on the fact that all stopped states except the special stop states can tolerate spurious wake up and recover their state. The state of stopped and traced tasked is changed to be stored in task->jobctl as well as in task->__state. This makes it possible for the freezer to recover tasks in these special states, as well as serving as a general cleanup. With a little more work in that direction I believe TASK_STOPPED can learn to tolerate spurious wake ups and become an ordinary stop state. The TASK_TRACED state has to remain a special state as the registers for a process are only reliably available when the process is stopped in the scheduler. Fundamentally ptrace needs acess to the saved register values of a task. There are bunch of semi-random ptrace related cleanups that were found while looking at these issues. One cleanup that deserves to be called out is from commit 57b6de08b5f6 ("ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs"). This makes a change that is technically user space visible, in the handling of what happens to a tracee when a tracer dies unexpectedly. According to our testing and our understanding of userspace nothing cares that spurious SIGTRAPs can be generated in that case" * tag 'ptrace_stop-cleanup-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state ptrace: Always take siglock in ptrace_resume ptrace: Don't change __state ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs ptrace: Document that wait_task_inactive can't fail ptrace: Reimplement PTRACE_KILL by always sending SIGKILL signal: Use lockdep_assert_held instead of assert_spin_locked ptrace: Remove arch_ptrace_attach ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP signal: Replace __group_send_sig_info with send_signal_locked signal: Rename send_signal send_signal_locked
2022-06-03Merge tag 'kthread-cleanups-for-v5.19' of ↵Gravatar Linus Torvalds 3-12/+42
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull kthread updates from Eric Biederman: "This updates init and user mode helper tasks to be ordinary user mode tasks. Commit 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads") caused init and the user mode helper threads that call kernel_execve to have struct kthread allocated for them. This struct kthread going away during execve in turned made a use after free of struct kthread possible. Here, commit 343f4c49f243 ("kthread: Don't allocate kthread_struct for init and umh") is enough to fix the use after free and is simple enough to be backportable. The rest of the changes pass struct kernel_clone_args to clean things up and cause the code to make sense. In making init and the user mode helpers tasks purely user mode tasks I ran into two complications. The function task_tick_numa was detecting tasks without an mm by testing for the presence of PF_KTHREAD. The initramfs code in populate_initrd_image was using flush_delayed_fput to ensuere the closing of all it's file descriptors was complete, and flush_delayed_fput does not work in a userspace thread. I have looked and looked and more complications and in my code review I have not found any, and neither has anyone else with the code sitting in linux-next" * tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: sched: Update task_tick_numa to ignore tasks without an mm fork: Stop allowing kthreads to call execve fork: Explicitly set PF_KTHREAD init: Deal with the init process being a user mode process fork: Generalize PF_IO_WORKER handling fork: Explicity test for idle tasks in copy_thread fork: Pass struct kernel_clone_args into copy_thread kthread: Don't allocate kthread_struct for init and umh
2022-06-03Merge tag 'arm64-fixes' of ↵Gravatar Linus Torvalds 1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: "Most of issues addressed were introduced during this merging window. - Initialise jump labels before setup_machine_fdt(), needed by commit f5bda35fba61 ("random: use static branch for crng_ready()"). - Sparse warnings: missing prototype, incorrect __user annotation. - Skip SVE kselftest if not sufficient vector lengths supported" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: kselftest/arm64: signal: Skip SVE signal test if not enough VLs supported arm64: Initialize jump labels before setup_machine_fdt() arm64: hibernate: Fix syntax errors in comments arm64: Remove the __user annotation for the restore_za_context() argument ftrace/fgraph: fix increased missing-prototypes warnings
2022-06-02bpf: Fix KASAN use-after-free Read in compute_effective_progsGravatar Tadeusz Struk 1-10/+60
Syzbot found a Use After Free bug in compute_effective_progs(). The reproducer creates a number of BPF links, and causes a fault injected alloc to fail, while calling bpf_link_detach on them. Link detach triggers the link to be freed by bpf_link_free(), which calls __cgroup_bpf_detach() and update_effective_progs(). If the memory allocation in this function fails, the function restores the pointer to the bpf_cgroup_link on the cgroup list, but the memory gets freed just after it returns. After this, every subsequent call to update_effective_progs() causes this already deallocated pointer to be dereferenced in prog_list_length(), and triggers KASAN UAF error. To fix this issue don't preserve the pointer to the prog or link in the list, but remove it and replace it with a dummy prog without shrinking the table. The subsequent call to __cgroup_bpf_detach() or __cgroup_bpf_detach() will correct it. Fixes: af6eea57437a ("bpf: Implement bpf_link-based cgroup BPF program attachment") Reported-by: <syzbot+f264bffdfbd5614f3bb2@syzkaller.appspotmail.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Cc: <stable@vger.kernel.org> Link: https://syzkaller.appspot.com/bug?id=8ebf179a95c2a2670f7cf1ba62429ec044369db4 Link: https://lore.kernel.org/bpf/20220517180420.87954-1-tadeusz.struk@linaro.org
2022-06-02bpf: Correct the comment about insn_to_jit_offGravatar Pu Lehui 1-1/+1
The insn_to_jit_off passed to bpf_prog_fill_jited_linfo should be the first byte of the next instruction, or the byte off to the end of the current instruction. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220530092815.1112406-4-pulehui@huawei.com
2022-06-02bpf: Unify data extension operation of jited_ksyms and jited_linfoGravatar Pu Lehui 1-2/+3
We found that 32-bit environment can not print BPF line info due to a data inconsistency between jited_ksyms[0] and jited_linfo[0]. For example: jited_kyms[0] = 0xb800067c, jited_linfo[0] = 0xffffffffb800067c We know that both of them store BPF func address, but due to the different data extension operations when extended to u64, they may not be the same. We need to unify the data extension operations of them. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/CAEf4BzZ-eDcdJZgJ+Np7Y=V-TVjDDvOMqPwzKjyWrh=i5juv4w@mail.gmail.com Link: https://lore.kernel.org/bpf/20220530092815.1112406-2-pulehui@huawei.com
2022-06-02Merge tag 'net-5.19-rc1' of ↵Gravatar Linus Torvalds 1-9/+5
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. Current release - new code bugs: - af_packet: make sure to pull the MAC header, avoid skb panic in GSO - ptp_clockmatrix: fix inverted logic in is_single_shot() - netfilter: flowtable: fix missing FLOWI_FLAG_ANYSRC flag - dt-bindings: net: adin: fix adi,phy-output-clock description syntax - wifi: iwlwifi: pcie: rename CAUSE macro, avoid MIPS build warning Previous releases - regressions: - Revert "net: af_key: add check for pfkey_broadcast in function pfkey_process" - tcp: fix tcp_mtup_probe_success vs wrong snd_cwnd - nf_tables: disallow non-stateful expression in sets earlier - nft_limit: clone packet limits' cost value - nf_tables: double hook unregistration in netns path - ping6: fix ping -6 with interface name Previous releases - always broken: - sched: fix memory barriers to prevent skbs from getting stuck in lockless qdiscs - neigh: set lower cap for neigh_managed_work rearming, avoid constantly scheduling the probe work - bpf: fix probe read error on big endian in ___bpf_prog_run() - amt: memory leak and error handling fixes Misc: - ipv6: expand & rename accept_unsolicited_na to accept_untracked_na" * tag 'net-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (80 commits) net/af_packet: make sure to pull mac header net: add debug info to __skb_pull() net: CONFIG_DEBUG_NET depends on CONFIG_NET stmmac: intel: Add RPL-P PCI ID net: stmmac: use dev_err_probe() for reporting mdio bus registration failure tipc: check attribute length for bearer name ice: fix access-beyond-end in the switch code nfp: remove padding in nfp_nfdk_tx_desc ax25: Fix ax25 session cleanup problems net: usb: qmi_wwan: Add support for Cinterion MV31 with new baseline sfc/siena: fix wrong tx channel offset with efx_separate_tx_channels sfc/siena: fix considering that all channels have TX queues socket: Don't use u8 type in uapi socket.h net/sched: act_api: fix error code in tcf_ct_flow_table_fill_tuple_ipv6() net: ping6: Fix ping -6 with interface name macsec: fix UAF bug for real_dev octeontx2-af: fix error code in is_valid_offset() wifi: mac80211: fix use-after-free in chanctx code bonding: guard ns_targets by CONFIG_IPV6 tcp: tcp_rtx_synack() can be called from process context ...
2022-06-02module: Fix prefix for module.sig_enforce module paramGravatar Saravana Kannan 1-0/+3
Commit cfc1d277891e ("module: Move all into module/") changed the prefix of the module param by moving/renaming files. A later commit also moves the module_param() into a different file, thereby changing the prefix yet again. This would break kernel cmdline compatibility and also userspace compatibility at /sys/module/module/parameters/sig_enforce. So, set the prefix back to "module.". Fixes: cfc1d277891e ("module: Move all into module/") Link: https://lore.kernel.org/lkml/20220602034111.4163292-1-saravanak@google.com/ Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Aaron Tomlin <atomlin@redhat.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Saravana Kannan <saravanak@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-02kernel/reboot: Use static handler for register_platform_power_off()Gravatar Dmitry Osipenko 1-6/+37
The register_platform_power_off() fails on m68k platform due to the memory allocation error that happens at a very early boot time when memory allocator isn't available yet. Fix it by using a static sys-off handler for the platform-level power-off handlers. Fixes: f0f7e5265b3b ("m68k: Switch to new sys-off handler API") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-06-02Merge tag 'livepatching-for-5.19' of ↵Gravatar Linus Torvalds 1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching cleanup from Petr Mladek: - Remove duplicated livepatch code [Christophe] * tag 'livepatching-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: livepatch: Remove klp_arch_set_pc() and asm/livepatch.h
2022-06-02Merge tag 'printk-for-5.19-fixup' of ↵Gravatar Linus Torvalds 1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fixup from Petr Mladek: - Revert inappropriate use of wake_up_interruptible_all() in printk() * tag 'printk-for-5.19-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: Revert "printk: wake up all waiters"
2022-06-02swiotlb: fix setting ->force_bounceGravatar Christoph Hellwig 1-8/+6
The swiotlb_init refactor messed up assigning ->force_bounce by doing it in different places based on what caused the setting of the flag. Fix this by passing the SWIOTLB_* flags to swiotlb_init_io_tlb_mem and just setting it there. Fixes: c6af2aa9ffc9 ("swiotlb: make the swiotlb_init interface more useful") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Nathan Chancellor <nathan@kernel.org>