aboutsummaryrefslogtreecommitdiff
path: root/kernel/bpf
AgeCommit message (Collapse)AuthorFilesLines
2023-07-27bpf: Support new signed div/mod instructions.Gravatar Yonghong Song 2-18/+98
Add interpreter/jit support for new signed div/mod insns. The new signed div/mod instructions are encoded with unsigned div/mod instructions plus insn->off == 1. Also add basic verifier support to ensure new insns get accepted. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011219.3714605-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27bpf: Support new unconditional bswap instructionGravatar Yonghong Song 2-2/+19
The existing 'be' and 'le' insns will do conditional bswap depends on host endianness. This patch implements unconditional bswap insns. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011213.3712808-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27bpf: Handle sign-extenstin ctx member accessesGravatar Yonghong Song 1-0/+6
Currently, if user accesses a ctx member with signed types, the compiler will generate an unsigned load followed by necessary left and right shifts. With the introduction of sign-extension load, compiler may just emit a ldsx insn instead. Let us do a final movsx sign extension to the final unsigned ctx load result to satisfy original sign extension requirement. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011207.3712528-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27bpf: Support new sign-extension mov insnsGravatar Yonghong Song 2-28/+157
Add interpreter/jit support for new sign-extension mov insns. The original 'MOV' insn is extended to support reg-to-reg signed version for both ALU and ALU64 operations. For ALU mode, the insn->off value of 8 or 16 indicates sign-extension from 8- or 16-bit value to 32-bit value. For ALU64 mode, the insn->off value of 8/16/32 indicates sign-extension from 8-, 16- or 32-bit value to 64-bit value. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011202.3712300-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-27bpf: Support new sign-extension load insnsGravatar Yonghong Song 2-24/+137
Add interpreter/jit support for new sign-extension load insns which adds a new mode (BPF_MEMSX). Also add verifier support to recognize these insns and to do proper verification with new insns. In verifier, besides to deduce proper bounds for the dst_reg, probed memory access is also properly handled. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230728011156.3711870-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-25bpf: work around -Wuninitialized warningGravatar Arnd Bergmann 1-6/+6
Splitting these out into separate helper functions means that we actually pass an uninitialized variable into another function call if dec_active() happens to not be inlined, and CONFIG_PREEMPT_RT is disabled: kernel/bpf/memalloc.c: In function 'add_obj_to_free_list': kernel/bpf/memalloc.c:200:9: error: 'flags' is used uninitialized [-Werror=uninitialized] 200 | dec_active(c, flags); Avoid this by passing the flags by reference, so they either get initialized and dereferenced through a pointer, or the pointer never gets accessed at all. Fixes: 18e027b1c7c6d ("bpf: Factor out inc/dec of active flag into helpers.") Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230725202653.2905259-1-arnd@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-24bpf: convert to ctime accessor functionsGravatar Jeff Layton 1-4/+2
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-84-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netGravatar Jakub Kicinski 1-7/+25
Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-19bpf, net: Introduce skb_pointer_if_linear().Gravatar Alexei Starovoitov 1-1/+4
Network drivers always call skb_header_pointer() with non-null buffer. Remove !buffer check to prevent accidental misuse of skb_header_pointer(). Introduce skb_pointer_if_linear() instead. Reported-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230718234021.43640-1-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-19bpf: Add fd-based tcx multi-prog infra with link supportGravatar Daniel Borkmann 4-13/+419
This work refactors and adds a lightweight extension ("tcx") to the tc BPF ingress and egress data path side for allowing BPF program management based on fds via bpf() syscall through the newly added generic multi-prog API. The main goal behind this work which we also presented at LPC [0] last year and a recent update at LSF/MM/BPF this year [3] is to support long-awaited BPF link functionality for tc BPF programs, which allows for a model of safe ownership and program detachment. Given the rise in tc BPF users in cloud native environments, this becomes necessary to avoid hard to debug incidents either through stale leftover programs or 3rd party applications accidentally stepping on each others toes. As a recap, a BPF link represents the attachment of a BPF program to a BPF hook point. The BPF link holds a single reference to keep BPF program alive. Moreover, hook points do not reference a BPF link, only the application's fd or pinning does. A BPF link holds meta-data specific to attachment and implements operations for link creation, (atomic) BPF program update, detachment and introspection. The motivation for BPF links for tc BPF programs is multi-fold, for example: - From Meta: "It's especially important for applications that are deployed fleet-wide and that don't "control" hosts they are deployed to. If such application crashes and no one notices and does anything about that, BPF program will keep running draining resources or even just, say, dropping packets. We at FB had outages due to such permanent BPF attachment semantics. With fd-based BPF link we are getting a framework, which allows safe, auto-detachable behavior by default, unless application explicitly opts in by pinning the BPF link." [1] - From Cilium-side the tc BPF programs we attach to host-facing veth devices and phys devices build the core datapath for Kubernetes Pods, and they implement forwarding, load-balancing, policy, EDT-management, etc, within BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently experienced hard-to-debug issues in a user's staging environment where another Kubernetes application using tc BPF attached to the same prio/handle of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath it. The goal is to establish a clear/safe ownership model via links which cannot accidentally be overridden. [0,2] BPF links for tc can co-exist with non-link attachments, and the semantics are in line also with XDP links: BPF links cannot replace other BPF links, BPF links cannot replace non-BPF links, non-BPF links cannot replace BPF links and lastly only non-BPF links can replace non-BPF links. In case of Cilium, this would solve mentioned issue of safe ownership model as 3rd party applications would not be able to accidentally wipe Cilium programs, even if they are not BPF link aware. Earlier attempts [4] have tried to integrate BPF links into core tc machinery to solve cls_bpf, which has been intrusive to the generic tc kernel API with extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could be wiped from the qdisc also. Locking a tc BPF program in place this way, is getting into layering hacks given the two object models are vastly different. We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF attach API, so that the BPF link implementation blends in naturally similar to other link types which are fd-based and without the need for changing core tc internal APIs. BPF programs for tc can then be successively migrated from classic cls_bpf to the new tc BPF link without needing to change the program's source code, just the BPF loader mechanics for attaching is sufficient. For the current tc framework, there is no change in behavior with this change and neither does this change touch on tc core kernel APIs. The gist of this patch is that the ingress and egress hook have a lightweight, qdisc-less extension for BPF to attach its tc BPF programs, in other words, a minimal entry point for tc BPF. The name tcx has been suggested from discussion of earlier revisions of this work as a good fit, and to more easily differ between the classic cls_bpf attachment and the fd-based one. For the ingress and egress tcx points, the device holds a cache-friendly array with program pointers which is separated from control plane (slow-path) data. Earlier versions of this work used priority to determine ordering and expression of dependencies similar as with classic tc, but it was challenged that for something more future-proof a better user experience is required. Hence this resulted in the design and development of the generic attach/detach/query API for multi-progs. See prior patch with its discussion on the API design. tcx is the first user and later we plan to integrate also others, for example, one candidate is multi-prog support for XDP which would benefit and have the same 'look and feel' from API perspective. The goal with tcx is to have maximum compatibility to existing tc BPF programs, so they don't need to be rewritten specifically. Compatibility to call into classic tcf_classify() is also provided in order to allow successive migration or both to cleanly co-exist where needed given its all one logical tc layer and the tcx plus classic tc cls/act build one logical overall processing pipeline. tcx supports the simplified return codes TCX_NEXT which is non-terminating (go to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT. The fd-based API is behind a static key, so that when unused the code is also not entered. The struct tcx_entry's program array is currently static, but could be made dynamic if necessary at a point in future. The a/b pair swap design has been chosen so that for detachment there are no allocations which otherwise could fail. The work has been tested with tc-testing selftest suite which all passes, as well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB. Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews of this work. [0] https://lpc.events/event/16/contributions/1353/ [1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com [2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog [3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf [4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230719140858.13224-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-19bpf: Add generic attach/detach/query API for multi-progsGravatar Daniel Borkmann 2-1/+446
This adds a generic layer called bpf_mprog which can be reused by different attachment layers to enable multi-program attachment and dependency resolution. In-kernel users of the bpf_mprog don't need to care about the dependency resolution internals, they can just consume it with few API calls. The initial idea of having a generic API sparked out of discussion [0] from an earlier revision of this work where tc's priority was reused and exposed via BPF uapi as a way to coordinate dependencies among tc BPF programs, similar as-is for classic tc BPF. The feedback was that priority provides a bad user experience and is hard to use [1], e.g.: I cannot help but feel that priority logic copy-paste from old tc, netfilter and friends is done because "that's how things were done in the past". [...] Priority gets exposed everywhere in uapi all the way to bpftool when it's right there for users to understand. And that's the main problem with it. The user don't want to and don't need to be aware of it, but uapi forces them to pick the priority. [...] Your cover letter [0] example proves that in real life different service pick the same priority. They simply don't know any better. Priority is an unnecessary magic that apps _have_ to pick, so they just copy-paste and everyone ends up using the same. The course of the discussion showed more and more the need for a generic, reusable API where the "same look and feel" can be applied for various other program types beyond just tc BPF, for example XDP today does not have multi- program support in kernel, but also there was interest around this API for improving management of cgroup program types. Such common multi-program management concept is useful for BPF management daemons or user space BPF applications coordinating internally about their attachments. Both from Cilium and Meta side [2], we've collected the following requirements for a generic attach/detach/query API for multi-progs which has been implemented as part of this work: - Support prog-based attach/detach and link API - Dependency directives (can also be combined): - BPF_F_{BEFORE,AFTER} with relative_{fd,id} which can be {prog,link,none} - BPF_F_ID flag as {fd,id} toggle; the rationale for id is so that user space application does not need CAP_SYS_ADMIN to retrieve foreign fds via bpf_*_get_fd_by_id() - BPF_F_LINK flag as {prog,link} toggle - If relative_{fd,id} is none, then BPF_F_BEFORE will just prepend, and BPF_F_AFTER will just append for attaching - Enforced only at attach time - BPF_F_REPLACE with replace_bpf_fd which can be prog, links have their own infra for replacing their internal prog - If no flags are set, then it's default append behavior for attaching - Internal revision counter and optionally being able to pass expected_revision - User space application can query current state with revision, and pass it along for attachment to assert current state before doing updates - Query also gets extension for link_ids array and link_attach_flags: - prog_ids are always filled with program IDs - link_ids are filled with link IDs when link was used, otherwise 0 - {prog,link}_attach_flags for holding {prog,link}-specific flags - Must be easy to integrate/reuse for in-kernel users The uapi-side changes needed for supporting bpf_mprog are rather minimal, consisting of the additions of the attachment flags, revision counter, and expanding existing union with relative_{fd,id} member. The bpf_mprog framework consists of an bpf_mprog_entry object which holds an array of bpf_mprog_fp (fast-path structure). The bpf_mprog_cp (control-path structure) is part of bpf_mprog_bundle. Both have been separated, so that fast-path gets efficient packing of bpf_prog pointers for maximum cache efficiency. Also, array has been chosen instead of linked list or other structures to remove unnecessary indirections for a fast point-to-entry in tc for BPF. The bpf_mprog_entry comes as a pair via bpf_mprog_bundle so that in case of updates the peer bpf_mprog_entry is populated and then just swapped which avoids additional allocations that could otherwise fail, for example, in detach case. bpf_mprog_{fp,cp} arrays are currently static, but they could be converted to dynamic allocation if necessary at a point in future. Locking is deferred to the in-kernel user of bpf_mprog, for example, in case of tcx which uses this API in the next patch, it piggybacks on rtnl. An extensive test suite for checking all aspects of this API for prog-based attach/detach and link API comes as BPF selftests in this series. Thanks also to Andrii Nakryiko for early API discussions wrt Meta's BPF prog management. [0] https://lore.kernel.org/bpf/20221004231143.19190-1-daniel@iogearbox.net [1] https://lore.kernel.org/bpf/CAADnVQ+gEY3FjCR=+DmjDR4gp5bOYZUFJQXj4agKFHT9CQPZBw@mail.gmail.com [2] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20230719140858.13224-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-19bpf: allow any program to use the bpf_map_sum_elem_count kfuncGravatar Anton Protopopov 1-1/+1
Register the bpf_map_sum_elem_count func for all programs, and update the map_ptr subtest of the test_progs test to test the new functionality. The usage is allowed as long as the pointer to the map is trusted (when using tracing programs) or is a const pointer to map, as in the following example: struct { __uint(type, BPF_MAP_TYPE_HASH); ... } hash SEC(".maps"); ... static inline int some_bpf_prog(void) { struct bpf_map *map = (struct bpf_map *)&hash; __s64 count; count = bpf_map_sum_elem_count(map); ... } Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/r/20230719092952.41202-5-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-19bpf: make an argument const in the bpf_map_sum_elem_count kfuncGravatar Anton Protopopov 1-1/+1
We use the map pointer only to read the counter values, no locking involved, so mark the argument as const. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/r/20230719092952.41202-4-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-19bpf: consider CONST_PTR_TO_MAP as trusted pointer to struct bpf_mapGravatar Anton Protopopov 2-2/+2
Add the BTF id of struct bpf_map to the reg2btf_ids array. This makes the values of the CONST_PTR_TO_MAP type to be considered as trusted by kfuncs. This, in turn, allows users to execute trusted kfuncs which accept `struct bpf_map *` arguments from non-tracing programs. While exporting the btf_bpf_map_id variable, save some bytes by defining it as BTF_ID_LIST_GLOBAL_SINGLE (which is u32[1]) and not as BTF_ID_LIST (which is u32[64]). Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/r/20230719092952.41202-3-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-19bpf: consider types listed in reg2btf_ids as trustedGravatar Anton Protopopov 1-9/+12
The reg2btf_ids array contains a list of types for which we can (and need) to find a corresponding static BTF id. All the types in the list can be considered as trusted for purposes of kfuncs. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/r/20230719092952.41202-2-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-18bpf: Add 'owner' field to bpf_{list,rb}_nodeGravatar Dave Marchevsky 1-4/+25
As described by Kumar in [0], in shared ownership scenarios it is necessary to do runtime tracking of {rb,list} node ownership - and synchronize updates using this ownership information - in order to prevent races. This patch adds an 'owner' field to struct bpf_list_node and bpf_rb_node to implement such runtime tracking. The owner field is a void * that describes the ownership state of a node. It can have the following values: NULL - the node is not owned by any data structure BPF_PTR_POISON - the node is in the process of being added to a data structure ptr_to_root - the pointee is a data structure 'root' (bpf_rb_root / bpf_list_head) which owns this node The field is initially NULL (set by bpf_obj_init_field default behavior) and transitions states in the following sequence: Insertion: NULL -> BPF_PTR_POISON -> ptr_to_root Removal: ptr_to_root -> NULL Before a node has been successfully inserted, it is not protected by any root's lock, and therefore two programs can attempt to add the same node to different roots simultaneously. For this reason the intermediate BPF_PTR_POISON state is necessary. For removal, the node is protected by some root's lock so this intermediate hop isn't necessary. Note that bpf_list_pop_{front,back} helpers don't need to check owner before removing as the node-to-be-removed is not passed in as input and is instead taken directly from the list. Do the check anyways and WARN_ON_ONCE in this unexpected scenario. Selftest changes in this patch are entirely mechanical: some BTF tests have hardcoded struct sizes for structs that contain bpf_{list,rb}_node fields, those were adjusted to account for the new sizes. Selftest additions to validate the owner field are added in a further patch in the series. [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2 Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230718083813.3416104-4-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-18bpf: Introduce internal definitions for UAPI-opaque bpf_{rb,list}_nodeGravatar Dave Marchevsky 1-10/+13
Structs bpf_rb_node and bpf_list_node are opaquely defined in uapi/linux/bpf.h, as BPF program writers are not expected to touch their fields - nor does the verifier allow them to do so. Currently these structs are simple wrappers around structs rb_node and list_head and linked_list / rbtree implementation just casts and passes to library functions for those data structures. Later patches in this series, though, will add an "owner" field to bpf_{rb,list}_node, such that they're not just wrapping an underlying node type. Moreover, the bpf linked_list and rbtree implementations will deal with these owner pointers directly in a few different places. To avoid having to do void *owner = (void*)bpf_list_node + sizeof(struct list_head) with opaque UAPI node types, add bpf_{list,rb}_node_kern struct definitions to internal headers and modify linked_list and rbtree to use the internal types where appropriate. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230718083813.3416104-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-18bpf: Repeat check_max_stack_depth for async callbacksGravatar Kumar Kartikeya Dwivedi 1-2/+19
While the check_max_stack_depth function explores call chains emanating from the main prog, which is typically enough to cover all possible call chains, it doesn't explore those rooted at async callbacks unless the async callback will have been directly called, since unlike non-async callbacks it skips their instruction exploration as they don't contribute to stack depth. It could be the case that the async callback leads to a callchain which exceeds the stack depth, but this is never reachable while only exploring the entry point from main subprog. Hence, repeat the check for the main subprog *and* all async callbacks marked by the symbolic execution pass of the verifier, as execution of the program may begin at any of them. Consider functions with following stack depths: main: 256 async: 256 foo: 256 main: rX = async bpf_timer_set_callback(...) async: foo() Here, async is not descended as it does not contribute to stack depth of main (since it is referenced using bpf_pseudo_func and not bpf_pseudo_call). However, when async is invoked asynchronously, it will end up breaching the MAX_BPF_STACK limit by calling foo. Hence, in addition to main, we also need to explore call chains beginning at all async callback subprogs in a program. Fixes: 7ddc80a476c2 ("bpf: Teach stack depth check about async callbacks.") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230717161530.1238-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-18bpf: Fix subprog idx logic in check_max_stack_depthGravatar Kumar Kartikeya Dwivedi 1-5/+6
The assignment to idx in check_max_stack_depth happens once we see a bpf_pseudo_call or bpf_pseudo_func. This is not an issue as the rest of the code performs a few checks and then pushes the frame to the frame stack, except the case of async callbacks. If the async callback case causes the loop iteration to be skipped, the idx assignment will be incorrect on the next iteration of the loop. The value stored in the frame stack (as the subprogno of the current subprog) will be incorrect. This leads to incorrect checks and incorrect tail_call_reachable marking. Save the target subprog in a new variable and only assign to idx once we are done with the is_async_cb check which may skip pushing of frame to the frame stack and subsequent stack depth checks and tail call markings. Fixes: 7ddc80a476c2 ("bpf: Teach stack depth check about async callbacks.") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230717161530.1238-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-13Merge tag 'for-netdev' of ↵Gravatar Jakub Kicinski 11-422/+846
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2023-07-13 We've added 67 non-merge commits during the last 15 day(s) which contain a total of 106 files changed, 4444 insertions(+), 619 deletions(-). The main changes are: 1) Fix bpftool build in presence of stale vmlinux.h, from Alexander Lobakin. 2) Introduce bpf_me_mcache_free_rcu() and fix OOM under stress, from Alexei Starovoitov. 3) Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling, from Andrii Nakryiko. 4) Introduce bpf map element count, from Anton Protopopov. 5) Check skb ownership against full socket, from Kui-Feng Lee. 6) Support for up to 12 arguments in BPF trampoline, from Menglong Dong. 7) Export rcu_request_urgent_qs_task, from Paul E. McKenney. 8) Fix BTF walking of unions, from Yafang Shao. 9) Extend link_info for kprobe_multi and perf_event links, from Yafang Shao. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (67 commits) selftests/bpf: Add selftest for PTR_UNTRUSTED bpf: Fix an error in verifying a field in a union selftests/bpf: Add selftests for nested_trust bpf: Fix an error around PTR_UNTRUSTED selftests/bpf: add testcase for TRACING with 6+ arguments bpf, x86: allow function arguments up to 12 for TRACING bpf, x86: save/restore regs with BPF_DW size bpftool: Use "fallthrough;" keyword instead of comments bpf: Add object leak check. bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu. bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu(). selftests/bpf: Improve test coverage of bpf_mem_alloc. rcu: Export rcu_request_urgent_qs_task() bpf: Allow reuse from waiting_for_gp_ttrace list. bpf: Add a hint to allocated objects. bpf: Change bpf_mem_cache draining process. bpf: Further refactor alloc_bulk(). bpf: Factor out inc/dec of active flag into helpers. bpf: Refactor alloc_bulk(). bpf: Let free_all() return the number of freed elements. ... ==================== Link: https://lore.kernel.org/r/20230714020910.80794-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-13bpf: Fix an error in verifying a field in a unionGravatar Yafang Shao 1-1/+1
We are utilizing BPF LSM to monitor BPF operations within our container environment. When we add support for raw_tracepoint, it hits below error. ; (const void *)attr->raw_tracepoint.name); 27: (79) r3 = *(u64 *)(r2 +0) access beyond the end of member map_type (mend:4) in struct (anon) with off 0 size 8 It can be reproduced with below BPF prog. SEC("lsm/bpf") int BPF_PROG(bpf_audit, int cmd, union bpf_attr *attr, unsigned int size) { switch (cmd) { case BPF_RAW_TRACEPOINT_OPEN: bpf_printk("raw_tracepoint is %s", attr->raw_tracepoint.name); break; default: break; } return 0; } The reason is that when accessing a field in a union, such as bpf_attr, if the field is located within a nested struct that is not the first member of the union, it can result in incorrect field verification. union bpf_attr { struct { __u32 map_type; <<<< Actually it will find that field. __u32 key_size; __u32 value_size; ... }; ... struct { __u64 name; <<<< We want to verify this field. __u32 prog_fd; } raw_tracepoint; }; Considering the potential deep nesting levels, finding a perfect solution to address this issue has proven challenging. Therefore, I propose a solution where we simply skip the verification process if the field in question is located within a union. Fixes: 7e3617a72df3 ("bpf: Add array support to btf_struct_access") Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230713025642.27477-4-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-13bpf: Fix an error around PTR_UNTRUSTEDGravatar Yafang Shao 2-11/+14
Per discussion with Alexei, the PTR_UNTRUSTED flag should not been cleared when we start to walk a new struct, because the struct in question may be a struct nested in a union. We should also check and set this flag before we walk its each member, in case itself is a union. We will clear this flag if the field is BTF_TYPE_SAFE_RCU_OR_NULL. Fixes: 6fcd486b3a0a ("bpf: Refactor RCU enforcement in the verifier.") Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230713025642.27477-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-12bpf: Add object leak check.Gravatar Hou Tao 1-0/+35
The object leak check is cheap. Do it unconditionally to spot difficult races in bpf_mem_alloc. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230706033447.54696-15-alexei.starovoitov@gmail.com
2023-07-12bpf: Convert bpf_cpumask to bpf_mem_cache_free_rcu.Gravatar Alexei Starovoitov 1-14/+6
Convert bpf_cpumask to bpf_mem_cache_free_rcu. Note that migrate_disable() in bpf_cpumask_release() is still necessary, since bpf_cpumask_release() is a dtor. bpf_obj_free_fields() can be converted to do migrate_disable() there in a follow up. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-14-alexei.starovoitov@gmail.com
2023-07-12bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu().Gravatar Alexei Starovoitov 1-3/+126
Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu(). Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into per-cpu free list the _rcu() flavor waits for RCU grace period and then moves objects into free_by_rcu_ttrace list where they are waiting for RCU task trace grace period to be freed into slab. The life cycle of objects: alloc: dequeue free_llist free: enqeueu free_llist free_rcu: enqueue free_by_rcu -> waiting_for_gp free_llist above high watermark -> free_by_rcu_ttrace after RCU GP waiting_for_gp -> free_by_rcu_ttrace free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-13-alexei.starovoitov@gmail.com
2023-07-12bpf: Allow reuse from waiting_for_gp_ttrace list.Gravatar Alexei Starovoitov 1-6/+10
alloc_bulk() can reuse elements from free_by_rcu_ttrace. Let it reuse from waiting_for_gp_ttrace as well to avoid unnecessary kmalloc(). Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230706033447.54696-10-alexei.starovoitov@gmail.com
2023-07-12bpf: Add a hint to allocated objects.Gravatar Alexei Starovoitov 1-19/+31
To address OOM issue when one cpu is allocating and another cpu is freeing add a target bpf_mem_cache hint to allocated objects and when local cpu free_llist overflows free to that bpf_mem_cache. The hint addresses the OOM while maintaining the same performance for common case when alloc/free are done on the same cpu. Note that do_call_rcu_ttrace() now has to check 'draining' flag in one more case, since do_call_rcu_ttrace() is called not only for current cpu. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-9-alexei.starovoitov@gmail.com
2023-07-12bpf: Change bpf_mem_cache draining process.Gravatar Alexei Starovoitov 1-9/+9
The next patch will introduce cross-cpu llist access and existing irq_work_sync() + drain_mem_cache() + rcu_barrier_tasks_trace() mechanism will not be enough, since irq_work_sync() + drain_mem_cache() on cpu A won't guarantee that llist on cpu A are empty. The free_bulk() on cpu B might add objects back to llist of cpu A. Add 'bool draining' flag. The modified sequence looks like: for_each_cpu: WRITE_ONCE(c->draining, true); // do_call_rcu_ttrace() won't be doing call_rcu() any more irq_work_sync(); // wait for irq_work callback (free_bulk) to finish drain_mem_cache(); // free all objects rcu_barrier_tasks_trace(); // wait for RCU callbacks to execute Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-8-alexei.starovoitov@gmail.com
2023-07-12bpf: Further refactor alloc_bulk().Gravatar Alexei Starovoitov 1-12/+18
In certain scenarios alloc_bulk() might be taking free objects mainly from free_by_rcu_ttrace list. In such case get_memcg() and set_active_memcg() are redundant, but they show up in perf profile. Split the loop and only set memcg when allocating from slab. No performance difference in this patch alone, but it helps in combination with further patches. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-7-alexei.starovoitov@gmail.com
2023-07-12bpf: Factor out inc/dec of active flag into helpers.Gravatar Alexei Starovoitov 1-12/+18
Factor out local_inc/dec_return(&c->active) into helpers. No functional changes. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-6-alexei.starovoitov@gmail.com
2023-07-12bpf: Refactor alloc_bulk().Gravatar Alexei Starovoitov 1-20/+26
Factor out inner body of alloc_bulk into separate helper. No functional changes. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-5-alexei.starovoitov@gmail.com
2023-07-12bpf: Let free_all() return the number of freed elements.Gravatar Alexei Starovoitov 1-2/+6
Let free_all() helper return the number of freed elements. It's not used in this patch, but helps in debug/development of bpf_mem_alloc. For example this diff for __free_rcu(): - free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size); + printk("cpu %d freed %d objs after tasks trace\n", raw_smp_processor_id(), + free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size)); would show how busy RCU tasks trace is. In artificial benchmark where one cpu is allocating and different cpu is freeing the RCU tasks trace won't be able to keep up and the list of objects would keep growing from thousands to millions and eventually OOMing. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-4-alexei.starovoitov@gmail.com
2023-07-12bpf: Simplify code of destroy_mem_alloc() with kmemdup().Gravatar Alexei Starovoitov 1-5/+2
Use kmemdup() to simplify the code. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-3-alexei.starovoitov@gmail.com
2023-07-12bpf: Rename few bpf_mem_alloc fields.Gravatar Alexei Starovoitov 1-28/+29
Rename: - struct rcu_head rcu; - struct llist_head free_by_rcu; - struct llist_head waiting_for_gp; - atomic_t call_rcu_in_progress; + struct llist_head free_by_rcu_ttrace; + struct llist_head waiting_for_gp_ttrace; + struct rcu_head rcu_ttrace; + atomic_t call_rcu_ttrace_in_progress; ... - static void do_call_rcu(struct bpf_mem_cache *c) + static void do_call_rcu_ttrace(struct bpf_mem_cache *c) to better indicate intended use. The 'tasks trace' is shortened to 'ttrace' to reduce verbosity. No functional changes. Later patches will add free_by_rcu/waiting_for_gp fields to be used with normal RCU. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230706033447.54696-2-alexei.starovoitov@gmail.com
2023-07-12bpf: teach verifier actual bounds of bpf_get_smp_processor_id() resultGravatar Andrii Nakryiko 1-11/+26
bpf_get_smp_processor_id() helper returns current CPU on which BPF program runs. It can't return value that is bigger than maximum allowed number of CPUs (minus one, due to zero indexing). Teach BPF verifier to recognize that. This makes it possible to use bpf_get_smp_processor_id() result to index into arrays without extra checks, as demonstrated in subsequent selftests/bpf patch. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230711232400.1658562-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-11bpf: Support ->fill_link_info for perf_eventGravatar Yafang Shao 1-0/+146
By introducing support for ->fill_link_info to the perf_event link, users gain the ability to inspect it using `bpftool link show`. While the current approach involves accessing this information via `bpftool perf show`, consolidating link information for all link types in one place offers greater convenience. Additionally, this patch extends support to the generic perf event, which is not currently accommodated by `bpftool perf show`. While only the perf type and config are exposed to userspace, other attributes such as sample_period and sample_freq are ignored. It's important to note that if kptr_restrict is not permitted, the probed address will not be exposed, maintaining security measures. A new enum bpf_perf_event_type is introduced to help the user understand which struct is relevant. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-9-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-11bpf: Add a common helper bpf_copy_to_user()Gravatar Yafang Shao 1-14/+20
Add a common helper bpf_copy_to_user(), which will be used at multiple places. No functional change. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230709025630.3735-8-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-11bpf: cpumap: Fix memory leak in cpu_map_update_elemGravatar Pu Lehui 1-16/+24
Syzkaller reported a memory leak as follows: BUG: memory leak unreferenced object 0xff110001198ef748 (size 192): comm "syz-executor.3", pid 17672, jiffies 4298118891 (age 9.906s) hex dump (first 32 bytes): 00 00 00 00 4a 19 00 00 80 ad e3 e4 fe ff c0 00 ....J........... 00 b2 d3 0c 01 00 11 ff 28 f5 8e 19 01 00 11 ff ........(....... backtrace: [<ffffffffadd28087>] __cpu_map_entry_alloc+0xf7/0xb00 [<ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0 [<ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520 [<ffffffffadc7349b>] map_update_elem+0x4cb/0x720 [<ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90 [<ffffffffb029cc80>] do_syscall_64+0x30/0x40 [<ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 BUG: memory leak unreferenced object 0xff110001198ef528 (size 192): comm "syz-executor.3", pid 17672, jiffies 4298118891 (age 9.906s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffadd281f0>] __cpu_map_entry_alloc+0x260/0xb00 [<ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0 [<ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520 [<ffffffffadc7349b>] map_update_elem+0x4cb/0x720 [<ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90 [<ffffffffb029cc80>] do_syscall_64+0x30/0x40 [<ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 BUG: memory leak unreferenced object 0xff1100010fd93d68 (size 8): comm "syz-executor.3", pid 17672, jiffies 4298118891 (age 9.906s) hex dump (first 8 bytes): 00 00 00 00 00 00 00 00 ........ backtrace: [<ffffffffade5db3e>] kvmalloc_node+0x11e/0x170 [<ffffffffadd28280>] __cpu_map_entry_alloc+0x2f0/0xb00 [<ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0 [<ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520 [<ffffffffadc7349b>] map_update_elem+0x4cb/0x720 [<ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90 [<ffffffffb029cc80>] do_syscall_64+0x30/0x40 [<ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 In the cpu_map_update_elem flow, when kthread_stop is called before calling the threadfn of rcpu->kthread, since the KTHREAD_SHOULD_STOP bit of kthread has been set by kthread_stop, the threadfn of rcpu->kthread will never be executed, and rcpu->refcnt will never be 0, which will lead to the allocated rcpu, rcpu->queue and rcpu->queue->queue cannot be released. Calling kthread_stop before executing kthread's threadfn will return -EINTR. We can complete the release of memory resources in this state. Fixes: 6710e1126934 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP") Signed-off-by: Pu Lehui <pulehui@huawei.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20230711115848.2701559-1-pulehui@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-06bpf: make preloaded map iterators to display map elements countGravatar Anton Protopopov 2-260/+275
Add another column to the /sys/fs/bpf/maps.debug iterator to display cur_entries, the current number of entries in the map as is returned by the bpf_map_sum_elem_count kfunc. Also fix formatting. Example: # cat /sys/fs/bpf/maps.debug id name max_entries cur_entries 2 iterator.rodata 1 0 125 cilium_auth_map 524288 666 126 cilium_runtime_ 256 0 127 cilium_signals 32 0 128 cilium_node_map 16384 1344 129 cilium_events 32 0 ... Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/r/20230706133932.45883-5-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-06bpf: populate the per-cpu insertions/deletions counters for hashmapsGravatar Anton Protopopov 1-2/+20
Initialize and utilize the per-cpu insertions/deletions counters for hash-based maps. Non-trivial changes only apply to the preallocated maps for which the {inc,dec}_elem_count functions are not called, as there's no need in counting elements to sustain proper map operations. To increase/decrease percpu counters for preallocated maps we add raw calls to the bpf_map_{inc,dec}_elem_count functions so that the impact is minimal. For dynamically allocated maps we add corresponding calls to the existing {inc,dec}_elem_count functions. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/r/20230706133932.45883-4-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-06bpf: add a new kfunc to return current bpf_map elements countGravatar Anton Protopopov 1-1/+38
A bpf_map_sum_elem_count kfunc was added to simplify getting the sum of the map per-cpu element counters. If a map doesn't implement the counter, then the function will always return 0. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/r/20230706133932.45883-3-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-05bpf: Fix max stack depth check for async callbacksGravatar Kumar Kartikeya Dwivedi 1-2/+3
The check_max_stack_depth pass happens after the verifier's symbolic execution, and attempts to walk the call graph of the BPF program, ensuring that the stack usage stays within bounds for all possible call chains. There are two cases to consider: bpf_pseudo_func and bpf_pseudo_call. In the former case, the callback pointer is loaded into a register, and is assumed that it is passed to some helper later which calls it (however there is no way to be sure), but the check remains conservative and accounts the stack usage anyway. For this particular case, asynchronous callbacks are skipped as they execute asynchronously when their corresponding event fires. The case of bpf_pseudo_call is simpler and we know that the call is definitely made, hence the stack depth of the subprog is accounted for. However, the current check still skips an asynchronous callback even if a bpf_pseudo_call was made for it. This is erroneous, as it will miss accounting for the stack usage of the asynchronous callback, which can be used to breach the maximum stack depth limit. Fix this by only skipping asynchronous callbacks when the instruction is not a pseudo call to the subprog. Fixes: 7ddc80a476c2 ("bpf: Teach stack depth check about async callbacks.") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230705144730.235802-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-05bpf: Remove unnecessary ring buffer size checkGravatar Hou Tao 1-15/+11
The theoretical maximum size of ring buffer is about 64GB, but now the size of ring buffer is specified by max_entries in bpf_attr and its maximum value is (4GB - 1), and it won't be possible for overflow. So just remove the unnecessary size check in ringbuf_map_alloc() but keep the comments for possible extension in future. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Closes: https://lore.kernel.org/bpf/9c636a63-1f3d-442d-9223-96c2dccb9469@moroto.mountain Link: https://lore.kernel.org/bpf/20230704074014.216616-1-houtao@huaweicloud.com
2023-07-03bpf, btf: Warn but return no error for NULL btf from ↵Gravatar SeongJae Park 1-4/+2
__register_btf_kfunc_id_set() __register_btf_kfunc_id_set() assumes .BTF to be part of the module's .ko file if CONFIG_DEBUG_INFO_BTF is enabled. If that's not the case, the function prints an error message and return an error. As a result, such modules cannot be loaded. However, the section could be stripped out during a build process. It would be better to let the modules loaded, because their basic functionalities have no problem [0], though the BTF functionalities will not be supported. Make the function to lower the level of the message from error to warn, and return no error. [0] https://lore.kernel.org/bpf/20220219082037.ow2kbq5brktf4f2u@apollo.legion Fixes: c446fdacb10d ("bpf: fix register_btf_kfunc_id_set for !CONFIG_DEBUG_INFO_BTF") Reported-by: Alexander Egorenkov <Alexander.Egorenkov@ibm.com> Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/87y228q66f.fsf@oc8242746057.ibm.com Link: https://lore.kernel.org/bpf/20220219082037.ow2kbq5brktf4f2u@apollo.legion Link: https://lore.kernel.org/bpf/20230701171447.56464-1-sj@kernel.org
2023-06-30bpf: Resolve modifiers when walking structsGravatar Stanislav Fomichev 1-0/+2
It is impossible to use skb_frag_t in the tracing program. Resolve typedefs when walking structs. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230626212522.2414485-1-sdf@google.com
2023-06-29bpf: Replace deprecated -target with --target= for ClangGravatar Fangrui Song 1-1/+1
The -target option has been deprecated since clang 3.4 in 2013. Therefore, use the preferred --target=bpf form instead. This also matches how we use --target= in scripts/Makefile.clang. Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://github.com/llvm/llvm-project/commit/274b6f0c87a6a1798de0a68135afc7f95def6277 Link: https://lore.kernel.org/bpf/20230624001856.1903733-1-maskray@google.com
2023-06-24Merge tag 'for-netdev' of ↵Gravatar Jakub Kicinski 18-184/+395
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-06-23 We've added 49 non-merge commits during the last 24 day(s) which contain a total of 70 files changed, 1935 insertions(+), 442 deletions(-). The main changes are: 1) Extend bpf_fib_lookup helper to allow passing the route table ID, from Louis DeLosSantos. 2) Fix regsafe() in verifier to call check_ids() for scalar registers, from Eduard Zingerman. 3) Extend the set of cpumask kfuncs with bpf_cpumask_first_and() and a rework of bpf_cpumask_any*() kfuncs. Additionally, add selftests, from David Vernet. 4) Fix socket lookup BPF helpers for tc/XDP to respect VRF bindings, from Gilad Sever. 5) Change bpf_link_put() to use workqueue unconditionally to fix it under PREEMPT_RT, from Sebastian Andrzej Siewior. 6) Follow-ups to address issues in the bpf_refcount shared ownership implementation, from Dave Marchevsky. 7) A few general refactorings to BPF map and program creation permissions checks which were part of the BPF token series, from Andrii Nakryiko. 8) Various fixes for benchmark framework and add a new benchmark for BPF memory allocator to BPF selftests, from Hou Tao. 9) Documentation improvements around iterators and trusted pointers, from Anton Protopopov. 10) Small cleanup in verifier to improve allocated object check, from Daniel T. Lee. 11) Improve performance of bpf_xdp_pointer() by avoiding access to shared_info when XDP packet does not have frags, from Jesper Dangaard Brouer. 12) Silence a harmless syzbot-reported warning in btf_type_id_size(), from Yonghong Song. 13) Remove duplicate bpfilter_umh_cleanup in favor of umd_cleanup_helper, from Jarkko Sakkinen. 14) Fix BPF selftests build for resolve_btfids under custom HOSTCFLAGS, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits) bpf, docs: Document existing macros instead of deprecated bpf, docs: BPF Iterator Document selftests/bpf: Fix compilation failure for prog vrf_socket_lookup selftests/bpf: Add vrf_socket_lookup tests bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpoint bpf: Factor out socket lookup functions for the TC hookpoint. selftests/bpf: Set the default value of consumer_cnt as 0 selftests/bpf: Ensure that next_cpu() returns a valid CPU number selftests/bpf: Output the correct error code for pthread APIs selftests/bpf: Use producer_cnt to allocate local counter array xsk: Remove unused inline function xsk_buff_discard() bpf: Keep BPF_PROG_LOAD permission checks clear of validations bpf: Centralize permissions checks for all BPF map types bpf: Inline map creation logic in map_create() function bpf: Move unprivileged checks into map_create() and bpf_prog_load() bpf: Remove in_atomic() from bpf_link_put(). selftests/bpf: Verify that check_ids() is used for scalars in regsafe() bpf: Verify scalar ids mapping in regsafe() using check_ids() selftests/bpf: Check if mark_chain_precision() follows scalar ids ... ==================== Link: https://lore.kernel.org/r/20230623211256.8409-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netGravatar Jakub Kicinski 3-14/+21
Cross-merge networking fixes after downstream PR. Conflicts: tools/testing/selftests/net/fcnal-test.sh d7a2fc1437f7 ("selftests: net: fcnal-test: check if FIPS mode is enabled") dd017c72dde6 ("selftests: fcnal: Test SO_DONTROUTE on TCP sockets.") https://lore.kernel.org/all/5007b52c-dd16-dbf6-8d64-b9701bfa498b@tessares.net/ https://lore.kernel.org/all/20230619105427.4a0df9b3@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-21bpf: Force kprobe multi expected_attach_type for kprobe_multi linkGravatar Jiri Olsa 1-0/+5
We currently allow to create perf link for program with expected_attach_type == BPF_TRACE_KPROBE_MULTI. This will cause crash when we call helpers like get_attach_cookie or get_func_ip in such program, because it will call the kprobe_multi's version (current->bpf_ctx context setup) of those helpers while it expects perf_link's current->bpf_ctx context setup. Making sure that we use BPF_TRACE_KPROBE_MULTI expected_attach_type only for programs attaching through kprobe_multi link. Fixes: ca74823c6e16 ("bpf: Add cookie support to programs attached with kprobe multi link") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230618131414.75649-1-jolsa@kernel.org
2023-06-21bpf/btf: Accept function names that contain dotsGravatar Florent Revest 1-12/+8
When building a kernel with LLVM=1, LLVM_IAS=0 and CONFIG_KASAN=y, LLVM leaves DWARF tags for the "asan.module_ctor" & co symbols. In turn, pahole creates BTF_KIND_FUNC entries for these and this makes the BTF metadata validation fail because they contain a dot. In a dramatic turn of event, this BTF verification failure can cause the netfilter_bpf initialization to fail, causing netfilter_core to free the netfilter_helper hashmap and netfilter_ftp to trigger a use-after-free. The risk of u-a-f in netfilter will be addressed separately but the existence of "asan.module_ctor" debug info under some build conditions sounds like a good enough reason to accept functions that contain dots in BTF. Although using only LLVM=1 is the recommended way to compile clang-based kernels, users can certainly do LLVM=1, LLVM_IAS=0 as well and we still try to support that combination according to Nick. To clarify: - > v5.10 kernel, LLVM=1 (LLVM_IAS=0 is not the default) is recommended, but user can still have LLVM=1, LLVM_IAS=0 to trigger the issue - <= 5.10 kernel, LLVM=1 (LLVM_IAS=0 is the default) is recommended in which case GNU as will be used Fixes: 1dc92851849c ("bpf: kernel side support for BTF Var and DataSec") Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Yonghong Song <yhs@meta.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/bpf/20230615145607.3469985-1-revest@chromium.org