Skip to content
Snippets Groups Projects
  1. Feb 21, 2024
  2. Feb 16, 2024
  3. Feb 15, 2024
    • Yonghong Song's avatar
      bpf: Fix test verif_scale_strobemeta_subprogs failure due to llvm19 · 682158ab
      Yonghong Song authored
      With latest llvm19, I hit the following selftest failures with
      
        $ ./test_progs -j
        libbpf: prog 'on_event': BPF program load failed: Permission denied
        libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG --
        combined stack size of 4 calls is 544. Too large
        verification time 1344153 usec
        stack depth 24+440+0+32
        processed 51008 insns (limit 1000000) max_states_per_insn 19 total_states 1467 peak_states 303 mark_read 146
        -- END PROG LOAD LOG --
        libbpf: prog 'on_event': failed to load: -13
        libbpf: failed to load object 'strobemeta_subprogs.bpf.o'
        scale_test:FAIL:expect_success unexpected error: -13 (errno 13)
        #498     verif_scale_strobemeta_subprogs:FAIL
      
      The verifier complains too big of the combined stack size (544 bytes) which
      exceeds the maximum stack limit 512. This is a regression from llvm19 ([1]).
      
      In the above error log, the original stack depth is 24+440+0+32.
      To satisfy interpreter's need, in verifier the stack depth is adjusted to
      32+448+32+32=544 which exceeds 512, hence the error. The same adjusted
      stack size is also used for jit case.
      
      But the jitted codes could use smaller stack size.
      
        $ egrep -r stack_depth | grep round_up
        arm64/net/bpf_jit_comp.c:       ctx->stack_size = round_up(prog->aux->stack_depth, 16);
        loongarch/net/bpf_jit.c:        bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
        powerpc/net/bpf_jit_comp.c:     cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
        riscv/net/bpf_jit_comp32.c:             round_up(ctx->prog->aux->stack_depth, STACK_ALIGN);
        riscv/net/bpf_jit_comp64.c:     bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16);
        s390/net/bpf_jit_comp.c:        u32 stack_depth = round_up(fp->aux->stack_depth, 8);
        sparc/net/bpf_jit_comp_64.c:            stack_needed += round_up(stack_depth, 16);
        x86/net/bpf_jit_comp.c:         EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8));
        x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8);
        x86/net/bpf_jit_comp.c:                     round_up(stack_depth, 8));
        x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8);
        x86/net/bpf_jit_comp.c:         EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8));
      
      In the above, STACK_ALIGN in riscv/net/bpf_jit_comp32.c is defined as 16.
      So stack is aligned in either 8 or 16, x86/s390 having 8-byte stack alignment and
      the rest having 16-byte alignment.
      
      This patch calculates total stack depth based on 16-byte alignment if jit is requested.
      For the above failing case, the new stack size will be 32+448+0+32=512 and no verification
      failure. llvm19 regression will be discussed separately in llvm upstream.
      
      The verifier change caused three test failures as these tests compared messages
      with stack size. More specifically,
        - test_global_funcs/global_func1: fail with interpreter mode and success with jit mode.
          Adjusted stack sizes so both jit and interpreter modes will fail.
        - async_stack_depth/{pseudo_call_check, async_call_root_check}: since jit and interpreter
          will calculate different stack sizes, the failure msg is adjusted to omit those
          specific stack size numbers.
      
        [1] https://lore.kernel.org/bpf/32bde0f0-1881-46c9-931a-673be566c61d@linux.dev/
      
      
      
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20240214232951.4113094-1-yonghong.song@linux.dev
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      682158ab
    • Andrii Nakryiko's avatar
      bpf: improve duplicate source code line detection · 57354f5f
      Andrii Nakryiko authored
      Verifier log avoids printing the same source code line multiple times
      when a consecutive block of BPF assembly instructions are covered by the
      same original (C) source code line. This greatly improves verifier log
      legibility.
      
      Unfortunately, this check is imperfect and in production applications it
      quite often happens that verifier log will have multiple duplicated
      source lines emitted, for no apparently good reason. E.g., this is
      excerpt from a real-world BPF application (with register states omitted
      for clarity):
      
      BEFORE
      ======
      ; for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { @ strobemeta_probe.bpf.c:394
      5369: (07) r8 += 2                    ;
      5370: (07) r7 += 16                   ;
      ; for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { @ strobemeta_probe.bpf.c:394
      5371: (07) r9 += 1                    ;
      5372: (79) r4 = *(u64 *)(r10 -32)     ;
      ; for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { @ strobemeta_probe.bpf.c:394
      5373: (55) if r9 != 0xf goto pc+2
      ; if (i >= map->cnt) @ strobemeta_probe.bpf.c:396
      5376: (79) r1 = *(u64 *)(r10 -40)     ;
      5377: (79) r1 = *(u64 *)(r1 +8)       ;
      ; if (i >= map->cnt) @ strobemeta_probe.bpf.c:396
      5378: (dd) if r1 s<= r9 goto pc-5     ;
      ; descr->key_lens[i] = 0; @ strobemeta_probe.bpf.c:398
      5379: (b4) w1 = 0                     ;
      5380: (6b) *(u16 *)(r8 -30) = r1      ;
      ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); @ strobemeta_probe.bpf.c:400
      5381: (79) r3 = *(u64 *)(r7 -8)       ;
      5382: (7b) *(u64 *)(r10 -24) = r6     ;
      ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); @ strobemeta_probe.bpf.c:400
      5383: (bc) w6 = w6                    ;
      ; barrier_var(payload_off); @ strobemeta_probe.bpf.c:280
      5384: (bf) r2 = r6                    ;
      5385: (bf) r1 = r4                    ;
      
      As can be seen, line 394 is emitted thrice, 396 is emitted twice, and
      line 400 is duplicated as well. Note that there are no intermingling
      other lines of source code in between these duplicates, so the issue is
      not compiler reordering assembly instruction such that multiple original
      source code lines are in effect.
      
      It becomes more obvious what's going on if we look at *full* original line info
      information (using btfdump for this, [0]):
      
        #2764: line: insn #5363 --> 394:3 @ ./././strobemeta_probe.bpf.c
                  for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) {
        #2765: line: insn #5373 --> 394:21 @ ./././strobemeta_probe.bpf.c
                  for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) {
        #2766: line: insn #5375 --> 394:47 @ ./././strobemeta_probe.bpf.c
                  for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) {
        #2767: line: insn #5377 --> 394:3 @ ./././strobemeta_probe.bpf.c
                  for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) {
        #2768: line: insn #5378 --> 414:10 @ ./././strobemeta_probe.bpf.c
                  return off;
      
      We can see that there are four line info records covering
      instructions #5363 through #5377 (instruction indices are shifted due to
      subprog instruction being appended to main program), all of them are
      pointing to the same C source code line #394. But each of them points to
      a different part of that line, which is denoted by differing column
      numbers (3, 21, 47, 3).
      
      But verifier log doesn't distinguish between parts of the same source code line
      and doesn't emit this column number information, so for end user it's just a
      repetitive visual noise. So let's improve the detection of repeated source code
      line and avoid this.
      
      With the changes in this patch, we get this output for the same piece of BPF
      program log:
      
      AFTER
      =====
      ; for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { @ strobemeta_probe.bpf.c:394
      5369: (07) r8 += 2                    ;
      5370: (07) r7 += 16                   ;
      5371: (07) r9 += 1                    ;
      5372: (79) r4 = *(u64 *)(r10 -32)     ;
      5373: (55) if r9 != 0xf goto pc+2
      ; if (i >= map->cnt) @ strobemeta_probe.bpf.c:396
      5376: (79) r1 = *(u64 *)(r10 -40)     ;
      5377: (79) r1 = *(u64 *)(r1 +8)       ;
      5378: (dd) if r1 s<= r9 goto pc-5     ;
      ; descr->key_lens[i] = 0; @ strobemeta_probe.bpf.c:398
      5379: (b4) w1 = 0                     ;
      5380: (6b) *(u16 *)(r8 -30) = r1      ;
      ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); @ strobemeta_probe.bpf.c:400
      5381: (79) r3 = *(u64 *)(r7 -8)       ;
      5382: (7b) *(u64 *)(r10 -24) = r6     ;
      5383: (bc) w6 = w6                    ;
      ; barrier_var(payload_off); @ strobemeta_probe.bpf.c:280
      5384: (bf) r2 = r6                    ;
      5385: (bf) r1 = r4                    ;
      
      All the duplication is gone and the log is cleaner and less distracting.
      
        [0] https://github.com/anakryiko/btfdump
      
      
      
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20240214174100.2847419-1-andrii@kernel.org
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      57354f5f
  4. Feb 14, 2024
    • Andrii Nakryiko's avatar
      bpf: Use O(log(N)) binary search to find line info record · a4561f5a
      Andrii Nakryiko authored
      
      Real-world BPF applications keep growing in size. Medium-sized production
      application can easily have 50K+ verified instructions, and its line
      info section in .BTF.ext has more than 3K entries.
      
      When verifier emits log with log_level>=1, it annotates assembly code
      with matched original C source code. Currently it uses linear search
      over line info records to find a match. As complexity of BPF
      applications grows, this O(K * N) approach scales poorly.
      
      So, let's instead of linear O(N) search for line info record use faster
      equivalent O(log(N)) binary search algorithm. It's not a plain binary
      search, as we don't look for exact match. It's an upper bound search
      variant, looking for rightmost line info record that starts at or before
      given insn_off.
      
      Some unscientific measurements were done before and after this change.
      They were done in VM and fluctuate a bit, but overall the speed up is
      undeniable.
      
      BASELINE
      ========
      File                              Program           Duration (us)   Insns
      --------------------------------  ----------------  -------------  ------
      katran.bpf.o                      balancer_ingress        2497130  343552
      pyperf600.bpf.linked3.o           on_event               12389611  627288
      strobelight_pyperf_libbpf.o       on_py_event              387399   52445
      --------------------------------  ----------------  -------------  ------
      
      BINARY SEARCH
      =============
      
      File                              Program           Duration (us)   Insns
      --------------------------------  ----------------  -------------  ------
      katran.bpf.o                      balancer_ingress        2339312  343552
      pyperf600.bpf.linked3.o           on_event                5602203  627288
      strobelight_pyperf_libbpf.o       on_py_event              294761   52445
      --------------------------------  ----------------  -------------  ------
      
      While Katran's speed up is pretty modest (about 105ms, or 6%), for
      production pyperf BPF program (on_py_event) it's much greater already,
      going from 387ms down to 295ms (23% improvement).
      
      Looking at BPF selftests's biggest pyperf example, we can see even more
      dramatic improvement, shaving more than 50% of time, going from 12.3s
      down to 5.6s.
      
      Different amount of improvement is the function of overall amount of BPF
      assembly instructions in .bpf.o files (which contributes to how much
      line info records there will be and thus, on average, how much time linear
      search will take), among other things:
      
      $ llvm-objdump -d katran.bpf.o | wc -l
      3863
      $ llvm-objdump -d strobelight_pyperf_libbpf.o | wc -l
      6997
      $ llvm-objdump -d pyperf600.bpf.linked3.o | wc -l
      87854
      
      Granted, this only applies to debugging cases (e.g., using veristat, or
      failing verification in production), but seems worth doing to improve
      overall developer experience anyways.
      
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/bpf/20240214002311.2197116-1-andrii@kernel.org
      a4561f5a
    • Matt Bobrowski's avatar
      libbpf: Make remark about zero-initializing bpf_*_info structs · 1159d278
      Matt Bobrowski authored
      
      In some situations, if you fail to zero-initialize the
      bpf_{prog,map,btf,link}_info structs supplied to the set of LIBBPF
      helpers bpf_{prog,map,btf,link}_get_info_by_fd(), you can expect the
      helper to return an error. This can possibly leave people in a
      situation where they're scratching their heads for an unnnecessary
      amount of time. Make an explicit remark about the requirement of
      zero-initializing the supplied bpf_{prog,map,btf,link}_info structs
      for the respective LIBBPF helpers.
      
      Internally, LIBBPF helpers bpf_{prog,map,btf,link}_get_info_by_fd()
      call into bpf_obj_get_info_by_fd() where the bpf(2)
      BPF_OBJ_GET_INFO_BY_FD command is used. This specific command is
      effectively backed by restrictions enforced by the
      bpf_check_uarg_tail_zero() helper. This function ensures that if the
      size of the supplied bpf_{prog,map,btf,link}_info structs are larger
      than what the kernel can handle, trailing bits are zeroed. This can be
      a problem when compiling against UAPI headers that don't necessarily
      match the sizes of the same underlying types known to the kernel.
      
      Signed-off-by: default avatarMatt Bobrowski <mattbobrowski@google.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/bpf/ZcyEb8x4VbhieWsL@google.com
      1159d278
    • Andrii Nakryiko's avatar
      bpf: emit source code file name and line number in verifier log · 7cc13adb
      Andrii Nakryiko authored
      
      As BPF applications grow in size and complexity and are separated into
      multiple .bpf.c files that are statically linked together, it becomes
      harder and harder to match verifier's BPF assembly level output to
      original C code. While often annotated C source code is unique enough to
      be able to identify the file it belongs to, quite often this is actually
      problematic as parts of source code can be quite generic.
      
      Long story short, it is very useful to see source code file name and
      line number information along with the original C code. Verifier already
      knows this information, we just need to output it.
      
      This patch extends verifier log with file name and line number
      information, emitted next to original (presumably C) source code,
      annotating BPF assembly output, like so:
      
        ; <original C code> @ <filename>.bpf.c:<line>
      
      If file name has directory names in it, they are stripped away. This
      should be fine in practice as file names tend to be pretty unique with
      C code anyways, and keeping log size smaller is always good.
      
      In practice this might look something like below, where some code is
      coming from application files, while others are from libbpf's usdt.bpf.h
      header file:
      
        ; if (STROBEMETA_READ( @ strobemeta_probe.bpf.c:534
        5592: (79) r1 = *(u64 *)(r10 -56)     ; R1_w=mem_or_null(id=1589,sz=7680) R10=fp0
        5593: (7b) *(u64 *)(r10 -56) = r1     ; R1_w=mem_or_null(id=1589,sz=7680) R10=fp0
        5594: (79) r3 = *(u64 *)(r10 -8)      ; R3_w=scalar() R10=fp0 fp-8=mmmmmmmm
      
        ...
      
        170: (71) r1 = *(u8 *)(r8 +15)        ; frame1: R1_w=scalar(...) R8_w=map_value(map=__bpf_usdt_spec,ks=4,vs=208)
        171: (67) r1 <<= 56                   ; frame1: R1_w=scalar(...)
        172: (c7) r1 s>>= 56                  ; frame1: R1_w=scalar(smin=smin32=-128,smax=smax32=127)
        ; val <<= arg_spec->arg_bitshift; @ usdt.bpf.h:183
        173: (67) r1 <<= 32                   ; frame1: R1_w=scalar(...)
        174: (77) r1 >>= 32                   ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
        175: (79) r2 = *(u64 *)(r10 -8)       ; frame1: R2_w=scalar() R10=fp0 fp-8=mmmmmmmm
        176: (6f) r2 <<= r1                   ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R2_w=scalar()
        177: (7b) *(u64 *)(r10 -8) = r2       ; frame1: R2_w=scalar(id=61) R10=fp0 fp-8_w=scalar(id=61)
        ; if (arg_spec->arg_signed) @ usdt.bpf.h:184
        178: (bf) r3 = r2                     ; frame1: R2_w=scalar(id=61) R3_w=scalar(id=61)
        179: (7f) r3 >>= r1                   ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R3_w=scalar()
        ; if (arg_spec->arg_signed) @ usdt.bpf.h:184
        180: (71) r4 = *(u8 *)(r8 +14)
        181: safe
      
      log_fixup tests needed a minor adjustment as verifier log output
      increased a bit and that test is quite sensitive to such changes.
      
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20240212235944.2816107-1-andrii@kernel.org
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7cc13adb
    • Alexei Starovoitov's avatar
      Merge branch 'fix-global-subprog-ptr_to_ctx-arg-handling' · 96adbf71
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      Fix global subprog PTR_TO_CTX arg handling
      
      Fix confusing and incorrect inference of PTR_TO_CTX argument type in BPF
      global subprogs. For some program types (iters, tracepoint, any program type
      that doesn't have fixed named "canonical" context type) when user uses (in
      a correct and valid way) a pointer argument to user-defined anonymous struct
      type, verifier will incorrectly assume that it has to be PTR_TO_CTX argument.
      While it should be just a PTR_TO_MEM argument with allowed size calculated
      from user-provided (even if anonymous) struct.
      
      This did come up in practice and was very confusing to users, so let's prevent
      this going forward. We had to do a slight refactoring of
      btf_get_prog_ctx_type() to make it easy to support a special s390x KPROBE use
      cases. See details in respective patches.
      
      v1->v2:
        - special-case typedef bpf_user_pt_regs_t handling for KPROBE programs,
          fixing s390x after changes in patch #2.
      ====================
      
      Link: https://lore.kernel.org/r/20240212233221.2575350-1-andrii@kernel.org
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      96adbf71
    • Andrii Nakryiko's avatar
      selftests/bpf: add anonymous user struct as global subprog arg test · 63d5a33f
      Andrii Nakryiko authored
      
      Add tests validating that kernel handles pointer to anonymous struct
      argument as PTR_TO_MEM case, not as PTR_TO_CTX case.
      
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20240212233221.2575350-5-andrii@kernel.org
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      63d5a33f
    • Andrii Nakryiko's avatar
      bpf: don't infer PTR_TO_CTX for programs with unnamed context type · 879bbe7a
      Andrii Nakryiko authored
      
      For program types that don't have named context type name (e.g., BPF
      iterator programs or tracepoint programs), ctx_tname will be a non-NULL
      empty string. For such programs it shouldn't be possible to have
      PTR_TO_CTX argument for global subprogs based on type name alone.
      arg:ctx tag is the only way to have PTR_TO_CTX passed into global
      subprog for such program types.
      
      Fix this loophole, which currently would assume PTR_TO_CTX whenever
      user uses a pointer to anonymous struct as an argument to their global
      subprogs. This happens in practice with the following (quite common, in
      practice) approach:
      
      typedef struct { /* anonymous */
          int x;
      } my_type_t;
      
      int my_subprog(my_type_t *arg) { ... }
      
      User's intent is to have PTR_TO_MEM argument for `arg`, but verifier
      will complain about expecting PTR_TO_CTX.
      
      This fix also closes unintended s390x-specific KPROBE handling of
      PTR_TO_CTX case. Selftest change is necessary to accommodate this.
      
      Fixes: 91cc1a99 ("bpf: Annotate context types")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20240212233221.2575350-4-andrii@kernel.org
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      879bbe7a
    • Andrii Nakryiko's avatar
      bpf: handle bpf_user_pt_regs_t typedef explicitly for PTR_TO_CTX global arg · 824c58fb
      Andrii Nakryiko authored
      
      Expected canonical argument type for global function arguments
      representing PTR_TO_CTX is `bpf_user_pt_regs_t *ctx`. This currently
      works on s390x by accident because kernel resolves such typedef to
      underlying struct (which is anonymous on s390x), and erroneously
      accepting it as expected context type. We are fixing this problem next,
      which would break s390x arch, so we need to handle `bpf_user_pt_regs_t`
      case explicitly for KPROBE programs.
      
      Fixes: 91cc1a99 ("bpf: Annotate context types")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20240212233221.2575350-3-andrii@kernel.org
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      824c58fb
    • Andrii Nakryiko's avatar
      bpf: simplify btf_get_prog_ctx_type() into btf_is_prog_ctx_type() · fb5b86cf
      Andrii Nakryiko authored
      
      Return result of btf_get_prog_ctx_type() is never used and callers only
      check NULL vs non-NULL case to determine if given type matches expected
      PTR_TO_CTX type. So rename function to `btf_is_prog_ctx_type()` and
      return a simple true/false. We'll use this simpler interface to handle
      kprobe program type's special typedef case in the next patch.
      
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20240212233221.2575350-2-andrii@kernel.org
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fb5b86cf
  5. Feb 13, 2024
    • Oliver Crumrine's avatar
      bpf: remove check in __cgroup_bpf_run_filter_skb · 32e18e76
      Oliver Crumrine authored
      
      Originally, this patch removed a redundant check in
      BPF_CGROUP_RUN_PROG_INET_EGRESS, as the check was already being done in
      the function it called, __cgroup_bpf_run_filter_skb. For v2, it was
      reccomended that I remove the check from __cgroup_bpf_run_filter_skb,
      and add the checks to the other macro that calls that function,
      BPF_CGROUP_RUN_PROG_INET_INGRESS.
      
      To sum it up, checking that the socket exists and that it is a full
      socket is now part of both macros BPF_CGROUP_RUN_PROG_INET_EGRESS and
      BPF_CGROUP_RUN_PROG_INET_INGRESS, and it is no longer part of the
      function they call, __cgroup_bpf_run_filter_skb.
      
      v3->v4: Fixed weird merge conflict.
      v2->v3: Sent to bpf-next instead of generic patch
      v1->v2: Addressed feedback about where check should be removed.
      
      Signed-off-by: default avatarOliver Crumrine <ozlinuxc@gmail.com>
      Acked-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/7lv62yiyvmj5a7eozv2iznglpkydkdfancgmbhiptrgvgan5sy@3fl3onchgdz3
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      32e18e76
    • Martin KaFai Lau's avatar
      Merge branch 'Support PTR_MAYBE_NULL for struct_ops arguments.' · 2c21a0f6
      Martin KaFai Lau authored
      Kui-Feng Lee says:
      
      ====================
      Allow passing null pointers to the operators provided by a struct_ops
      object. This is an RFC to collect feedbacks/opinions.
      
      The function pointers that are passed to struct_ops operators (the function
      pointers) are always considered reliable until now. They cannot be
      null. However, in certain scenarios, it should be possible to pass null
      pointers to these operators. For instance, sched_ext may pass a null
      pointer in the struct task type to an operator that is provided by its
      struct_ops objects.
      
      The proposed solution here is to add PTR_MAYBE_NULL annotations to
      arguments and create instances of struct bpf_ctx_arg_aux (arg_info) for
      these arguments. These arg_infos will be installed at
      prog->aux->ctx_arg_info and will be checked by the BPF verifier when
      loading the programs. When a struct_ops program accesses arguments in the
      ctx, the verifier will call btf_ctx_access() (through
      bpf_verifier_ops->is_valid_access) to verify the access. btf_ctx_access()
      will check arg_info and use the information of the matched arg_info to
      properly set reg_type.
      
      For nullable arguments, this patch sets an arg_info to label them with
      PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL. This enforces the verifier to
      check programs and ensure that they properly check the pointer. The
      programs should check if the pointer is null before reading/writing the
      pointed memory.
      
      The implementer of a struct_ops should annotate the arguments that can
      be null. The implementer should define a stub function (empty) as a
      placeholder for each defined operator. The name of a stub function
      should be in the pattern "<st_op_type>__<operator name>". For example,
      for test_maybe_null of struct bpf_testmod_ops, it's stub function name
      should be "bpf_testmod_ops__test_maybe_null". You mark an argument
      nullable by suffixing the argument name with "__nullable" at the stub
      function.  Here is the example in bpf_testmod.c.
      
        static int bpf_testmod_ops__test_maybe_null(int dummy,
                                                    struct task_struct *task__nullable)
        {
                return 0;
        }
      
      This means that the argument 1 (2nd) of bpf_testmod_ops->test_maybe_null,
      which is a function pointer that can be null. With this annotation, the
      verifier will understand how to check programs using this arguments.  A BPF
      program that implement test_maybe_null should check the pointer to make
      sure it is not null before using it. For example,
      
        if (task__nullable)
            save_tgid = task__nullable->tgid
      
      Without the check, the verifier will reject the program.
      
      Since we already has stub functions for kCFI, we just reuse these stub
      functions with the naming convention mentioned earlier. These stub
      functions with the naming convention is only required if there are nullable
      arguments to annotate. For functions without nullable arguments, stub
      functions are not necessary for the purpose of this patch.
      ---
      Major changes from v7:
      
       - Update a comment that is out of date.
      
      Major changes from v6:
      
       - Remove "len" from bpf_struct_ops_desc_release().
      
       - Rename arg_info(s) to info, and rename all_arg_info to arg_info in
         prepare_arg_info().
      
       - Rename arg_info to info in struct bpf_struct_ops_arg_info.
      
      Major changes from v5:
      
       - Rename all member_arg_info variables.
      
       - Refactor to bpf_struct_ops_desc_release() to share code
         between btf_free_struct_ops_tab() and bpf_struct_ops_desc_init().
      
       - Refactor to btf_param_match_suffix(). (Add a new patch as the part 2.)
      
       - Clean up the commit log and remaining code in the patch of test cases.
      
       - Update a comment in struct_ops_maybe_null.c.
      
      Major changes from v4:
      
       - Remove the support of pointers to types other than struct
         types. That would be a separate patchset.
      
         - Remove the patch about extending PTR_TO_BTF_ID.
      
         - Remove the test against various pointer types from selftests.
      
       - Remove the patch "bpf: Remove an unnecessary check" and send that
         patch separately.
      
       - Remove member_arg_info_cnt from struct bpf_struct_ops_desc.
      
       - Use btf_id from FUNC_PROTO of a function pointer instead of a stub
         function.
      
      Major changes from v3:
      
       - Move the code collecting argument information to prepare_arg_info()
         called in the loop in bpf_struct_ops_desc_init().
      
       - Simplify the memory allocation by having separated arg_info for
         each member of a struct_ops type.
      
       - Extend PTR_TO_BTF_ID to pointers to scalar types and array types,
         not only to struct types.
      
      Major changes from v2:
      
       - Remove dead code.
      
       - Add comments to explain the code itself.
      
      Major changes from v1:
      
       - Annotate arguments by suffixing argument names with "__nullable" at
         stub functions.
      
      v7: https://lore.kernel.org/all/20240209020053.1132710-1-thinker.li@gmail.com/
      v6: https://lore.kernel.org/all/20240208065103.2154768-1-thinker.li@gmail.com/
      v5: https://lore.kernel.org/all/20240206063833.2520479-1-thinker.li@gmail.com/
      v4: https://lore.kernel.org/all/20240202220516.1165466-1-thinker.li@gmail.com/
      v3: https://lore.kernel.org/all/20240122212217.1391878-1-thinker.li@gmail.com/
      v2: https://lore.kernel.org/all/20240118224922.336006-1-thinker.li@gmail.com/
      
      
      ====================
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      2c21a0f6
    • Kui-Feng Lee's avatar
      selftests/bpf: Test PTR_MAYBE_NULL arguments of struct_ops operators. · 00f239ec
      Kui-Feng Lee authored
      
      Test if the verifier verifies nullable pointer arguments correctly for BPF
      struct_ops programs.
      
      "test_maybe_null" in struct bpf_testmod_ops is the operator defined for the
      test cases here.
      
      A BPF program should check a pointer for NULL beforehand to access the
      value pointed by the nullable pointer arguments, or the verifier should
      reject the programs. The test here includes two parts; the programs
      checking pointers properly and the programs not checking pointers
      beforehand. The test checks if the verifier accepts the programs checking
      properly and rejects the programs not checking at all.
      
      Signed-off-by: default avatarKui-Feng Lee <thinker.li@gmail.com>
      Link: https://lore.kernel.org/r/20240209023750.1153905-5-thinker.li@gmail.com
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      00f239ec
    • Kui-Feng Lee's avatar
      bpf: Create argument information for nullable arguments. · 16116035
      Kui-Feng Lee authored
      
      Collect argument information from the type information of stub functions to
      mark arguments of BPF struct_ops programs with PTR_MAYBE_NULL if they are
      nullable.  A nullable argument is annotated by suffixing "__nullable" at
      the argument name of stub function.
      
      For nullable arguments, this patch sets a struct bpf_ctx_arg_aux to label
      their reg_type with PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL. This
      makes the verifier to check programs and ensure that they properly check
      the pointer. The programs should check if the pointer is null before
      accessing the pointed memory.
      
      The implementer of a struct_ops type should annotate the arguments that can
      be null. The implementer should define a stub function (empty) as a
      placeholder for each defined operator. The name of a stub function should
      be in the pattern "<st_op_type>__<operator name>". For example, for
      test_maybe_null of struct bpf_testmod_ops, it's stub function name should
      be "bpf_testmod_ops__test_maybe_null". You mark an argument nullable by
      suffixing the argument name with "__nullable" at the stub function.
      
      Since we already has stub functions for kCFI, we just reuse these stub
      functions with the naming convention mentioned earlier. These stub
      functions with the naming convention is only required if there are nullable
      arguments to annotate. For functions having not nullable arguments, stub
      functions are not necessary for the purpose of this patch.
      
      This patch will prepare a list of struct bpf_ctx_arg_aux, aka arg_info, for
      each member field of a struct_ops type.  "arg_info" will be assigned to
      "prog->aux->ctx_arg_info" of BPF struct_ops programs in
      check_struct_ops_btf_id() so that it can be used by btf_ctx_access() later
      to set reg_type properly for the verifier.
      
      Signed-off-by: default avatarKui-Feng Lee <thinker.li@gmail.com>
      Link: https://lore.kernel.org/r/20240209023750.1153905-4-thinker.li@gmail.com
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      16116035
    • Kui-Feng Lee's avatar
      bpf: Move __kfunc_param_match_suffix() to btf.c. · 6115a0ae
      Kui-Feng Lee authored
      
      Move __kfunc_param_match_suffix() to btf.c and rename it as
      btf_param_match_suffix(). It can be reused by bpf_struct_ops later.
      
      Signed-off-by: default avatarKui-Feng Lee <thinker.li@gmail.com>
      Link: https://lore.kernel.org/r/20240209023750.1153905-3-thinker.li@gmail.com
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      6115a0ae
    • Kui-Feng Lee's avatar
      bpf: add btf pointer to struct bpf_ctx_arg_aux. · 77c0208e
      Kui-Feng Lee authored
      
      Enable the providers to use types defined in a module instead of in the
      kernel (btf_vmlinux).
      
      Signed-off-by: default avatarKui-Feng Lee <thinker.li@gmail.com>
      Link: https://lore.kernel.org/r/20240209023750.1153905-2-thinker.li@gmail.com
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      77c0208e
    • Dave Thaler's avatar
      bpf, docs: Update ISA document title · dc8543b5
      Dave Thaler authored
      * Use "Instruction Set Architecture (ISA)" instead of "Instruction Set
        Specification"
      * Remove version number
      
      As previously discussed on the mailing list at
      https://mailarchive.ietf.org/arch/msg/bpf/SEpn3OL9TabNRn-4rDX9A6XVbjM/
      
      
      
      Signed-off-by: default avatarDave Thaler <dthaler1968@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/bpf/20240208221449.12274-1-dthaler1968@gmail.com
      dc8543b5
    • Cupertino Miranda's avatar
      libbpf: Add support to GCC in CORE macro definitions · 12bbcf8e
      Cupertino Miranda authored
      
      Due to internal differences between LLVM and GCC the current
      implementation for the CO-RE macros does not fit GCC parser, as it will
      optimize those expressions even before those would be accessible by the
      BPF backend.
      
      As examples, the following would be optimized out with the original
      definitions:
        - As enums are converted to their integer representation during
        parsing, the IR would not know how to distinguish an integer
        constant from an actual enum value.
        - Types need to be kept as temporary variables, as the existing type
        casts of the 0 address (as expanded for LLVM), are optimized away by
        the GCC C parser, never really reaching GCCs IR.
      
      Although, the macros appear to add extra complexity, the expanded code
      is removed from the compilation flow very early in the compilation
      process, not really affecting the quality of the generated assembly.
      
      Signed-off-by: default avatarCupertino Miranda <cupertino.miranda@oracle.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240213173543.1397708-1-cupertino.miranda@oracle.com
      12bbcf8e
    • Jose E. Marchesi's avatar
      bpf: Abstract loop unrolling pragmas in BPF selftests · 52dbd67d
      Jose E. Marchesi authored
      
      [Changes from V1:
      - Avoid conflict by rebasing with latest master.]
      
      Some BPF tests use loop unrolling compiler pragmas that are clang
      specific and not supported by GCC.  These pragmas, along with their
      GCC equivalences are:
      
        #pragma clang loop unroll_count(N)
        #pragma GCC unroll N
      
        #pragma clang loop unroll(full)
        #pragma GCC unroll 65534
      
        #pragma clang loop unroll(disable)
        #pragma GCC unroll 1
      
        #pragma unroll [aka #pragma clang loop unroll(enable)]
        There is no GCC equivalence to this pragma.  It enables unrolling on
        loops that the compiler would not ordinarily unroll even with
        -O2|-funroll-loops, but it is not equivalent to full unrolling
        either.
      
      This patch adds a new header progs/bpf_compiler.h that defines the
      following macros, which correspond to each pair of compiler-specific
      pragmas above:
      
        __pragma_loop_unroll_count(N)
        __pragma_loop_unroll_full
        __pragma_loop_no_unroll
        __pragma_loop_unroll
      
      The selftests using loop unrolling pragmas are then changed to include
      the header and use these macros in place of the explicit pragmas.
      
      Tested in bpf-next master.
      No regressions.
      
      Signed-off-by: default avatarJose E. Marchesi <jose.marchesi@oracle.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/bpf/20240208203612.29611-1-jose.marchesi@oracle.com
      52dbd67d
    • Yonghong Song's avatar
      selftests/bpf: Ensure fentry prog cannot attach to bpf_spin_{lock,unlcok}() · fc1c9e40
      Yonghong Song authored
      
      Add two tests to ensure fentry programs cannot attach to
      bpf_spin_{lock,unlock}() helpers. The tracing_failure.c files
      can be used in the future for other tracing failure cases.
      
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20240207070107.335341-1-yonghong.song@linux.dev
      fc1c9e40
    • Yonghong Song's avatar
      bpf: Mark bpf_spin_{lock,unlock}() helpers with notrace correctly · 178c5466
      Yonghong Song authored
      Currently tracing is supposed not to allow for bpf_spin_{lock,unlock}()
      helper calls. This is to prevent deadlock for the following cases:
        - there is a prog (prog-A) calling bpf_spin_{lock,unlock}().
        - there is a tracing program (prog-B), e.g., fentry, attached
          to bpf_spin_lock() and/or bpf_spin_unlock().
        - prog-B calls bpf_spin_{lock,unlock}().
      For such a case, when prog-A calls bpf_spin_{lock,unlock}(),
      a deadlock will happen.
      
      The related source codes are below in kernel/bpf/helpers.c:
        notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
        notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
      notrace is supposed to prevent fentry prog from attaching to
      bpf_spin_{lock,unlock}().
      
      But actually this is not the case and fentry prog can successfully
      attached to bpf_spin_lock(). Siddharth Chintamaneni reported
      the issue in [1]. The following is the macro definition for
      above BPF_CALL_1:
        #define BPF_CALL_x(x, name, ...)                                               \
              static __always_inline                                                 \
              u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__));   \
              typedef u64 (*btf_##name)(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \
              u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__));         \
              u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__))          \
              {                                                                      \
                      return ((btf_##name)____##name)(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\
              }                                                                      \
              static __always_inline                                                 \
              u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__))
      
        #define BPF_CALL_1(name, ...)   BPF_CALL_x(1, name, __VA_ARGS__)
      
      The notrace attribute is actually applied to the static always_inline function
      ____bpf_spin_{lock,unlock}(). The actual callback function
      bpf_spin_{lock,unlock}() is not marked with notrace, hence
      allowing fentry prog to attach to two helpers, and this
      may cause the above mentioned deadlock. Siddharth Chintamaneni
      actually has a reproducer in [2].
      
      To fix the issue, a new macro NOTRACE_BPF_CALL_1 is introduced which
      will add notrace attribute to the original function instead of
      the hidden always_inline function and this fixed the problem.
      
        [1] https://lore.kernel.org/bpf/CAE5sdEigPnoGrzN8WU7Tx-h-iFuMZgW06qp0KHWtpvoXxf1OAQ@mail.gmail.com/
        [2] https://lore.kernel.org/bpf/CAE5sdEg6yUc_Jz50AnUXEEUh6O73yQ1Z6NV2srJnef0ZrQkZew@mail.gmail.com/
      
      
      
      Fixes: d83525ca ("bpf: introduce bpf_spin_lock")
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/bpf/20240207070102.335167-1-yonghong.song@linux.dev
      178c5466
    • Daniel Xu's avatar
      bpf: Have bpf_rdonly_cast() take a const pointer · 5b268d1e
      Daniel Xu authored
      
      Since 20d59ee5 ("libbpf: add bpf_core_cast() macro"), libbpf is now
      exporting a const arg version of bpf_rdonly_cast(). This causes the
      following conflicting type error when generating kfunc prototypes from
      BTF:
      
      In file included from skeleton/pid_iter.bpf.c:5:
      /home/dxu/dev/linux/tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_core_read.h:297:14: error: conflicting types for 'bpf_rdonly_cast'
      extern void *bpf_rdonly_cast(const void *obj__ign, __u32 btf_id__k) __ksym __weak;
                   ^
      ./vmlinux.h:135625:14: note: previous declaration is here
      extern void *bpf_rdonly_cast(void *obj__ign, u32 btf_id__k) __weak __ksym;
      
      This is b/c the kernel defines bpf_rdonly_cast() with non-const arg.
      Since const arg is more permissive and thus backwards compatible, we
      change the kernel definition as well to avoid conflicting type errors.
      
      Signed-off-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/bpf/dfd3823f11ffd2d4c838e961d61ec9ae8a646773.1707080349.git.dxu@dxuuu.xyz
      5b268d1e
  6. Feb 11, 2024
    • Marco Elver's avatar
      bpf: Allow compiler to inline most of bpf_local_storage_lookup() · 68bc61c2
      Marco Elver authored
      In various performance profiles of kernels with BPF programs attached,
      bpf_local_storage_lookup() appears as a significant portion of CPU
      cycles spent. To enable the compiler generate more optimal code, turn
      bpf_local_storage_lookup() into a static inline function, where only the
      cache insertion code path is outlined
      
      Notably, outlining cache insertion helps avoid bloating callers by
      duplicating setting up calls to raw_spin_{lock,unlock}_irqsave() (on
      architectures which do not inline spin_lock/unlock, such as x86), which
      would cause the compiler produce worse code by deciding to outline
      otherwise inlinable functions. The call overhead is neutral, because we
      make 2 calls either way: either calling raw_spin_lock_irqsave() and
      raw_spin_unlock_irqsave(); or call __bpf_local_storage_insert_cache(),
      which calls raw_spin_lock_irqsave(), followed by a tail-call to
      raw_spin_unlock_irqsave() where the compiler can perform TCO and (in
      optimized uninstrumented builds) turns it into a plain jump. The call to
      __bpf_local_storage_insert_cache() can be elided entirely if
      cacheit_lockit is a false constant expression.
      
      Based on results from './benchs/run_bench_local_storage.sh' (21 trials,
      reboot between each trial; x86 defconfig + BPF, clang 16) this produces
      improvements in throughput and latency in the majority of cases, with an
      average (geomean) improvement of 8%:
      
      +---- Hashmap Control --------------------
      |
      | + num keys: 10
      | :                                         <before>             | <after>
      | +-+ hashmap (control) sequential get    +----------------------+----------------------
      |   +- hits throughput                    | 14.789 M ops/s       | 14.745 M ops/s (  ~  )
      |   +- hits latency                       | 67.679 ns/op         | 67.879 ns/op   (  ~  )
      |   +- important_hits throughput          | 14.789 M ops/s       | 14.745 M ops/s (  ~  )
      |
      | + num keys: 1000
      | :                                         <before>             | <after>
      | +-+ hashmap (control) sequential get    +----------------------+----------------------
      |   +- hits throughput                    | 12.233 M ops/s       | 12.170 M ops/s (  ~  )
      |   +- hits latency                       | 81.754 ns/op         | 82.185 ns/op   (  ~  )
      |   +- important_hits throughput          | 12.233 M ops/s       | 12.170 M ops/s (  ~  )
      |
      | + num keys: 10000
      | :                                         <before>             | <after>
      | +-+ hashmap (control) sequential get    +----------------------+----------------------
      |   +- hits throughput                    | 7.220 M ops/s        | 7.204 M ops/s  (  ~  )
      |   +- hits latency                       | 138.522 ns/op        | 138.842 ns/op  (  ~  )
      |   +- important_hits throughput          | 7.220 M ops/s        | 7.204 M ops/s  (  ~  )
      |
      | + num keys: 100000
      | :                                         <before>             | <after>
      | +-+ hashmap (control) sequential get    +----------------------+----------------------
      |   +- hits throughput                    | 5.061 M ops/s        | 5.165 M ops/s  (+2.1%)
      |   +- hits latency                       | 198.483 ns/op        | 194.270 ns/op  (-2.1%)
      |   +- important_hits throughput          | 5.061 M ops/s        | 5.165 M ops/s  (+2.1%)
      |
      | + num keys: 4194304
      | :                                         <before>             | <after>
      | +-+ hashmap (control) sequential get    +----------------------+----------------------
      |   +- hits throughput                    | 2.864 M ops/s        | 2.882 M ops/s  (  ~  )
      |   +- hits latency                       | 365.220 ns/op        | 361.418 ns/op  (-1.0%)
      |   +- important_hits throughput          | 2.864 M ops/s        | 2.882 M ops/s  (  ~  )
      |
      +---- Local Storage ----------------------
      |
      | + num_maps: 1
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 33.005 M ops/s       | 39.068 M ops/s (+18.4%)
      |   +- hits latency                       | 30.300 ns/op         | 25.598 ns/op   (-15.5%)
      |   +- important_hits throughput          | 33.005 M ops/s       | 39.068 M ops/s (+18.4%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 37.151 M ops/s       | 44.926 M ops/s (+20.9%)
      |   +- hits latency                       | 26.919 ns/op         | 22.259 ns/op   (-17.3%)
      |   +- important_hits throughput          | 37.151 M ops/s       | 44.926 M ops/s (+20.9%)
      |
      | + num_maps: 10
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 32.288 M ops/s       | 38.099 M ops/s (+18.0%)
      |   +- hits latency                       | 30.972 ns/op         | 26.248 ns/op   (-15.3%)
      |   +- important_hits throughput          | 3.229 M ops/s        | 3.810 M ops/s  (+18.0%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 34.473 M ops/s       | 41.145 M ops/s (+19.4%)
      |   +- hits latency                       | 29.010 ns/op         | 24.307 ns/op   (-16.2%)
      |   +- important_hits throughput          | 12.312 M ops/s       | 14.695 M ops/s (+19.4%)
      |
      | + num_maps: 16
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 32.524 M ops/s       | 38.341 M ops/s (+17.9%)
      |   +- hits latency                       | 30.748 ns/op         | 26.083 ns/op   (-15.2%)
      |   +- important_hits throughput          | 2.033 M ops/s        | 2.396 M ops/s  (+17.9%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 34.575 M ops/s       | 41.338 M ops/s (+19.6%)
      |   +- hits latency                       | 28.925 ns/op         | 24.193 ns/op   (-16.4%)
      |   +- important_hits throughput          | 11.001 M ops/s       | 13.153 M ops/s (+19.6%)
      |
      | + num_maps: 17
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 28.861 M ops/s       | 32.756 M ops/s (+13.5%)
      |   +- hits latency                       | 34.649 ns/op         | 30.530 ns/op   (-11.9%)
      |   +- important_hits throughput          | 1.700 M ops/s        | 1.929 M ops/s  (+13.5%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 31.529 M ops/s       | 36.110 M ops/s (+14.5%)
      |   +- hits latency                       | 31.719 ns/op         | 27.697 ns/op   (-12.7%)
      |   +- important_hits throughput          | 9.598 M ops/s        | 10.993 M ops/s (+14.5%)
      |
      | + num_maps: 24
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 18.602 M ops/s       | 19.937 M ops/s (+7.2%)
      |   +- hits latency                       | 53.767 ns/op         | 50.166 ns/op   (-6.7%)
      |   +- important_hits throughput          | 0.776 M ops/s        | 0.831 M ops/s  (+7.2%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 21.718 M ops/s       | 23.332 M ops/s (+7.4%)
      |   +- hits latency                       | 46.047 ns/op         | 42.865 ns/op   (-6.9%)
      |   +- important_hits throughput          | 6.110 M ops/s        | 6.564 M ops/s  (+7.4%)
      |
      | + num_maps: 32
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 14.118 M ops/s       | 14.626 M ops/s (+3.6%)
      |   +- hits latency                       | 70.856 ns/op         | 68.381 ns/op   (-3.5%)
      |   +- important_hits throughput          | 0.442 M ops/s        | 0.458 M ops/s  (+3.6%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 17.111 M ops/s       | 17.906 M ops/s (+4.6%)
      |   +- hits latency                       | 58.451 ns/op         | 55.865 ns/op   (-4.4%)
      |   +- important_hits throughput          | 4.776 M ops/s        | 4.998 M ops/s  (+4.6%)
      |
      | + num_maps: 100
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 5.281 M ops/s        | 5.528 M ops/s  (+4.7%)
      |   +- hits latency                       | 192.398 ns/op        | 183.059 ns/op  (-4.9%)
      |   +- important_hits throughput          | 0.053 M ops/s        | 0.055 M ops/s  (+4.9%)
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 6.265 M ops/s        | 6.498 M ops/s  (+3.7%)
      |   +- hits latency                       | 161.436 ns/op        | 152.877 ns/op  (-5.3%)
      |   +- important_hits throughput          | 1.636 M ops/s        | 1.697 M ops/s  (+3.7%)
      |
      | + num_maps: 1000
      | :                                         <before>             | <after>
      | +-+ local_storage cache sequential get  +----------------------+----------------------
      |   +- hits throughput                    | 0.355 M ops/s        | 0.354 M ops/s  (  ~  )
      |   +- hits latency                       | 2826.538 ns/op       | 2827.139 ns/op (  ~  )
      |   +- important_hits throughput          | 0.000 M ops/s        | 0.000 M ops/s  (  ~  )
      | :
      | :                                         <before>             | <after>
      | +-+ local_storage cache interleaved get +----------------------+----------------------
      |   +- hits throughput                    | 0.404 M ops/s        | 0.403 M ops/s  (  ~  )
      |   +- hits latency                       | 2481.190 ns/op       | 2487.555 ns/op (  ~  )
      |   +- important_hits throughput          | 0.102 M ops/s        | 0.101 M ops/s  (  ~  )
      
      The on_lookup test in {cgrp,task}_ls_recursion.c is removed
      because the bpf_local_storage_lookup is no longer traceable
      and adding tracepoint will make the compiler generate worse
      code: https://lore.kernel.org/bpf/ZcJmok64Xqv6l4ZS@elver.google.com/
      
      
      
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Cc: Martin KaFai Lau <martin.lau@linux.dev>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20240207122626.3508658-1-elver@google.com
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      68bc61c2
  7. Feb 08, 2024
  8. Feb 07, 2024
    • Andrii Nakryiko's avatar
      Merge branch 'tools-resolve_btfids-fix-cross-compilation-to-non-host-endianness' · abae1ac5
      Andrii Nakryiko authored
      
      Viktor Malik says:
      
      ====================
      tools/resolve_btfids: fix cross-compilation to non-host endianness
      
      The .BTF_ids section is pre-filled with zeroed BTF ID entries during the
      build and afterwards patched by resolve_btfids with correct values.
      Since resolve_btfids always writes in host-native endianness, it relies
      on libelf to do the translation when the target ELF is cross-compiled to
      a different endianness (this was introduced in commit 61e8aeda
      ("bpf: Fix libelf endian handling in resolv_btfids")).
      
      Unfortunately, the translation will corrupt the flags fields of SET8
      entries because these were written during vmlinux compilation and are in
      the correct endianness already. This will lead to numerous selftests
      failures such as:
      
          $ sudo ./test_verifier 502 502
          #502/p sleepable fentry accept FAIL
          Failed to load prog 'Invalid argument'!
          bpf_fentry_test1 is not sleepable
          verification time 34 usec
          stack depth 0
          processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
          Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
      
      Since it's not possible to instruct libelf to translate just certain
      values, let's manually bswap the flags (both global and entry flags) in
      resolve_btfids when needed, so that libelf then translates everything
      correctly.
      
      The first patch of the series refactors resolve_btfids by using types
      from btf_ids.h instead of accessing the BTF ID data using magic offsets.
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      ---
      Changes in v4:
      - remove unnecessary vars and pointer casts (suggested by Daniel Xu)
      
      Changes in v3:
      - add byte swap of global 'flags' field in btf_id_set8 (suggested by
        Jiri Olsa)
      - cleaner refactoring of sets_patch (suggested by Jiri Olsa)
      - add compile-time assertion that IDs are at the beginning of pairs
        struct in btf_id_set8 (suggested by Daniel Borkmann)
      
      Changes in v2:
      - use type defs from btf_ids.h (suggested by Andrii Nakryiko)
      ====================
      
      Link: https://lore.kernel.org/r/cover.1707223196.git.vmalik@redhat.com
      
      
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      abae1ac5
    • Viktor Malik's avatar
      tools/resolve_btfids: Fix cross-compilation to non-host endianness · 903fad43
      Viktor Malik authored
      
      The .BTF_ids section is pre-filled with zeroed BTF ID entries during the
      build and afterwards patched by resolve_btfids with correct values.
      Since resolve_btfids always writes in host-native endianness, it relies
      on libelf to do the translation when the target ELF is cross-compiled to
      a different endianness (this was introduced in commit 61e8aeda
      ("bpf: Fix libelf endian handling in resolv_btfids")).
      
      Unfortunately, the translation will corrupt the flags fields of SET8
      entries because these were written during vmlinux compilation and are in
      the correct endianness already. This will lead to numerous selftests
      failures such as:
      
          $ sudo ./test_verifier 502 502
          #502/p sleepable fentry accept FAIL
          Failed to load prog 'Invalid argument'!
          bpf_fentry_test1 is not sleepable
          verification time 34 usec
          stack depth 0
          processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
          Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
      
      Since it's not possible to instruct libelf to translate just certain
      values, let's manually bswap the flags (both global and entry flags) in
      resolve_btfids when needed, so that libelf then translates everything
      correctly.
      
      Fixes: ef2c6f37 ("tools/resolve_btfids: Add support for 8-byte BTF sets")
      Signed-off-by: default avatarViktor Malik <vmalik@redhat.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/7b6bff690919555574ce0f13d2a5996cacf7bf69.1707223196.git.vmalik@redhat.com
      903fad43
    • Viktor Malik's avatar
      tools/resolve_btfids: Refactor set sorting with types from btf_ids.h · 9707ac4f
      Viktor Malik authored
      
      Instead of using magic offsets to access BTF ID set data, leverage types
      from btf_ids.h (btf_id_set and btf_id_set8) which define the actual
      layout of the data. Thanks to this change, set sorting should also
      continue working if the layout changes.
      
      This requires to sync the definition of 'struct btf_id_set8' from
      include/linux/btf_ids.h to tools/include/linux/btf_ids.h. We don't sync
      the rest of the file at the moment, b/c that would require to also sync
      multiple dependent headers and we don't need any other defs from
      btf_ids.h.
      
      Signed-off-by: default avatarViktor Malik <vmalik@redhat.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
      Link: https://lore.kernel.org/bpf/ff7f062ddf6a00815fda3087957c4ce667f50532.1707223196.git.vmalik@redhat.com
      9707ac4f
  9. Feb 06, 2024
Loading