examples/c: add hashing and naive substring search algo #331

anakryiko · 2025-03-11T21:51:12Z

Also benchmark it a little. Performance obviously will depend on haystack and needle strings and so on, but hashing implementation seems to be on par with naive implementation for short strings, but is getting relatively faster as strings become longer and/or pattern match happens further into the string.

E.g., for searching "ra" in "abracadabra" (end of short string):

substr-2084331 [012] ..... 2514091.887184: bpf_trace_printk: BENCH HASHED 156 ns/iter
substr-2084331 [012] ..... 2514091.891784: bpf_trace_printk: BENCH NAIVE 183 ns/iter

For searching "eaba" in "abacabadabacabaeabacabadabacaba" (middle of longer string):

substr-2082624 [015] ..... 2514066.577106: bpf_trace_printk: BENCH HASHED 289 ns/iter
substr-2082624 [015] ..... 2514066.588243: bpf_trace_printk: BENCH NAIVE 445 ns/iter

But searching all occurences of "a" inside "abracadabra" (almost immediate match in rather short string):

substr-2111313 [078] ..... 2514466.822019: bpf_trace_printk: BENCH HASHED 259 ns/iter
substr-2111313 [078] ..... 2514466.827745: bpf_trace_printk: BENCH NAIVE 228 ns/iter

Overall, hashed variant seems best from practical point of view.

chenhengqi · 2025-03-12T02:11:07Z

examples/c/substr.bpf.c

Hi, Andrii. I see some annotations in the prog like __arg_nonnull, does this help compiler or verifier to optimize their process ?

__arg_nonnull is an annotation that can be applied to arguments of global subprog (which is verified by BPF verifier in isolation from main program, based on functions' type signature; so it's a more restricted way to verify, but also allows to scale BPF verification much better, as we create a smaller isolated pieces of logic that BPF verifier won't have to re-validate every single time). It tells BPF verifier that this argument can't be NULL. This will be assumed by verifier when validating the body of that subprogram, but also enforced by verifier when other code calls into this subprogram.

Hope this helps.

Also benchmark it a little. Performance obviously will depend on haystack and needle strings and so on, but hashing implementation seems to be on par with naive implementation for short strings, but is getting relatively faster as strings become longer and/or pattern match happens further into the string. E.g., for searching "ra" in "abracadabra" (end of short string): substr-2084331 [012] ..... 2514091.887184: bpf_trace_printk: BENCH HASHED 156 ns/iter substr-2084331 [012] ..... 2514091.891784: bpf_trace_printk: BENCH NAIVE 183 ns/iter For searching "eaba" in "abacabadabacabaeabacabadabacaba" (middle of longer string): substr-2082624 [015] ..... 2514066.577106: bpf_trace_printk: BENCH HASHED 289 ns/iter substr-2082624 [015] ..... 2514066.588243: bpf_trace_printk: BENCH NAIVE 445 ns/iter But searching all occurences of "a" inside "abracadabra" (almost immediate match in rather short string): substr-2111313 [078] ..... 2514466.822019: bpf_trace_printk: BENCH HASHED 259 ns/iter substr-2111313 [078] ..... 2514466.827745: bpf_trace_printk: BENCH NAIVE 228 ns/iter Overall, hashed variant seems best from practical point of view. Signed-off-by: Andrii Nakryiko <[email protected]>

anakryiko force-pushed the example-substr branch from f849345 to 3075498 Compare March 11, 2025 21:54

chenhengqi reviewed Mar 12, 2025

View reviewed changes

anakryiko force-pushed the example-substr branch from 3075498 to e59471e Compare March 12, 2025 20:40

anakryiko force-pushed the example-substr branch from e59471e to a4d665a Compare March 12, 2025 21:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples/c: add hashing and naive substring search algo #331

examples/c: add hashing and naive substring search algo #331

anakryiko commented Mar 11, 2025

chenhengqi Mar 12, 2025

anakryiko Mar 12, 2025

examples/c: add hashing and naive substring search algo #331

Are you sure you want to change the base?

examples/c: add hashing and naive substring search algo #331

Conversation

anakryiko commented Mar 11, 2025

chenhengqi Mar 12, 2025

Choose a reason for hiding this comment

anakryiko Mar 12, 2025

Choose a reason for hiding this comment