Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples/c: add hashing and naive substring search algo #331

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

anakryiko
Copy link
Member

Also benchmark it a little. Performance obviously will depend on haystack and needle strings and so on, but hashing implementation seems to be on par with naive implementation for short strings, but is getting relatively faster as strings become longer and/or pattern match happens further into the string.

E.g., for searching "ra" in "abracadabra" (end of short string):

substr-2084331 [012] ..... 2514091.887184: bpf_trace_printk: BENCH HASHED 156 ns/iter
substr-2084331 [012] ..... 2514091.891784: bpf_trace_printk: BENCH NAIVE 183 ns/iter

For searching "eaba" in "abacabadabacabaeabacabadabacaba" (middle of longer string):

substr-2082624 [015] ..... 2514066.577106: bpf_trace_printk: BENCH HASHED 289 ns/iter
substr-2082624 [015] ..... 2514066.588243: bpf_trace_printk: BENCH NAIVE 445 ns/iter

But searching all occurences of "a" inside "abracadabra" (almost immediate match in rather short string):

substr-2111313 [078] ..... 2514466.822019: bpf_trace_printk: BENCH HASHED 259 ns/iter
substr-2111313 [078] ..... 2514466.827745: bpf_trace_printk: BENCH NAIVE 228 ns/iter

Overall, hashed variant seems best from practical point of view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Andrii. I see some annotations in the prog like __arg_nonnull, does this help compiler or verifier to optimize their process ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__arg_nonnull is an annotation that can be applied to arguments of global subprog (which is verified by BPF verifier in isolation from main program, based on functions' type signature; so it's a more restricted way to verify, but also allows to scale BPF verification much better, as we create a smaller isolated pieces of logic that BPF verifier won't have to re-validate every single time). It tells BPF verifier that this argument can't be NULL. This will be assumed by verifier when validating the body of that subprogram, but also enforced by verifier when other code calls into this subprogram.

Hope this helps.

Also benchmark it a little. Performance obviously will depend on
haystack and needle strings and so on, but hashing implementation seems
to be on par with naive implementation for short strings, but is getting
relatively faster as strings become longer and/or pattern match happens
further into the string.

E.g., for searching "ra" in "abracadabra" (end of short string):

  substr-2084331 [012] ..... 2514091.887184: bpf_trace_printk: BENCH HASHED 156 ns/iter
  substr-2084331 [012] ..... 2514091.891784: bpf_trace_printk: BENCH NAIVE 183 ns/iter

For searching "eaba" in "abacabadabacabaeabacabadabacaba" (middle of longer string):

  substr-2082624 [015] ..... 2514066.577106: bpf_trace_printk: BENCH HASHED 289 ns/iter
  substr-2082624 [015] ..... 2514066.588243: bpf_trace_printk: BENCH NAIVE 445 ns/iter

But searching all occurences of "a" inside "abracadabra" (almost immediate
match in rather short string):

  substr-2111313 [078] ..... 2514466.822019: bpf_trace_printk: BENCH HASHED 259 ns/iter
  substr-2111313 [078] ..... 2514466.827745: bpf_trace_printk: BENCH NAIVE 228 ns/iter

Overall, hashed variant seems best from practical point of view.

Signed-off-by: Andrii Nakryiko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants