Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String search improvements #691

Open
dg-pb opened this issue Jul 26, 2024 · 3 comments
Open

String search improvements #691

dg-pb opened this issue Jul 26, 2024 · 3 comments

Comments

@dg-pb
Copy link

dg-pb commented Jul 26, 2024

I have done some work on fastsearch.h module.

python/cpython#120025

PR was reviewed by few non-core-dev members and was subject to sufficient amount of iterations so that it is currently in stable state.

I have posted in coredev discourse, but without much luck.

So I thought maybe this is a better place to get some attention given that this is largely related to performance.

Apart from performance improvements, it does introduce consistency and simplifications that are helpful to move forward with further improvements (see String Search Overview, Coverage and API).

@markshannon
Copy link
Member

It looks a worthwhile improvement, but I don't think anyone in our team is an expert enough in string searching algorithms to review it.

Trying pinging discourse again?

@iritkatriel
Copy link
Collaborator

CC @gsbrodal (an expert on string algorithms).

@dg-pb
Copy link
Author

dg-pb commented Mar 27, 2025

This is still hanging. And it is a blocker to further progress in this direction.

Just want to note, that to review this, knowledge of string search algorithms is not necessary.

The logic of complex linear time "two-way" algorithm has not been changed, so review is on:

  1. making sure that changes did not alter previous logic of two-way find (no need to understand search algorithms)
  2. making sure that direction agonistic logic has been implemented correctly (no need to understand search algorithms)
  3. reviewing new main search function:
    a) fairly simple string search with couple of basic additional rules
    b) logic of falling back to more complex two-way search

Only 3a needs a bit of of understanding of string search logic (but not the complex linear time "two-way"), the rest can be done without it.

So although there is definitely a bit of work to review, but for someone with good C knowledge, it might not be as much as it might seem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants