Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel scan #22

Open
acelyc111 opened this issue Jul 25, 2022 · 0 comments
Open

parallel scan #22

acelyc111 opened this issue Jul 25, 2022 · 0 comments

Comments

@acelyc111
Copy link
Owner

acelyc111 commented Jul 25, 2022

Current scan on a replica is serialized, means there is only 1 scan iterator on a replica, it will be too slow in some cases, for example, copy data in tools.
We can implement parallel scans on a replica:

  1. count the key-value pairs in a replica, say N
  2. manual specify how many scans on a replica, say M
  3. then we can get the start keys by scanning the replica again:
    a. start scan the replica again
    b. find the ith end key when count N/M key-value pairs
    c. loop a~b until we reach the end of replica
    d. at last, we may get less or more than M key pairs, since the data in the replica may increase or decrease at the 2nd scan, or some data been expired.
  4. assign each key pair to different scanners(corresponding to client side threads), they can scan parallelly to speed up the whole scan.
    NOTE: since the sub-scans may on the same hashkey, but different scan can not ensure atomic, so the total result doesn't provide atomic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant