-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why use kd-tree rather than HNSW? #23
Comments
I explored HNSW a couple months ago. There are currently two HNSW wasm packages: |
Thanks for opening the issue @zxch3n ! Could you tell me more about HNSW and why is it a better option than kd tree? I don't know much about HNSW yet. |
@jlarmstrongiv That's awesome! What're your thoughts on HNSW? |
@DawChihLiou I thought it was really neat! Both packages are open-source on github and up to date with hnswlib v0.7.0 Importantly:
The only thing I couldn’t figure out with the wasm version yet was importing/exporting the index to/from a file in the browser. Notes to self:
|
You can see the details about HNSW in its paper https://arxiv.org/abs/1603.09320 |
@zxch3n thanks for the suggestion. I'll explore HNSW once Voy is stabilized so we'll be able to compare the benchmarks. A few more resources: |
Ahh, that answers the question I had when reading the
The example from #21 has a higher dimensionality of 384, not just 4. Unfortunately, most of my data has a much, much, much higher dimensionality than that. It seems like What are your thoughts on that @DawChihLiou, since most of the embeddings you have planned have high dimensionality? |
I think HNSW is a better fit and worth exploring too. Currently kiddo is capable of handling higher dimensionality and produce quality results. The vectors in the example are in 768d. I'll work on HNSW once Voy's API is more stabilized. |
I think HNSW may change some of Voy’s APIs, like the type of |
@jlarmstrongiv thanks for your support! I really appreciate it. HNSW will be an internal implementation so it'll most likely to have no effect on the API. SerializedIndex is done to communicate between js and wasm. |
hey, I have experimented a little bit with hnsw based vector stores in the browser, check out this link: https://github.com/xyntopia/vexvault you can find the relevant part here: https://github.com/Xyntopia/vexvault/blob/vexvault/src/modules/localVectorStore.ts |
For high-dimensional data, kd-tree uses O(n) time to do the search. And knowing something about the (Euclidian) distance in one dimension says very little about the distance in the full space.
The text was updated successfully, but these errors were encountered: