Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add: Clustering and multiple vectors per key #200

Merged
merged 42 commits into from
Aug 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
9163d79
Fix: Concurrent interruptions & error handling
ashvardanian Aug 10, 2023
531d2bc
Fix: Counting nodes per level
ashvardanian Aug 10, 2023
7723ce5
Fix: MetricKind name collision
ashvardanian Aug 10, 2023
a823d06
Fix: Pretty-printing metadata
ashvardanian Aug 10, 2023
ddf1afa
Add: Clustering functionality
ashvardanian Aug 10, 2023
6695954
Merge branch 'main-dev' of https://github.com/ashvardanian/usearch in…
ashvardanian Aug 10, 2023
d9bc92b
Refactor: Args to top-level interface
ashvardanian Aug 11, 2023
886e29f
Refactor: Bindings settgins and docs
ashvardanian Aug 11, 2023
a1b9b38
Refactor: Prepare to move GoLang builds
ashvardanian Aug 11, 2023
1b3ae10
Docs: Mention integratiosn
ashvardanian Aug 11, 2023
1227ff0
Add: Rename, remove, count and check in batches
ashvardanian Aug 13, 2023
f2f6b46
Refactor: Split Index tests for key collisions
ashvardanian Aug 14, 2023
27b214f
Make: Update version in `README.md`
ashvardanian Aug 14, 2023
79f33d7
Merge pull request #1 from unum-cloud/main-dev
ashvardanian Aug 14, 2023
7e5f6a7
Add: Support multiple vectors per key
ashvardanian Aug 14, 2023
0307e6e
Merge branch 'main-dev' of https://github.com/ashvardanian/usearch in…
ashvardanian Aug 14, 2023
89a0b75
Make: Remove `robin-map` dependency
ashvardanian Aug 14, 2023
9b724da
Refactor: Consistent `multi` in bindings
ashvardanian Aug 14, 2023
e6ed1a7
Fix: Type-casting in Python retrieval
ashvardanian Aug 15, 2023
5e50f6b
Add: Clustering limited to number of clusters
ashvardanian Aug 15, 2023
9fd2767
Fix: Persisting the flag for multi-indexes
ashvardanian Aug 19, 2023
7ec4699
Fix: `match_t` constructors and type names
ashvardanian Aug 20, 2023
492b181
Add: `distance_between` and `cluster` APIs
ashvardanian Aug 20, 2023
c428f54
Make: Freeze Sphinx version
ashvardanian Aug 20, 2023
9ede10f
Refactor: `multi` support in C 99 bindings
ashvardanian Aug 20, 2023
d1fd90a
Refactor: Placeholder for #206
ashvardanian Aug 20, 2023
4d2fccd
Refactor: Black formatting
ashvardanian Aug 20, 2023
f6f12fe
Add: `pairwise_distance` and clustering fixes
ashvardanian Aug 20, 2023
77bb4eb
Merge branch 'main-dev' into main-dev
ashvardanian Aug 20, 2023
a415971
Improve: Parallel cluster refinement
ashvardanian Aug 21, 2023
c4634d0
Merge branch 'main-dev' of https://github.com/ashvardanian/usearch in…
ashvardanian Aug 21, 2023
f3d56fa
Fix: Support platforms without 16-byte atomic store
ashvardanian Aug 21, 2023
d17a8e0
Docs: Add Arxiv dataset for benchmarks
ashvardanian Aug 21, 2023
da9f3a9
Fix: Multi-vector keys
ashvardanian Aug 21, 2023
7741f56
Fix: Default initialization
ashvardanian Aug 21, 2023
d678810
Improve: `clustering` API
ashvardanian Aug 22, 2023
40e803c
Add: `unfair_shared_mutex_t` for C++ 11 compat.
ashvardanian Aug 22, 2023
a01fc6d
Fix: Printing top layer of graph
ashvardanian Aug 22, 2023
593f688
Improve: Lower asymptotics for clustering
ashvardanian Aug 22, 2023
4570ee3
Add: `Clustering` class for recursive exploration
ashvardanian Aug 22, 2023
f988fc3
Fix: `shared_lock_gt` for C++11
ashvardanian Aug 22, 2023
b7a59ad
Fix: Clustering tests
ashvardanian Aug 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 2 additions & 18 deletions .github/workflows/prerelease.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ jobs:
- name: Build locally
run: python -m pip install .
- name: Test with PyTest
run: pytest python/scripts/test.py
run: pytest python/scripts/


test_python_37:
Expand Down Expand Up @@ -95,7 +95,7 @@ jobs:
run: python -m pip install .

- name: Test with PyTest
run: pytest python/scripts/test.py
run: pytest python/scripts/


test_javascript:
Expand All @@ -122,22 +122,6 @@ jobs:
toolchain: stable
override: true

test_golang:
name: Test GoLang
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: git submodule update --init --recursive
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: '1.15'
- name: Build C library for cGo
run: |
make -C ./c libusearch_c.so
mv ./c/libusearch_c.so ./golang/libusearch_c.so
cd golang && ls && go test -v

test_java:
name: Test Java
runs-on: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ jobs:
run: |
sudo apt update &&
sudo apt install -y doxygen graphviz dia git &&
pip install sphinx sphinx-js breathe furo m2r2 sphinxcontrib-googleanalytics==0.2.dev20220708 sphinxcontrib-jquery &&
pip install sphinx==7.1.2 sphinx-js breathe furo m2r2 sphinxcontrib-googleanalytics==0.2.dev20220708 sphinxcontrib-jquery &&
npm install -g jsdoc
- name: Install USearch from PyPi
run: pip install usearch
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/update_version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ echo $1 > VERSION &&
sed -i "s/^\(#define USEARCH_VERSION_MINOR \).*/\1$(echo "$1" | cut -d. -f2)/" ./include/usearch/index.hpp &&
sed -i "s/^\(#define USEARCH_VERSION_PATCH \).*/\1$(echo "$1" | cut -d. -f3)/" ./include/usearch/index.hpp &&
sed -i "s/<version>[0-9]\+\.[0-9]\+\.[0-9]\+/<version>$1/" README.md &&
sed -i "s/version = {0\.[0-9]\+\.[0-9]\+}/version = {$1}/" README.md &&
sed -i "s/version = {[0-9]\+\.[0-9]\+\.[0-9]\+}/version = {$1}/" README.md &&
sed -i "s/version=\".*\"/version=\"$1\"/" wasmer.toml
5 changes: 1 addition & 4 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,4 @@
url = https://github.com/ashvardanian/simsimd
[submodule "fp16"]
path = fp16
url = https://github.com/maratyszcza/fp16
[submodule "robin-map"]
path = robin-map
url = https://github.com/tessil/robin-map
url = https://github.com/maratyszcza/fp16
3 changes: 2 additions & 1 deletion Package.swift
Original file line number Diff line number Diff line change
Expand Up @@ -19,20 +19,21 @@ let package = Package(
cxxSettings: [
.headerSearchPath("../include/"),
.headerSearchPath("../fp16/include/"),
.headerSearchPath("../robin-map/include/"),
.headerSearchPath("../simismd/include/")
]
),
.target(
name: "USearch",
dependencies: ["USearchObjective"],
path: "swift",
exclude: ["README.md", "Test.swift"],
sources: ["USearch.swift", "Index+Sugar.swift"]
),
.testTarget(
name: "USearchTests",
dependencies: ["USearch"],
path: "swift",
exclude: ["USearch.swift", "Index+Sugar.swift", "README.md"],
sources: ["Test.swift"]
)
],
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -319,10 +319,10 @@ matches = index.search(fingerprints, 10)

## Integrations

- [x] GPT-Cache.
- [x] LangChain.
- [ ] ClickHouse.
- [ ] Microsoft Semantic Kernel.
- [x] GPTCache: [Python](https://github.com/zilliztech/GPTCache/releases/tag/0.1.29).
- [x] LangChain: [Python](https://github.com/langchain-ai/langchain/releases/tag/v0.0.257) and [JavaScipt](https://github.com/hwchase17/langchainjs/releases/tag/0.0.125).
- [ ] Microsoft Semantic Kernel: [Python](https://github.com/microsoft/semantic-kernel/pull/2358) and C#.
- [ ] ClickHouse: C++.

## Citations

Expand All @@ -332,8 +332,8 @@ doi = {10.5281/zenodo.7949416},
author = {Vardanian, Ash},
title = {{USearch by Unum Cloud}},
url = {https://github.com/unum-cloud/usearch},
version = {0.13.0},
year = {2022}
version = {1.0.0},
year = {2022},
month = jun,
}
```
28 changes: 16 additions & 12 deletions binding.gyp
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,36 @@
"targets": [
{
"target_name": "usearch",
"sources": [
"javascript/lib.cpp"
],
"sources": ["javascript/lib.cpp"],
"include_dirs": [
"<!@(node -p \"require('node-addon-api').include\")",
"include",
"fp16/include",
"robin-map/include",
"simsimd/include"
"simsimd/include",
],
"dependencies": ["<!(node -p \"require('node-addon-api').gyp\")"],
"cflags": [
"-fexceptions",
"-Wno-unknown-pragmas",
"-Wno-maybe-uninitialized",
],
"dependencies": [
"<!(node -p \"require('node-addon-api').gyp\")"
"cflags_cc": [
"-fexceptions",
"-Wno-unknown-pragmas",
"-Wno-maybe-uninitialized",
"-std=c++11",
],
"cflags": ["-fexceptions", "-Wno-unknown-pragmas", "-Wno-maybe-uninitialized"],
"cflags_cc": ["-fexceptions", "-Wno-unknown-pragmas", "-Wno-maybe-uninitialized", "-std=c++11"],
"xcode_settings": {
"GCC_ENABLE_CPP_EXCEPTIONS": "YES",
"CLANG_CXX_LIBRARY": "libc++",
"MACOSX_DEPLOYMENT_TARGET": "10.15"
"MACOSX_DEPLOYMENT_TARGET": "10.15",
},
"msvs_settings": {
"VCCLCompilerTool": {
"ExceptionHandling": 1,
"AdditionalOptions": ["-std:c++11"]
"AdditionalOptions": ["-std:c++11"],
}
}
},
}
]
}
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ model {
include "**/*.cpp"
}
exportedHeaders {
srcDirs "include", "fp16/include", "robin-map/include", "simsimd/include", "${Jvm.current().javaHome}/include"
srcDirs "include", "fp16/include", "simsimd/include", "${Jvm.current().javaHome}/include"
}
}
}
Expand Down
1 change: 0 additions & 1 deletion build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ fn main() {
.include("include")
.include("rust")
.include("fp16/include")
.include("robin-map/include")
.include("simsimd/include")
.compile("usearch");

Expand Down
1 change: 0 additions & 1 deletion c/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
set(USEARCH_PUNNED_INCLUDE_DIRS
"${CMAKE_CURRENT_SOURCE_DIR}/../include"
"${CMAKE_CURRENT_SOURCE_DIR}/../fp16/include"
"${CMAKE_CURRENT_SOURCE_DIR}/../robin-map/include"
"${CMAKE_CURRENT_SOURCE_DIR}/../simsimd/include"
"${CMAKE_CURRENT_SOURCE_DIR}/"
)
Expand Down
44 changes: 30 additions & 14 deletions c/lib.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,13 @@ add_result_t add_(index_dense_t* index, usearch_key_t key, void const* vector, s
}
}

bool get_(index_dense_t* index, usearch_key_t key, void* vector, scalar_kind_t kind) {
bool get_(index_dense_t* index, usearch_key_t key, size_t count, void* vector, scalar_kind_t kind) {
switch (kind) {
case scalar_kind_t::f32_k: return index->get(key, (f32_t*)vector);
case scalar_kind_t::f64_k: return index->get(key, (f64_t*)vector);
case scalar_kind_t::f16_k: return index->get(key, (f16_t*)vector);
case scalar_kind_t::i8_k: return index->get(key, (i8_bits_t*)vector);
case scalar_kind_t::b1x8_k: return index->get(key, (b1x8_t*)vector);
case scalar_kind_t::f32_k: return index->get(key, (f32_t*)vector, count);
case scalar_kind_t::f64_k: return index->get(key, (f64_t*)vector, count);
case scalar_kind_t::f16_k: return index->get(key, (f16_t*)vector, count);
case scalar_kind_t::i8_k: return index->get(key, (i8_bits_t*)vector, count);
case scalar_kind_t::b1x8_k: return index->get(key, (b1x8_t*)vector, count);
default: return search_result_t().failed("Unknown scalar kind!");
}
}
Expand All @@ -87,6 +87,7 @@ USEARCH_EXPORT usearch_index_t usearch_init(usearch_init_options_t* options, use
assert(options && error);

index_dense_config_t config(options->connectivity, options->expansion_add, options->expansion_search);
config.multi = options->multi;
metric_kind_t metric_kind = to_native_metric(options->metric_kind);
scalar_kind_t scalar_kind = to_native_scalar(options->quantization);

Expand Down Expand Up @@ -167,9 +168,14 @@ USEARCH_EXPORT bool usearch_contains(usearch_index_t index, usearch_key_t key, u
return reinterpret_cast<index_dense_t*>(index)->contains(key);
}

USEARCH_EXPORT size_t usearch_count(usearch_index_t index, usearch_key_t key, usearch_error_t*) {
assert(index);
return reinterpret_cast<index_dense_t*>(index)->count(key);
}

USEARCH_EXPORT size_t usearch_search( //
usearch_index_t index, void const* vector, usearch_scalar_kind_t kind, size_t results_limit, //
usearch_key_t* found_labels, usearch_distance_t* found_distances, usearch_error_t* error) {
usearch_key_t* found_keys, usearch_distance_t* found_distances, usearch_error_t* error) {

assert(index && vector && error);
search_result_t result =
Expand All @@ -179,23 +185,33 @@ USEARCH_EXPORT size_t usearch_search(
return 0;
}

return result.dump_to(found_labels, found_distances);
return result.dump_to(found_keys, found_distances);
}

USEARCH_EXPORT bool usearch_get( //
usearch_index_t index, usearch_key_t key, //
void* vector, usearch_scalar_kind_t kind, usearch_error_t*) {
USEARCH_EXPORT size_t usearch_get( //
usearch_index_t index, usearch_key_t key, size_t count, //
void* vectors, usearch_scalar_kind_t kind, usearch_error_t*) {

assert(index && vector);
return get_(reinterpret_cast<index_dense_t*>(index), key, vector, to_native_scalar(kind));
assert(index && vectors);
return get_(reinterpret_cast<index_dense_t*>(index), key, count, vectors, to_native_scalar(kind));
}

USEARCH_EXPORT bool usearch_remove(usearch_index_t index, usearch_key_t key, usearch_error_t* error) {
USEARCH_EXPORT size_t usearch_remove(usearch_index_t index, usearch_key_t key, usearch_error_t* error) {

assert(index && error);
labeling_result_t result = reinterpret_cast<index_dense_t*>(index)->remove(key);
if (!result)
*error = result.error.release();
return result.completed;
}

USEARCH_EXPORT size_t usearch_rename(usearch_index_t index, usearch_key_t from, usearch_key_t to,
usearch_error_t* error) {

assert(index && error);
labeling_result_t result = reinterpret_cast<index_dense_t*>(index)->rename(from, to);
if (!result)
*error = result.error.release();
return result.completed;
}
}
45 changes: 34 additions & 11 deletions c/usearch.h
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,10 @@ USEARCH_EXPORT typedef struct usearch_init_options_t {
* @brief The @b optional expansion factor used for index construction during search operations.
*/
size_t expansion_search;
/**
* @brief When set allows multiple vectors to map to the same key.
*/
bool multi;
} usearch_init_options_t;

/**
Expand Down Expand Up @@ -151,39 +155,58 @@ USEARCH_EXPORT void usearch_add( //
*/
USEARCH_EXPORT bool usearch_contains(usearch_index_t, usearch_key_t, usearch_error_t* error);

/**
* @brief Counts the number of entries in the index under a specific key.
* @param[in] key The key to be checked.
* @param[out] error Pointer to a string where the error message will be stored, if an error occurs.
* @return Number of vectors found under that key.
*/
USEARCH_EXPORT size_t usearch_count(usearch_index_t, usearch_key_t, usearch_error_t* error);

/**
* @brief Performs k-Approximate Nearest Neighbors (kANN) Search for closest vectors to query.
* @param[in] query_vector Pointer to the query vector data.
* @param[in] query_kind The scalar type used in the query vector data.
* @param[in] results_limit Upper bound on the number of neighbors to search, the "k" in "kANN".
* @param[out] found_keys Output buffer for up to `results_limit` nearest neighbors keys.
* @param[out] found_distances Output buffer for up to `results_limit` distances to nearest neighbors.
* @param[in] count Upper bound on the number of neighbors to search, the "k" in "kANN".
* @param[out] keys Output buffer for up to `count` nearest neighbors keys.
* @param[out] distances Output buffer for up to `count` distances to nearest neighbors.
* @param[out] error Pointer to a string where the error message will be stored, if an error occurs.
* @return Number of found matches.
*/
USEARCH_EXPORT size_t usearch_search( //
usearch_index_t, void const* query_vector, usearch_scalar_kind_t query_kind, size_t results_limit, //
usearch_key_t* found_keys, usearch_distance_t* found_distances, usearch_error_t* error);
USEARCH_EXPORT size_t usearch_search( //
usearch_index_t, //
void const* query_vector, usearch_scalar_kind_t query_kind, //
size_t count, usearch_key_t* keys, usearch_distance_t* distances, usearch_error_t* error);

/**
* @brief Retrieves the vector associated with the given key from the index.
* @param[in] key The key of the vector to retrieve.
* @param[out] vector Pointer to the memory where the vector data will be copied.
* @param[in] count Number of vectors that can be fitted into `vector` for multi-vector entries.
* @param[in] vector_kind The scalar type used in the vector data.
* @param[out] error Pointer to a string where the error message will be stored, if an error occurs.
* @return `true` if the vector is successfully retrieved, `false` if the vector is not found.
* @return Number of vectors found under that name and exported to `vector`.
*/
USEARCH_EXPORT bool usearch_get( //
usearch_index_t, usearch_key_t key, //
USEARCH_EXPORT size_t usearch_get( //
usearch_index_t, usearch_key_t key, size_t count, //
void* vector, usearch_scalar_kind_t vector_kind, usearch_error_t* error);

/**
* @brief Removes the vector associated with the given key from the index.
* @param[in] key The key of the vector to be removed.
* @param[out] error Pointer to a string where the error message will be stored, if an error occurs.
* @return `true` if the vector is successfully removed, `false` if the vector is not found.
* @return Number of vectors found under that name and dropped from the index.
*/
USEARCH_EXPORT size_t usearch_remove(usearch_index_t, usearch_key_t key, usearch_error_t* error);

/**
* @brief Renames the vector to map to a different key.
* @param[in] from The key of the vector to be renamed.
* @param[in] to New key for found entry.
* @param[out] error Pointer to a string where the error message will be stored, if an error occurs.
* @return Number of vectors found under that name and renamed.
*/
USEARCH_EXPORT bool usearch_remove(usearch_index_t, usearch_key_t key, usearch_error_t* error);
USEARCH_EXPORT size_t usearch_rename(usearch_index_t, usearch_key_t from, usearch_key_t to, usearch_error_t* error);

#ifdef __cplusplus
}
Expand Down
Loading
Loading