Skip to content

Commit

Permalink
improve python documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Bluemi committed Oct 29, 2024
1 parent a6035f7 commit d73992e
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 13 deletions.
30 changes: 17 additions & 13 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,20 @@ python -m venv /path/to/deglib_env && . /path/to/deglib_env/bin/activate
**Get the Source**
```shell
# clone git repository
# TODO: "-b feat/python_bindings" not necessary after merge
git clone -b feat/python_bindings [email protected]:Visual-Computing/DynamicExplorationGraph.git
cd DynamicExplorationGraph
git clone [email protected]:Visual-Computing/DynamicExplorationGraph.git
cd DynamicExplorationGraph/python
```

**Install the Package from Source**
```shell
cd python
pip install setuptools pybind11 build
python setup.py copy_build_files # copy c++ library to ./lib/
pip install .
```
This will compile the C++ code and install deglib into your virtual environment, so it may take a while.

**Testing**

To execute all tests.
```shell
pytest
Expand Down Expand Up @@ -75,29 +74,32 @@ vectors and D is the number of dimensions of each feature vector.
### Building a Graph

```python
import deglib

graph = deglib.builder.EvenRegularGraphBuilder.build_from_data(dataset, edges_per_vertex=32)
graph = deglib.builder.build_from_data(dataset, edges_per_vertex=32, callback="progress")
graph.save_graph("/path/to/graph.deg")
rd_graph = deglib.graph.load_readonly_graph("/path/to/graph.deg")
```

### Searching the Graph
```python
# query can have shape (D,) or (Q, D), where
# D is the dimensionality of the dataset and
# Q is the number of queries.
query = np.random.random((dims,)).astype(np.float32)
result = graph.search(query, eps=0.1, k=10) # get 10 nearest features to query
for r in result:
print(r.get_internal_index(), r.get_distance())
result, dists = graph.search(query, eps=0.1, k=10) # get 10 nearest features to query
print('best dataset index:', result[0])
best_match = dataset[result[0]]
```

For more examples see [tests](tests).

### Referencing C++ memory
Consider the following example:
```python
feature_vector = graph.get_feature_vector(42)
del graph
print(feature_vector)
```
This will crash as `feature_vector` is holding a reference to memory that is owned by `graph`. This can lead to segmentation faults.
This will crash as `feature_vector` is holding a reference to memory that is owned by `graph`. This can lead to undefined behaviour (most likely segmentation fault).
Be careful to keep objects in memory that are referenced. If you need it use the `copy=True` option:

```python
Expand Down Expand Up @@ -131,11 +133,13 @@ elements from the graph, external labels and internal indices are equal.
# as long as no elements are removed
# external labels and internal indices are equal
for i, vec in enumerate(data):
builder.add_entry(i, vec)
builder.add_entry(i, vec)
```

### Eps
TODO
The eps-search-parameter controls how many nodes are checked during search.
Lower eps values like 0.001 are faster but less accurate.
Higher eps values like 0.1 are slower but more accurate. Should always be greater 0.

### Relative Neighborhood Graph / RNG-conform
TODO
Expand Down
2 changes: 2 additions & 0 deletions python/src/deglib/repository.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ def ivecs_read(filename: str | Path) -> np.ndarray:
d = a[0]
return a.reshape(-1, d + 1)[:, 1:].copy()


def u8vecs_read(filename: str | Path) -> np.ndarray:
"""
The loaded dataset should be in the format described here: http://corpus-texmex.irisa.fr/
Expand All @@ -19,6 +20,7 @@ def u8vecs_read(filename: str | Path) -> np.ndarray:
b = a.view(np.uint8).reshape(-1, a[0] + 4)
return b[:, 4:].copy()


def fvecs_read(filename: str | Path) -> np.ndarray:
"""
Taken from https://github.com/facebookresearch/faiss/blob/main/benchs/datasets.py#L12
Expand Down
3 changes: 3 additions & 0 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

The Dynamic Exploration Graph (DEG) is a graph-based algorithm for approximate nearest neighbor search (ANNS). It indexes both static and dynamic datasets using three algorithms: incremental extension, continuous edge optimization, and vertex deletion. The resulting graph demonstrates high efficiency in terms of queries per second relative to the achieved recall rate. DEG provides state-of-the-art performance for both indexed and unindexed queries (where the query is not part of the index).

## Usage
For a short introduction on how to use deglib for vector search, see our [Python Examples](python/README.md#examples).

## Release

- [2024/05/01] Our paper [An Exploration Graph with Continuous Refinement for Efficient Multimedia Retrieval](https://doi.org/10.1145/3652583.3658117) is accepted by ICMR2024 as **oral presentation**
Expand Down

0 comments on commit d73992e

Please sign in to comment.