Skip to content

Commit

Permalink
update before release
Browse files Browse the repository at this point in the history
  • Loading branch information
brj0 committed May 29, 2023
1 parent dd2c8e8 commit ae4aad6
Show file tree
Hide file tree
Showing 34 changed files with 2,208 additions and 2,683 deletions.
169 changes: 7 additions & 162 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,175 +1,20 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# vim swap files
*.swp
# C++ files
*.o
*.d

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# ctags
tags

data
examples
umap
nnd
# Debug files
Makefile
trash.*
callgrind.out.*
gmon.out

# cpp files
*.o
*.d

# ignore this file
.gitignore

#
trash.*
nnd
20 changes: 7 additions & 13 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,26 +13,20 @@ set(CMAKE_CXX_FLAGS
file(GLOB_RECURSE SRC_FILES src/*.cpp)

# Library
add_library(nndlib STATIC ${SRC_FILES})
add_library(nndescent STATIC ${SRC_FILES})

# Source code
target_include_directories(nndlib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/src)
target_include_directories(nndescent PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/src)

# Tests
add_executable(test0 tests/test0.cpp)
target_link_libraries(test0 PRIVATE nndlib)

add_executable(test_distances tests/test_distances.cpp)
target_link_libraries(test_distances PRIVATE nndlib)
target_link_libraries(test_distances PRIVATE nndescent)

add_executable(simple tests/simple.cpp)
target_link_libraries(simple PRIVATE nndlib)

add_executable(coil20 tests/coil20.cpp)
target_link_libraries(coil20 PRIVATE nndlib)
target_link_libraries(simple PRIVATE nndescent)

add_executable(mnist tests/mnist.cpp)
target_link_libraries(mnist PRIVATE nndlib)
add_executable(faces tests/faces.cpp)
target_link_libraries(faces PRIVATE nndescent)

add_executable(fmnist tests/fmnist.cpp)
target_link_libraries(fmnist PRIVATE nndlib)
target_link_libraries(fmnist PRIVATE nndescent)
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
appname := nnd

sources := src/dtypes.cpp src/rp_trees.cpp src/nnd.cpp tests/test0.cpp src/utils.cpp
sources := src/dtypes.cpp src/rp_trees.cpp src/nnd.cpp tests/trash.test0.cpp src/utils.cpp
objects := $(patsubst %.cpp,%.o,$(sources))
depends := $(patsubst %.cpp,%.d,$(sources))

Expand Down
66 changes: 66 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Nearest Neighbor Descent (nndescent)

Nearest Neighbor Descent (nndescent) is a C++ implementation of the pynndescent library, originally written by Leland McInnes, which performs approximate nearest neighbor search. The goal of this algorithm is to construct a k-nearest neighbor graph quickly and accurately.

## Background

The theoretical background of NND is based on the following paper:
- Dong, Wei, Charikar Moses, and Kai Li. "Efficient k-nearest neighbor graph construction for generic similarity measures." Proceedings of the 20th International Conference on World Wide Web. 2011.

In addition, the algorithm utilizes random projection trees for initializing the nearest neighbor graph, based on the following paper:
- DASGUPTA, Sanjoy; FREUND, Yoav. Random projection trees and low dimensional manifolds. In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing. 2008.

## Features

- C++ implementation utilizing OpenMP for efficient computation
- Support for dense matrices
- Implementation of a subset of distance functions

## Installation

1. Clone the repository:

```sh
git clone https://github.com/brj0/nndescent.git
cd nndescent
```

2. Build the project:

```sh
pip install .
```

3. Run the examples in `tests`. To build the dataset you should first run `make_test_data.py`

## Performance

On my computer, the training phase of nndescent is approximately 5-10% faster than pynndescent. Additionally, the search query phase is approximately 75% faster. Below is the output obtained from running tests/benchmark.py:

### Benchmark test pynndescent vs nndescent
Data set | py train [ms] | c train [ms] | ratio | py vs c match | py test [ms] | c test [ms] | ratio | py accuracy | c accuracy
----------|---------------|--------------|-------|---------------|--------------|-------------|-------|-------------|-----------
faces | 191.8 | 190.0 | 0.991 | 1.000 | 1631.6 | 20.5 | 0.013 | 1.000 | 0.999
fmnist | 13587.5 | 12935.1 | 0.952 | 0.997 | 6751.2 | 1757.2 | 0.260 | 0.978 | 0.978
mnist | 14187.2 | 12712.9 | 0.896 | 0.997 | 6664.2 | 1665.1 | 0.250 | 0.969 | 0.968

The compilation time and the long numba loading time during import in Python for pynndescent are not taken into account here and are not required in nndescent.

## Usage

Please refer to the examples provided in the repository for instructions on how to use the NND library in your projects.

## Contributing

Contributions are welcome! If you have any bug reports, feature requests, or suggestions, please open an issue or submit a pull request.

## License

This project is licensed under the [BSD-2-Clause license](LICENSE).

## Acknowledgements

This implementation is based on the original pynndescent library by Leland McInnes. I would like to express my gratitude for his work.

For more information, visit the [pynndescent GitHub repository](https://github.com/lmcinnes/pynndescent).

Loading

0 comments on commit ae4aad6

Please sign in to comment.