From 80af3cfc8ad3d403150a55b7bd1e97add7f0cfa7 Mon Sep 17 00:00:00 2001 From: Michael Feil <63565275+michaelfeil@users.noreply.github.com> Date: Mon, 23 Oct 2023 20:28:26 +0200 Subject: [PATCH 1/2] Update README.md --- README.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/README.md b/README.md index 7f5ae14d..b40cf7db 100644 --- a/README.md +++ b/README.md @@ -19,10 +19,7 @@ ![codecov](https://codecov.io/gh/michaelfeil/infinity/branch/main/graph/badge.svg?token=NMVQY5QOFQ) ![CI](https://github.com/michaelfeil/infinity/actions/workflows/ci.yaml/badge.svg) -Embedding Inference Server - finding TGI for embeddings. Infinity is developed under MIT Licence - https://github.com/michaelfeil/infinity - - - +Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of sentence-transformer models and frameworks. Infinity is developed under MIT Licence: https://github.com/michaelfeil/infinity ## Why Infinity: Infinity provides the following features: From 00269fcc7f43aeda51a3d2d61e07abaab3eb1dfd Mon Sep 17 00:00:00 2001 From: Michael Feil <63565275+michaelfeil@users.noreply.github.com> Date: Mon, 23 Oct 2023 20:30:09 +0200 Subject: [PATCH 2/2] Update README.md --- libs/infinity_emb/README.md | 62 ++++++++++++++++++++++++++++++++----- 1 file changed, 54 insertions(+), 8 deletions(-) diff --git a/libs/infinity_emb/README.md b/libs/infinity_emb/README.md index 8bb36ef6..b40cf7db 100644 --- a/libs/infinity_emb/README.md +++ b/libs/infinity_emb/README.md @@ -19,10 +19,7 @@ ![codecov](https://codecov.io/gh/michaelfeil/infinity/branch/main/graph/badge.svg?token=NMVQY5QOFQ) ![CI](https://github.com/michaelfeil/infinity/actions/workflows/ci.yaml/badge.svg) -Embedding Inference Server - finding TGI for embeddings. Infinity is developed under MIT Licence - https://github.com/michaelfeil/infinity - - - +Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of sentence-transformer models and frameworks. Infinity is developed under MIT Licence: https://github.com/michaelfeil/infinity ## Why Infinity: Infinity provides the following features: @@ -30,13 +27,14 @@ Infinity provides the following features: - **Fast inference**: The inference server is built on top of [torch](https:) and [ctranslate2](https://github.com/OpenNMT/CTranslate2) under the hood, getting most out of your **CUDA** or **CPU** hardware. - **Dynamic batching**: New embedding requests are queued while GPU is busy with the previous ones. New requests are squeezed intro your GPU/CPU as soon as ready. - **Correct and tested implementation**: Unit and end-to-end tested. Embeddings via infinity are identical to [SentenceTransformers](https://github.com/UKPLab/sentence-transformers/) (up to numerical precision). Lets API users create embeddings till infinity and beyond. -- **Easy to use**: The API is built on top of [FastAPI](https://fastapi.tiangolo.com/), [Swagger](https://swagger.io/) makes it fully documented. API specs are aligned to OpenAI. See below on how to get started. +- **Easy to use**: The API is built on top of [FastAPI](https://fastapi.tiangolo.com/), [Swagger](https://swagger.io/) makes it fully documented. API are aligned to [OpenAI's Embedding specs](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings). See below on how to get started. # Infinity demo: In this gif below, we use [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2), deployed at batch-size=2. After initialization, from a second terminal 3 requests (payload 1,1,and 5 sentences) are sent via cURL. ![](docs/demo_v0_0_1.gif) # Getting started + Install via pip ```bash pip install infinity-emb[all] @@ -68,14 +66,62 @@ infinity_emb --help ``` ### or launch the CLI using a pre-built docker container -Get the Python + ```bash model=sentence-transformers/all-MiniLM-L6-v2 port=8080 -docker run -it --gpus all -p $port:$port michaelf34/infinity:latest --model-name-or-path $model --port $port --engine ctranslate2 +docker run -it --gpus all -p $port:$port michaelf34/infinity:latest --model-name-or-path $model --port $port ``` The download path at runtime, can be controlled via the environment variable `SENTENCE_TRANSFORMERS_HOME`. +### Launch FAQ: +
+ What are embedding models? + Embedding models can map any text to a low-dimensional dense vector which can be used for tasks like retrieval, classification, clustering, or semantic search. + And it also can be used in vector databases for LLMs. + + The most know architecture are encoder-only transformers such as BERT, and most popular implementation include [SentenceTransformers](https://github.com/UKPLab/sentence-transformers/). +
+ +
+ What models are supported? + + All models of the sentence transformers org are supported https://huggingface.co/sentence-transformers / sbert.net. + LLM's like LLAMA2-7B are not intended for deployment. + + With the command `--engine torch` the model must be compatible with https://github.com/UKPLab/sentence-transformers/. + - only models from Huggingface are supported. + + With the command `--engine ctranslate2` + - only `BERT` models are supported. + - only models from Huggingface are supported. + + For the latest trends, you might want to check out one of the folloing models. + https://huggingface.co/spaces/mteb/leaderboard + +
+ +
+ Launching multiple models in one dockerfile + + Multiple models on one GPU is in experimental mode. You can use the following temporary solution: + ```Dockerfile + # Dockerfile for multiple models via multiple ports + FROM michaelf34/infinity:latest + ENTRYPOINT ["/bin/sh", "-c", \ + "(/opt/poetry/bin/poetry run infinity_emb --port 8080 --model-name-or-path sentence-transformers/all-MiniLM-L6-v2 &);\ + (/opt/poetry/bin/poetry run infinity_emb --port 8081 --model-name-or-path intfloat/e5-large-v2 )"] + ``` + + You can build and run it via: + ```bash + docker build -t custominfinity . && docker run -it --gpus all -p 8080:8080 -p 8081:8081 custominfinity + ``` + + Both models now run on two instances in one dockerfile servers. + +
+ # Documentation After startup, the Swagger Ui will be available under `{url}:{port}/docs`, in this case `http://localhost:8080/docs`. @@ -110,4 +156,4 @@ poetry run pytest ./tests [license-shield]: https://img.shields.io/github/license/michaelfeil/infinity.svg?style=for-the-badge [license-url]: https://github.com/michaelfeil/infinity/blob/master/LICENSE.txt [linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555 -[linkedin-url]: https://linkedin.com/in/michael-feil \ No newline at end of file +[linkedin-url]: https://linkedin.com/in/michael-feil