BUFFALO - An Enterprise LLM Gateway 🐃

Boundless User-Focused Framework for Advanced LLM Optimzation

tl;dr - Cisco Research proudly presents BUFFALO, a flexible, open-source enterprise gateway for LLMs.

Check out our TechBlog, Slide Deck, Demo Input, and Demo Output for more information and exciting applications!

Make sure to use the Table of Contents (the three horizontal lines next to README.md) for easier viewing!

Why BUFFALO?

The guiding question in our creation of BUFFALO was simple:

"Why can't an enterprise deploy GPT-4, LLAMA-2, or any other LLM for both internal and external usage?"

Our team identified several limitations of the LLM landscape which prevent Enterprises from readily deploying LLM-based solutions.


Data Privacy	Expenses	Data Security	Hallucinations	Explainability	Competence
Sharing PII, divulging customer/enterprise data	Cost/power, usage limitation, redundant queries	Leaking confidential data, unauthorized access	Misleading information, improper tool usage	Robustness, removal of bias, truthfulness	Lack of reasoning, attribution, false confidence

Installing Python Packages

To start, we can clone this repository using:

git clone https://github.com/cisco-open/BUFFALO.git

Next, we must install the necessary packages for BUFFALO. For this, we have two options:

Will have a conda env/requirements.txt coming soon!

Need to pip install the following:

streamlit
annotated_text
faiss
langchain
rannet
nltk
openie
elasticsearch
spacy

Installing Elasticsearch

Elasticsearch needs to be running in the background for the Output Verification component. Please download Elasticsearch 8.9.0 at the following link Elasticsearch Download

Next, ensure that security settings are disabled by going to Elasticsearch.yml in the Elasticsearch installation folder and setting the following:

xpack.security.enabled: false
xpack.security.enrollment.enabled: false

More Info - Disabling Elasticsearch Security

Installing Stanford CoreNLP

JAVA needs to be installed to run OpenIE!

Stanford CoreNLP also needs to be installed in order to run the Ouput Verification component. The zip folder can be downloaded from Stanford OpenIE Repo

Installing RANNET Model

Both RANNET base english model and model-store need to be downloaded. These can be found on the RANNET GitHub page.

Usage (Demos)

Having completed installation, launching BUFFALO is as easy as 1, 2, 3.

Step 1) Launching Elasticsearch

You can launch Elasticsearch by navigating to the downloaded, unzipped directory, and using the corresponding command.

More Info - Launching ElasticSearch

.\bin\elasticsearch.bat

Step 2) Starting up the REST API Server

Next, navigate to the gateway folder and start the backend

cd gateway
python app.py

Step 3) Starting the Streamlit Frontend

Wait for a few seconds to ensure the server is up and running.

Next, launch streamlit by going back to the top-level directory:

cd ..
streamlit run main.py

And voila! A link should pop up taking us to our dashboard gateway.

Architecture

Let's take a quick look at how our BUFFALO gateway works under the hood.

The backend (gateway) is a REST API server which we started by running python app.py

This server contains ALL the actual gateway components/computations, allowing the streamlit frontend to be as light as possible.

All streamlit does is call the REST API server w/ corresponding GET/POST request, and displays the output.

All REST API endpoints can be seen in app.py, and they are summarized here:


General	(GET) / = default, ensures server is up (GET) /reset = resets server state (GET) /docs = gets all documents and file sizes in Enterprise Knowledge Base (set to the .\gateway\docs folder) (GET) /model_list = gets list of all models as specified in admin.yml (POST) /model = receives query, returns answer given specific LLM
Input Layer	(POST) /prompt/post = recieves input prompt and selected models, returns cost options (POST) /cache/search/post = receives prompt, returns cache results (POST) /cache/model/post = receives query, returns document-specific LLM response (if applicable) (POST) /cache/add/post = receives prompt/query, adds to cache
Output Layer	(POST) /exfiltrator/post = receives output of llm, returns leaks, if any (POST) /exfiltrator/topic/post = receives output of llm, returns list of relevant topics/facts (POST) /verify/facts/post = receives LLM output, returns list of facts (POST) /verify/ie/post = receives list of facts, retruns relation triplets for each, if any (POST) /verify/triplet/post = receives list of triplets, returns matches/info if found

New endpoints can be added in a similar format to app.py

These endpoints simply call the corresponding functions from each of the classes. The four files house all of the classes/methods for each of the four components (prompt.py : QueryProcessor, cache.py : QACache, leaks.py : ExfiltrationModel, verify.py : VerificationModel).

And in Streamlit, our frontend is split into three parts: Col1, Col2A, Col2B

Col1 is on the left hand side, where the user enters prompt and can see what documents are in the Knowledge base
Col2A is on the right hand side, and displayed first which shows the Prompt and Cache
Col2B is shown after "Submit to LLM" is selected in Col2A, and it shows the Exfiltration and Verification components

Contributing & Future Directions

We have several TODO's spread out throughout our files! We will continue working in these directions to improve BUFFALO, adjusting as needed based on our user's feedback

Currently, we are working on implementing the following features, split by component-of-focus:


Overall	Some way to store session state so can have chat history Json dictionary from end-to-end (need to discuss some new ideas) Model Comparison (benchmark of prompts to multiple models, give truthfulness score) Can sample a few datapoints to run check against (show comparison on how they work) Running multiple models (can ask Vamsi) Using GPT4 to tag (connecting prompt and verification)
Model-Specific	Change model so we're loading from huggingface + have a dedicated class
Prompt Layer	Say cost options can be configured by admin (verbosity), add to admin.yml
LLM Cache	Someway to give cache more than one document If cache partial hit, need option to ONLY query LLM w/ non-cache hit (some way to specify for each) If cache hit, but still LLM, and LLM score better, replace cache entry w/ better result or append
Exfiltration	Remove squad documents (or be able to toggle off) Need to not only add file names, but also big ideas (ex. "Financial Information About Cisco") Solidify Idea Embeddings research ^^
Verification	Need to separate between retrieval augmented correction/generation Will NOT send docs to LLM until we explicitly ask it to
Miscellaneous	Create + Test Conda Env Merge Elasticsearch launch into backend start

If you wish to contribute or suggest any additional funtionalities, please check out Contributing Guidelines

Acknowledgements

BUFFALO would not have been possible without the contributions of the following individuals, to whom we express our gratitude:

Advit Deepak, Will Healy, Raunak Sinha, Tarun Raheja, Jayanth Srinivasa, Charles Fleming, Goli Vamsi Krishna Mohan, Ramana Kompella, Vijoy Pandey
The entirety of the Cisco Research team for their feedback, ideas, and support 🥳

Thank you so much for checking us out! 🐃

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
frontend		frontend
gateway		gateway
.dockerignore		.dockerignore
.gitignore		.gitignore
BUFFALO_2min_InputLayer.mp4		BUFFALO_2min_InputLayer.mp4
BUFFALO_2min_OutputLayer.mp4		BUFFALO_2min_OutputLayer.mp4
Buffalo_Slides.pdf		Buffalo_Slides.pdf
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.backend		Dockerfile.backend
Dockerfile.frontend		Dockerfile.frontend
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
const.py		const.py
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements-frontend.txt		requirements-frontend.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BUFFALO - An Enterprise LLM Gateway 🐃

Why BUFFALO?

Installing Python Packages

Installing Elasticsearch

Installing Stanford CoreNLP

Installing RANNET Model

Usage (Demos)

Step 1) Launching Elasticsearch

Step 2) Starting up the REST API Server

Step 3) Starting the Streamlit Frontend

Architecture

Contributing & Future Directions

Acknowledgements

About

Releases

Packages

Contributors 4

Languages

License

cisco-open/buffalo

Folders and files

Latest commit

History

Repository files navigation

BUFFALO - An Enterprise LLM Gateway 🐃

Why BUFFALO?

Installing Python Packages

Installing Elasticsearch

Installing Stanford CoreNLP

Installing RANNET Model

Usage (Demos)

Step 1) Launching Elasticsearch

Step 2) Starting up the REST API Server

Step 3) Starting the Streamlit Frontend

Architecture

Contributing & Future Directions

Acknowledgements

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages