Boundless User-Focused Framework for Advanced LLM Optimzation
tl;dr - Cisco Research proudly presents BUFFALO, a flexible, open-source enterprise gateway for LLMs.
Check out our TechBlog, Slide Deck, Demo Input, and Demo Output for more information and exciting applications!
Make sure to use the Table of Contents (the three horizontal lines next to
README.md
) for easier viewing!
The guiding question in our creation of BUFFALO was simple:
"Why can't an enterprise deploy GPT-4, LLAMA-2, or any other LLM for both internal and external usage?"
Our team identified several limitations of the LLM landscape which prevent Enterprises from readily deploying LLM-based solutions.
Data Privacy | Expenses | Data Security | Hallucinations | Explainability | Competence |
Sharing PII, divulging customer/enterprise data | Cost/power, usage limitation, redundant queries | Leaking confidential data, unauthorized access | Misleading information, improper tool usage | Robustness, removal of bias, truthfulness | Lack of reasoning, attribution, false confidence |
To start, we can clone this repository using:
git clone https://github.com/cisco-open/BUFFALO.git
Next, we must install the necessary packages for BUFFALO. For this, we have two options:
Will have a conda env/requirements.txt coming soon!
Need to pip install the following:
- streamlit
- annotated_text
- faiss
- langchain
- rannet
- nltk
- openie
- elasticsearch
- spacy
Elasticsearch needs to be running in the background for the Output Verification component. Please download Elasticsearch 8.9.0 at the following link Elasticsearch Download
Next, ensure that security settings are disabled by going to Elasticsearch.yml
in the Elasticsearch installation folder and setting the following:
xpack.security.enabled: false
xpack.security.enrollment.enabled: false
More Info - Disabling Elasticsearch Security
JAVA needs to be installed to run OpenIE!
Stanford CoreNLP also needs to be installed in order to run the Ouput Verification component. The zip folder can be downloaded from Stanford OpenIE Repo
Both RANNET base english model and model-store need to be downloaded. These can be found on the RANNET GitHub page.
Having completed installation, launching BUFFALO is as easy as 1, 2, 3.
You can launch Elasticsearch by navigating to the downloaded, unzipped directory, and using the corresponding command.
More Info - Launching ElasticSearch
.\bin\elasticsearch.bat
Next, navigate to the gateway folder and start the backend
cd gateway
python app.py
Wait for a few seconds to ensure the server is up and running.
Next, launch streamlit by going back to the top-level directory:
cd ..
streamlit run main.py
And voila! A link should pop up taking us to our dashboard gateway.
Let's take a quick look at how our BUFFALO gateway works under the hood.
The backend (gateway) is a REST API server which we started by running python app.py
This server contains ALL the actual gateway components/computations, allowing the streamlit frontend to be as light as possible.
All streamlit does is call the REST API server w/ corresponding GET/POST request, and displays the output.
All REST API endpoints can be seen in app.py
, and they are summarized here:
General |
|
Input Layer |
|
Output Layer |
|
New endpoints can be added in a similar format to app.py
These endpoints simply call the corresponding functions from each of the classes. The four files house all of the classes/methods for each of the four components (prompt.py
: QueryProcessor, cache.py
: QACache, leaks.py
: ExfiltrationModel, verify.py
: VerificationModel).
And in Streamlit, our frontend is split into three parts: Col1, Col2A, Col2B
- Col1 is on the left hand side, where the user enters prompt and can see what documents are in the Knowledge base
- Col2A is on the right hand side, and displayed first which shows the Prompt and Cache
- Col2B is shown after "Submit to LLM" is selected in Col2A, and it shows the Exfiltration and Verification components
We have several TODO's spread out throughout our files! We will continue working in these directions to improve BUFFALO, adjusting as needed based on our user's feedback
Currently, we are working on implementing the following features, split by component-of-focus:
Overall |
|
Model-Specific |
|
Prompt Layer |
|
LLM Cache |
|
Exfiltration |
|
Verification |
|
Miscellaneous |
|
If you wish to contribute or suggest any additional funtionalities, please check out Contributing Guidelines
BUFFALO would not have been possible without the contributions of the following individuals, to whom we express our gratitude:
- Advit Deepak, Will Healy, Raunak Sinha, Tarun Raheja, Jayanth Srinivasa, Charles Fleming, Goli Vamsi Krishna Mohan, Ramana Kompella, Vijoy Pandey
- The entirety of the Cisco Research team for their feedback, ideas, and support 🥳
Thank you so much for checking us out! 🐃