A proof-of-concept for helping developers use MSAL through a natural langauge interface.
The aim of this demo is to show how Retrieval Augmented Generation (RAG) can be used to enhance the onboarding experience by:
- Providing answers (including code samples) tailored to specific scenarios
- Linking back to relevant documentation / source code for better library discovery

This demo requires a GPU and enough RAM to load the Mixtral 8x7b Mixture-of-Experts Large Language Model (LLM). The original demo was built on a M1 Max Macbook Pro with 64GB of unified RAM, which is enough to load the 47B parameter model into memory. You may need to use a quantized version of the model or replace the local LLM with API calls to a cloud-based model if your system doesn't meet the minimum GPU / RAM requirements.
Assuming you've met the system requirements, first we'll need to create a virtual environment to isolate the Python dependencies for this project.
Install conda using this guide, then run the following command to create a new virtual enviornment:
conda create -n msal-chat python=3.10
Then activate the enviornment.
conda activate msal-chat
With our python environment setup complete, install the required dependencies:
pip install -r requirements.txt
We will also need Ollama in order to run language models locally. Download Ollama, move the mounted app to the Applicatons
folder, and start it.
Once Ollama is installed and running, run the following command in your terminal to download and start running the mixtral
model.
ollama run mixtral
This will take a while depending on your internet speed as the model is about 26GB in size. Note that while we use the mixtral
model in this demo, you can use any of the models that Ollama offers.
In order to reference code and samples from the MSAL.js library, the documents must be ingested as embeddings into a vector database. To ingest the MSAL Javascript repository, run the following command:
python ingest.py
Note that the duration of this ingestion process is dependent on your system's GPU configuration. On an M1 Max chip this takes several hours to complete.
With the MSAL.js data ingested, you can now run the chatbot with the following commmand:
streamlit run main.py