Research Assistant Mini App using Streamlit user interface.
- Passage Search (Done)
- Long Form QA (Done)
- Document Network (Not yet implemented)
- Document Search (Not yet implemented)
- Passage Search
- Long Form QA
- Install the Nvidia GPU driver and CUDA package, then ensure the system can use GPU CUDA cores (via anaconda).
- Install Python 3.9.x, JDK 8, wkhtmltopdf, & Apache Tika (via chocolatey or apt).
- Run
git clone https://github.com/muazhari/research-assistant-mini.git
. - Go to
research-assistant-mini
directory. - Run
pip install -r requirements.txt && pip install farm-haystack[only-faiss,only-faiss-gpu,crawler,preprocessing,ocr] txtai[pipeline]
. - Get your Open AI API key.
- Run
python run.py --server.address 0.0.0.0 --server.port 8501
(Run as Administrator if in Windows). - Open URL
http://localhost:8501
in a browser. - Use the app.
- Get your Open AI API key.
- Get your Ngrok Authentication Token.
- Create cell based on below Jupyter Notebook script in Kaggle, or other alternatives.
#@title Research Assistant Mini App
NGROK_TOKEN = "" #@param {type:"string"}
# Python version upgrade script. Use this if the python version is not equal to 3.9.x.
!conda create -n newCondaEnvironment python=3.9 -c cctbx202208 -y
!source /opt/conda/bin/activate newCondaEnvironment && conda install -c cctbx202208 -y
!/opt/conda/envs/newCondaEnvironment/bin/python3 --version
!echo 'print("Hello, World!")' > test.py
!/opt/conda/envs/newCondaEnvironment/bin/python3 test.py
!sudo rm /opt/conda/bin/python3
!sudo ln -sf /opt/conda/envs/newCondaEnvironment/bin/python3 /opt/conda/bin/python3
!sudo rm /opt/conda/bin/python3.7
!sudo ln -sf /opt/conda/envs/newCondaEnvironment/bin/python3 /opt/conda/bin/python3.7
!sudo rm /opt/conda/bin/python
!sudo ln -sf /opt/conda/envs/newCondaEnvironment/bin/python3 /opt/conda/bin/python
!sudo rm /opt/conda/bin/ngrok
!sudo ln -sf /opt/conda/envs/newCondaEnvironment/bin/ngrok /opt/conda/bin/ngrok
!sudo rm /opt/conda/bin/streamlit
!sudo ln -sf /opt/conda/envs/newCondaEnvironment/bin/streamlit /opt/conda/bin/streamlit
!python --version
# Installation script.
%cd ~
!git clone https://github.com/muazhari/research-assistant-mini.git
%cd ~/research-assistant-mini/
!git fetch --all
!git reset --hard origin
!apt-get update -y
!yes | DEBIAN_FRONTEND=noninteractive apt-get install -yqq wkhtmltopdf xvfb libopenblas-dev libomp-dev poppler-utils openjdk-8-jdk jq
!pip install -r requirements.txt
!pip install pyngrok farm-haystack[only-faiss,only-faiss-gpu,crawler,preprocessing,ocr] txtai[pipeline] sqlalchemy==1.4.47
!nvidia-smi
get_ipython().system_raw(f'ngrok authtoken {NGROK_TOKEN}')
get_ipython().system_raw('ngrok http 8501 &')
print("Open public URL:")
!curl -s http://localhost:4040/api/tunnels | jq ".tunnels[0].public_url"
!streamlit run ~/research-assistant-mini/app.py
!sleep 10000000
- Submit your ngrok Authentication Token to
NGROK_TOKEN
column in the cell form. - Enable GPU in the Notebook.
- Run the cell.
- Wait until the setups are done.
- Open Ngrok public URL.
- Use the app.
- If you want to process text in another language, you can use the other language model or the multilingual model. For example, use the settings below if you want to process Indonesian text:
- Retriever DPR Query:
voidful/dpr-question_encoder-bert-base-multilingual
- Retriever DPR Passage:
voidful/dpr-ctx_encoder-bert-base-multilingual
- Retriever Embedding Dimension:
768
- Reranker:
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
- Prompt:
- Retriever DPR Query:
Sintesiskan jawaban komprehensif dari paragraf-paragraf berikut yang paling relevan dan pertanyaan yang diberikan. Berikan jawaban panjang yang diuraikan dari poin-poin utama dan informasi dalam paragraf-paragraf. Katakan tidak relevan jika paragraf-paragraf tidak relevan dengan pertanyaan, lalu jelaskan mengapa itu tidak relevan.
Paragraf-paragraf: {join(documents)}
Pertanyaan: {query}
Jawaban:
- Delete the entire contents of the
document_store
folder if there is an error. Just delete the contents, don't delete the folder.
- This repository not yet peer reviewed, so be careful when using it.