Synchronize Pathway-labs/realtime-indexer-qa-chat (#5904)

Pathway-Dev · pathway-release-manul · berkecanrizai · Manul from Pathway · commit cc9cb357327e · 2024-03-12T10:07:27.000Z
* Initial commit ORIGINAL_AUTHOR=pathway-release-manul <157591932+pathway-release-manul@users.noreply.github.com> GitOrigin-RevId: 6eb879d * add: readme ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: b8f6565 * add: app code ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: d0e5f99 * add: requirements txt ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 4198d09 * add python-dotenv to requirements ORIGINAL_AUTHOR=pathway-release-manul <157591932+pathway-release-manul@users.noreply.github.com> GitOrigin-RevId: ec994e8 * correct rag backend host and port ORIGINAL_AUTHOR=pathway-release-manul <157591932+pathway-release-manul@users.noreply.github.com> GitOrigin-RevId: 6add2a0 * specify extra-index-url for pathway package ORIGINAL_AUTHOR=pathway-release-manul <157591932+pathway-release-manul@users.noreply.github.com> GitOrigin-RevId: 276f18d * rely on stable version of pathway ORIGINAL_AUTHOR=pathway-release-manul <157591932+pathway-release-manul@users.noreply.github.com> GitOrigin-RevId: 5babcd1 * fix: restrict responses to context ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 08e7f3f * fix: lint ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: f147cb0 * add: config ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 08296c6 * add: image files ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: f6a7444 * fix: ui, init message, fixes, refactor ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 48e056d * fix: rm comments ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 094dc41 * fix: engine ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: ee4acbf * fix: change folder of config ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 09f31b4 * fix: clear chat memory in state ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: b26abd3 * fix: init ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: c97d741 * fix: theme ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 4b80045 * minor visual improvements ORIGINAL_AUTHOR=Sergey <sergey@pathway.com> GitOrigin-RevId: 9039ecb * feat: ui revisions, metadata ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 316bfbd * fix: lint ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: dff6598 * provide more information in status string ORIGINAL_AUTHOR=Sergey <sergey@pathway.com> GitOrigin-RevId: 4ebca47 * correct link urls in the sidebar ORIGINAL_AUTHOR=Sergey <sergey@pathway.com> GitOrigin-RevId: c46adc3 * Added Dev Container Folder ORIGINAL_AUTHOR=Sergey Kulik <104143901+zxqfd555-pw@users.noreply.github.com> GitOrigin-RevId: b213d5c * pin pathway version until the api is updated ORIGINAL_AUTHOR=Sergey <sergey@pathway.com> GitOrigin-RevId: ee07ce6 * add public folders remark ORIGINAL_AUTHOR=Sergey <sergey@pathway.com> GitOrigin-RevId: 95d08b1 * unpin pathway version ORIGINAL_AUTHOR=Sergey <sergey@pathway.com> GitOrigin-RevId: e87e987 * add a section for last indexed file ORIGINAL_AUTHOR=Sergey <sergey@pathway.com> GitOrigin-RevId: 79804b2 * feat: refresh button ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 4fe4e45 * feat: gather endpoint calls in async ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: e29508a * fix: small clean ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: e60fb6e * fix: lint ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 5a0a4fa * feat: add sources ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 6797645 * fix: single call to endpoint instead of 2 ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 1a44303 * feat: get_inputs modified ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 4883491 * fix: err handle in case `path` is missing ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 4af1e0e * fix: issue with retrievals from gdrive ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 1ef1eea * feat: change default prompts ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 95d40b3 * fix: list unique files as sources ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 7c0c041 * fix: title font and texts ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 305febb * feat: change title and icon ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: a32548d * fix: update reqs ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: 09ace67 * fix: . ORIGINAL_AUTHOR=Berke <berkecanrizai1@gmail.com> GitOrigin-RevId: f9e7e11 * deployment: chat-realtime-sharepoint-gdrive (#5717) GitOrigin-RevId: 7747b8a ORIGINAL_AUTHOR=Pawel Podhajski <106311100+pw-ppodhajski@users.noreply.github.com> * fix: system prompt to be more helpful (#5756) * fix: system prompt to be more helpful * fix: pin the llm GitOrigin-RevId: 4df8496 ORIGINAL_AUTHOR=berkecanrizai <63911408+berkecanrizai@users.noreply.github.com> * add statuses to streamlit UI (#5757) GitOrigin-RevId: a4047b7 ORIGINAL_AUTHOR=Sergey Kulik <104143901+zxqfd555-pw@users.noreply.github.com> * add: internal versions of apps, app env vars Berke/hosted docindex run mode GitOrigin-RevId: 277d32a ORIGINAL_AUTHOR=berkecanrizai <63911408+berkecanrizai@users.noreply.github.com> * feat: add logging to streamlit apps feat: add logs for streamlit apps GitOrigin-RevId: c876465 ORIGINAL_AUTHOR=berkecanrizai <63911408+berkecanrizai@users.noreply.github.com> * Merge pull request pathway-labs#1 from pathway-labs/staging feat: add logging to grafana ORIGINAL_AUTHOR=Jan Chorowski <janchorowski@users.noreply.github.com> GitOrigin-RevId: 99d80f0 * Merge pull request pathway-labs#2 from pathway-labs/staging fix: log used files ORIGINAL_AUTHOR=Jan Chorowski <janchorowski@users.noreply.github.com> GitOrigin-RevId: 759107e * Merge pull request pathway-labs#3 from pathway-labs/staging Add Dockerfile ORIGINAL_AUTHOR=Jan Chorowski <janchorowski@users.noreply.github.com> GitOrigin-RevId: cafbf3e * Update Dockerfile ORIGINAL_AUTHOR=Jan Chorowski <janchorowski@users.noreply.github.com> GitOrigin-RevId: 986aa60 * feat: docker explaination in readme (pathway-labs#4) Clarify readme instructions --------- Co-authored-by: Jan Chorowski <janchorowski@users.noreply.github.com> ORIGINAL_AUTHOR=berkecanrizai <63911408+berkecanrizai@users.noreply.github.com> GitOrigin-RevId: 9cca4e8 * Update README.md ORIGINAL_AUTHOR=Adrian Kosowski <adrian@pathway.com> GitOrigin-RevId: da304ed --------- Co-authored-by: pathway-release-manul <157591932+pathway-release-manul@users.noreply.github.com> Co-authored-by: Berke <berkecanrizai1@gmail.com> Co-authored-by: Sergey <sergey@pathway.com> Co-authored-by: Sergey Kulik <104143901+zxqfd555-pw@users.noreply.github.com> Co-authored-by: Pawel Podhajski <106311100+pw-ppodhajski@users.noreply.github.com> Co-authored-by: berkecanrizai <63911408+berkecanrizai@users.noreply.github.com> Co-authored-by: Jan Chorowski <janchorowski@users.noreply.github.com> Co-authored-by: Adrian Kosowski <adrian@pathway.com> GitOrigin-RevId: aee4d1e58a33f88a1a3133892b40e66ac82e1b90
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,13 @@
+FROM python:3.11
+
+WORKDIR /app
+
+COPY demo/requirements.txt demo/
+
+RUN pip install --pre -U --no-cache-dir -r demo/requirements.txt
+
+COPY . .
+
+EXPOSE 8501
+
+CMD ["streamlit", "run", "demo/app.py", "--server.port", "8501", "--server.fileWatcherType", "none"]
diff --git a/README.md b/README.md
@@ -1,25 +1,21 @@
-
 # Build a chatbot with always updated data sources using Pathway + LlamaIndex + Streamlit
 
-## Subtitle: Create a RAG application without a Vector DB, ETL pipelines or separate backend!
+## Create a RAG App without a Vector DB or fragmented ETL pipelines!
 
+This repository will show you how to build a RAG App that always has up-to-date information from your documents and sources stored in Google Drive, Dropbox, Sharepoint and more. 
 
-In this post, we explore how to build a RAG application that always has up-to-date information from your documents and sources stored in Google Drive, Dropbox, Sharepoint and more. 
+The setup guide below describes how to build your **App**. You then connect your App to a public **Pathway Vector Store**  sandbox, which is in sync with some public Google Drive and Sharepoint folders. Here, you can upload your own non-confidential files, and try out the App with the sandbox. Finally, we will show you how to quickly spin up your very own Pathway Vector Store which is kept in sync with your own private folders. 
 
+> ℹ To run the full solution (your very own Pathway Vector Store + App) in a single go in production, with your own private folders, we recommend using this complete [🐋 Dockerized setup 🐋](https://github.com/pathwaycom/llm-app/blob/main/examples/pipelines/demo-document-indexing/README.md) directly.
 
 ## What is Pathway
 Pathway is an open data processing framework. It allows you to easily develop data transformation pipelines and Machine Learning applications that work with live data sources and changing data. Pathway listens to our documents for changes, additions or removals. It handles loading and indexing without the need for an ETL. Specifically, we will use Pathway hosted offering that makes it particularly easy to launch advanced RAG applications with very little overhead.
 
-(Meta note) select one:
-- In this demo, you will use Pathway with LlamaIndex with Pathway's LlamaIndex integration which makes it particularly easy to create chatbots that have memory and can access our documents.
-
-- In this demo, you will use LlamaIndex with the Pathway's LlamaIndex integration, and Pathway hosted index solution. Using Pathway and LlamaIndex is a quick way to create powerful chatbots that have memory and can access our documents.
-
-- In this blog, we showcase the integration of LlamaIndex with Pathway's hosted index solution. You can effortlessly develop advanced chatbots with memory capabilities, providing easy real-time access to your documents.
+In this repository, we showcase the integration of LlamaIndex with Pathway's Vector Store solution. You can effortlessly develop advanced chatbots with memory capabilities, providing easy real-time access to your documents. The instructions below are intended as a step-by-step tutorial for learning. 
 
 ## Why Pathway?
 
-Pathway offers an indexing solution that is always up to date without the need for traditional ETL pipelines, which are needed in regular VectorDBs. It can monitor several data sources (files, S3 folders, cloud storage) and provide the latest information to your LLM application. 
+Pathway is a data processing framework allowing easy building of advanced data processing pipelines. Among others, it offers [Pathway Vector Store](https://pathway.com/developers/user-guide/llm-xpack/vectorstore_pipeline/), a document indexing solution that is always up to date without the need for traditional ETL pipelines, which are needed in regular VectorDBs. It can monitor several data sources (files, S3 folders, cloud storage) and provide the latest information to your LLM application. 
 
 This means you do not need to worry about:
 - Checking files to see if there are any changes
@@ -30,22 +26,19 @@ These are all handled by Pathway.
 
 ## App Overview
 
-This demo consists of three parts. For always up-to-date knowledge and information retrieval from the documents in our folders, Pathway vector store is used.
-LlamaIndex provides search capability to OpenAI LLM and combines functionalities such as chat memory, and OpenAI API calls for the app. Finally, Streamlit powers the easy-to-navigate user interface for easy access to the app.
+This demo combines three technologies.
+* For always up-to-date knowledge and information retrieval from the documents in our folders, **Pathway Vector Store** is used.
+* **LlamaIndex** provides search capability to OpenAI LLM and combines functionalities such as chat memory, and OpenAI API calls for the app.
+* Finally, **Streamlit** powers the easy-to-navigate user interface for easy access to the app.
 
-
-## Tutorial: Creating always up-to-date RAG app with Pathway + LlamaIndex
-
-```
-Want to jump right in? Check out the app and the [code](https://github.com/pathway-labs/realtime-indexer-qa-chat).
-```
+## Tutorial: Creating always up-to-date RAG App with Pathway Vector Store + LlamaIndex
 
 ## Prerequisites
 - An OpenAI API Key (Only needed for OpenAI models)
-- Pathway instance (Hosted version is provided free for the demo)
+- Running Pathway Vector Store process (a hosted version is provided for the demo, instructoins to self-host one are provided below)
 
-## Adding data to source
-First, add example documents to your pipeline by uploading files to Google Drive that is registered to Pathway as a source. Pathway can listen to many sources simultaneously, such as local files, S3 folders, cloud storage and any data stream for data changes. For this demo, a Google Drive folder is provided for you to upload files. There is Pathway Github repository's readme that is provided in the folder. In this demo, we will ask our questions about Pathway our assistant and it will respond based on the available files in the Drive folder.
+## Adding new documents
+First, add example documents to the vector store by uploading files to Google Drive that is registered to Pathway Vector Store as a source. Pathway can listen to many sources simultaneously, such as local files, S3 folders, cloud storage and any data stream for data changes. For this demo, a public Google Drive folder is provided for you to upload file. It is pre-populated with Pathway Github repository's readme. In this demo, we will ask questions about Pathway to our assistant and it will respond based on the available files in the Drive folder.
 
 See [pathway-io](https://pathway.com/developers/api-docs/pathway-io) for more information on available connectors and how to implement custom connectors.
 
@@ -60,7 +53,7 @@ from llama_index.query_engine import RetrieverQueryEngine
 from llama_index.chat_engine.condense_question import CondenseQuestionChatEngine
 ```
 
-Then, initialize the retriever with the hosted Pathway instance and create query engine:
+Then, initialize the retriever with the chosen Pathway Vector Store instance (for an easy start we point to the managed instance) and create the query engine:
 
 ```python
 PATHWAY_HOST = "https://api-pathway-indexer.staging.deploys.pathway.com"
@@ -107,7 +100,7 @@ if "messages" not in st.session_state.keys():
 
 When the app is first run, `messages` will not be in the `st.session_state` and it will be initialized.
 
-Then,  print the messages both from the user and the assistant. Streamlit works in a way that resembles running a script, the whole file will be running each time there is a change in components, and the session state is the only component that has states. Making it powerful for saving and keeping elements that do not need to be re-initialized. That is why, all messages are printed iteratively.
+Then, print messages both from the user and the assistant. Streamlit works in a way that resembles running a script, the whole file will be running each time there is a change in components, and the session state is the only component that has states. Making it powerful for saving and keeping elements that do not need to be re-initialized. That is why, all messages are printed iteratively.
 
 ```python
 if prompt := st.chat_input("Your question"):
@@ -132,22 +125,43 @@ if st.session_state.messages[-1]["role"] != "assistant":
 ```
 
 
-## Running the App
+## 1️⃣ Running the App
 
 ### On Streamlit Community Cloud
 
+The demo is hosted on Streamlit Community Cloud [here](https://chat-realtime-sharepoint-gdrive.streamlit.app/). This version of the app uses Pathway's [hosted document pipelines](https://cloud.pathway.com/docindex).
 
 ### On your local machine
 
-Clone [this repository](change this to tutorial repo or folder) to your machine.
+Clone this repository to your machine.
 Create a `.env` file under the root folder, this will store your OpenAI API key, demo uses the OpenAI GPT model to answer questions.
 
-You need a Pathway instance for vector search, for local deployment see the [vector store guide](https://pathway.com/developers/showcases/vectorstore_pipeline) and also [Pathway Deployment](https://pathway.com/developers/user-guide/deployment/docker-deployment). For this demo, a free instance is provided that reads documents in [Google Drive](https://drive.google.com/drive/u/2/folders/1cULDv2OaViJBmOfG5WB0oWcgayNrGtVs) and [Sharepoint](https://navalgo.sharepoint.com/:f:/s/ConnectorSandbox/EgBe-VQr9h1IuR7VBeXsRfIBuOYhv-8z02_6zf4uTH8WbQ?e=YmlA05).
+You need access to a running Pathway Vector Store pipeline. For this demo, a public instance is provided that reads documents in [Google Drive](https://drive.google.com/drive/u/2/folders/1cULDv2OaViJBmOfG5WB0oWcgayNrGtVs) and [Sharepoint](https://navalgo.sharepoint.com/:f:/s/ConnectorSandbox/EgBe-VQr9h1IuR7VBeXsRfIBuOYhv-8z02_6zf4uTH8WbQ?e=YmlA05). However, it is easy to run our own locally. Please see the [vector store guide](https://pathway.com/developers/showcases/vectorstore_pipeline) and also [Pathway Deployment](https://pathway.com/developers/user-guide/deployment/docker-deployment). 
 
 Open a terminal and run `streamlit run ui.py`. This will prompt you a URL, simply click and open the demo.
 
 Congrats! Now you are ready to chat with your documents with updated knowledge provided by Pathway.
 
+### Running with Docker
+
+We provide a Dockerfile to run the application. From the root folder of the repository run 
+
+```
+docker build -t realtime_chat .
+docker run -p 8501:8501 realtime_chat
+```
+
+We recommend running in docker when working on a Windows machine.
+
+## 2️⃣ Running a local Pathway Vector Store
+
+OK, so far you have managed to get the RAG App and running and it's working - but it still connects to the public demo folders! Let's fix that - we will now show you how to connect your very own folders, in a private deployment. This means you will need to spin up a light web server which provides the "Pathway Vector Store" service, responsible for the whole document ingestion and indexing pipeline.
+
+The code for the Pathway Vector Store pipeline, along with a Dockerfile is provided in the [Pathway LLM examples repository](https://github.com/pathwaycom/llm-app/tree/main/examples/pipelines/demo-document-indexing). Please follow instructions to run only the vector store pipeline, or to run the pipeline and the Streamlit UI as a joint deployment using `docker compose`.
+
+Note that if you want to create a RAG application connected to your Google Drive, you need to set up a Google Service account, [refer to the instructions here](https://github.com/pathwaycom/llm-app/blob/main/examples/pipelines/demo-question-answering/README.md#create-a-new-project-in-the-google-api-console).
+Also, if you are not planning to use local files in your app, you can skip the `binding local volume` part explained in the llm-app instructions linked above. 
+
 ## Summing Up
 
-In this tutorial, you learned how to create and deploy a simple yet powerful RAG application with always up-to-date knowledge of your documents, without ETL jobs and buffers to check and read documents for any changes. You also learned how to get started with LlamaIndex using Pathway vector store, and how easy it is to get going with hosted Pathway that handles the majority of hurdles for you.
+In this tutorial, you learned how to create and deploy a simple yet powerful RAG application with always up-to-date knowledge of your documents, without ETL jobs and buffers to check and read documents for any changes. You also learned how to get started with LlamaIndex using Pathway vector store, and how easy it is to get going with hosted Pathway that handles the majority of hurdles for you.
diff --git a/demo/app.py b/demo/app.py
@@ -1,13 +1,17 @@
 import json
 import logging
 import os
+import uuid
 
 import pandas as pd
 import streamlit as st
 from dotenv import load_dotenv
 from endpoint_utils import get_inputs
+from llama_index.llms.types import ChatMessage, MessageRole
 from log_utils import init_pw_log_config
+from rag import DEFAULT_PATHWAY_HOST, PATHWAY_HOST, chat_engine, vector_client
 from streamlit.web.server.websocket_headers import _get_websocket_headers
+from traceloop.sdk import Traceloop
 
 logging.basicConfig(
     level=logging.INFO,
@@ -50,18 +54,24 @@
 )
 
 with st.sidebar:
-    st.markdown("**Add Your Files**")
-    st.markdown(htm, unsafe_allow_html=True)
+    if PATHWAY_HOST == DEFAULT_PATHWAY_HOST:
+        st.markdown("**Add Your Files**")
 
-    st.markdown("\n\n\n\n\n\n\n")
-    st.markdown("\n\n\n\n\n\n\n")
-    st.markdown(
-        "[View code on GitHub.](https://github.com/pathway-labs/chat-realtime-sharepoint-gdrive)"
-    )
+        st.markdown(htm, unsafe_allow_html=True)
 
-    st.markdown(
-        """Pathway pipelines ingest documents from [Google Drive](https://drive.google.com/drive/u/0/folders/1cULDv2OaViJBmOfG5WB0oWcgayNrGtVs) and [Sharepoint](https://navalgo.sharepoint.com/:f:/s/ConnectorSandbox/EgBe-VQr9h1IuR7VBeXsRfIBuOYhv-8z02_6zf4uTH8WbQ?e=YmlA05) simultaneously. It automatically manages and syncs indexes enabling RAG applications."""
-    )
+        st.markdown("\n\n\n\n\n\n\n")
+        st.markdown("\n\n\n\n\n\n\n")
+        st.markdown(
+            "[View code on GitHub.](https://github.com/pathway-labs/chat-realtime-sharepoint-gdrive)"
+        )
+        st.markdown(
+            """Pathway pipelines ingest documents from [Google Drive](https://drive.google.com/drive/u/0/folders/1cULDv2OaViJBmOfG5WB0oWcgayNrGtVs) and [Sharepoint](https://navalgo.sharepoint.com/:f:/s/ConnectorSandbox/EgBe-VQr9h1IuR7VBeXsRfIBuOYhv-8z02_6zf4uTH8WbQ?e=YmlA05) simultaneously. It automatically manages and syncs indexes enabling RAG applications."""
+        )
+    else:
+        st.markdown(f"**Connected to:** {PATHWAY_HOST}")
+        st.markdown(
+            "[View code on GitHub.](https://github.com/pathway-labs/chat-realtime-sharepoint-gdrive)"
+        )
 
     st.markdown(
         """**Ready to build your own?**
@@ -98,12 +108,6 @@
 
 
 if "messages" not in st.session_state.keys():
-    import uuid
-
-    from llama_index.llms.types import ChatMessage, MessageRole
-    from rag import chat_engine, vector_client
-    from traceloop.sdk import Traceloop
-
     if "session_id" not in st.session_state.keys():
         session_id = "uuid-" + str(uuid.uuid4())
 
@@ -145,6 +149,8 @@
 
 
 df = pd.DataFrame(last_indexed_files, columns=[last_modified_time, "status"])
+if df.status.isna().any():
+    del df["status"]
 
 df.set_index(df.columns[0])
 st.dataframe(df, hide_index=True, height=150, use_container_width=True)
@@ -176,15 +182,7 @@
     with st.chat_message("assistant"):
         with st.spinner("Thinking..."):
             response = st.session_state.chat_engine.chat(prompt)
-            logging.info(
-                json.dumps(
-                    {
-                        "_type": "llm_response",
-                        "response": str(response),
-                        "session_id": st.session_state.get("session_id", "NULL_SESS"),
-                    }
-                )
-            )
+
             sources = []
 
             try:
@@ -213,6 +211,17 @@
 
             sources_text = ", ".join(sources)
 
+            logging.info(
+                json.dumps(
+                    {
+                        "_type": "llm_response",
+                        "response": str(response),
+                        "session_id": st.session_state.get("session_id", "NULL_SESS"),
+                        "sources": sources,
+                    }
+                )
+            )
+
             response_text = (
                 response.response
                 + f"\n\nDocuments looked up to obtain this answer: {sources_text}"
diff --git a/demo/rag.py b/demo/rag.py
@@ -15,9 +15,11 @@
 
 Traceloop.init(app_name=os.environ.get("APP_NAME", "PW - LlamaIndex (Streamlit)"))
 
-PATHWAY_HOST = os.environ.get("PATHWAY_HOST", "demo-document-indexing.pathway.stream")
+DEFAULT_PATHWAY_HOST = "demo-document-indexing.pathway.stream"
 
-PATHWAY_PORT = 80
+PATHWAY_HOST = os.environ.get("PATHWAY_HOST", DEFAULT_PATHWAY_HOST)
+
+PATHWAY_PORT = int(os.environ.get("PATHWAY_PORT", "80"))
 
 vector_client = VectorStoreClient(PATHWAY_HOST, PATHWAY_PORT)