Build a RAG based question answer solution using Amazon Bedrock Knowledge Base, vector engine for Amazon OpenSearch Service Serverless and LangChain
Amit Arora
One of the most common applications of generative AI and Foundation Models (FMs) in an enterprise environment is answering questions based on the enterprise’s knowledge corpus. Amazon Lex provides the framework for building AI based chatbots. Pre-trained foundation models (FMs) perform well at natural language understanding (NLU) tasks such summarization, text generation and question answering on a broad variety of topics but either struggle to provide accurate (without hallucinations) answers or completely fail at answering questions about content that they haven’t seen as part of their training data. Furthermore, FMs are trained with a point in time snapshot of data and have no inherent ability to access fresh data at inference time; without this ability they might provide responses that are potentially incorrect or inadequate.
A commonly used approach to address this problem is to use a technique called Retrieval Augmented Generation (RAG). In the RAG-based approach we convert the user question into vector embeddings using an FM and then do a similarity search for these embeddings in a pre-populated vector database holding the embeddings for the enterprise knowledge corpus. A small number of similar documents (typically three) is added as context along with the user question to the “prompt” provided to another FM and then that FM generates an answer to the user question using information provided as context in the prompt. RAG models were introduced by Lewis et al. in 2020 as a model where parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. To understand the overall structure of a RAG-based approach, refer to Build a powerful question answering bot with Amazon SageMaker, Amazon OpenSearch Service, Streamlit, and LangChain.
In this post we provide a step-by-step guide with all the building blocks for creating a Low Code No Code (LCNC) enterprise ready RAG application such as a question answering solution. We use FMs available through Amazon Bedrock for the embeddings model (Amazon Titan Text Embeddings v2), the text generation model (Anthropic Claude v2), the Amazon Bedrock Knowledge Base and Amazon Bedrock Agents for this solution. The text corpus representing an enterprise knowledge base is stored as HTML files in Amazon S3 and is ingested in the form of text embeddings into an index in a Amazon OpenSearch Service Serverless collection using Bedrock knowledge base agent in a fully-managed serverless fashion.
We provide an AWS Cloud Formation template to stand up all the resources required for building this solution. We then demonstrate how to use LangChain to interface with the Bedrock and opensearch-py to interface with OpenSearch Service Serverless and build a RAG based question answer workflow.
We use a subset of SageMaker docs as the knowledge corpus for this post. The data is available in the form of HTML files in an S3 bucket, a Bedrock Knowledge Base Agent then reads these files, converts them into smaller chunks, encodes these chunks into vectors (embeddings) and then ingests these embeddings into an OpenSearch Service Serverless collection index. We implement the RAG functionality in a notebook, a set of SageMaker related questions is asked of the Claude model without providing any additional context and then the same questions are asked again but this time with context based on similar documents retrieved from OpenSearch Service Serverless i.e. using the RAG approach. We demonstrate the responses generated without RAG could be factually inaccurate whereas the RAG based responses are accurate and more useful.
All the code for this post is available in the GitHub repo.
The following figure represents the high-level architecture of the proposed solution.
Figure 1: ArchitectureStep-by-step explanation:
- The user provides a question via the Jupyter notebook.
- The question is converted into embedding using Bedrock via the Titan embeddings v2 model.
- The embedding is used to find similar documents from an OpenSearch Service Serverless index.
- The similar documents long with the user question are used to create a “prompt”.
- The prompt is provided to Bedrock to generate a response using the Claude v2 model.
- The response along with the context is printed out in a notebook cell.
As illustrated in the architecture diagram, we use the following AWS services:
- Bedrock for access to the FMs for embedding and text generation as well as for the knowledge base agent.
- OpenSearch Service Serverless with vector search for storing the embeddings of the enterprise knowledge corpus and doing similarity search with user questions.
- S3 for storing the raw knowledge corpus data (HTML files).
- AWS Identity and Access Management roles and policies for access management.
- AWS CloudFormation for creating the entire solution stack through infrastructure as code.
In terms of open-source packages used in this solution, we use LangChain for interfacing with Bedrock and opensearch-py to interface with OpenSearch Service Serverless.
The workflow for instantiating the solution presented in this post in your own AWS account is as follows:
-
Run the CloudFormation template provided with this post in your account. This will create all the necessary infrastructure resources needed for this solution:
- OpenSearch Service Serverless collection
- SageMaker Notebook
- IAM roles
-
Create a vector index in the OpenSearch Service Serverless collection. This is done through the OpenSearch Service Serverless console.
-
Create a knowledge base in Bedrock and synch data from the S3 bucket to the OpenSearch Service Serverless collection index. This is done through the Bedrock console.
-
Create a Bedrock Agent and connect it to the knowledge base and use the Agent console for question answering without having to write any code.
-
Run the
rag_w_bedrock_and_aoss.ipynb
notebook in the SageMaker notebook to ask questions based on the data ingested in OpenSearch Service Serverless collection index.
These steps are discussed in detail in the following sections.
To implement the solution provided in this post, you should have an AWS account and awareness about FMs, OpenSearch Service and Bedrock.
Choose Launch Stack for the Region you want to deploy resources to. All parameters needed by the CloudFormation template have default values already filled in, except for ARN of the IAM role with which you are currently logged into your AWS account which you’d have to provide. Make a note of the OpenSearch Service collection ARN, we use this in subsequent steps. This template takes about 10 minutes to complete.
AWS Region | Link |
---|---|
us-east-1 (N. Virginia) | ![]() |
us-west-2 (Oregon) | ![]() |
After the stack is created successfully, navigate to the stack’s
Outputs
tab on the AWS CloudFormation console and note the values for
CollectionARN
and AOSSVectorIndexName
. We use those in the
subsequent steps.
The CloudFormation stack creates an OpenSearch Service Serverless collection, the next step is to create a vector index. This is done through the OpenSearch Service Serverless console as described below.
-
Navigate to OpenSearch Service console and click on
Collections
. Thesagemaker-kb
collection created by the CloudFormation stack will be listed there.Figure 3: SageMaker Knowledge Base Collection
-
Click on the
sagemaker-kb
link to create a vector index for storing the embeddings from the documents in S3.Figure 4: SageMaker Knowledge Base Vector Index
-
Set the vector index name as
sagemaker-readthedocs-io
, vector field name asvector
dimensions as1536
, choose engine types asFAISS
and distance metric asEuclidean
. It is required that you set these parameters exactly as mentioned here because the Bedrock Knowledge Base Agent is going to use these same values.Figure 5: SageMaker Knowledge Base Vector Index Parameters
-
Once created the vector index is listed as part of the collection.
Figure 6: SageMaker Knowledge Base Vector Index Created
Once the OpenSearch Service Serverless collection and vector index have been created, it is time to setup the Bedrock knowledge base.
-
Navigate to the Bedrock Console and click on
Knowledge Base
and click on theCreated Knowledge Base
button.Figure 7: Bedrock Knowledge Base
-
Fill out the details for creating the knowledge base as shown in the screenshots below.
Figure 8: Bedrock Knowledge Base
-
Select the S3 bucket.
Figure 9: Bedrock Knowledge Base S3 bucket
-
The Titan embeddings model is automatically selected.
Figure 10: Bedrock Knowledge Base embeddings model
-
Select Amazon OpenSearch Service Serverless from the vector database options available.
Figure 11: Bedrock Knowledge Base OpenSearch Service Serverless
-
Review and create the knowledge base by clicking the
Create knowledge base
button.Figure 12: Bedrock Knowledge Base Review & Create
-
The knowledge base should be created now.
Figure 13: Bedrock Knowledge Base create complete
Once the Bedrock knowledge base is created we are now ready to sync the data (raw documents) in S3 to embeddings in the OpenSearch Service Serverless collection vector index.
-
Start the
Sync
by pressing theSync
button, the button label changes toSyncing
.Figure 14: Bedrock Knowledge Base sync
-
Once the
Sync
completes the status changes toReady
.Figure 15: Bedrock Knowledge Base sync completed
Now we are all set to ask some questions of our newly created knowledge base. In this step we do this in a no code way by creating a Bedrock Agent.
-
Create a new Bedrock agent, call it
sagemaker-qa
and use theAmazonBedrockExecutionRoleForAgent_SageMakerQA
IAM role, this role is created automatically via CloudFormation.Figure 16: Provide agent details - agent name
Figure 17: Provide agent details - IAM role
-
Provide the following as the instructions for the agent:
You are a Q&A agent that politely answers questions from a knowledge base named sagemaker-docs.
. TheAnthropic Claude V2
model is selected as the model for the agent.Figure 18: Select model
-
Click
Next
on theAdd Action groups - optional
page, there are no action groups needed for this agent. -
Select the
sagemaker-docs
knowledge base, in the knowledge base instructions for agent field entryAnswer questions about Amazon SageMaker based only on the information contained in the knowledge base.
.Figure 19: Add knowledge base
-
Click the
Create Agent
button on theReview and create
screen.Figure 20: Review and create
-
Once the agent is ready, we can ask questions to our agent using the Agent console.
Figure 21: Agent console
-
We ask the agent some questions such as
What are the XGBoost versions supported in Amazon SageMaker
, notice that we not only get the correct answer but also a link to the source of the answer in terms of the original document stored in S3 that has been used as context to provide this answer!Figure 22: Q&A with Bedrock Agent
-
The agent also provides a trace feature which can show the steps the agent undertakes to come up with the final answer. The steps include the prompt used and the text from the retrieved documents from the knowledge base.
Figure 23: Bedrock Agent Trace Step 1
Figure 24: Bedrock Agent Trace Step 2
Now we will interact with our knowledge base through code. The CloudFormation template creates a SageMaker Notebook that contains the code to demonstrate this.
-
Navigate to SageMaker Notebooks and find the notebook named
bedrock-kb-rag-workshop
and click onOpen Jupyter Lab
.Figure 25: RAG with Bedrock KB notebook
-
Open a new
Terminal
fromFile -> New -> Terminal
and run the following commands to install the Bedrock SDK in a new conda kernel calledbedrock_py39
.chmod +x /home/ec2-user/SageMaker/bedrock-kb-rag-workshop/setup_bedrock_conda.sh /home/ec2-user/SageMaker/bedrock-kb-rag-workshop/setup_bedrock_conda.sh
-
Wait for one minute after completing the previous step and now click on the
rag_w_bedrock_and_aoss.ipynb
to open the notebook. Confirm that the notebook is using the newly createdbedrock_py39
kernel, otherwise the code will not work. In case the kernel is not set tobedrock_py39
then refresh the page and this time thebedrock_py39
kernel would be selected. -
The notebook code demonstrates use of Bedrock, LangChain and opensearch-py packages for implementing the RAG technique for question answering.
-
We access the models available via Bedrock using the
Bedrock
andBedrockEmbeddings
classes from the LangChain package.# we will use Anthropic Claude for text generation claude_llm = Bedrock(model_id= "anthropic.claude-v2") claude_llm.model_kwargs = dict(temperature=0.5, max_tokens_to_sample=300, top_k=250, top_p=1, stop_sequences=[]) # we will be using the Titan Embeddings Model to generate our Embeddings. embeddings = BedrockEmbeddings(model_id = "amazon.titan-embed-g1-text-02")
-
Interface to OpenSearch Service Serverless is through the opensearch-py package.
# Functions to talk to OpenSearch # Define queries for OpenSearch def query_docs(query: str, embeddings: BedrockEmbeddings, aoss_client: OpenSearch, index: str, k: int = 3) -> Dict: """ Convert the query into embedding and then find similar documents from OpenSearch Service Serverless """ # embedding query_embedding = embeddings.embed_query(query) # query to lookup OpenSearch kNN vector. Can add any metadata fields based filtering # here as part of this query. query_qna = { "size": k, "query": { "knn": { "vector": { "vector": query_embedding, "k": k } } } } # OpenSearch API call relevant_documents = aoss_client.search( body = query_qna, index = index ) return relevant_documents
-
We combine the prompt and the documents retrieved from OpenSearch Service Serverless as follows.
def create_context_for_query(q: str, embeddings: BedrockEmbeddings, aoss_client: OpenSearch, vector_index: str) -> str: """ Create a context out of the similar docs retrieved from the vector database by concatenating the text from the similar documents. """ print(f"query -> {q}") aoss_response = query_docs(q, embeddings, aoss_client, vector_index) context = "" for r in aoss_response['hits']['hits']: s = r['_source'] print(f"{s['metadata']}\n{s['text']}") context += f"{s['text']}\n" print("----------------") return context
-
Combining everything, the RAG workflow works as shown below.
# 1. Start with the query q = "What versions of XGBoost are supported by Amazon SageMaker?" # 2. Create the context by finding similar documents from the knowledge base context = create_context_for_query(q, embeddings, client, aoss_vector_index) # 3. Now create a prompt by combining the query and the context prompt = PROMPT_TEMPLATE.format(context, q) # 4. Provide the prompt to the FM to generate an answer to the query based on context provided response = claude_llm(prompt)
-
Here is an example of a sample question answered first with just the question in the prompt i.e. without providing any additional context. The answer without context is inaccurate.
Figure 26: Answer with prompt alone
-
We then ask the same question but this time with the additional context retrieved from the knowledge base included in the prompt. Now the inaccuracy in the earlier response is addressed and we also have attribution as to the source of this answer (notice the underlined text for the filename and the actual answer)!
Figure 27: Answer with prompt and context
To avoid incurring future charges, delete the resources. You can do this by first deleting all the files from the S3 bucket created by the CloudFormation template and then deleting the CloudFormation stack.
In this post, we showed how to create an enterprise ready RAG solution using a combination of AWS services and open-source Python packages.
We encourage you to learn more by exploring Amazon Titan models, Amazon Bedrock, and OpenSearch Service and building a solution using the sample implementation provided in this post and a dataset relevant to your business. If you have questions or suggestions, leave a comment.
Amit
Arora is an AI and ML Specialist Architect at Amazon Web Services,
helping enterprise customers use cloud-based machine learning services
to rapidly scale their innovations. He is also an adjunct lecturer in
the MS data science and analytics program at Georgetown University in
Washington D.C.