This agent writes vector data to vector databases. LangStream currently supports AstraDB and Pinecone.
Astra DB and Pinecone are both of the type "vector-db-sink" in a LangStream pipeline, but the databases require different configuration values to map the vector data from the sink into the database.
The Astra DB vector database connection is defined in configuration.yaml:
configuration:
resources:
- type: "vector-database"
name: "AstraDatasource"
configuration:
service: "astra"
username: "${ secrets.astra.username }"
password: "${ secrets.astra.password }"
secureBundle: "${ secrets.astra.secureBundle }"
The "Write to Astra DB" pipeline step takes embeddings as input from "input-topic" and writes them to the configured datasource "AstraDatasource":
name: "Write to Astra DB"
topics:
- name: "input-topic"
creation-mode: create-if-not-exists
pipeline:
- name: "Write to Cassandra"
type: "vector-db-sink"
input: "input-topic"
configuration:
datasource: "AstraDatasource"
table: "vsearch.products"
mapping: "id=value.id,description=value.description,name=value.name"
Input
Output
- None, it’s a sink.
Label | Type | Description |
---|---|---|
datasource | String | The datasource is defined in the Resources section of configuration.yaml. |
table | String | The `keyspace.table-name` the vector data will be written to |
mapping | String | How the data from the input records will be mapped to the corresponding columns in the database table. "id=value.id" maps the "id" value in the input record to the "id" value of the database. |
The "Write to Pinecone" pipeline step takes embeddings as input from "vectors-topic" and writes them to a Pinecone datasource.
The Pinecone vector database connection is defined in configuration.yaml:
- type: "vector-database"
name: "PineconeDatasource"
configuration:
service: "pinecone"
api-key: "${secrets.pinecone.api-key}"
environment: "${secrets.pinecone.environment}"
index-name: "${secrets.pinecone.index-name}"
project-name: "${secrets.pinecone.project-name}"
server-side-timeout-sec: 10
The "Write to Pinecone" pipeline step takes embeddings as input from "input-topic" and writes them to the configured datasource "PineconeDatasource":
name: "Write to Pinecone DB"
topics:
- name: "vectors-topic"
creation-mode: create-if-not-exists
pipeline:
- name: "Write to Pinecone"
type: "vector-db-sink"
configuration:
datasource: "PineconeDatasource"
vector.id: "value.id"
vector.vector: "value.embeddings"
vector.namespace: "value.namespace"
vector.metadata.genre: "value.genre"
Input
Output
- None, it’s a sink.
Label | Type | Description |
---|---|---|
datasource | String | The datasource is defined in the Resources section of configuration.yaml. |
vector.id | String | Maps id to vector.id |
vector.vector | String | Maps the input value "vector" to "vector.vector" in the database. |
vector.namespace | String | Maps the input value "namespace" to "vector.namespace" in the database. |
vector.metadata.{metadataField} | String | Maps the input value "metadata.{metadataField}" to "vector.metadata.{metadataField}" |