Skip to main content

PGVector

LangChain4j integrates seamlessly with PGVector, allowing developers to store and query vector embeddings directly in PostgreSQL. This integration is ideal for applications like semantic search, RAG, and more.

Maven Dependency


<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-pgvector</artifactId>
<version>1.0.0-alpha1</version>
</dependency>

Gradle Dependency

implementation 'dev.langchain4j:langchain4j-pgvector:1.0.0-alpha1'

APIs

  • PgVectorEmbeddingStore

Parameter Summary

Plain Java PropertyDescriptionDefault ValueRequired/Optional
datasourceThe DataSource object used for database connections. If not provided, host, port, user, password, and database must be provided individually.NoneRequired if host, port, user, password, and database are not provided individually.
hostHostname of the PostgreSQL server. Required if DataSource is not provided.NoneRequired if DataSource is not provided
portPort number of the PostgreSQL server. Required if DataSource is not provided.NoneRequired if DataSource is not provided
userUsername for database authentication. Required if DataSource is not provided.NoneRequired if DataSource is not provided
passwordPassword for database authentication. Required if DataSource is not provided.NoneRequired if DataSource is not provided
databaseName of the database to connect to. Required if DataSource is not provided.NoneRequired if DataSource is not provided
tableThe name of the database table used for storing embeddings.NoneRequired
dimensionThe dimensionality of the embedding vectors. This should match the embedding model being used. Use embeddingModel.dimension() to dynamically set it.NoneRequired
useIndexAn IVFFlat index divides vectors into lists, and then searches a subset of those lists closest to the query vector. It has faster build times and uses less memory than HNSW but has lower query performance (in terms of speed-recall tradeoff). Should use IVFFlat index.falseOptional
indexListSizeThe number of lists for the IVFFlat index.NoneWhen Required: If useIndex is true, indexListSize must be provided and must be greater than zero. Otherwise, the program will throw an exception during table initialization. When Optional: If useIndex is false, this property is ignored and doesn’t need to be set.
createTableSpecifies whether to automatically create the embeddings table.trueOptional
dropTableFirstSpecifies whether to drop the table before recreating it (useful for tests).falseOptional
metadataStorageConfigConfiguration object for handling metadata associated with embeddings. Supports three storage modes:
  • COLUMN_PER_KEY: For static metadata when you know the metadata keys in advance.
  • COMBINED_JSON: For dynamic metadata when you don’t know the metadata keys in advance. Stores data as JSON. (Default)
  • COMBINED_JSONB: Similar to JSON, but stored in binary format for optimized querying on large datasets.
COMBINED_JSONOptional. If not set, a default configuration is used with COMBINED_JSON.

Examples

To demonstrate the capabilities of PGVector, you can use a Dockerized PostgreSQL setup. It leverages Testcontainers to run PostgreSQL with PGVector.

Quick Start with Docker

To quickly set up a PostgreSQL instance with the PGVector extension, you can use the following Docker command:

docker run --rm --name langchain4j-postgres-test-container -p 5432:5432 -e POSTGRES_USER=my_user -e POSTGRES_PASSWORD=my_password pgvector/pgvector

Explanation of the Command:

  • docker run: Runs a new container.
  • --rm: Automatically removes the container after it stops, ensuring no residual data.
  • --name langchain4j-postgres-test-container: Names the container langchain4j-postgres-test-container for easy identification.
  • -p 5432:5432: Maps port 5432 on your local machine to port 5432 in the container.
  • -e POSTGRES_USER=my_user: Sets the PostgreSQL username to my_user.
  • -e POSTGRES_PASSWORD=my_password: Sets the PostgreSQL password to my_password.
  • pgvector/pgvector: Specifies the Docker image to use, pre-configured with the PGVector extension.

Here are two code examples showing how to create a PgVectorEmbeddingStore. The first uses only the required parameters, while the second configures all available parameters.

  1. Only Required Parameters
EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
.host("localhost") // Required: Host of the PostgreSQL instance
.port(5432) // Required: Port of the PostgreSQL instance
.database("postgres") // Required: Database name
.user("my_user") // Required: Database user
.password("my_password") // Required: Database password
.table("my_embeddings") // Required: Table name to store embeddings
.dimension(embeddingModel.dimension()) // Required: Dimension of embeddings
.build();
  1. All Parameters Set

In this variant, we include all the commonly used optional parameters like DataSource, useIndex, indexListSize, createTable, dropTableFirst, and metadataStorageConfig. Adjust these values as needed:

DataSource dataSource = ...;                 // Pre-configured DataSource, if available

EmbeddingStore<TextSegment> embeddingStore = PgVectorEmbeddingStore.builder()
// Connection and table parameters
.datasource(dataSource) // Optional: If using a DataSource instead of host/port credentials
.host("localhost")
.port(5432)
.database("postgres")
.user("my_user")
.password("my_password")
.table("my_embeddings")

// Embedding dimension
.dimension(embeddingModel.dimension()) // Required: Must match the embedding model’s output dimension

// Indexing and performance options
.useIndex(true) // Enable IVFFlat index
.indexListSize(100) // Number of lists for IVFFlat index

// Table creation options
.createTable(true) // Automatically create the table if it doesn’t exist
.dropTableFirst(false) // Don’t drop the table first (set to true if you want a fresh start)

// Metadata storage format
.metadataStorageConfig(MetadataStorageConfig.combinedJsonb()) // Store metadata as a combined JSONB column

.build();

Use the first example if you just want the minimal configuration to get started quickly. The second example shows how you can leverage all available builder parameters for more control and customization.