Elasticsearch
Maven Dependency
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-elasticsearch</artifactId>
<version>1.13.0-beta23</version>
</dependency>
Overview
The langchain4j-elasticsearch module provides integration with Elasticsearch as an embedding store and content
retriever.
It comes with two main classes:
ElasticsearchEmbeddingStore: an implementation of theEmbeddingStoreinterface that uses Elasticsearch to store and retrieve embeddings.ElasticsearchContentRetriever: an implementation of theContentRetrieverinterface that uses Elasticsearch to retrieve relevant documents based on vector similarity search.
Both classes need an Elasticsearch Client to connect to the Elasticsearch server.
String apiKey = "VnVhQ2ZHY0JDZGJrU...";
ElasticsearchClient client = ElasticsearchClient.of(ec -> ec
.host("https://localhost:9200")
.apiKey(apiKey));
Note:
See the Elasticsearch documentation on how to create an ElasticsearchClient instance.
ElasticsearchEmbeddingStore
To create the ElasticsearchEmbeddingStore instance, you need to provide an ElasticsearchClient:
ElasticsearchEmbeddingStore store = ElasticsearchEmbeddingStore.builder()
.client(client)
.build();
It comes with the following options:
indexName: the name of the Elasticsearch index to use. Default isdefault.configuration: theElasticsearchConfigurationto use. Default isElasticsearchConfigurationKnn.
The previous code is equivalent to:
ElasticsearchEmbeddingStore store = ElasticsearchEmbeddingStore.builder()
.client(client)
.configuration(ElasticsearchConfigurationKnn.builder().build())
.indexName("default")
.build();
ElasticsearchContentRetriever
A ContentRetriever needs an embedding model:
EmbeddingModel embeddingModel = new AllMiniLmL6V2QuantizedEmbeddingModel();
To create an ElasticsearchContentRetriever instance, you need to provide the ElasticsearchClient and
the EmbeddingModel:
ElasticsearchContentRetriever contentRetriever = ElasticsearchContentRetriever.builder()
.client(client)
.embeddingModel(embeddingModel)
.build();
It comes with the following options:
configuration: theElasticsearchConfigurationto use (see below). Default isElasticsearchConfigurationKnn.indexName: the name of the Elasticsearch index to use. Default isdefault. Index will be created automatically if not exists.maxResults: the maximum number of results to retrieve. Default is3.minScore: the minimum score threshold for retrieved results. Default is0.0.filter: aFilterto apply during retrieval if any. Default isnull.
The previous code is equivalent to:
ElasticsearchContentRetriever contentRetriever = ElasticsearchContentRetriever.builder()
.client(client)
.embeddingModel(embeddingModel)
.configuration(ElasticsearchConfigurationKnn.builder().build())
.indexName("default")
.maxResults(3)
.minScore(0.0)
.filter(null)
.build();
ElasticsearchConfiguration
An ElasticsearchConfiguration defines how the embedding store or content retriever will interact with the
Elasticsearch server. You can create your own configuration by implementing the ElasticsearchConfiguration interface,
or use one of the provided implementations:
ElasticsearchConfigurationKnn: uses approximate kNN queries (default).ElasticsearchConfigurationScript: uses scriptScore queries. Note that this implementation is using cosine similarity.ElasticsearchConfigurationFullText: uses full text search (for content retriever only).ElasticsearchConfigurationHybrid: uses hybrid search (for content retriever only, requires paid license). It combines a kNN vector query with a full text query.
To create a configuration instance, you can use the builder provided by each implementation. For example:
ElasticsearchConfiguration configuration = ElasticsearchConfigurationKnn.builder().build();
ElasticsearchConfigurationKnn
The ElasticsearchConfigurationKnn uses approximate kNN queries to perform vector similarity search.
It is the default configuration used by both ElasticsearchEmbeddingStore
and ElasticsearchContentRetriever.
To create an instance, you can use the builder:
ElasticsearchConfiguration configuration = ElasticsearchConfigurationKnn.builder().build();
It comes with the following options:
numCandidates: the number of candidate neighbors to consider during the search. Default isnull, meaning using the default Elasticsearch value.includeVectorResponse: whether to include vector fields in the search response. Default isfalse.
Note: From version 9.2 of the elasticsearch server, vector fields are excluded from the response by default. To include vector fields in the responses (not recommended), set the
includeVectorResponsein the builder:ElasticsearchConfigurationKnn configuration = ElasticsearchConfigurationKnn.builder().includeVectorResponse(true).build();