Elasticsearch
Maven Dependency
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-elasticsearch</artifactId>
<version>1.10.0-beta18</version>
</dependency>
Overview
The langchain4j-elasticsearch module provides integration with Elasticsearch as an embedding store and content
retriever.
It comes with two main classes:
ElasticsearchEmbeddingStore: an implementation of theEmbeddingStoreinterface that uses Elasticsearch to store and retrieve embeddings.ElasticsearchContentRetriever: an implementation of theContentRetrieverinterface that uses Elasticsearch to retrieve relevant documents based on vector similarity search.
Both classes need an Elasticsearch RestClient to connect to the Elasticsearch server.
String apiKey = "VnVhQ2ZHY0JDZGJrU...";
RestClient restClient = RestClient
.builder(HttpHost.create("https://localhost:9200"))
.setDefaultHeaders(new Header[]{
new BasicHeader("Authorization", "ApiKey " + apiKey)
})
.build();
Note:
See the Elasticsearch documentation on how to create a RestClient instance.
ElasticsearchEmbeddingStore
To create the ElasticsearchEmbeddingStore instance, you need to provide an Elasticsearch RestClient:
ElasticsearchEmbeddingStore store = ElasticsearchEmbeddingStore.builder()
.restClient(restClient)
.build();
It comes with the following options:
indexName: the name of the Elasticsearch index to use. Default isdefault.configuration: theElasticsearchConfigurationto use. Default isElasticsearchConfigurationKnn.
The previous code is equivalent to:
ElasticsearchEmbeddingStore store = ElasticsearchEmbeddingStore.builder()
.restClient(restClient)
.configuration(ElasticsearchConfigurationKnn.builder().build())
.indexName("default")
.build();
ElasticsearchContentRetriever
A ContentRetriever needs an embedding model:
EmbeddingModel embeddingModel = new AllMiniLmL6V2QuantizedEmbeddingModel();
To create an ElasticsearchContentRetriever instance, you need to provide the Elasticsearch RestClient and
the EmbeddingModel:
ElasticsearchContentRetriever contentRetriever = ElasticsearchContentRetriever.builder()
.restClient(restClient)
.embeddingModel(embeddingModel)
.build();
It comes with the following options:
configuration: theElasticsearchConfigurationto use (see below). Default isElasticsearchConfigurationKnn.indexName: the name of the Elasticsearch index to use. Default isdefault. Index will be created automatically if not exists.maxResults: the maximum number of results to retrieve. Default is3.minScore: the minimum score threshold for retrieved results. Default is0.0.filter: aFilterto apply during retrieval if any. Default isnull.
The previous code is equivalent to:
ElasticsearchContentRetriever contentRetriever = ElasticsearchContentRetriever.builder()
.restClient(restClient)
.embeddingModel(embeddingModel)
.configuration(ElasticsearchConfigurationKnn.builder().build())
.indexName("default")
.maxResults(3)
.minScore(0.0)
.filter(null)
.build();
ElasticsearchConfiguration
An ElasticsearchConfiguration defines how the embedding store or content retriever will interact with the
Elasticsearch server. You can create your own configuration by implementing the ElasticsearchConfiguration interface,
or use one of the provided implementations:
ElasticsearchConfigurationKnn: uses approximate kNN queries (default).ElasticsearchConfigurationScript: uses scriptScore queries. Note that this implementation is using cosine similarity.ElasticsearchConfigurationFullText: uses full text search (for content retriever only).ElasticsearchConfigurationHybrid: uses hybrid search (for content retriever only, requires paid license). It combines a kNN vector query with a full text query.
To create a configuration instance, you can use the builder provided by each implementation. For example:
ElasticsearchConfiguration configuration = ElasticsearchConfigurationKnn.builder().build();
ElasticsearchConfigurationKnn
The ElasticsearchConfigurationKnn uses approximate kNN queries to perform vector similarity search.
It is the default configuration used by both ElasticsearchEmbeddingStore
and ElasticsearchContentRetriever.
To create an instance, you can use the builder:
ElasticsearchConfiguration configuration = ElasticsearchConfigurationKnn.builder().build();
It comes with the following options:
numCandidates: the number of candidate neighbors to consider during the search. Default isnull, meaning using the default Elasticsearch value.includeVectorResponse: whether to include vector fields in the search response. Default isfalse.
Note: From version 9.2 of the elasticsearch server, vector fields are excluded from the response by default. To include vector fields in the responses, set the
includeVectorResponsein the builder:ElasticsearchConfigurationKnn configuration = ElasticsearchConfigurationKnn.builder()
.includeVectorResponse(true)
.build();
ElasticsearchConfigurationScript
The ElasticsearchConfigurationScript uses scriptScore queries to perform vector similarity search. Note that this
implementation is using cosine similarity.
It is available for both ElasticsearchEmbeddingStore
and ElasticsearchContentRetriever.
To create an instance, you can use the builder:
ElasticsearchConfiguration configuration = ElasticsearchConfigurationScript.builder().build();
It comes with the following options:
includeVectorResponse: whether to include vector fields in the search response. Default isfalse.
Note: From version 9.2 of the elasticsearch server, vector fields are excluded from the response by default. To include vector fields in the responses, set the
includeVectorResponsein the builder:ElasticsearchConfiguration configuration = ElasticsearchConfigurationScript.builder()
.includeVectorResponse(true)
.build();
ElasticsearchConfigurationFullText
The ElasticsearchConfigurationFullText uses full text search to retrieve relevant documents.
It is available ElasticsearchContentRetriever only.
To create an instance, you can use the builder:
ElasticsearchConfiguration configuration = ElasticsearchConfigurationFullText.builder().build();
ElasticsearchConfigurationHybrid
The ElasticsearchConfigurationHybrid uses hybrid search to combine a kNN vector query with a full text query. Note
that hybrid search requires an elasticsearch enterprise license or a trial.
It is available ElasticsearchContentRetriever only.
To create an instance, you can use the builder:
ElasticsearchConfiguration configuration = ElasticsearchConfigurationHybrid.builder().build();
It comes with the following options:
numCandidates: the number of candidate neighbors to consider during the search. Default isnull, meaning using the default Elasticsearch value.includeVectorResponse: whether to include vector fields in the search response. Default isfalse.
Note: From version 9.2 of the elasticsearch server, vector fields are excluded from the response by default. To include vector fields in the responses, set the
includeVectorResponsein the builder:ElasticsearchConfiguration configuration = ElasticsearchConfigurationHybrid.builder()
.includeVectorResponse(true)
.build();
Creating Custom Configurations
You can create your own Elasticsearch configuration by implementing the ElasticsearchConfiguration interface. For example:
public class MyElasticsearchConfiguration implements ElasticsearchConfiguration {
@Override
SearchResponse<Document> vectorSearch(
ElasticsearchClient client,
String indexName,
EmbeddingSearchRequest embeddingSearchRequest) {
// Your custom vector search implementation here
}
@Override
SearchResponse<Document> fullTextSearch(
ElasticsearchClient client,
String indexName,
String textQuery) {
// Your custom full text search implementation here
}
@Override
SearchResponse<Document> hybridSearch(
ElasticsearchClient client,
String indexName,
EmbeddingSearchRequest embeddingSearchRequest,
String textQuery) {
// Your custom hybrid search implementation here
}
}