Package dev.langchain4j.store.embedding
Class EmbeddingStoreIngestor
java.lang.Object
dev.langchain4j.store.embedding.EmbeddingStoreIngestor
The
In the simplest configuration,
Optionally, the
Optionally, the
Optionally, the
Including a document title or a short summary in each
EmbeddingStoreIngestor
represents an ingestion pipeline and is responsible
for ingesting Document
s into an EmbeddingStore
.
In the simplest configuration,
EmbeddingStoreIngestor
embeds provided documents
using a provided EmbeddingModel
and stores them, along with their Embedding
s
in an EmbeddingStore
.
Optionally, the
EmbeddingStoreIngestor
can transform documents using a provided DocumentTransformer
.
This can be useful if you want to clean, enrich, or format documents before embedding them.
Optionally, the
EmbeddingStoreIngestor
can split documents into TextSegment
s
using a provided DocumentSplitter
.
This can be useful if documents are big, and you want to split them into smaller segments to improve the quality
of similarity searches and reduce the size and cost of a prompt sent to the LLM.
Optionally, the
EmbeddingStoreIngestor
can transform TextSegment
s using a TextSegmentTransformer
.
This can be useful if you want to clean, enrich, or format TextSegment
s before embedding them.
Including a document title or a short summary in each
TextSegment
is a common technique
to improve the quality of similarity searches.-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
EmbeddingStoreIngestor builder. -
Constructor Summary
ConstructorDescriptionEmbeddingStoreIngestor
(DocumentTransformer documentTransformer, DocumentSplitter documentSplitter, TextSegmentTransformer textSegmentTransformer, EmbeddingModel embeddingModel, EmbeddingStore<TextSegment> embeddingStore) Creates an instance of anEmbeddingStoreIngestor
. -
Method Summary
Modifier and TypeMethodDescriptionbuilder()
Creates a new EmbeddingStoreIngestor builder.Ingests a specified document into anEmbeddingStore
that was specified during the creation of thisEmbeddingStoreIngestor
.Ingests specified documents into anEmbeddingStore
that was specified during the creation of thisEmbeddingStoreIngestor
.static IngestionResult
ingest
(Document document, EmbeddingStore<TextSegment> embeddingStore) Ingests a specifiedDocument
into a specifiedEmbeddingStore
.Ingests specified documents into anEmbeddingStore
that was specified during the creation of thisEmbeddingStoreIngestor
.static IngestionResult
ingest
(List<Document> documents, EmbeddingStore<TextSegment> embeddingStore) Ingests specifiedDocument
s into a specifiedEmbeddingStore
.
-
Constructor Details
-
EmbeddingStoreIngestor
public EmbeddingStoreIngestor(DocumentTransformer documentTransformer, DocumentSplitter documentSplitter, TextSegmentTransformer textSegmentTransformer, EmbeddingModel embeddingModel, EmbeddingStore<TextSegment> embeddingStore) Creates an instance of anEmbeddingStoreIngestor
.- Parameters:
documentTransformer
- TheDocumentTransformer
to use. Optional.documentSplitter
- TheDocumentSplitter
to use. Optional. If none is specified, it tries to load one through SPI (seeDocumentSplitterFactory
).textSegmentTransformer
- TheTextSegmentTransformer
to use. Optional.embeddingModel
- TheEmbeddingModel
to use. Mandatory. If none is specified, it tries to load one through SPI (seeEmbeddingModelFactory
).embeddingStore
- TheEmbeddingStore
to use. Mandatory.
-
-
Method Details
-
ingest
Ingests a specifiedDocument
into a specifiedEmbeddingStore
.
UsesDocumentSplitter
andEmbeddingModel
found through SPIs (seeDocumentSplitterFactory
andEmbeddingModelFactory
).
For the "Easy RAG", importlangchain4j-easy-rag
module, which contains aDocumentSplitterFactory
andEmbeddingModelFactory
implementations.- Returns:
- result including information related to ingestion process.
-
ingest
public static IngestionResult ingest(List<Document> documents, EmbeddingStore<TextSegment> embeddingStore) Ingests specifiedDocument
s into a specifiedEmbeddingStore
.
UsesDocumentSplitter
andEmbeddingModel
found through SPIs (seeDocumentSplitterFactory
andEmbeddingModelFactory
).
For the "Easy RAG", importlangchain4j-easy-rag
module, which contains aDocumentSplitterFactory
andEmbeddingModelFactory
implementations.- Returns:
- result including information related to ingestion process.
-
ingest
Ingests a specified document into anEmbeddingStore
that was specified during the creation of thisEmbeddingStoreIngestor
.- Parameters:
document
- the document to ingest.- Returns:
- result including information related to ingestion process.
-
ingest
Ingests specified documents into anEmbeddingStore
that was specified during the creation of thisEmbeddingStoreIngestor
.- Parameters:
documents
- the documents to ingest.- Returns:
- result including information related to ingestion process.
-
ingest
Ingests specified documents into anEmbeddingStore
that was specified during the creation of thisEmbeddingStoreIngestor
.- Parameters:
documents
- the documents to ingest.- Returns:
- result including information related to ingestion process.
-
builder
Creates a new EmbeddingStoreIngestor builder.- Returns:
- the builder.
-