Class EmbeddingStoreIngestor

java.lang.Object
dev.langchain4j.store.embedding.EmbeddingStoreIngestor

public class EmbeddingStoreIngestor extends Object
The EmbeddingStoreIngestor represents an ingestion pipeline and is responsible for ingesting Documents into an EmbeddingStore.

In the simplest configuration, EmbeddingStoreIngestor embeds provided documents using a provided EmbeddingModel and stores them, along with their Embeddings in an EmbeddingStore.

Optionally, the EmbeddingStoreIngestor can transform documents using a provided DocumentTransformer. This can be useful if you want to clean, enrich, or format documents before embedding them.

Optionally, the EmbeddingStoreIngestor can split documents into TextSegments using a provided DocumentSplitter. This can be useful if documents are big, and you want to split them into smaller segments to improve the quality of similarity searches and reduce the size and cost of a prompt sent to the LLM.

Optionally, the EmbeddingStoreIngestor can transform TextSegments using a TextSegmentTransformer. This can be useful if you want to clean, enrich, or format TextSegments before embedding them.
Including a document title or a short summary in each TextSegment is a common technique to improve the quality of similarity searches.