Uses of Interface
dev.langchain4j.data.document.DocumentSplitter
Packages that use DocumentSplitter
Package
Description
-
Uses of DocumentSplitter in dev.langchain4j.data.document.splitter
Classes in dev.langchain4j.data.document.splitter that implement DocumentSplitterModifier and TypeClassDescriptionclassSplits the providedDocumentinto characters and attempts to fit as many characters as possible into a singleTextSegment, adhering to the limit set bymaxSegmentSize.classSplits the providedDocumentinto lines and attempts to fit as many lines as possible into a singleTextSegment, adhering to the limit set bymaxSegmentSize.classSplits the providedDocumentinto paragraphs and attempts to fit as many paragraphs as possible into a singleTextSegment, adhering to the limit set bymaxSegmentSize.classSplits the providedDocumentinto parts using the providedregexand attempts to fit as many parts as possible into a singleTextSegment, adhering to the limit set bymaxSegmentSize.classSplits the providedDocumentinto sentences and attempts to fit as many sentences as possible into a singleTextSegment, adhering to the limit set bymaxSegmentSize.classSplits the providedDocumentinto words and attempts to fit as many words as possible into a singleTextSegment, adhering to the limit set bymaxSegmentSize.classBase class for hierarchical document splitters.Fields in dev.langchain4j.data.document.splitter declared as DocumentSplitterModifier and TypeFieldDescriptionprotected final DocumentSplitterHierarchicalDocumentSplitter.subSplitterMethods in dev.langchain4j.data.document.splitter that return DocumentSplitterModifier and TypeMethodDescriptionprotected DocumentSplitterDocumentByCharacterSplitter.defaultSubSplitter()protected DocumentSplitterDocumentByLineSplitter.defaultSubSplitter()protected DocumentSplitterDocumentByParagraphSplitter.defaultSubSplitter()protected DocumentSplitterDocumentByRegexSplitter.defaultSubSplitter()protected DocumentSplitterDocumentBySentenceSplitter.defaultSubSplitter()protected DocumentSplitterDocumentByWordSplitter.defaultSubSplitter()protected abstract DocumentSplitterHierarchicalDocumentSplitter.defaultSubSplitter()The default sub-splitter to use when a single segment is too long.static DocumentSplitterDocumentSplitters.recursive(int maxSegmentSizeInChars, int maxOverlapSizeInChars) This is a recommendedDocumentSplitterfor generic text.static DocumentSplitterDocumentSplitters.recursive(int maxSegmentSizeInTokens, int maxOverlapSizeInTokens, TokenCountEstimator tokenCountEstimator) This is a recommendedDocumentSplitterfor generic text.Constructors in dev.langchain4j.data.document.splitter with parameters of type DocumentSplitterModifierConstructorDescriptionDocumentByCharacterSplitter(int maxSegmentSizeInChars, int maxOverlapSizeInChars, DocumentSplitter subSplitter) DocumentByCharacterSplitter(int maxSegmentSizeInTokens, int maxOverlapSizeInTokens, TokenCountEstimator tokenCountEstimator, DocumentSplitter subSplitter) DocumentByLineSplitter(int maxSegmentSizeInChars, int maxOverlapSizeInChars, DocumentSplitter subSplitter) DocumentByLineSplitter(int maxSegmentSizeInTokens, int maxOverlapSizeInTokens, TokenCountEstimator tokenCountEstimator, DocumentSplitter subSplitter) DocumentByParagraphSplitter(int maxSegmentSizeInChars, int maxOverlapSizeInChars, DocumentSplitter subSplitter) DocumentByParagraphSplitter(int maxSegmentSizeInTokens, int maxOverlapSizeInTokens, TokenCountEstimator tokenCountEstimator, DocumentSplitter subSplitter) DocumentByRegexSplitter(String regex, String joinDelimiter, int maxSegmentSizeInChars, int maxOverlapSizeInChars, DocumentSplitter subSplitter) DocumentByRegexSplitter(String regex, String joinDelimiter, int maxSegmentSizeInTokens, int maxOverlapSizeInTokens, TokenCountEstimator tokenCountEstimator, DocumentSplitter subSplitter) DocumentBySentenceSplitter(int maxSegmentSizeInChars, int maxOverlapSizeInChars, DocumentSplitter subSplitter) DocumentBySentenceSplitter(int maxSegmentSizeInTokens, int maxOverlapSizeInTokens, TokenCountEstimator tokenCountEstimator, DocumentSplitter subSplitter) DocumentBySentenceSplitter(int maxSegmentSizeInTokens, int maxOverlapSizeInTokens, TokenCountEstimator tokenCountEstimator, DocumentSplitter subSplitter, opennlp.tools.sentdetect.SentenceModel sentenceModel) DocumentByWordSplitter(int maxSegmentSizeInChars, int maxOverlapSizeInChars, DocumentSplitter subSplitter) DocumentByWordSplitter(int maxSegmentSizeInTokens, int maxOverlapSizeInTokens, TokenCountEstimator tokenCountEstimator, DocumentSplitter subSplitter) protectedHierarchicalDocumentSplitter(int maxSegmentSizeInTokens, int maxOverlapSizeInTokens, TokenCountEstimator tokenCountEstimator, DocumentSplitter subSplitter) Creates a new instance ofHierarchicalDocumentSplitter. -
Uses of DocumentSplitter in dev.langchain4j.data.document.splitter.oracle
Classes in dev.langchain4j.data.document.splitter.oracle that implement DocumentSplitterModifier and TypeClassDescriptionclassSplit documents Use dbms_vector_chain.utl_to_chunks to split documents. -
Uses of DocumentSplitter in dev.langchain4j.data.document.splitter.recursive
Methods in dev.langchain4j.data.document.splitter.recursive that return DocumentSplitter -
Uses of DocumentSplitter in dev.langchain4j.spi.data.document.splitter
Methods in dev.langchain4j.spi.data.document.splitter that return DocumentSplitter -
Uses of DocumentSplitter in dev.langchain4j.store.embedding
Methods in dev.langchain4j.store.embedding with parameters of type DocumentSplitterModifier and TypeMethodDescriptionEmbeddingStoreIngestor.Builder.documentSplitter(DocumentSplitter documentSplitter) Sets the document splitter.Constructors in dev.langchain4j.store.embedding with parameters of type DocumentSplitterModifierConstructorDescriptionEmbeddingStoreIngestor(DocumentTransformer documentTransformer, DocumentSplitter documentSplitter, TextSegmentTransformer textSegmentTransformer, EmbeddingModel embeddingModel, EmbeddingStore<TextSegment> embeddingStore) Creates an instance of anEmbeddingStoreIngestor.