Class DocumentSplitters
java.lang.Object
dev.langchain4j.data.document.splitter.DocumentSplitters
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic DocumentSplitterrecursive(int maxSegmentSizeInChars, int maxOverlapSizeInChars) This is a recommendedDocumentSplitterfor generic text.static DocumentSplitterrecursive(int maxSegmentSizeInTokens, int maxOverlapSizeInTokens, TokenCountEstimator tokenCountEstimator) This is a recommendedDocumentSplitterfor generic text.
-
Constructor Details
-
DocumentSplitters
public DocumentSplitters()
-
-
Method Details
-
recursive
public static DocumentSplitter recursive(int maxSegmentSizeInTokens, int maxOverlapSizeInTokens, TokenCountEstimator tokenCountEstimator) This is a recommendedDocumentSplitterfor generic text. It tries to split the document into paragraphs first and fits as many paragraphs into a singleTextSegmentas possible. If some paragraphs are too long, they are recursively split into lines, then sentences, then words, and then characters until they fit into a segment.- Parameters:
maxSegmentSizeInTokens- The maximum size of the segment, defined in tokens.maxOverlapSizeInTokens- The maximum size of the overlap, defined in tokens. Only full sentences are considered for the overlap.tokenCountEstimator- TheTokenCountEstimatorthat is used to count tokens in the text.- Returns:
- recursive document splitter
-
recursive
This is a recommendedDocumentSplitterfor generic text. It tries to split the document into paragraphs first and fits as many paragraphs into a singleTextSegmentas possible. If some paragraphs are too long, they are recursively split into lines, then sentences, then words, and then characters until they fit into a segment.- Parameters:
maxSegmentSizeInChars- The maximum size of the segment, defined in characters.maxOverlapSizeInChars- The maximum size of the overlap, defined in characters. Only full sentences are considered for the overlap.- Returns:
- recursive document splitter
-