Class DocumentSplitters
java.lang.Object
dev.langchain4j.data.document.splitter.DocumentSplitters
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic DocumentSplitter
recursive
(int maxSegmentSizeInChars, int maxOverlapSizeInChars) This is a recommendedDocumentSplitter
for generic text.static DocumentSplitter
This is a recommendedDocumentSplitter
for generic text.
-
Constructor Details
-
DocumentSplitters
public DocumentSplitters()
-
-
Method Details
-
recursive
public static DocumentSplitter recursive(int maxSegmentSizeInTokens, int maxOverlapSizeInTokens, Tokenizer tokenizer) This is a recommendedDocumentSplitter
for generic text. It tries to split the document into paragraphs first and fits as many paragraphs into a singleTextSegment
as possible. If some paragraphs are too long, they are recursively split into lines, then sentences, then words, and then characters until they fit into a segment.- Parameters:
maxSegmentSizeInTokens
- The maximum size of the segment, defined in tokens.maxOverlapSizeInTokens
- The maximum size of the overlap, defined in tokens. Only full sentences are considered for the overlap.tokenizer
- The tokenizer that is used to count tokens in the text.- Returns:
- recursive document splitter
-
recursive
This is a recommendedDocumentSplitter
for generic text. It tries to split the document into paragraphs first and fits as many paragraphs into a singleTextSegment
as possible. If some paragraphs are too long, they are recursively split into lines, then sentences, then words, and then characters until they fit into a segment.- Parameters:
maxSegmentSizeInChars
- The maximum size of the segment, defined in characters.maxOverlapSizeInChars
- The maximum size of the overlap, defined in characters. Only full sentences are considered for the overlap.- Returns:
- recursive document splitter
-