Class DocumentByCharacterSplitter

java.lang.Object
dev.langchain4j.data.document.splitter.HierarchicalDocumentSplitter
dev.langchain4j.data.document.splitter.DocumentByCharacterSplitter
All Implemented Interfaces:
DocumentSplitter

public class DocumentByCharacterSplitter extends HierarchicalDocumentSplitter
Splits the provided Document into characters and attempts to fit as many characters as possible into a single TextSegment, adhering to the limit set by maxSegmentSize.

The maxSegmentSize can be defined in terms of characters (default) or tokens. For token-based limit, a TokenCountEstimator must be provided.

If multiple characters fit within maxSegmentSize, they are joined together without delimiters.

Each TextSegment inherits all metadata from the Document and includes an "index" metadata key representing its position within the document (starting from 0).