Package dev.langchain4j.data.document
Interface DocumentParser
- All Known Implementing Classes:
ApachePdfBoxDocumentParser
,ApachePoiDocumentParser
,ApacheTikaDocumentParser
,TextDocumentParser
public interface DocumentParser
Defines the interface for parsing an
InputStream
into a Document
.
Different document types require specialized parsing logic.-
Method Summary
Modifier and TypeMethodDescriptionparse
(InputStream inputStream) Parses a givenInputStream
into aDocument
.
-
Method Details
-
parse
Parses a givenInputStream
into aDocument
. The specific implementation of this method will depend on the type of the document being parsed.Note: This method does not close the provided
InputStream
- it is the caller's responsibility to manage the lifecycle of the stream.- Parameters:
inputStream
- TheInputStream
that contains the content of theDocument
.- Returns:
- The parsed
Document
. - Throws:
BlankDocumentException
- when the parsedDocument
is blank/empty.
-