Interface DocumentParser
- All Known Implementing Classes:
ApachePdfBoxDocumentParser, ApachePoiDocumentParser, ApacheTikaDocumentParser, MarkdownDocumentParser, TextDocumentParser, YamlDocumentParser
public interface DocumentParser
Defines the interface for parsing an
InputStream into a Document.
Different document types require specialized parsing logic.-
Method Summary
Modifier and TypeMethodDescriptionparse(InputStream inputStream) Parses a givenInputStreaminto aDocument.
-
Method Details
-
parse
Parses a givenInputStreaminto aDocument. The specific implementation of this method will depend on the type of the document being parsed.Note: This method does not close the provided
InputStream- it is the caller's responsibility to manage the lifecycle of the stream.- Parameters:
inputStream- TheInputStreamthat contains the content of theDocument.- Returns:
- The parsed
Document. - Throws:
BlankDocumentException- when the parsedDocumentis blank/empty.
-