Class ApachePdfBoxDocumentParser
java.lang.Object
dev.langchain4j.data.document.parser.apache.pdfbox.ApachePdfBoxDocumentParser
- All Implemented Interfaces:
DocumentParser
Parses PDF file into a
Document
using Apache PDFBox library-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionparse
(InputStream inputStream) Parses a givenInputStream
into aDocument
.
-
Constructor Details
-
ApachePdfBoxDocumentParser
public ApachePdfBoxDocumentParser() -
ApachePdfBoxDocumentParser
public ApachePdfBoxDocumentParser(boolean includeMetadata)
-
-
Method Details
-
parse
Description copied from interface:DocumentParser
Parses a givenInputStream
into aDocument
. The specific implementation of this method will depend on the type of the document being parsed.Note: This method does not close the provided
InputStream
- it is the caller's responsibility to manage the lifecycle of the stream.- Specified by:
parse
in interfaceDocumentParser
- Parameters:
inputStream
- TheInputStream
that contains the content of theDocument
.- Returns:
- The parsed
Document
.
-