Class ApachePdfBoxDocumentParser

java.lang.Object
dev.langchain4j.data.document.parser.apache.pdfbox.ApachePdfBoxDocumentParser
All Implemented Interfaces:
DocumentParser

public class ApachePdfBoxDocumentParser extends Object implements DocumentParser
Parses PDF file into a Document using Apache PDFBox library
  • Constructor Details

    • ApachePdfBoxDocumentParser

      public ApachePdfBoxDocumentParser()
    • ApachePdfBoxDocumentParser

      public ApachePdfBoxDocumentParser(boolean includeMetadata)
  • Method Details

    • parse

      public Document parse(InputStream inputStream)
      Description copied from interface: DocumentParser
      Parses a given InputStream into a Document. The specific implementation of this method will depend on the type of the document being parsed.

      Note: This method does not close the provided InputStream - it is the caller's responsibility to manage the lifecycle of the stream.

      Specified by:
      parse in interface DocumentParser
      Parameters:
      inputStream - The InputStream that contains the content of the Document.
      Returns:
      The parsed Document.