Class ApachePoiDocumentParser

java.lang.Object
dev.langchain4j.data.document.parser.apache.poi.ApachePoiDocumentParser
All Implemented Interfaces:
DocumentParser

public class ApachePoiDocumentParser extends Object implements DocumentParser
Parses Microsoft Office file into a Document using Apache POI library. This parser supports various file formats, including doc, docx, ppt, pptx, xls, and xlsx. For detailed information on supported formats, please refer to the official Apache POI website.
  • Constructor Details

    • ApachePoiDocumentParser

      public ApachePoiDocumentParser()
  • Method Details

    • parse

      public Document parse(InputStream inputStream)
      Description copied from interface: DocumentParser
      Parses a given InputStream into a Document. The specific implementation of this method will depend on the type of the document being parsed.

      Note: This method does not close the provided InputStream - it is the caller's responsibility to manage the lifecycle of the stream.

      Specified by:
      parse in interface DocumentParser
      Parameters:
      inputStream - The InputStream that contains the content of the Document.
      Returns:
      The parsed Document.