Class ClassPathDocumentLoader

java.lang.Object
dev.langchain4j.data.document.loader.ClassPathDocumentLoader

public class ClassPathDocumentLoader extends Object
DocumentLoader implementation for loading documents using a ClassPathSource
Author:
Eric Deandrea
  • Method Details

    • loadDocument

      public static Document loadDocument(String pathOnClasspath)
      Loads a Document from the specified file path.
      The file is parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactoru). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Returned Document contains all the textual information from the file.
      Parameters:
      pathOnClasspath - The path on the classpath to the file.
      Returns:
      document
      Throws:
      IllegalArgumentException - If specified path is not a file.
    • loadDocument

      public static Document loadDocument(String pathOnClasspath, DocumentParser documentParser)
      Loads a Document from the specified file path.
      The file is parsed using the specified DocumentParser.
      Returned Document contains all the textual information from the file.
      Parameters:
      pathOnClasspath - The path on the classpath to the file.
      documentParser - The parser to be used for parsing text from the file.
      Returns:
      document
      Throws:
      IllegalArgumentException - If specified path is not a file.
    • loadDocuments

      public static List<Document> loadDocuments(String directoryOnClasspath)
      Loads Documents from the specified directory. Does not use recursion.
      The files are parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactoru). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Skips any Documents that fail to load.
      Parameters:
      directoryOnClasspath - The path to the directory on the classpath with files.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocuments

      public static List<Document> loadDocuments(String directoryOnClasspath, DocumentParser documentParser)
      Loads Documents from the specified directory. Does not use recursion.
      The files are parsed using the specified DocumentParser.
      Skips any Documents that fail to load.
      Parameters:
      directoryOnClasspath - The path to the directory on the classpath with files.
      documentParser - The parser to be used for parsing text from each file.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocuments

      public static List<Document> loadDocuments(String directoryOnClasspath, PathMatcher pathMatcher)
      Loads matching Documents from the specified directory. Does not use recursion.
      The files are parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactoru). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Skips any Documents that fail to load.
      Parameters:
      directoryOnClasspath - The path to the directory on the classpath with files.
      pathMatcher - Only files whose paths match the provided PathMatcher will be loaded. For example, using FileSystems.getDefault().getPathMatcher("glob:*.txt") will load all files from directoryPath with a txt extension. When traversing the directory, each file path is converted from absolute to relative (relative to directoryPath) before being matched by a pathMatcher. Thus, pathMatcher should use relative patterns.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocuments

      public static List<Document> loadDocuments(String directoryOnClasspath, PathMatcher pathMatcher, DocumentParser documentParser)
      Loads matching Documents from the specified directory. Does not use recursion.
      The files are parsed using the specified DocumentParser.
      Skips any Documents that fail to load.
      Parameters:
      directoryOnClasspath - The path to the directory on the classpath with files.
      pathMatcher - Only files whose paths match the provided PathMatcher will be loaded. For example, using FileSystems.getDefault().getPathMatcher("glob:*.txt") will load all files from directoryPath with a txt extension. When traversing the directory, each file path is converted from absolute to relative (relative to directoryPath) before being matched by a pathMatcher. Thus, pathMatcher should use relative patterns.
      documentParser - The parser to be used for parsing text from each file.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocumentsRecursively

      public static List<Document> loadDocumentsRecursively(String directoryOnClasspath)
      Recursively loads Documents from the specified directory and its subdirectories.
      The files are parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactoru). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Skips any Documents that fail to load.
      Parameters:
      directoryOnClasspath - The path to the directory on the classpath with files.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocumentsRecursively

      public static List<Document> loadDocumentsRecursively(String directoryOnClasspath, DocumentParser documentParser)
      Recursively loads Documents from the specified directory and its subdirectories.
      The files are parsed using the specified DocumentParser.
      Skips any Documents that fail to load.
      Parameters:
      directoryOnClasspath - The path to the directory on the classpath with files.
      documentParser - The parser to be used for parsing text from each file.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocumentsRecursively

      public static List<Document> loadDocumentsRecursively(String directoryOnClasspath, PathMatcher pathMatcher)
      Recursively loads matching Documents from the specified directory and its subdirectories.
      The files are parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactoru). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Skips any Documents that fail to load.
      Parameters:
      directoryOnClasspath - The path to the directory on the classpath with files.
      pathMatcher - Only files whose paths match the provided PathMatcher will be loaded. For example, using FileSystems.getDefault().getPathMatcher("glob:**.txt") will load all files from directoryPath and its subdirectories with a txt extension. When traversing the directory tree, each file path is converted from absolute to relative (relative to directoryPath) before being matched by a pathMatcher. Thus, pathMatcher should use relative patterns. Please be aware that *.txt pattern (with a single asterisk) will match files only in the directoryPath, but it will not match files from the subdirectories of directoryPath.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocumentsRecursively

      public static List<Document> loadDocumentsRecursively(String directoryOnClasspath, PathMatcher pathMatcher, DocumentParser documentParser)
      Recursively loads matching Documents from the specified directory and its subdirectories.
      The files are parsed using the specified DocumentParser.
      Skips any Documents that fail to load.
      Parameters:
      directoryOnClasspath - The path to the directory on the classpath with files.
      pathMatcher - Only files whose paths match the provided PathMatcher will be loaded. For example, using FileSystems.getDefault().getPathMatcher("glob:**.txt") will load all files from directoryPath and its subdirectories with a txt extension. When traversing the directory tree, each file path is converted from absolute to relative (relative to directoryPath) before being matched by a pathMatcher. Thus, pathMatcher should use relative patterns. Please be aware that *.txt pattern (with a single asterisk) will match files only in the directoryPath, but it will not match files from the subdirectories of directoryPath.
      documentParser - The parser to be used for parsing text from each file.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.