Class FileSystemDocumentLoader

java.lang.Object
dev.langchain4j.data.document.loader.FileSystemDocumentLoader

public class FileSystemDocumentLoader extends Object
  • Method Details

    • loadDocument

      public static Document loadDocument(Path filePath, DocumentParser documentParser)
      Loads a Document from the specified file Path.
      The file is parsed using the specified DocumentParser.
      Returned Document contains all the textual information from the file.
      Parameters:
      filePath - The path to the file.
      documentParser - The parser to be used for parsing text from the file.
      Returns:
      document
      Throws:
      IllegalArgumentException - If specified path is not a file.
    • loadDocument

      public static Document loadDocument(Path filePath)
      Loads a Document from the specified file Path.
      The file is parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactory). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Returned Document contains all the textual information from the file.
      Parameters:
      filePath - The path to the file.
      Returns:
      document
      Throws:
      IllegalArgumentException - If specified path is not a file.
    • loadDocument

      public static Document loadDocument(String filePath, DocumentParser documentParser)
      Loads a Document from the specified file path.
      The file is parsed using the specified DocumentParser.
      Returned Document contains all the textual information from the file.
      Parameters:
      filePath - The path to the file.
      documentParser - The parser to be used for parsing text from the file.
      Returns:
      document
      Throws:
      IllegalArgumentException - If specified path is not a file.
    • loadDocument

      public static Document loadDocument(String filePath)
      Loads a Document from the specified file path.
      The file is parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactory). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Returned Document contains all the textual information from the file.
      Parameters:
      filePath - The path to the file.
      Returns:
      document
      Throws:
      IllegalArgumentException - If specified path is not a file.
    • loadDocuments

      public static List<Document> loadDocuments(Path directoryPath, DocumentParser documentParser)
      Loads Documents from the specified directory. Does not use recursion.
      The files are parsed using the specified DocumentParser.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      documentParser - The parser to be used for parsing text from each file.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocuments

      public static List<Document> loadDocuments(Path directoryPath)
      Loads Documents from the specified directory. Does not use recursion.
      The files are parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactory). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocuments

      public static List<Document> loadDocuments(String directoryPath, DocumentParser documentParser)
      Loads Documents from the specified directory. Does not use recursion.
      The files are parsed using the specified DocumentParser.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      documentParser - The parser to be used for parsing text from each file.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocuments

      public static List<Document> loadDocuments(String directoryPath)
      Loads Documents from the specified directory. Does not use recursion.
      The files are parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactory). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocuments

      public static List<Document> loadDocuments(Path directoryPath, PathMatcher pathMatcher, DocumentParser documentParser)
      Loads matching Documents from the specified directory. Does not use recursion.
      The files are parsed using the specified DocumentParser.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      pathMatcher - Only files whose paths match the provided PathMatcher will be loaded. For example, using FileSystems.getDefault().getPathMatcher("glob:*.txt") will load all files from directoryPath with a txt extension. When traversing the directory, each file path is converted from absolute to relative (relative to directoryPath) before being matched by a pathMatcher. Thus, pathMatcher should use relative patterns.
      documentParser - The parser to be used for parsing text from each file.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocuments

      public static List<Document> loadDocuments(Path directoryPath, PathMatcher pathMatcher)
      Loads matching Documents from the specified directory. Does not use recursion.
      The files are parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactory). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      pathMatcher - Only files whose paths match the provided PathMatcher will be loaded. For example, using FileSystems.getDefault().getPathMatcher("glob:*.txt") will load all files from directoryPath with a txt extension. When traversing the directory, each file path is converted from absolute to relative (relative to directoryPath) before being matched by a pathMatcher. Thus, pathMatcher should use relative patterns.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocuments

      public static List<Document> loadDocuments(String directoryPath, PathMatcher pathMatcher, DocumentParser documentParser)
      Loads matching Documents from the specified directory. Does not use recursion.
      The files are parsed using the specified DocumentParser.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      pathMatcher - Only files whose paths match the provided PathMatcher will be loaded. For example, using FileSystems.getDefault().getPathMatcher("glob:*.txt") will load all files from directoryPath with a txt extension. When traversing the directory, each file path is converted from absolute to relative (relative to directoryPath) before being matched by a pathMatcher. Thus, pathMatcher should use relative patterns.
      documentParser - The parser to be used for parsing text from each file.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocuments

      public static List<Document> loadDocuments(String directoryPath, PathMatcher pathMatcher)
      Loads matching Documents from the specified directory. Does not use recursion.
      The files are parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactory). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      pathMatcher - Only files whose paths match the provided PathMatcher will be loaded. For example, using FileSystems.getDefault().getPathMatcher("glob:*.txt") will load all files from directoryPath with a txt extension. When traversing the directory, each file path is converted from absolute to relative (relative to directoryPath) before being matched by a pathMatcher. Thus, pathMatcher should use relative patterns.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocumentsRecursively

      public static List<Document> loadDocumentsRecursively(Path directoryPath, DocumentParser documentParser)
      Recursively loads Documents from the specified directory and its subdirectories.
      The files are parsed using the specified DocumentParser.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      documentParser - The parser to be used for parsing text from each file.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocumentsRecursively

      public static List<Document> loadDocumentsRecursively(Path directoryPath)
      Recursively loads Documents from the specified directory and its subdirectories.
      The files are parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactory). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocumentsRecursively

      public static List<Document> loadDocumentsRecursively(String directoryPath, DocumentParser documentParser)
      Recursively loads Documents from the specified directory and its subdirectories.
      The files are parsed using the specified DocumentParser.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      documentParser - The parser to be used for parsing text from each file.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocumentsRecursively

      public static List<Document> loadDocumentsRecursively(String directoryPath)
      Recursively loads Documents from the specified directory and its subdirectories.
      The files are parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactory). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocumentsRecursively

      public static List<Document> loadDocumentsRecursively(Path directoryPath, PathMatcher pathMatcher, DocumentParser documentParser)
      Recursively loads matching Documents from the specified directory and its subdirectories.
      The files are parsed using the specified DocumentParser.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      pathMatcher - Only files whose paths match the provided PathMatcher will be loaded. For example, using FileSystems.getDefault().getPathMatcher("glob:**.txt") will load all files from directoryPath and its subdirectories with a txt extension. When traversing the directory tree, each file path is converted from absolute to relative (relative to directoryPath) before being matched by a pathMatcher. Thus, pathMatcher should use relative patterns. Please be aware that *.txt pattern (with a single asterisk) will match files only in the directoryPath, but it will not match files from the subdirectories of directoryPath.
      documentParser - The parser to be used for parsing text from each file.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocumentsRecursively

      public static List<Document> loadDocumentsRecursively(Path directoryPath, PathMatcher pathMatcher)
      Recursively loads matching Documents from the specified directory and its subdirectories.
      The files are parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactory). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      pathMatcher - Only files whose paths match the provided PathMatcher will be loaded. For example, using FileSystems.getDefault().getPathMatcher("glob:**.txt") will load all files from directoryPath and its subdirectories with a txt extension. When traversing the directory tree, each file path is converted from absolute to relative (relative to directoryPath) before being matched by a pathMatcher. Thus, pathMatcher should use relative patterns. Please be aware that *.txt pattern (with a single asterisk) will match files only in the directoryPath, but it will not match files from the subdirectories of directoryPath.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocumentsRecursively

      public static List<Document> loadDocumentsRecursively(String directoryPath, PathMatcher pathMatcher, DocumentParser documentParser)
      Recursively loads matching Documents from the specified directory and its subdirectories.
      The files are parsed using the specified DocumentParser.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      pathMatcher - Only files whose paths match the provided PathMatcher will be loaded. For example, using FileSystems.getDefault().getPathMatcher("glob:**.txt") will load all files from directoryPath and its subdirectories with a txt extension. When traversing the directory tree, each file path is converted from absolute to relative (relative to directoryPath) before being matched by a pathMatcher. Thus, pathMatcher should use relative patterns. Please be aware that *.txt pattern (with a single asterisk) will match files only in the directoryPath, but it will not match files from the subdirectories of directoryPath.
      documentParser - The parser to be used for parsing text from each file.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.
    • loadDocumentsRecursively

      public static List<Document> loadDocumentsRecursively(String directoryPath, PathMatcher pathMatcher)
      Recursively loads matching Documents from the specified directory and its subdirectories.
      The files are parsed using the default DocumentParser. The default DocumentParser is loaded through SPI (see DocumentParserFactory). If no DocumentParserFactory is available in the classpath, a TextDocumentParser is used.
      Skips any Documents that fail to load.
      Parameters:
      directoryPath - The path to the directory with files.
      pathMatcher - Only files whose paths match the provided PathMatcher will be loaded. For example, using FileSystems.getDefault().getPathMatcher("glob:**.txt") will load all files from directoryPath and its subdirectories with a txt extension. When traversing the directory tree, each file path is converted from absolute to relative (relative to directoryPath) before being matched by a pathMatcher. Thus, pathMatcher should use relative patterns. Please be aware that *.txt pattern (with a single asterisk) will match files only in the directoryPath, but it will not match files from the subdirectories of directoryPath.
      Returns:
      list of documents
      Throws:
      IllegalArgumentException - If specified path is not a directory.