Class GitHubDocumentLoader

java.lang.Object
dev.langchain4j.data.document.loader.github.GitHubDocumentLoader

public class GitHubDocumentLoader extends Object
  • Constructor Details

    • GitHubDocumentLoader

      public GitHubDocumentLoader(String gitHubToken, String gitHubTokenOrganization)
    • GitHubDocumentLoader

      public GitHubDocumentLoader(String apiUrl, String gitHubToken, String gitHubTokenOrganization)
    • GitHubDocumentLoader

      public GitHubDocumentLoader()
    • GitHubDocumentLoader

      public GitHubDocumentLoader(org.kohsuke.github.GitHub gitHub)
  • Method Details

    • loadDocument

      public Document loadDocument(String owner, String repo, String ref, String path, DocumentParser parser)
      Loads a document from a specific file in a GitHub repository using the provided reference (commit ID, branch name, or tag).

      This method retrieves the contents of a file from a GitHub repository at a specific version (ref), parses it using the provided DocumentParser, and returns the resulting Document object.

      Parameters:

      • owner - The GitHub username or organization name that owns the repository. Must not be blank.
      • repo - The name of the GitHub repository. Must not be blank.
      • ref - The Git reference which can be one of the following:
        • A branch name (e.g., main, develop)
        • A tag name (e.g., v1.0.0)
        • A commit SHA (e.g., a3c6e1b...)
        If null or blank, GitHub will use the repository’s default branch (usually main or master).
      • path - The relative file path within the repository to the content to be loaded (e.g., docs/README.md).
      • parser - An implementation of DocumentParser used to parse the retrieved file content into a Document object.

      Returns:

      A Document parsed from the contents of the file at the specified location and ref in the GitHub repository.

      Throws:

      Usage Example:

      
       Document doc = loader.loadDocument("langchain4j", "langchain4j", "main", "pom.xml", new TextDocumentParser());
       
      Parameters:
      owner - the GitHub repository owner (user or organization)
      repo - the name of the GitHub repository
      ref - the name of the commit SHA, branch, or tag. If null, the repository’s default branch is used
      path - the relative path to the file in the repository
      parser - the parser used to convert the GitHub content into a Document
      Returns:
      the parsed Document object representing the content of the file
    • loadDocuments

      public List<Document> loadDocuments(String owner, String repo, String branch, String path, DocumentParser parser)
      Loads and parses multiple documents from a directory in a GitHub repository at a specific branch.

      This method recursively scans the specified directory in a GitHub repository at a given branch, retrieves all files contained within (including nested directories), parses each file using the provided DocumentParser, and returns a list of Document objects.

      Parameters:

      • owner - The GitHub username or organization name that owns the repository. Must not be blank.
      • repo - The name of the GitHub repository. Must not be blank.
      • branch - The name of the Git branch from which to read the directory contents (e.g., main, develop).
      • path - The relative path to the directory within the repository to scan (e.g., docs/ or src/resources/).
      • parser - An implementation of DocumentParser used to convert file contents into Document objects.

      Returns:

      A list of Document objects parsed from the files found in the specified directory and its subdirectories.

      Throws:

      Usage Example:

      
       List<Document> docs = loader.loadDocuments(
           "langchain4j",
           "langchain4j",
           "main",
           "docs/",
           new MarkdownParser()
       );
       
      Parameters:
      owner - the GitHub repository owner (user or organization)
      repo - the name of the GitHub repository
      branch - the name of the Git branch to fetch the directory contents from
      path - the relative path to the directory in the repository
      parser - the parser used to convert each file into a Document
      Returns:
      a list of parsed Document objects from the specified directory
    • loadDocuments

      public List<Document> loadDocuments(String owner, String repo, String branch, DocumentParser parser)
    • builder

      public static GitHubDocumentLoader.Builder builder()