Class GitHubDocumentLoader
java.lang.Object
dev.langchain4j.data.document.loader.github.GitHubDocumentLoader
-
Nested Class Summary
Nested Classes -
Constructor Summary
ConstructorsConstructorDescriptionGitHubDocumentLoader(String gitHubToken, String gitHubTokenOrganization) GitHubDocumentLoader(String apiUrl, String gitHubToken, String gitHubTokenOrganization) GitHubDocumentLoader(org.kohsuke.github.GitHub gitHub) -
Method Summary
Modifier and TypeMethodDescriptionstatic GitHubDocumentLoader.Builderbuilder()loadDocument(String owner, String repo, String ref, String path, DocumentParser parser) Loads a document from a specific file in a GitHub repository using the provided reference (commit ID, branch name, or tag).loadDocuments(String owner, String repo, String branch, DocumentParser parser) loadDocuments(String owner, String repo, String branch, String path, DocumentParser parser) Loads and parses multiple documents from a directory in a GitHub repository at a specific branch.
-
Constructor Details
-
GitHubDocumentLoader
-
GitHubDocumentLoader
-
GitHubDocumentLoader
public GitHubDocumentLoader() -
GitHubDocumentLoader
public GitHubDocumentLoader(org.kohsuke.github.GitHub gitHub)
-
-
Method Details
-
loadDocument
public Document loadDocument(String owner, String repo, String ref, String path, DocumentParser parser) Loads a document from a specific file in a GitHub repository using the provided reference (commit ID, branch name, or tag).This method retrieves the contents of a file from a GitHub repository at a specific version (ref), parses it using the provided
DocumentParser, and returns the resultingDocumentobject.Parameters:
- owner - The GitHub username or organization name that owns the repository. Must not be blank.
- repo - The name of the GitHub repository. Must not be blank.
- ref - The Git reference which can be one of the following:
- A branch name (e.g.,
main,develop) - A tag name (e.g.,
v1.0.0) - A commit SHA (e.g.,
a3c6e1b...)
nullor blank, GitHub will use the repository’s default branch (usuallymainormaster). - A branch name (e.g.,
- path - The relative file path within the repository to the content to be loaded (e.g.,
docs/README.md). - parser - An implementation of
DocumentParserused to parse the retrieved file content into aDocumentobject.
Returns:
ADocumentparsed from the contents of the file at the specified location and ref in the GitHub repository.Throws:
IllegalArgumentExceptionif theownerorrepois blank or null.RuntimeExceptionif the GitHub API call fails or the content cannot be retrieved (wrapsIOException).
Usage Example:
Document doc = loader.loadDocument("langchain4j", "langchain4j", "main", "pom.xml", new TextDocumentParser());- Parameters:
owner- the GitHub repository owner (user or organization)repo- the name of the GitHub repositoryref- the name of the commit SHA, branch, or tag. Ifnull, the repository’s default branch is usedpath- the relative path to the file in the repositoryparser- the parser used to convert the GitHub content into a Document- Returns:
- the parsed Document object representing the content of the file
-
loadDocuments
public List<Document> loadDocuments(String owner, String repo, String branch, String path, DocumentParser parser) Loads and parses multiple documents from a directory in a GitHub repository at a specific branch.This method recursively scans the specified directory in a GitHub repository at a given branch, retrieves all files contained within (including nested directories), parses each file using the provided
DocumentParser, and returns a list ofDocumentobjects.Parameters:
- owner - The GitHub username or organization name that owns the repository. Must not be blank.
- repo - The name of the GitHub repository. Must not be blank.
- branch - The name of the Git branch from which to read the directory contents (e.g.,
main,develop). - path - The relative path to the directory within the repository to scan (e.g.,
docs/orsrc/resources/). - parser - An implementation of
DocumentParserused to convert file contents intoDocumentobjects.
Returns:
A list ofDocumentobjects parsed from the files found in the specified directory and its subdirectories.Throws:
IllegalArgumentExceptionifownerorrepois blank or null.RuntimeExceptionif anIOExceptionoccurs while accessing the GitHub repository content.
Usage Example:
List<Document> docs = loader.loadDocuments( "langchain4j", "langchain4j", "main", "docs/", new MarkdownParser() );- Parameters:
owner- the GitHub repository owner (user or organization)repo- the name of the GitHub repositorybranch- the name of the Git branch to fetch the directory contents frompath- the relative path to the directory in the repositoryparser- the parser used to convert each file into a Document- Returns:
- a list of parsed Document objects from the specified directory
-
loadDocuments
-
builder
-