Class GitHubDocumentLoader
java.lang.Object
dev.langchain4j.data.document.loader.github.GitHubDocumentLoader
-
Nested Class Summary
Nested Classes -
Constructor Summary
ConstructorsConstructorDescriptionGitHubDocumentLoader
(String gitHubToken, String gitHubTokenOrganization) GitHubDocumentLoader
(String apiUrl, String gitHubToken, String gitHubTokenOrganization) GitHubDocumentLoader
(org.kohsuke.github.GitHub gitHub) -
Method Summary
Modifier and TypeMethodDescriptionstatic GitHubDocumentLoader.Builder
builder()
loadDocument
(String owner, String repo, String ref, String path, DocumentParser parser) Loads a document from a specific file in a GitHub repository using the provided reference (commit ID, branch name, or tag).loadDocuments
(String owner, String repo, String branch, DocumentParser parser) loadDocuments
(String owner, String repo, String branch, String path, DocumentParser parser) Loads and parses multiple documents from a directory in a GitHub repository at a specific branch.
-
Constructor Details
-
GitHubDocumentLoader
-
GitHubDocumentLoader
-
GitHubDocumentLoader
public GitHubDocumentLoader() -
GitHubDocumentLoader
public GitHubDocumentLoader(org.kohsuke.github.GitHub gitHub)
-
-
Method Details
-
loadDocument
public Document loadDocument(String owner, String repo, String ref, String path, DocumentParser parser) Loads a document from a specific file in a GitHub repository using the provided reference (commit ID, branch name, or tag).This method retrieves the contents of a file from a GitHub repository at a specific version (ref), parses it using the provided
DocumentParser
, and returns the resultingDocument
object.Parameters:
- owner - The GitHub username or organization name that owns the repository. Must not be blank.
- repo - The name of the GitHub repository. Must not be blank.
- ref - The Git reference which can be one of the following:
- A branch name (e.g.,
main
,develop
) - A tag name (e.g.,
v1.0.0
) - A commit SHA (e.g.,
a3c6e1b...
)
null
or blank, GitHub will use the repository’s default branch (usuallymain
ormaster
). - A branch name (e.g.,
- path - The relative file path within the repository to the content to be loaded (e.g.,
docs/README.md
). - parser - An implementation of
DocumentParser
used to parse the retrieved file content into aDocument
object.
Returns:
ADocument
parsed from the contents of the file at the specified location and ref in the GitHub repository.Throws:
IllegalArgumentException
if theowner
orrepo
is blank or null.RuntimeException
if the GitHub API call fails or the content cannot be retrieved (wrapsIOException
).
Usage Example:
Document doc = loader.loadDocument("langchain4j", "langchain4j", "main", "pom.xml", new TextDocumentParser());
- Parameters:
owner
- the GitHub repository owner (user or organization)repo
- the name of the GitHub repositoryref
- the name of the commit SHA, branch, or tag. Ifnull
, the repository’s default branch is usedpath
- the relative path to the file in the repositoryparser
- the parser used to convert the GitHub content into a Document- Returns:
- the parsed Document object representing the content of the file
-
loadDocuments
public List<Document> loadDocuments(String owner, String repo, String branch, String path, DocumentParser parser) Loads and parses multiple documents from a directory in a GitHub repository at a specific branch.This method recursively scans the specified directory in a GitHub repository at a given branch, retrieves all files contained within (including nested directories), parses each file using the provided
DocumentParser
, and returns a list ofDocument
objects.Parameters:
- owner - The GitHub username or organization name that owns the repository. Must not be blank.
- repo - The name of the GitHub repository. Must not be blank.
- branch - The name of the Git branch from which to read the directory contents (e.g.,
main
,develop
). - path - The relative path to the directory within the repository to scan (e.g.,
docs/
orsrc/resources/
). - parser - An implementation of
DocumentParser
used to convert file contents intoDocument
objects.
Returns:
A list ofDocument
objects parsed from the files found in the specified directory and its subdirectories.Throws:
IllegalArgumentException
ifowner
orrepo
is blank or null.RuntimeException
if anIOException
occurs while accessing the GitHub repository content.
Usage Example:
List<Document> docs = loader.loadDocuments( "langchain4j", "langchain4j", "main", "docs/", new MarkdownParser() );
- Parameters:
owner
- the GitHub repository owner (user or organization)repo
- the name of the GitHub repositorybranch
- the name of the Git branch to fetch the directory contents frompath
- the relative path to the directory in the repositoryparser
- the parser used to convert each file into a Document- Returns:
- a list of parsed Document objects from the specified directory
-
loadDocuments
-
builder
-