Skip to main content

Google Cloud Storage

A Google Cloud Storage (GCS) document loader that allows you to load documents from storage buckets.

Maven Dependency

<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-document-loader-google-cloud-storage</artifactId>
<version>0.36.2</version>
</dependency>

APIs

  • GoogleCloudStorageDocumentLoader

Authentication

The authentication should be handled transparently for you:

  • If your application is running on Google Cloud Platform (Cloud Run, App Engine, Compute Engine, etc)
  • When running locally on your machine, if you are already authenticated via Google's gcloud SDK

You should just create a loader specifying just your project ID:

GoogleCloudStorageDocumentLoader gcsLoader = GoogleCloudStorageDocumentLoader.builder()
.project(System.getenv("GCP_PROJECT_ID"))
.build();

Otherwise, it's possible to specify Credentials, if you have downloaded a service account key, and exported an environment variable pointing to it:

GoogleCloudStorageDocumentLoader gcsLoader = GoogleCloudStorageDocumentLoader.builder()
.project(System.getenv("GCP_PROJECT_ID"))
.credentials(GoogleCredentials.fromStream(new FileInputStream(System.getenv("GOOGLE_APPLICATION_CREDENTIALS"))))
.build();

Learn more about credentials.

When accessing a public bucket, you shouldn't need to authenticate.

Examples

Load a single file from a GCS bucket

GoogleCloudStorageDocumentLoader gcsLoader = GoogleCloudStorageDocumentLoader.builder()
.project(System.getenv("GCP_PROJECT_ID"))
.build();

Document document = gcsLoader.loadDocument("BUCKET_NAME", "FILE_NAME.txt", new TextDocumentParser());

Load all files from a GCS bucket

GoogleCloudStorageDocumentLoader gcsLoader = GoogleCloudStorageDocumentLoader.builder()
.project(System.getenv("GCP_PROJECT_ID"))
.build();

List<Document> documents = gcsLoader.loadDocuments("BUCKET_NAME", new TextDocumentParser());

Load all files from a GCS bucket with a glob pattern

GoogleCloudStorageDocumentLoader gcsLoader = GoogleCloudStorageDocumentLoader.builder()
.project(System.getenv("GCP_PROJECT_ID"))
.build();

List<Document> documents = gcsLoader.loadDocuments("BUCKET_NAME", "*.txt", new TextDocumentParser());

For more code samples, please have a look at the integration test class: