Skip to main content

Google AI Gemini Embeddings

https://ai.google.dev/gemini-api/docs/embeddings

Table of Contents

Maven Dependency

<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-google-ai-gemini</artifactId>
<version>1.9.0</version>
</dependency>

API Key

Get an API key for free here: https://ai.google.dev/gemini-api/docs/api-key .

Models available

Check the list of available models in the documentation.

  • gemini-embedding-001
    • Input token limit: 2,048
    • Output dimension size: Flexible, supports: 128 - 3072, Recommended: 768, 1536, 3072

GoogleAiEmbeddingModel

The GoogleAiEmbeddingModel allows you to generate embeddings from text using Google AI Gemini's embedding models.

Basic Usage

EmbeddingModel embeddingModel = GoogleAiEmbeddingModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-embedding-001")
.build();

Response<Embedding> response = embeddingModel.embed("Hello, world!");
Embedding embedding = response.content();

Embedding Multiple Texts

List<TextSegment> segments = List.of(
TextSegment.from("First document"),
TextSegment.from("Second document"),
TextSegment.from("Third document")
);

Response<List<Embedding>> response = embeddingModel.embedAll(segments);
List<Embedding> embeddings = response.content();

Configuring the Embedding Model

EmbeddingModel embeddingModel = GoogleAiEmbeddingModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-embedding-001")
.taskType(GoogleAiEmbeddingModel.TaskType.RETRIEVAL_DOCUMENT)
.outputDimensionality(768)
.titleMetadataKey("title")
.maxRetries(3)
.timeout(Duration.ofSeconds(30))
.logRequestsAndResponses(true)
.build();

Task Types

The taskType parameter optimizes the embedding for specific use cases:

  • RETRIEVAL_QUERY: For search queries
  • RETRIEVAL_DOCUMENT: For documents to be retrieved (default for document indexing)
  • SEMANTIC_SIMILARITY: For measuring text similarity
  • CLASSIFICATION: For text classification tasks
  • CLUSTERING: For grouping similar texts
  • QUESTION_ANSWERING: For Q&A systems
  • FACT_VERIFICATION: For fact-checking applications

Using Metadata for Document Titles

When using TaskType.RETRIEVAL_DOCUMENT, you can provide document titles via metadata:

EmbeddingModel embeddingModel = GoogleAiEmbeddingModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-embedding-001")
.taskType(GoogleAiEmbeddingModel.TaskType.RETRIEVAL_DOCUMENT)
.titleMetadataKey("title") // defaults to "title"
.build();

TextSegment segment = TextSegment.from(
"This is the document content",
Metadata.from("title", "My Document Title")
);

Response<Embedding> response = embeddingModel.embed(segment);

Output Dimensionality

You can specify the output dimensionality to reduce the embedding size:

EmbeddingModel embeddingModel = GoogleAiEmbeddingModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-embedding-001")
.outputDimensionality(256) // Reduce from default 768 dimensions
.build();

Batch Processing

The model automatically batches requests when embedding multiple segments, with a maximum of 100 segments per batch for optimal performance.

Note: This is not the discounted batch API, instead this is a convenience method for processing multiple embeddings.

Batch Embedding Processing

The GoogleAiGeminiBatchEmbeddingModel provides an interface for processing large volumes of embedding requests asynchronously at a reduced cost (50% of standard pricing). It is ideal for non-urgent, large-scale embedding tasks with a 24-hour turnaround SLO.

Creating Batch Embedding Jobs

Inline batch creation:

GoogleAiGeminiBatchEmbeddingModel batchModel = GoogleAiGeminiBatchEmbeddingModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-embedding-001")
.taskType(GoogleAiEmbeddingModel.TaskType.RETRIEVAL_DOCUMENT)
.outputDimensionality(768)
.build();

// Create batch of text segments
List<TextSegment> segments = List.of(
TextSegment.from("First document to embed"),
TextSegment.from("Second document to embed"),
TextSegment.from("Third document to embed")
);

// Submit the batch
BatchResponse<Embedding> response = batchModel.createBatchInline(
"Document Embeddings Batch", // display name
0L, // priority (optional, defaults to 0)
segments
);

File-based batch creation:

For larger batches, you can create a batch from an uploaded file:

// First, upload a file with batch requests
GeminiFiles filesApi = GeminiFiles.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.build();

GeminiFile uploadedFile = filesApi.uploadFile(
Paths.get("batch_embeddings.jsonl"),
"Batch Embedding Requests"
);

// Wait for file to be active
while (uploadedFile.isProcessing()) {
Thread.sleep(1000);
uploadedFile = filesApi.getMetadata(uploadedFile.name());
}

// Create batch from file
BatchResponse<Embedding> response = batchModel.createBatchFromFile(
"My Embedding Batch Job",
uploadedFile
);

Handling Batch Responses

The BatchResponse is a sealed interface with three possible states:

BatchResponse<Embedding> response = batchModel.createBatchInline("My Batch", null, segments);

switch (response) {
case BatchIncomplete incomplete -> {
System.out.println("Batch is " + incomplete.state());
System.out.println("Batch name: " + incomplete.batchName().value());
}
case BatchSuccess success -> {
System.out.println("Batch completed successfully!");
for (Embedding embedding : success.responses()) {
System.out.println("Embedding dimensions: " + embedding.dimension());
}
}
case BatchError error -> {
System.err.println("Batch failed: " + error.message());
System.err.println("Error code: " + error.code());
System.err.println("State: " + error.state());
}
}

Polling for Results

Since batch processing is asynchronous, you need to poll for results:

BatchResponse<Embedding> initialResponse = batchModel.createBatchInline(
"My Batch",
null,
segments
);

// Extract the batch name for polling
BatchName batchName = switch (initialResponse) {
case BatchIncomplete incomplete -> incomplete.batchName();
case BatchSuccess success -> success.batchName();
case BatchError error -> throw new RuntimeException("Batch creation failed");
};

// Poll until completion
BatchResponse<Embedding> result;
do {
Thread.sleep(5000); // Wait 5 seconds between polls
result = batchModel.retrieveBatchResults(batchName);
} while (result instanceof BatchIncomplete);

// Process final result
if (result instanceof BatchSuccess success) {
List<Embedding> embeddings = success.responses();
System.out.println("Generated " + embeddings.size() + " embeddings");
} else if (result instanceof BatchError error) {
System.err.println("Batch failed: " + error.message());
}

Managing Batch Jobs

Cancel a batch job:

BatchName batchName = // ... obtained from createBatchInline or createBatchFromFile

try {
batchModel.cancelBatchJob(batchName);
System.out.println("Batch cancelled successfully");
} catch (HttpException e) {
System.err.println("Failed to cancel batch: " + e.getMessage());
}

Delete a batch job:

batchModel.deleteBatchJob(batchName);
System.out.println("Batch deleted successfully");

List batch jobs:

// List first page of batch jobs
BatchList<Embedding> batchList = batchModel.listBatchJobs(10, null);

for (BatchResponse<Embedding> batch : batchList.batches()) {
System.out.println("Batch: " + batch);
}

// Get next page if available
if (batchList.nextPageToken() != null) {
BatchList<Embedding> nextPage = batchModel.listBatchJobs(10, batchList.nextPageToken());
}

File-Based Batch Processing

For advanced use cases, you can write batch requests to a JSONL file and upload it:

// Create a JSONL file with batch requests
Path batchFile = Files.createTempFile("batch", ".jsonl");

try (JsonLinesWriter writer = new StreamingJsonLinesWriter(batchFile)) {
List<BatchFileRequest<TextSegment>> fileRequests = List.of(
new BatchFileRequest<>("segment-1", TextSegment.from("First document")),
new BatchFileRequest<>("segment-2", TextSegment.from("Second document")),
new BatchFileRequest<>("segment-3", TextSegment.from("Third document"))
);

batchModel.writeBatchToFile(writer, fileRequests);
}

// Upload the file
GeminiFiles filesApi = GeminiFiles.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.build();

GeminiFile uploadedFile = filesApi.uploadFile(batchFile, "Batch Embedding Requests");

// Create batch from file
BatchResponse<Embedding> response = batchModel.createBatchFromFile(
"File-Based Embedding Batch",
uploadedFile
);

Using Metadata with Batch Embeddings

When using TaskType.RETRIEVAL_DOCUMENT, you can include document titles via metadata:

GoogleAiGeminiBatchEmbeddingModel batchModel = GoogleAiGeminiBatchEmbeddingModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-embedding-001")
.taskType(GoogleAiEmbeddingModel.TaskType.RETRIEVAL_DOCUMENT)
.titleMetadataKey("title")
.build();

List<TextSegment> segments = List.of(
TextSegment.from(
"Content of first document",
Metadata.from("title", "First Document Title")
),
TextSegment.from(
"Content of second document",
Metadata.from("title", "Second Document Title")
)
);

BatchResponse<Embedding> response = batchModel.createBatchInline(
"Documents with Titles",
null,
segments
);

Configuration

The GoogleAiGeminiBatchEmbeddingModel supports the same configuration options as GoogleAiEmbeddingModel:

GoogleAiGeminiBatchEmbeddingModel batchModel = GoogleAiGeminiBatchEmbeddingModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-embedding-001")
.taskType(GoogleAiEmbeddingModel.TaskType.RETRIEVAL_DOCUMENT)
.outputDimensionality(768)
.titleMetadataKey("title")
.maxRetries(3)
.timeout(Duration.ofSeconds(30))
.logRequestsAndResponses(true)
.build();

Important Constraints

  • Size Limit: The inline API supports a total request size of 20 MB or under
  • Batch Size: Maximum of 100 segments per batch for optimal performance
  • Cost: Batch processing offers 50% cost reduction compared to real-time requests
  • Turnaround: 24-hour SLO, though completion is often much quicker
  • Use Cases: Best for large-scale embedding generation for document indexing or semantic search

Example: Complete Workflow

GoogleAiGeminiBatchEmbeddingModel batchModel = GoogleAiGeminiBatchEmbeddingModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-embedding-001")
.taskType(GoogleAiEmbeddingModel.TaskType.RETRIEVAL_DOCUMENT)
.outputDimensionality(768)
.build();

// Prepare batch of text segments
List<TextSegment> segments = new ArrayList<>();
for (int i = 0; i < 500; i++) {
segments.add(TextSegment.from(
"Document content #" + i,
Metadata.from("title", "Document " + i)
));
}

// Submit batch
BatchResponse<Embedding> response = batchModel.createBatchInline(
"Large Document Collection",
0L,
segments
);

// Get batch name
BatchName batchName = switch (response) {
case BatchIncomplete incomplete -> incomplete.batchName();
case BatchSuccess success -> success.batchName();
case BatchError error -> throw new RuntimeException("Failed: " + error.message());
};

// Poll for completion
BatchResponse<Embedding> finalResult;
int attempts = 0;
int maxAttempts = 720; // 1 hour with 5-second intervals

do {
if (attempts++ >= maxAttempts) {
throw new RuntimeException("Batch processing timeout");
}
Thread.sleep(5000);
finalResult = batchModel.retrieveBatchResults(batchName);

if (finalResult instanceof BatchIncomplete incomplete) {
System.out.println("Status: " + incomplete.state());
}
} while (finalResult instanceof BatchIncomplete);

// Process results
if (finalResult instanceof BatchSuccess success) {
List<Embedding> embeddings = success.responses();
System.out.println("Generated " + embeddings.size() + " embeddings");

// Store embeddings in your vector database
for (int i = 0; i < embeddings.size(); i++) {
Embedding embedding = embeddings.get(i);
System.out.println("Embedding " + i + " has " + embedding.dimension() + " dimensions");
// vectorStore.add(embedding, segments.get(i));
}
} else if (finalResult instanceof BatchError error) {
System.err.println("Batch failed: " + error.message());
}

Learn more

If you're interested in learning more about the Google AI Gemini embedding models, please have a look at the documentation.