Google Gen AI (Experimental)
https://github.com/googleapis/java-genai
[!WARNING]
This integration is currently marked as Experimental. The API and implementation are subject to change in future releases. It uses the new official Google Gen AI SDK for Java (com.google.genai:google-genai).
Table of Contents
- Maven Dependency
- API Key
- Models Available
- GoogleGenAiChatModel
- GoogleGenAiStreamingChatModel
- GoogleGenAiEmbeddingModel
- GoogleGenAiImageModel
- Request & Response Logging
- Batch API
- Tools
- JSON Schema / Structured Outputs
- Grounding Metadata
- Custom Labels
- File API
- Cached Content Support
- Thinking Models (Gemini 3.0+)
- Multimodality (Audio, Video, PDF)
- Token Count Estimator
- Model Catalog
Maven Dependency
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-google-genai</artifactId>
<version>1.16.0-beta26</version>
</dependency>
Authentication
You can authenticate with the Gemini models using either an API key or Google Cloud Vertex AI credentials.
Gemini Developer API (API Key)
Get an API key for free here: https://ai.google.dev/gemini-api/docs/api-key.
You can provide it to the builder using .apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY")).
Google Cloud Vertex AI
If you are using Vertex AI, you can authenticate using Google Credentials along with your project ID and location. The integration will automatically use Application Default Credentials (ADC) if available, or you can explicitly provide them:
ChatModel gemini = GoogleGenAiChatModel.builder()
// .googleCredentials(...) // Optional: explicitly provide credentials
.projectId("your-google-cloud-project-id")
.location("us-central1")
.modelName("gemini-2.5-flash")
.build();
Models available
Check the list of available models in the documentation.
gemini-3.1-pro-previewgemini-3.1-flash-litegemini-3-pro-previewgemini-3-flash-previewgemini-2.5-progemini-2.5-flashgemini-2.5-flash-lite
(See the official documentation for the full list of specialized preview models like -image, -tts, and -live).
GoogleGenAiChatModel
The usual chat(...) methods are available:
ChatModel gemini = GoogleGenAiChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.build();
String response = gemini.chat("Hello Gemini!");
As well as the ChatResponse chat(ChatRequest req) method:
ChatModel gemini = GoogleGenAiChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.build();
ChatResponse chatResponse = gemini.chat(ChatRequest.builder()
.messages(UserMessage.from(
"How many R's are there in the word 'strawberry'?"))
.build());
String response = chatResponse.aiMessage().text();
Configuring
ChatModel gemini = GoogleGenAiChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
// or .googleCredentials(...)
.projectId(...)
.location(...)
.modelName("gemini-2.5-flash")
.temperature(1.0)
.topP(0.95)
.topK(64)
.seed(42)
.maxOutputTokens(8192)
.timeout(Duration.ofSeconds(60))
.maxRetries(2)
.stopSequences(List.of(...))
.safetySettings(List.of(...))
.responseFormat(ResponseFormat.JSON)
.enableGoogleSearch(true)
.enableGoogleMaps(true)
.enableUrlContext(true)
.allowedFunctionNames(List.of("getWeather"))
.thinkingLevel("LOW")
.listeners(...)
.build();
Request & Response Logging
You can enable request and response logging for debugging, troubleshooting, and audit purposes on GoogleGenAiChatModel, GoogleGenAiStreamingChatModel, GoogleGenAiEmbeddingModel, and GoogleGenAiImageModel.
To capture these logs, configure .logRequests(true), .logResponses(true) (or both using .logRequestsAndResponses(true)) in your model builders.
ChatModel gemini = GoogleGenAiChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.logRequests(true)
.logResponses(true)
// Or: .logRequestsAndResponses(true)
.build();
Logging Configuration Setup
All logging in the Google Gen AI integration module is routed through the standard SLF4J facade. To actually view the output, you must ensure that:
- An SLF4J binding (implementation) is present in your dependencies.
- The logging framework is configured to output logs under the
INFOlevel for the packagedev.langchain4j.model.google.genai.
Below are common setup patterns for popular logging environments:
1. Setup using Logback
Add the Logback classic implementation to your project:
Maven
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.5.8</version> <!-- or your preferred version -->
</dependency>
Gradle
implementation 'ch.qos.logback:logback-classic:1.5.8'
Next, configure the logging level in your src/main/resources/logback.xml file. For example:
<configuration>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>
</appender>
<!-- Configure the package specifically for Google Gen AI logging -->
<logger name="dev.langchain4j.model.google.genai" level="INFO" />
<root level="WARN">
<appender-ref ref="STDOUT" />
</root>
</configuration>
2. Setup in Spring Boot Applications
Spring Boot automatically provides an SLF4J provider. Simply configure the logging level in your application.properties (or application.yml equivalent):
# Enable logging for Google Gen AI models
logging.level.dev.langchain4j.model.google.genai=INFO
3. Setup with SLF4J Simple
If you are writing a script or a simple command-line application, you can use the lightweight slf4j-simple backend:
Maven
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>2.0.13</version>
</dependency>
Configure SLF4J Simple via a system property when starting your application:
java -Dorg.slf4j.simpleLogger.log.dev.langchain4j.model.google.genai=INFO -jar app.jar
Alternatively, create a simplelogger.properties file in src/main/resources/ containing:
org.slf4j.simpleLogger.log.dev.langchain4j.model.google.genai=info
GoogleGenAiStreamingChatModel
The GoogleGenAiStreamingChatModel allows streaming the text of a response token by token.
The response must be handled by a StreamingChatResponseHandler.
StreamingChatModel gemini = GoogleGenAiStreamingChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.build();
CompletableFuture<ChatResponse> futureResponse = new CompletableFuture<>();
gemini.chat("Tell me a joke about Java", new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String partialResponse) {
System.out.print(partialResponse);
}
@Override
public void onCompleteResponse(ChatResponse completeResponse) {
futureResponse.complete(completeResponse);
}
@Override
public void onError(Throwable error) {
futureResponse.completeExceptionally(error);
}
});
futureResponse.join();
Executor
The Google Gen AI SDK exposes streaming as a blocking ResponseStream iterator: each chunk is delivered by a blocking next() call. GoogleGenAiStreamingChatModel therefore needs an ExecutorService to drive that iteration off the caller's thread.
If you don't pass one, a shared default from DefaultExecutorProvider is used (lazily initialized, uses virtual threads when available). This works out of the box but is not recommended for production: the default executor is unbounded, JVM-wide, and not tied to your application lifecycle — so it offers no back-pressure, no graceful shutdown, and no visibility in your metrics.
You should almost always supply your own executor — for example, your framework's managed task executor (Spring TaskExecutor, Quarkus ManagedExecutor, ...), a virtual-thread executor you own, or a bounded pool tuned to your concurrency budget:
ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor(); // or your framework's executor
StreamingChatModel gemini = GoogleGenAiStreamingChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.executor(executor)
.build();
Tools
Tools (aka Function Calling) are supported. You can define them using LangChain4j's AiServices:
class WeatherForecastService {
@Tool("Get the weather forecast for a location")
String getForecast(@P("Location to get the forecast for") String location) {
return "The weather in " + location + " is sunny and 25°C.";
}
}
interface WeatherAssistant {
String chat(String userMessage);
}
WeatherForecastService weatherForecastService = new WeatherForecastService();
ChatModel gemini = GoogleGenAiChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.temperature(0.0)
.build();
WeatherAssistant weatherAssistant = AiServices.builder(WeatherAssistant.class)
.chatModel(gemini)
.tools(weatherForecastService)
.build();
String response = weatherAssistant.chat("What is the weather forecast for Tokyo?");
JSON Schema / Structured Outputs
The langchain4j-google-genai integration maps LangChain4j JSON schemas (ResponseFormat.jsonSchema()) directly into the ResponseSchema of the official Google Gen AI SDK. This allows natively extracting strongly-typed Java records!
record WeatherForecast(
@Description("minimum temperature") Integer minTemperature,
@Description("maximum temperature") Integer maxTemperature,
@Description("chances of rain") boolean rain
) { }
interface WeatherForecastAssistant {
WeatherForecast extract(String forecast);
}
ChatModel gemini = GoogleGenAiChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.build();
WeatherForecastAssistant forecastAssistant = AiServices.builder(WeatherForecastAssistant.class)
.chatModel(gemini)
.build();
WeatherForecast forecast = forecastAssistant.extract("""
Morning: The day dawns bright and clear in Osaka...
Temperatures climb to a comfortable 22°C (72°F) and
will drop to 15°C (59°F).
""");
[!NOTE]
The Google Gen AI API has some restrictions on advanced JSON schema features (such asanyOf/ polymorphic typing). Simple POJOs, lists, and nested objects are fully supported.
Cached Content Support
When working with very large context windows (like massive system prompts, large documents, or extensive codebases) that are reused across multiple requests, you can significantly reduce costs and latency by caching the content.
Once you have created the cached content using the official Google Gen AI SDK or API, you can easily pass the unique cache identifier to the LangChain4j chat model builders:
// Pass your cached content URI here
String cachedContentUri = "projects/123456/locations/us-central1/cachedContents/my-cached-content-789";
ChatModel gemini = GoogleGenAiChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-pro")
.cachedContent(cachedContentUri)
.build();
// The model will automatically use the cached context!
String response = gemini.chat("Summarize the cached document in 3 bullet points.");
This feature is available on GoogleGenAiChatModel, GoogleGenAiStreamingChatModel, and GoogleGenAiBatchChatModel.
Thinking Models (Gemini 3.0+)
Gemini 3.0 models (like gemini-3.0-pro and gemini-3.0-flash) support advanced reasoning (thinking) capabilities.
You can enable this by specifying a thinkingLevel during model configuration. The supported values are "MINIMAL", "LOW", "MEDIUM", and "HIGH":
ChatModel gemini = GoogleGenAiChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-3.0-pro")
.thinkingLevel("MEDIUM")
.build();
[!NOTE] Previously, thinking was configured using a token-based
thinkingBudget. ThethinkingBudgetparameter is now considered legacy, though still supported. You cannot specify boththinkingLevelandthinkingBudgetat the same time.
[!TIP] The LangChain4j
google-genaiintegration seamlessly manages the complex state required for multi-turn tool execution with thinking models. It automatically persists and injects the necessary hiddenthought_signaturetokens across conversation turns, ensuring robust and uninterrupted agentic workflows!
GoogleGenAiEmbeddingModel
The GoogleGenAiEmbeddingModel allows you to generate embeddings for text segments using models like gemini-embedding-2.
EmbeddingModel embeddingModel = GoogleGenAiEmbeddingModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-embedding-2")
.outputDimensionality(768)
.taskType(GoogleGenAiEmbeddingModel.TaskTypeEnum.RETRIEVAL_DOCUMENT)
.build();
Response<Embedding> response = embeddingModel.embed("Hello world!");
Batching & Retries
When embedding multiple text segments (via embedAll), GoogleGenAiEmbeddingModel automatically manages batching and API request retries.
EmbeddingModel embeddingModel = GoogleGenAiEmbeddingModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-embedding-2")
.maxSegmentsPerBatch(100) // Default: 100. Sets maximum segments per batch request.
.maxRetries(3) // Default: 3. Automatically retries failed requests.
.build();
Title-based Grouping Strategy
The official Google Gen AI Java SDK's embedContent API only supports a single common title per batch request. To handle this restriction cleanly and preserve document-level associations, GoogleGenAiEmbeddingModel implements a group-by-title batching strategy:
- When
taskTypeis set toRETRIEVAL_DOCUMENT, the model groups text segments by their document title (extracted from the segment's metadata using the key defined by.titleMetadataKey(...), which defaults to"title"). - Segments sharing the same title are batched and sent together in a single API call.
- Segments with different titles (or no title) are processed in separate, optimized batches.
- The resulting embeddings are seamlessly reassembled and returned in their original order.
This maximizes API throughput without losing document metadata context or individual segment titles.
GoogleGenAiImageModel
The GoogleGenAiImageModel allows you to generate images from text prompts. It supports custom configuration like aspect ratios, image sizes, and person generation policies.
ImageModel imageModel = GoogleGenAiImageModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-3.1-flash-image-preview")
.aspectRatio("16:9")
.build();
Response<Image> response = imageModel.generate("A futuristic city at sunset");
Batch API
The Google Gen AI integration provides support for the Batch API, allowing you to run operations asynchronously in the background. The following batch models are supported:
GoogleGenAiBatchChatModelGoogleGenAiBatchEmbeddingModelGoogleGenAiBatchImageModel
You can create batch jobs inline or from an uploaded file on Google Cloud.
GoogleGenAiBatchChatModel batchChatModel = GoogleGenAiBatchChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.build();
BatchResponse<ChatResponse> batchResponse = batchChatModel.submit(
"My Batch Job",
List.of(
ChatRequest.builder().messages(UserMessage.from("What is 2+2?")).build(),
ChatRequest.builder().messages(UserMessage.from("What is the capital of France?")).build()
)
);
System.out.println("Batch Job ID: " + batchResponse.batchId());
You can then retrieve the status and results of the job using batchChatModel.retrieve(batchResponse.batchId()).
Grounding Metadata
If you enable Google Search grounding or use a Vertex AI Search datastore, the Google Gen AI chat model exposes the native GroundingMetadata directly in the ChatResponse. You can retrieve it through the response metadata via the underlying raw GenerateContentResponse.
ChatModel gemini = GoogleGenAiChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.enableGoogleSearch(true)
.build();
ChatResponse response = gemini.chat(ChatRequest.builder()
.messages(UserMessage.from("Who won the super bowl in 2024?"))
.build());
GoogleGenAiChatResponseMetadata metadata =
(GoogleGenAiChatResponseMetadata) response.metadata();
if (metadata.rawResponse() != null
&& metadata.rawResponse().candidates() != null
&& !metadata.rawResponse().candidates().isEmpty()) {
var groundingMetadata = metadata.rawResponse().candidates().get(0).groundingMetadata();
if (groundingMetadata != null && groundingMetadata.webSearchQueries() != null) {
System.out.println("Search Queries: " + groundingMetadata.webSearchQueries());
}
}
Custom Labels
You can apply custom key-value labels to your Google Gen AI requests, which can be useful for billing, metrics, and tracking purposes. Custom labels are supported by:
GoogleGenAiChatModelGoogleGenAiStreamingChatModelGoogleGenAiBatchChatModelGoogleGenAiImageModelGoogleGenAiBatchImageModel
ChatModel gemini = GoogleGenAiChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.labels(Map.of("environment", "production", "team", "backend"))
.build();
File API
The Google Gen AI integration provides the GoogleGenAiFiles utility to upload and manage files on Google's servers. This is particularly useful for passing large multimodal inputs (like long videos, audio files, or extensive PDFs) that might exceed standard request limits.
GoogleGenAiFiles fileApi = GoogleGenAiFiles.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.build();
String uploadedFileUri = fileApi.uploadFile(
Paths.get("path/to/my-video.mp4"),
"video/mp4",
"My Video Demo"
);
// You can now use this URI in your chat requests
ChatModel gemini = GoogleGenAiChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.build();
ChatResponse response = gemini.chat(ChatRequest.builder()
.messages(UserMessage.from(
VideoContent.from(uploadedFileUri, "video/mp4"),
TextContent.from("What happens in this video?")
))
.build());
Multimodality (Audio, Video, PDF)
The integration fully supports LangChain4j's multimodal content types. The underlying GoogleGenAiContentMapper automatically converts them into the appropriate Gemini Part objects.
ChatModel gemini = GoogleGenAiChatModel.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.build();
ChatResponse response = gemini.chat(ChatRequest.builder()
.messages(UserMessage.from(
AudioContent.from("https://example.com/audio.mp3"),
PdfFileContent.from("https://example.com/document.pdf"),
TextContent.from("Summarize the document and the audio recording.")
))
.build());
Token Count Estimator
You can accurately estimate the number of tokens in your prompts and messages using the GoogleGenAiTokenCountEstimator, which uses the official SDK's counting endpoints.
TokenCountEstimator estimator = GoogleGenAiTokenCountEstimator.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.modelName("gemini-2.5-flash")
.build();
int tokenCount = estimator.estimateTokenCount("How many tokens is this sentence?");
System.out.println("Tokens: " + tokenCount);
Model Catalog
You can query the list of available Gemini models programmatically using the GoogleGenAiModelCatalog. This is helpful for discovering model capabilities, context windows, and supported methods dynamically.
GoogleGenAiModelCatalog catalog = GoogleGenAiModelCatalog.builder()
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
.build();
List<Model> availableModels = catalog.listModels();
availableModels.forEach(model -> {
System.out.println("Model Name: " + model.name());
System.out.println("Supported Generation Methods: " + model.supportedGenerationMethods());
});