Skip to main content

Google AI Gemini

https://ai.google.dev/gemini-api/docs

Maven Dependency


<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-google-ai-gemini</artifactId>
<version>0.36.0</version>
</dependency>

API Key

Get an API key for free here: https://ai.google.dev/gemini-api/docs/api-key .

Models available

Check the list of available models in the documentation.

  • gemini-1.5-flash
  • gemini-1.5-pro
  • gemini-1.0-pro

GoogleAiGeminiChatModel

The usual generate(...) methods are available:

ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-1.5-flash")
...
.build();

String response = gemini.generate("Hello Gemini!");

As well, as the ChatResponse chat(ChatRequest req) method:

ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-1.5-flash")
.build();

ChatResponse chatResponse = gemini.chat(ChatRequest.builder()
.messages(UserMessage.from(
"How many R's are there in the word 'strawberry'?"))
.build());

String response = chatResponse.aiMessage().text();

Configuring

ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-1.5-flash")
.temperature(1.0)
.topP(0.95)
.topK(64)
.maxOutputTokens(8192)
.timeout(Duration.ofSeconds(60))
.candidateCount(1)
.responseFormat(ResponseFormat.JSON) // or .responseFormat(ResponseFormat.builder()...build())
.stopSequences(List.of(...))
.toolConfig(GeminiFunctionCallingConfig.builder()...build()) // or below
.toolConfig(GeminiMode.ANY, List.of("fnOne", "fnTwo"))
.allowCodeExecution(true)
.includeCodeExecution(output)
.logRequestsAndResponses(true)
.safetySettings(List<GeminiSafetySetting> or Map<GeminiHarmCategory, GeminiHarmBlockThreshold>)
.build();

GoogleAiGeminiStreamingChatModel

The GoogleAiGeminiStreamingChatModel allows streaming the text of a response token by token. The response must be managed by a StreamingResponseHandler.

StreamingChatLanguageModel gemini = GoogleAiGeminiStreamingChatModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-1.5-flash")
.build();

CompletableFuture<Response<AiMessage>> futureResponse = new CompletableFuture<>();

gemini.generate("Tell me a joke about Java", new StreamingResponseHandler<AiMessage>() {
@Override
public void onNext(String token) {
System.out.print(token);
}

@Override
public void onComplete(Response<AiMessage> response) {
futureResponse.complete(response);
}

@Override
public void onError(Throwable error) {
futureResponse.completeExceptionally(error);
}
});

futureResponse.join();

Tools

Tools (aka Function Calling) is supported, including parallel calls. You can either use the generate(...) methods that take a single or a list of tool specifications to let Gemini know it can request a function to be called. Or you can use LangChain4j's AiServices to define them.

Here is an example of a weather tool, using AiServices:

record WeatherForecast(
String location,
String forecast,
int temperature) {}

class WeatherForecastService {
@Tool("Get the weather forecast for a location")
WeatherForecast getForecast(
@P("Location to get the forecast for") String location) {
if (location.equals("Paris")) {
return new WeatherForecast("Paris", "sunny", 20);
} else if (location.equals("London")) {
return new WeatherForecast("London", "rainy", 15);
} else if (location.equals("Tokyo")) {
return new WeatherForecast("Tokyo", "warm", 32);
} else {
return new WeatherForecast("Unknown", "unknown", 0);
}
}
}

interface WeatherAssistant {
String chat(String userMessage);
}

WeatherForecastService weatherForecastService =
new WeatherForecastService();

ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-1.5-flash")
.temperature(0.0)
.build();

WeatherAssistant weatherAssistant =
AiServices.builder(WeatherAssistant.class)
.chatLanguageModel(gemini)
.tools(weatherForecastService)
.build();

String tokyoWeather = weatherAssistant.chat(
"What is the weather forecast for Tokyo?");

System.out.println("Gemini> " + tokyoWeather);
// Gemini> The weather forecast for Tokyo is warm
// with a temperature of 32 degrees.

Structured output

JSON mode

You can force Gemini to reply in JSON:

ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-1.5-flash")
.responseFormat(ResponseFormat.JSON)
.build();

String roll = gemini.generate("Roll a 6-sided dice");

System.out.println(roll);
// {"roll": "3"}

A system prompt can further describe what the JSON output should look like. Gemini normally follows the suggested schema, but it is not guaranteed. If you want a guaranteed application of a JSON schema, you should define a response format, as explained in the next section.

Response format / response schema

You can specify: a ResponseFormat via the responseFormat() builder method.

Let's have a look at an example to define a JSON schema for a recipe:

ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-1.5-flash")
.responseFormat(ResponseFormat.builder()
.type(JSON)
.jsonSchema(JsonSchema.builder()
.rootElement(JsonObjectSchema.builder()
.addStringProperty("title")
.addIntegerProperty("preparationTimeMinutes")
.addProperty("ingredients", JsonArraySchema.builder()
.items(new JsonStringSchema())
.build())
.addProperty("steps", JsonArraySchema.builder()
.items(new JsonStringSchema())
.build())
.build())
.build())
.build())
.build();

String recipeResponse = gemini.generate(
"Suggest a dessert recipe with strawberries");

System.out.println(recipeResponse);

Instead of building the JSON schema yourself, you can also derive a schema from your own Java classes:

ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-1.5-flash")
.temperature(2.0)
.responseFormat(ResponseFormat.builder()
.type(JSON)
.jsonSchema(JsonSchemas.jsonSchemaFrom(TripItinerary.class).get())
.build())
.build();

Type-safe data extraction from free form text

Large Language Models are great at extracting structured information out of unstructured text.

In the following example, we retrieve a type-safe WeatherForecast object from a weather forecast text, thanks to AiServices:

// A type-safe / strongly-typed object 
// representing the weather forecast

record WeatherForecast(
@Description("minimum temperature")
Integer minTemperature,
@Description("maximum temperature")
Integer maxTemperature,
@Description("chances of rain")
boolean rain
) { }

// An interface contract, to interact with Gemini

interface WeatherForecastAssistant {
WeatherForecast extract(String forecast);
}

// Let's extract the data:

ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-1.5-flash")
.build();

WeatherForecastAssistant forecastAssistant =
AiServices.builder(WeatherForecastAssistant.class)
.chatLanguageModel(gemini)
.build();

WeatherForecast forecast = forecastAssistant.extract("""
Morning: The day dawns bright and clear in Osaka, with crisp
autumn air and sunny skies. Expect temperatures to hover
around 18°C (64°F) as you head out for your morning stroll
through Namba.
Afternoon: The sun continues to shine as the city buzzes with
activity. Temperatures climb to a comfortable 22°C (72°F).
Enjoy a leisurely lunch at one of Osaka's many outdoor cafes,
or take a boat ride on the Okawa River to soak in the beautiful
scenery.
Evening: As the day fades, expect clear skies and a slight chill
in the air. Temperatures drop to 15°C (59°F). A cozy dinner at a
traditional Izakaya will be the perfect way to end your day in
Osaka.
Overall: A beautiful autumn day in Osaka awaits, perfect for
exploring the city's vibrant streets, enjoying the local cuisine,
and soaking in the sights.
Don't forget: Pack a light jacket for the evening and wear
comfortable shoes for all the walking you'll be doing.
""");

Python code execution

Beyond function calling, Google AI Gemini allows to create and execute Python code in a sandboxed environment. This is particularly interesting for situations where more advanced calculations or logic is needed.

ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-1.5-flash")
.allowCodeExecution(true)
.includeCodeExecutionOutput(true)
.build();

There are 2 builder methods:

  • allowCodeExecution(true): to let Gemini know it can do some Python coding
  • includeCodeExecutionOutput(true): if you want to see the actual Python script it came up with, and the output of its execution
Response<AiMessage> mathQuizz = gemini.generate(
SystemMessage.from("""
You are an expert mathematician.
When asked a math problem or logic problem,
you can solve it by creating a Python program,
and execute it to return the result.
"""),
UserMessage.from("""
Implement the Fibonacci and Ackermann functions.
What is the result of `fibonacci(22)` - ackermann(3, 4)?
""")
);

Gemini will craft a Python script, execute it on its server, and return the result. Since we asked to see the code and output of the execution, the answer will look as follows:

Code executed:
```python
def fibonacci(n):
if n <= 1:
return n
else:
return fibonacci(n-1) + fibonacci(n-2)

def ackermann(m, n):
if m == 0:
return n + 1
elif n == 0:
return ackermann(m - 1, 1)
else:
return ackermann(m - 1, ackermann(m, n - 1))

print(fibonacci(22) - ackermann(3, 4))
```
Output:
```
17586
```
The result of `fibonacci(22) - ackermann(3, 4)` is **17586**.

I implemented the Fibonacci and Ackermann functions in Python.
Then I called `fibonacci(22) - ackermann(3, 4)` and printed the result.

If we hadn't asked for the code / output, we would have received only the following text:

The result of `fibonacci(22) - ackermann(3, 4)` is **17586**.

I implemented the Fibonacci and Ackermann functions in Python.
Then I called `fibonacci(22) - ackermann(3, 4)` and printed the result.

Multimodality

Gemini is a multimodal model, which means it outputs text, but in input, it accepts other modalities besides text, like:

  • pictures (ImageContent)
  • videos (VideoContent)
  • audio files (AudioContent)
  • PDF files (PdfFileContent)
  • text documents (TextFileContent)

The example below shows how to mix a text prompt, with an image, and a Markdown document:

// README.md markdown file from LangChain4j's project Github repos
String base64Text = b64encoder.encodeToString(readBytes(
"https://github.com/langchain4j/langchain4j/blob/main/README.md"));

// PNG of the cute colorful parrot mascot of the LangChain4j project
String base64Img = b64encoder.encodeToString(readBytes(
"https://avatars.githubusercontent.com/u/132277850?v=4"));

ChatLanguageModel gemini = GoogleAiGeminiChatModel.builder()
.apiKey(System.getenv("GEMINI_AI_KEY"))
.modelName("gemini-1.5-flash")
.build();

Response<AiMessage> response = gemini.generate(
UserMessage.from(
TextFileContent.from(base64Text, "text/x-markdown"),
ImageContent.from(base64Img, "image/png"),
TextContent.from("""
Do you think this logo fits well
with the project description?
""")
)
);

Learn more

If you're interested in learning more about the Google AI Gemini model, please have a look at its documentation.