Skip to main content

Ollama

What is Ollama?

Ollama is an advanced AI tool that allows users to easily set up and run large language models locally (in CPU and GPU modes). With Ollama, users can leverage powerful language models such as Llama 2 and even customize and create their own models. Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. It optimizes setup and configuration details, including GPU usage.

For more details about Ollama, check these out:

Talks

Watch this presentation at Docker Con 23:

Watch this intro by Code to the Moon:

Get started

To get started, add the following dependencies to your project's pom.xml:


<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-ollama</artifactId>
<version>1.0.0-beta1</version>
</dependency>

<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>ollama</artifactId>
<version>1.19.1</version>
</dependency>

Try out a simple chat example code when Ollama runs in testcontainers:

import com.github.dockerjava.api.DockerClient;
import com.github.dockerjava.api.model.Image;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.testcontainers.DockerClientFactory;
import org.testcontainers.containers.Container;
import org.testcontainers.ollama.OllamaContainer;
import org.testcontainers.utility.DockerImageName;

import java.io.IOException;
import java.util.List;

public class OllamaChatExample {

private static final Logger log = LoggerFactory.getLogger(OllamaChatExample.class);

static final String OLLAMA_IMAGE = "ollama/ollama:latest";
static final String TINY_DOLPHIN_MODEL = "tinydolphin";
static final String DOCKER_IMAGE_NAME = "tc-ollama/ollama:latest-tinydolphin";

public static void main(String[] args) {
// Create and start the Ollama container
DockerImageName dockerImageName = DockerImageName.parse(OLLAMA_IMAGE);
DockerClient dockerClient = DockerClientFactory.instance().client();
List<Image> images = dockerClient.listImagesCmd().withReferenceFilter(DOCKER_IMAGE_NAME).exec();
OllamaContainer ollama;
if (images.isEmpty()) {
ollama = new OllamaContainer(dockerImageName);
} else {
ollama = new OllamaContainer(DockerImageName.parse(DOCKER_IMAGE_NAME).asCompatibleSubstituteFor(OLLAMA_IMAGE));
}
ollama.start();

// Pull the model and create an image based on the selected model.
try {
log.info("Start pulling the '{}' model ... would take several minutes ...", TINY_DOLPHIN_MODEL);
Container.ExecResult r = ollama.execInContainer("ollama", "pull", TINY_DOLPHIN_MODEL);
log.info("Model pulling competed! {}", r);
} catch (IOException | InterruptedException e) {
throw new RuntimeException("Error pulling model", e);
}
ollama.commitToImage(DOCKER_IMAGE_NAME);

// Build the ChatLanguageModel
ChatLanguageModel model = OllamaChatModel.builder()
.baseUrl(ollama.getEndpoint())
.temperature(0.0)
.logRequests(true)
.logResponses(true)
.modelName(TINY_DOLPHIN_MODEL)
.build();

// Example usage
String answer = model.generate("Provide 3 short bullet points explaining why Java is awesome");
System.out.println(answer);

// Stop the Ollama container
ollama.stop();
}
}

If your Ollama runs locally, you can also try below chat example code:

class OllamaChatLocalModelTest {
static String MODEL_NAME = "llama3.2"; // try other local ollama model names
static String BASE_URL = "http://localhost:11434"; // local ollama base url

public static void main(String[] args) {
ChatLanguageModel model = OllamaChatModel.builder()
.baseUrl(BASE_URL)
.modelName(MODEL_NAME)
.build();
String answer = model.generate("List top 10 cites in China");
System.out.println(answer);

model = OllamaChatModel.builder()
.baseUrl(BASE_URL)
.modelName(MODEL_NAME)
.responseFormat(JSON)
.build();

String json = model.generate("List top 10 cites in US");
System.out.println(json);
}
}

Try out a simple streaming chat example code when Ollama runs in testcontainers:

import com.github.dockerjava.api.DockerClient;
import com.github.dockerjava.api.model.Image;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.StreamingResponseHandler;
import dev.langchain4j.model.chat.StreamingChatLanguageModel;
import dev.langchain4j.model.ollama.OllamaStreamingChatModel;
import dev.langchain4j.model.output.Response;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.testcontainers.DockerClientFactory;
import org.testcontainers.containers.Container;
import org.testcontainers.ollama.OllamaContainer;
import org.testcontainers.utility.DockerImageName;

import java.io.IOException;
import java.util.List;
import java.util.concurrent.CompletableFuture;

public class OllamaStreamingChatExample {

private static final Logger log = LoggerFactory.getLogger(OllamaStreamingChatExample.class);

static final String OLLAMA_IMAGE = "ollama/ollama:latest";
static final String TINY_DOLPHIN_MODEL = "tinydolphin";
static final String DOCKER_IMAGE_NAME = "tc-ollama/ollama:latest-tinydolphin";

public static void main(String[] args) {
DockerImageName dockerImageName = DockerImageName.parse(OLLAMA_IMAGE);
DockerClient dockerClient = DockerClientFactory.instance().client();
List<Image> images = dockerClient.listImagesCmd().withReferenceFilter(DOCKER_IMAGE_NAME).exec();
OllamaContainer ollama;
if (images.isEmpty()) {
ollama = new OllamaContainer(dockerImageName);
} else {
ollama = new OllamaContainer(DockerImageName.parse(DOCKER_IMAGE_NAME).asCompatibleSubstituteFor(OLLAMA_IMAGE));
}
ollama.start();
try {
log.info("Start pulling the '{}' model ... would take several minutes ...", TINY_DOLPHIN_MODEL);
Container.ExecResult r = ollama.execInContainer("ollama", "pull", TINY_DOLPHIN_MODEL);
log.info("Model pulling competed! {}", r);
} catch (IOException | InterruptedException e) {
throw new RuntimeException("Error pulling model", e);
}
ollama.commitToImage(DOCKER_IMAGE_NAME);

StreamingChatLanguageModel model = OllamaStreamingChatModel.builder()
.baseUrl(ollama.getEndpoint())
.temperature(0.0)
.logRequests(true)
.logResponses(true)
.modelName(TINY_DOLPHIN_MODEL)
.build();

String userMessage = "Write a 100-word poem about Java and AI";

CompletableFuture<Response<AiMessage>> futureResponse = new CompletableFuture<>();
model.generate(userMessage, new StreamingResponseHandler<AiMessage>() {

@Override
public void onNext(String token) {
System.out.print(token);
}

@Override
public void onComplete(Response<AiMessage> response) {
futureResponse.complete(response);
}

@Override
public void onError(Throwable error) {
futureResponse.completeExceptionally(error);
}
});

futureResponse.join();
ollama.stop();
}
}

If your Ollama runs locally, you can also try below streaming chat example code:

class OllamaStreamingChatLocalModelTest {
static String MODEL_NAME = "llama3.2"; // try other local ollama model names
static String BASE_URL = "http://localhost:11434"; // local ollama base url

public static void main(String[] args) {
StreamingChatLanguageModel model = OllamaStreamingChatModel.builder()
.baseUrl(BASE_URL)
.modelName(MODEL_NAME)
.temperature(0.0)
.build();
String userMessage = "Write a 100-word poem about Java and AI";

CompletableFuture<Response<AiMessage>> futureResponse = new CompletableFuture<>();
model.generate(userMessage, new StreamingResponseHandler<>() {

@Override
public void onNext(String token) {
System.out.print(token);
}

@Override
public void onComplete(Response<AiMessage> response) {
futureResponse.complete(response);
}

@Override
public void onError(Throwable error) {
futureResponse.completeExceptionally(error);
}
});

futureResponse.join();
}
}

Parameters

OllamaChatModel and OllamaStreamingChatModel classes can be instantiated with the following params with the builder pattern:

ParameterDescriptionTypeExample
baseUrlThe base URL of Ollama server.Stringhttp://localhost:11434
modelNameThe name of the model to use from Ollama server.String
temperatureControls the randomness of the generated responses. Higher values (e.g., 1.0) result in more diverse output, while lower values (e.g., 0.2) produce more deterministic responses.Double
topKSpecifies the number of highest probability tokens to consider for each step during generation.Integer
topPControls the diversity of the generated responses by setting a threshold for the cumulative probability of top tokens.Double
repeatPenaltyPenalizes the model for repeating similar tokens in the generated output.Double
seedSets the random seed for reproducibility of generated responses.Integer
numPredictThe number of predictions to generate for each input prompt.Integer
stopA list of strings that, if generated, will mark the end of the response.List<String>
formatThe desired format for the generated output. (Depracated see responseFormat)String
responseFormatThe desired format for the generated output. TEXT or JSON with optional JSON Schema definitionResponseFormat
supportedCapabilitiesSet of model capabilities used by AiServices API (only OllamaChatModel supported)CapabilityRESPONSE_FORMAT_JSON_SCHEMA
timeoutThe maximum time allowed for the API call to complete.DurationPT60S
maxRetriesThe maximum number of retries in case of API call failure.Integer

Usage Example

OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3.1")
.temperature(0.8)
.timeout(Duration.ofSeconds(60))
.build();

Usage Example with Spring Boot

langchain4j.ollama.chat-model.base-url=http://localhost:11434
langchain4j.ollama.chat-model.model-name=llama3.1
langchain4j.ollama.chat-model.temperature=0.8
langchain4j.ollama.chat-model.timeout=PT60S

JSON mode

JSON mode using builder

OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3.1")
.responseFormat(ResponseFormat.JSON)
.temperature(0.8)
.timeout(Duration.ofSeconds(60))
.build();

JSON mode using builder deprecated

OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3.1")
.format("json")
.temperature(0.8)
.timeout(Duration.ofSeconds(60))
.build();

Structured outputs

JSON schema definition using builder

OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3.1")
.responseFormat(ResponseFormat.builder()
.type(ResponseFormatType.JSON)
.jsonSchema(JsonSchema.builder().rootElement(JsonObjectSchema.builder()
.addProperty("name", JsonStringSchema.builder().build())
.addProperty("capital", JsonStringSchema.builder().build())
.addProperty(
"languages",
JsonArraySchema.builder()
.items(JsonStringSchema.builder().build())
.build())
.required("name", "capital", "languages")
.build())
.build())
.build())
.temperature(0.8)
.timeout(Duration.ofSeconds(60))
.build();

JSON Schema using ChatRequest API

OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3.1")
.build();

ChatResponse chatResponse = ollamaChatModel.chat(ChatRequest.builder()
.messages(userMessage("Tell me about Canada."))
.responseFormat(ResponseFormat.builder()
.type(ResponseFormatType.JSON)
.jsonSchema(JsonSchema.builder().rootElement(JsonObjectSchema.builder()
.addProperty("name", JsonStringSchema.builder().build())
.addProperty("capital", JsonStringSchema.builder().build())
.addProperty(
"languages",
JsonArraySchema.builder()
.items(JsonStringSchema.builder().build())
.build())
.required("name", "capital", "languages")
.build())
.build())
.build())
.build());

String jsonFormattedResponse = chatResponse.aiMessage().text();

/* jsonFormattedResponse value:

{
"capital" : "Ottawa",
"languages" : [ "English", "French" ],
"name" : "Canada"
}

*/


Json Schema with AiServices

When OllamaChatModel is created with supported capability RESPONSE_FORMAT_JSON_SCHEMA, AIService will automatically generate schema from interface return value. More about it in Structured Outputs

OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("...")
.modelName("...")
.supportedCapabilities(RESPONSE_FORMAT_JSON_SCHEMA)
.build();

Custom Messages

The OllamaChatModel and OllamaStreamingChatModel support custom chat messages in addition to the standard chat message types. Custom messages can be used to specify a message with arbitrary attributes. This can be useful for some models like Granite Guardian that make use of non-standard messages to assess the retrieved context used for Retrieval-Augmented Generation (RAG).

Let's see how we can use a CustomMessage to specify a message with arbitrary attributes:

OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("granite3-guardian")
.build();

String retrievedContext = "One significant part of treaty making is that signing a treaty implies recognition that the other side is a sovereign state and that the agreement being considered is enforceable under international law. Hence, nations can be very careful about terming an agreement to be a treaty. For example, within the United States, agreements between states are compacts and agreements between states and the federal government or between agencies of the government are memoranda of understanding.";

List<ChatMessage> messages = List.of(
SystemMessage.from("context_relevance"),
UserMessage.from("What is the history of treaty making?"),
CustomMessage.from(Map.of(
"role", "context",
"content", retrievedContext
))
);

ChatResponse chatResponse = ollamaChatModel.chat(ChatRequest.builder().messages(messages).build());

System.out.println(chatResponse.aiMessage().text()); // "Yes" (meaning risk detected by Granite Guardian)