Ollama
What is Ollama?
Ollama is an advanced AI tool that allows users to easily set up and run large language models locally (in CPU and GPU modes). With Ollama, users can leverage powerful language models such as Llama 2 and even customize and create their own models. Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. It optimizes setup and configuration details, including GPU usage.
For more details about Ollama, check these out:
Talks
Watch this presentation at Docker Con 23:
Watch this intro by Code to the Moon:
Get started
To get started, add the following dependencies to your project's pom.xml
:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-ollama</artifactId>
<version>1.0.0-beta1</version>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>ollama</artifactId>
<version>1.19.1</version>
</dependency>
Try out a simple chat example code when Ollama runs in testcontainers:
import com.github.dockerjava.api.DockerClient;
import com.github.dockerjava.api.model.Image;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.testcontainers.DockerClientFactory;
import org.testcontainers.containers.Container;
import org.testcontainers.ollama.OllamaContainer;
import org.testcontainers.utility.DockerImageName;
import java.io.IOException;
import java.util.List;
public class OllamaChatExample {
private static final Logger log = LoggerFactory.getLogger(OllamaChatExample.class);
static final String OLLAMA_IMAGE = "ollama/ollama:latest";
static final String TINY_DOLPHIN_MODEL = "tinydolphin";
static final String DOCKER_IMAGE_NAME = "tc-ollama/ollama:latest-tinydolphin";
public static void main(String[] args) {
// Create and start the Ollama container
DockerImageName dockerImageName = DockerImageName.parse(OLLAMA_IMAGE);
DockerClient dockerClient = DockerClientFactory.instance().client();
List<Image> images = dockerClient.listImagesCmd().withReferenceFilter(DOCKER_IMAGE_NAME).exec();
OllamaContainer ollama;
if (images.isEmpty()) {
ollama = new OllamaContainer(dockerImageName);
} else {
ollama = new OllamaContainer(DockerImageName.parse(DOCKER_IMAGE_NAME).asCompatibleSubstituteFor(OLLAMA_IMAGE));
}
ollama.start();
// Pull the model and create an image based on the selected model.
try {
log.info("Start pulling the '{}' model ... would take several minutes ...", TINY_DOLPHIN_MODEL);
Container.ExecResult r = ollama.execInContainer("ollama", "pull", TINY_DOLPHIN_MODEL);
log.info("Model pulling competed! {}", r);
} catch (IOException | InterruptedException e) {
throw new RuntimeException("Error pulling model", e);
}
ollama.commitToImage(DOCKER_IMAGE_NAME);
// Build the ChatLanguageModel
ChatLanguageModel model = OllamaChatModel.builder()
.baseUrl(ollama.getEndpoint())
.temperature(0.0)
.logRequests(true)
.logResponses(true)
.modelName(TINY_DOLPHIN_MODEL)
.build();
// Example usage
String answer = model.generate("Provide 3 short bullet points explaining why Java is awesome");
System.out.println(answer);
// Stop the Ollama container
ollama.stop();
}
}
If your Ollama runs locally, you can also try below chat example code:
class OllamaChatLocalModelTest {
static String MODEL_NAME = "llama3.2"; // try other local ollama model names
static String BASE_URL = "http://localhost:11434"; // local ollama base url
public static void main(String[] args) {
ChatLanguageModel model = OllamaChatModel.builder()
.baseUrl(BASE_URL)
.modelName(MODEL_NAME)
.build();
String answer = model.generate("List top 10 cites in China");
System.out.println(answer);
model = OllamaChatModel.builder()
.baseUrl(BASE_URL)
.modelName(MODEL_NAME)
.responseFormat(JSON)
.build();
String json = model.generate("List top 10 cites in US");
System.out.println(json);
}
}
Try out a simple streaming chat example code when Ollama runs in testcontainers:
import com.github.dockerjava.api.DockerClient;
import com.github.dockerjava.api.model.Image;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.StreamingResponseHandler;
import dev.langchain4j.model.chat.StreamingChatLanguageModel;
import dev.langchain4j.model.ollama.OllamaStreamingChatModel;
import dev.langchain4j.model.output.Response;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.testcontainers.DockerClientFactory;
import org.testcontainers.containers.Container;
import org.testcontainers.ollama.OllamaContainer;
import org.testcontainers.utility.DockerImageName;
import java.io.IOException;
import java.util.List;
import java.util.concurrent.CompletableFuture;
public class OllamaStreamingChatExample {
private static final Logger log = LoggerFactory.getLogger(OllamaStreamingChatExample.class);
static final String OLLAMA_IMAGE = "ollama/ollama:latest";
static final String TINY_DOLPHIN_MODEL = "tinydolphin";
static final String DOCKER_IMAGE_NAME = "tc-ollama/ollama:latest-tinydolphin";
public static void main(String[] args) {
DockerImageName dockerImageName = DockerImageName.parse(OLLAMA_IMAGE);
DockerClient dockerClient = DockerClientFactory.instance().client();
List<Image> images = dockerClient.listImagesCmd().withReferenceFilter(DOCKER_IMAGE_NAME).exec();
OllamaContainer ollama;
if (images.isEmpty()) {
ollama = new OllamaContainer(dockerImageName);
} else {
ollama = new OllamaContainer(DockerImageName.parse(DOCKER_IMAGE_NAME).asCompatibleSubstituteFor(OLLAMA_IMAGE));
}
ollama.start();
try {
log.info("Start pulling the '{}' model ... would take several minutes ...", TINY_DOLPHIN_MODEL);
Container.ExecResult r = ollama.execInContainer("ollama", "pull", TINY_DOLPHIN_MODEL);
log.info("Model pulling competed! {}", r);
} catch (IOException | InterruptedException e) {
throw new RuntimeException("Error pulling model", e);
}
ollama.commitToImage(DOCKER_IMAGE_NAME);
StreamingChatLanguageModel model = OllamaStreamingChatModel.builder()
.baseUrl(ollama.getEndpoint())
.temperature(0.0)
.logRequests(true)
.logResponses(true)
.modelName(TINY_DOLPHIN_MODEL)
.build();
String userMessage = "Write a 100-word poem about Java and AI";
CompletableFuture<Response<AiMessage>> futureResponse = new CompletableFuture<>();
model.generate(userMessage, new StreamingResponseHandler<AiMessage>() {
@Override
public void onNext(String token) {
System.out.print(token);
}
@Override
public void onComplete(Response<AiMessage> response) {
futureResponse.complete(response);
}
@Override
public void onError(Throwable error) {
futureResponse.completeExceptionally(error);
}
});
futureResponse.join();
ollama.stop();
}
}
If your Ollama runs locally, you can also try below streaming chat example code:
class OllamaStreamingChatLocalModelTest {
static String MODEL_NAME = "llama3.2"; // try other local ollama model names
static String BASE_URL = "http://localhost:11434"; // local ollama base url
public static void main(String[] args) {
StreamingChatLanguageModel model = OllamaStreamingChatModel.builder()
.baseUrl(BASE_URL)
.modelName(MODEL_NAME)
.temperature(0.0)
.build();
String userMessage = "Write a 100-word poem about Java and AI";
CompletableFuture<Response<AiMessage>> futureResponse = new CompletableFuture<>();
model.generate(userMessage, new StreamingResponseHandler<>() {
@Override
public void onNext(String token) {
System.out.print(token);
}
@Override
public void onComplete(Response<AiMessage> response) {
futureResponse.complete(response);
}
@Override
public void onError(Throwable error) {
futureResponse.completeExceptionally(error);
}
});
futureResponse.join();
}
}
Parameters
OllamaChatModel
and OllamaStreamingChatModel
classes can be instantiated with the following
params with the builder pattern:
Parameter | Description | Type | Example |
---|---|---|---|
baseUrl | The base URL of Ollama server. | String | http://localhost:11434 |
modelName | The name of the model to use from Ollama server. | String | |
temperature | Controls the randomness of the generated responses. Higher values (e.g., 1.0) result in more diverse output, while lower values (e.g., 0.2) produce more deterministic responses. | Double | |
topK | Specifies the number of highest probability tokens to consider for each step during generation. | Integer | |
topP | Controls the diversity of the generated responses by setting a threshold for the cumulative probability of top tokens. | Double | |
repeatPenalty | Penalizes the model for repeating similar tokens in the generated output. | Double | |
seed | Sets the random seed for reproducibility of generated responses. | Integer | |
numPredict | The number of predictions to generate for each input prompt. | Integer | |
stop | A list of strings that, if generated, will mark the end of the response. | List<String> | |
format | The desired format for the generated output. (Depracated see responseFormat) | String | |
responseFormat | The desired format for the generated output. TEXT or JSON with optional JSON Schema definition | ResponseFormat | |
supportedCapabilities | Set of model capabilities used by AiServices API (only OllamaChatModel supported) | Capability | RESPONSE_FORMAT_JSON_SCHEMA |
timeout | The maximum time allowed for the API call to complete. | Duration | PT60S |
maxRetries | The maximum number of retries in case of API call failure. | Integer |
Usage Example
OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3.1")
.temperature(0.8)
.timeout(Duration.ofSeconds(60))
.build();
Usage Example with Spring Boot
langchain4j.ollama.chat-model.base-url=http://localhost:11434
langchain4j.ollama.chat-model.model-name=llama3.1
langchain4j.ollama.chat-model.temperature=0.8
langchain4j.ollama.chat-model.timeout=PT60S
JSON mode
JSON mode using builder
OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3.1")
.responseFormat(ResponseFormat.JSON)
.temperature(0.8)
.timeout(Duration.ofSeconds(60))
.build();
JSON mode using builder deprecated
OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3.1")
.format("json")
.temperature(0.8)
.timeout(Duration.ofSeconds(60))
.build();
Structured outputs
JSON schema definition using builder
OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3.1")
.responseFormat(ResponseFormat.builder()
.type(ResponseFormatType.JSON)
.jsonSchema(JsonSchema.builder().rootElement(JsonObjectSchema.builder()
.addProperty("name", JsonStringSchema.builder().build())
.addProperty("capital", JsonStringSchema.builder().build())
.addProperty(
"languages",
JsonArraySchema.builder()
.items(JsonStringSchema.builder().build())
.build())
.required("name", "capital", "languages")
.build())
.build())
.build())
.temperature(0.8)
.timeout(Duration.ofSeconds(60))
.build();
JSON Schema using ChatRequest API
OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("llama3.1")
.build();
ChatResponse chatResponse = ollamaChatModel.chat(ChatRequest.builder()
.messages(userMessage("Tell me about Canada."))
.responseFormat(ResponseFormat.builder()
.type(ResponseFormatType.JSON)
.jsonSchema(JsonSchema.builder().rootElement(JsonObjectSchema.builder()
.addProperty("name", JsonStringSchema.builder().build())
.addProperty("capital", JsonStringSchema.builder().build())
.addProperty(
"languages",
JsonArraySchema.builder()
.items(JsonStringSchema.builder().build())
.build())
.required("name", "capital", "languages")
.build())
.build())
.build())
.build());
String jsonFormattedResponse = chatResponse.aiMessage().text();
/* jsonFormattedResponse value:
{
"capital" : "Ottawa",
"languages" : [ "English", "French" ],
"name" : "Canada"
}
*/
Json Schema with AiServices
When OllamaChatModel
is created with supported capability RESPONSE_FORMAT_JSON_SCHEMA
, AIService
will automatically generate schema from interface return value. More about it in Structured Outputs
OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("...")
.modelName("...")
.supportedCapabilities(RESPONSE_FORMAT_JSON_SCHEMA)
.build();
Custom Messages
The OllamaChatModel
and OllamaStreamingChatModel
support custom chat messages in addition to the standard chat message types.
Custom messages can be used to specify a message with arbitrary attributes. This can be useful for
some models like Granite Guardian that make use of
non-standard messages to assess the retrieved context used for Retrieval-Augmented Generation (RAG).
Let's see how we can use a CustomMessage
to specify a message with arbitrary attributes:
OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://localhost:11434")
.modelName("granite3-guardian")
.build();
String retrievedContext = "One significant part of treaty making is that signing a treaty implies recognition that the other side is a sovereign state and that the agreement being considered is enforceable under international law. Hence, nations can be very careful about terming an agreement to be a treaty. For example, within the United States, agreements between states are compacts and agreements between states and the federal government or between agencies of the government are memoranda of understanding.";
List<ChatMessage> messages = List.of(
SystemMessage.from("context_relevance"),
UserMessage.from("What is the history of treaty making?"),
CustomMessage.from(Map.of(
"role", "context",
"content", retrievedContext
))
);
ChatResponse chatResponse = ollamaChatModel.chat(ChatRequest.builder().messages(messages).build());
System.out.println(chatResponse.aiMessage().text()); // "Yes" (meaning risk detected by Granite Guardian)