Skip to main content

Ollama

What is Ollama?

Ollama is an advanced AI tool that allows users to easily set up and run large language models locally (in CPU and GPU modes). With Ollama, users can leverage powerful language models such as Llama 2 and even customize and create their own models. Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile. It optimizes setup and configuration details, including GPU usage.

For more details about Ollama, check these out:

Talks

Watch this presentation at Docker Con 23:

Watch this intro by Code to the Moon:

Get started

To get started, add the following dependencies to your project's pom.xml:


<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-ollama</artifactId>
<version>${lanchain4j-ollama.version}</version>
</dependency>

<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>testcontainers</artifactId>
<version>1.19.1</version>
</dependency>

Try out a simple chat example code:

import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import org.testcontainers.ollama.OllamaContainer;
import org.testcontainers.utility.DockerImageName;

public class OllamaChatExample {

public static void main(String[] args) {
// The model name to use (e.g., "orca-mini", "mistral", "llama2", "codellama", "phi", or
// "tinyllama")
String modelName = "orca-mini";

// Create and start the Ollama container
OllamaContainer ollama =
new OllamaContainer(DockerImageName.parse("langchain4j/ollama-" + modelName + ":latest")
.asCompatibleSubstituteFor("ollama/ollama"));
ollama.start();

// Build the ChatLanguageModel
ChatLanguageModel model =
OllamaChatModel.builder().baseUrl(baseUrl(ollama)).modelName(modelName).build();

// Example usage
String answer = model.generate("Provide 3 short bullet points explaining why Java is awesome");
System.out.println(answer);

// Stop the Ollama container
ollama.stop();
}

private static String baseUrl(GenericContainer<?> ollama) {
return String.format("http://%s:%d", ollama.getHost(), ollama.getFirstMappedPort());
}
}

Try out a simple streaming chat example code:

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.StreamingResponseHandler;
import dev.langchain4j.model.chat.StreamingChatLanguageModel;
import dev.langchain4j.model.ollama.OllamaStreamingChatModel;
import dev.langchain4j.model.output.Response;
import org.testcontainers.ollama.OllamaContainer;
import org.testcontainers.utility.DockerImageName;

import java.util.concurrent.CompletableFuture;

public class OllamaStreamingChatExample {

static String MODEL_NAME = "orca-mini"; // try "mistral", "llama2", "codellama" or "phi"
static String DOCKER_IMAGE_NAME = "langchain4j/ollama-" + MODEL_NAME + ":latest";

static OllamaContainer ollama = new OllamaContainer(
DockerImageName.parse(DOCKER_IMAGE_NAME).asCompatibleSubstituteFor("ollama/ollama"));

public static void main(String[] args) {
ollama.start();
StreamingChatLanguageModel model = OllamaStreamingChatModel.builder()
.baseUrl(String.format("http://%s:%d", ollama.getHost(), ollama.getMappedPort(PORT)))
.modelName(MODEL_NAME)
.temperature(0.0)
.build();

String userMessage = "Write a 100-word poem about Java and AI";

CompletableFuture<Response<AiMessage>> futureResponse = new CompletableFuture<>();
model.generate(userMessage, new StreamingResponseHandler<AiMessage>() {

@Override
public void onNext(String token) {
System.out.print(token);
}

@Override
public void onComplete(Response<AiMessage> response) {
futureResponse.complete(response);
}

@Override
public void onError(Throwable error) {
futureResponse.completeExceptionally(error);
}
});

futureResponse.join();
ollama.stop();
}
}

Parameters

OllamaChatModel and OllamaStreamingChatModel classes can be instantiated with the following params with the builder pattern:

ParameterDescriptionType
baseUrlThe base URL of Ollama server.String
modelNameThe name of the model to use from Ollama server.String
temperatureControls the randomness of the generated responses. Higher values (e.g., 1.0) result in more diverse output, while lower values (e.g., 0.2) produce more deterministic responses.Double
topKSpecifies the number of highest probability tokens to consider for each step during generation.Integer
topPControls the diversity of the generated responses by setting a threshold for the cumulative probability of top tokens.Double
repeatPenaltyPenalizes the model for repeating similar tokens in the generated output.Double
seedSets the random seed for reproducibility of generated responses.Integer
numPredictThe number of predictions to generate for each input prompt.Integer
stopA list of strings that, if generated, will mark the end of the response.List<String>
formatThe desired format for the generated output.String
timeoutThe maximum time allowed for the API call to complete.Duration
maxRetriesThe maximum number of retries in case of API call failure.Integer

Usage Example:

OllamaChatModel ollamaChatModel = OllamaChatModel.builder()
.baseUrl("http://your-ollama-host:your-ollama-port")
.modelName("llama2")
.temperature(0.8)
.build();