Chat and Language Models

note

This page describes a low-level LLM API. See AI Services for a high-level LLM API.

note

All supported LLMs can be found here.

LLMs are currently available in two API types:

LanguageModels. Their API is very simple - they accept a String as input and return a String as output. This API is now becoming obsolete in favor of chat API (second API type).
ChatModels. These accept multiple ChatMessages as input and return a single AiMessage as output. ChatMessage usually contains text, but some LLMs also support other modalities (e.g., images, audio, etc.). Examples of such chat models include OpenAI's gpt-4o-mini and Google's gemini-1.5-pro.

Support for LanguageModels will no longer be expanded in LangChain4j, so in all new features, we will use a ChatModel API.

ChatModel is the low-level API to interact with LLMs in LangChain4j, offering the most power and flexibility. There is also a high-level API (AI Services) that we will cover later, after we go over the basics.

Apart from ChatModel and LanguageModel, LangChain4j supports the following types of models:

EmbeddingModel - This model can translate text into an Embedding.
ImageModel - This model can generate and edit Images.
ModerationModel - This model can check if the text contains harmful content.
ScoringModel - This model can score (or rank) multiple pieces of text against a query, essentially determining how relevant each piece of text is to the query. This is useful for RAG. These will be covered later.

Now, let's take a closer look at the ChatModel API.

public interface ChatModel {

    String chat(String userMessage);
    
    ...
}

As you can see, there is a simple chat method that takes a String as input and returns a String as output, similar to LanguageModel. This is just a convenience method so you can play around quickly and easily without needing to wrap the String in a UserMessage.

Here are other chat API methods:

    ...
    
    ChatResponse chat(ChatMessage... messages);

    ChatResponse chat(List<ChatMessage> messages);
        
    ...

These versions of the chat methods take one or multiple ChatMessages as input. ChatMessage is a base interface that represents a chat message. The next section will provide more details about chat messages.

If you wish to customize the request (e.g., specify model name, temperature, tools, JSON schema, etc.), you can use the chat(ChatRequest) method:

    ...
    
    ChatResponse chat(ChatRequest chatRequest);
        
    ...

ChatRequest chatRequest = ChatRequest.builder()
    .messages(...)
    .modelName(...)
    .temperature(...)
    .topP(...)
    .topK(...)
    .frequencyPenalty(...)
    .presencePenalty(...)
    .maxOutputTokens(...)
    .stopSequences(...)
    .toolSpecifications(...)
    .toolChoice(...)
    .responseFormat(...)
    .parameters(...) // you can also set common or provider-specific parameters all at once
    .build();

ChatResponse chatResponse = chatModel.chat(chatRequest);

Types of `ChatMessage`

There are currently four types of chat messages, one for each "source" of the message:

UserMessage: This is a message from the user. The user can be either an end user of your application (a human) or your application itself. Depending on the modalities supported by the LLM, UserMessage can contain either just text (String), or other modalities.
AiMessage: This is a message that was generated by the AI, in response to the sent message(s). It can contain:
- text(): textual content
- thinking(): thinking/reasoning content
- toolExecutionRequests(): requests to execute tools. We will explore tools in another section.
- attributes(): additional attributes, typically provider-specific
ToolExecutionResultMessage: This is the result of the ToolExecutionRequest.
SystemMessage: This is a message from the system. Usually, you, as a developer, should define the content of this message. Normally, you would write here instructions on what the LLM's role is in this conversation, how it should behave, in what style to answer, etc. LLMs are trained to pay more attention to SystemMessage than to other types of messages, so be careful, and it's better not to give an end user free access to define or inject some input into a SystemMessage. Usually, it is located at the start of the conversation.
CustomMessage: This is a custom message that can contain arbitrary attributes. This message type can only be used by ChatModel implementations that support it (currently only Ollama).

Now that we know all types of ChatMessage, let's see how we can combine them in the conversation.

In the simplest scenario we can provide a single instance of a UserMessage into the chat method. This is similar to the first version of the chat method, which takes a String as input. The major difference here is that it now returns not a String, but a ChatResponse. In addition to AiMessage, ChatResponse also contains ChatResponseMetadata. ChatResponseMetadata contains TokenUsage, which contains stats about how many tokens the input (all the ChatMessages that you provided to the generate method) contained, how many tokens were generated as output (in the AiMessage), and the total (input + output). You will need this information to calculate how much a given call to the LLM costs. Then, ChatResponseMetadata also contains FinishReason, which is an enum with various reasons why generation has stopped. Usually, it will be FinishReason.STOP, if the LLM decided to stop generation itself.

There are multiple ways to create a UserMessage, depending on the contents. The simplest one is new UserMessage("Hi") or UserMessage.from("Hi").

Multiple `ChatMessage`s

Now, why do you need to provide multiple ChatMessages as input, instead of just one? This is because LLMs are stateless by nature, meaning they do not maintain the state of the conversation. So, if you want to support multi-turn conversations, you should take care of managing the state of the conversation.

Let's say you want to build a chatbot. Imagine a simple multi-turn conversation between a user and a chatbot (AI):

User: Hello, my name is Klaus
AI: Hi Klaus, how can I help you?
User: What is my name?
AI: Klaus

This is what interactions with ChatModel will look like:

UserMessage firstUserMessage = UserMessage.from("Hello, my name is Klaus");
AiMessage firstAiMessage = model.chat(firstUserMessage).aiMessage(); // Hi Klaus, how can I help you?
UserMessage secondUserMessage = UserMessage.from("What is my name?");
AiMessage secondAiMessage = model.chat(firstUserMessage, firstAiMessage, secondUserMessage).aiMessage(); // Klaus

As you can see, in the second call of the chat method, we provide not just a single secondUserMessage, but also previous messages in the conversation.

Maintaining and managing these messages manually is cumbersome. Therefore, the concept of ChatMemory exists, which we will explore in the next section.

Multimodality

UserMessage can contain not only text, but other types of content as well. UserMessage contains a List<Content> contents. Content is an interface and has the following implementations:

TextContent
ImageContent
AudioContent
VideoContent
PdfFileContent

You can see which LLM providers support which modalities in the comparison table here.

Here is an example of sending both text and an image to the LLM:

UserMessage userMessage = UserMessage.from(
    TextContent.from("Describe the following image"),
    ImageContent.from("https://example.com/cat.jpg")
);
ChatResponse response = model.chat(userMessage);

Text Content

TextContent is the simplest form of Content that represents plain text and wraps a single String. UserMessage.from(TextContent.from("Hello!")) is equivalent to UserMessage.from("Hello!").

One can provide one or multiple TextContents inside the UserMessage:

UserMessage userMessage = UserMessage.from(
    TextContent.from("Hello!"),
    TextContent.from("How are you?")
);

Image Content

Depending on the LLM provider, ImageContent can be created either from the URL of the remote image (see an example above), or from the Base64-encoded binary data:

byte[] imageBytes = readBytes("/home/me/cat.jpg");
String base64Data = Base64.getEncoder().encodeToString(imageBytes);
ImageContent imageContent = ImageContent.from(base64Data, "image/jpg");
UserMessage userMessage = UserMessage.from(imageContent);

One can also specify DetailLevel enum (with LOW/HIGH/AUTO options) to control how the model processes the image. See more details here.

Audio Content

AudioContent is similar to the ImageContent, but represents audio content.

Video Content

VideoContent is similar to the ImageContent, but represents video content.

PDF File Content

PdfFileContent is similar to the ImageContent, but represents binary contents of a PDF file.

Kotlin Extensions

The ChatModel Kotlin extensions provide asynchronous methods for handling chat interactions with a language model, utilizing Kotlin's coroutine capabilities. The chatAsync methods allow non-blocking processing of ChatRequest or ChatRequest.Builder configurations, returning ChatResponse with the model's reply. Similarly, generateAsync handles the asynchronous generation of responses from chat messages. These extensions simplify building chat requests and handling conversations efficiently in Kotlin applications. Note that these methods are marked as experimental and may evolve over time.

ChatModel.chatAsync(request: ChatRequest): Designed for Kotlin coroutines, this asynchronous extension function wraps the synchronous chat method within a coroutine scope using Dispatchers.IO. This enables non-blocking operations, crucial for maintaining application responsiveness. It's named chatAsync specifically to avoid conflicts with the existing synchronous chat. Its function signature is: suspend fun ChatModel.chatAsync(request: ChatRequest): ChatResponse. The keyword suspend designates it as a coroutine function.

ChatModel.chat(block: ChatRequestBuilder.() -> Unit): This variant of chat offers a more streamlined approach by using Kotlin's type-safe builder DSL. It simplifies constructing ChatRequest objects while internally using chatAsync for asynchronous execution. This version offers both conciseness and non-blocking behavior through coroutines.

Types of ChatMessage​

Multiple ChatMessages​

Multimodality​

Text Content​

Image Content​

Audio Content​

Video Content​

PDF File Content​

Kotlin Extensions​