Skip to main content

Chat and Language Models

LLMs are currently available in two API types:

  • LanguageModels. Their API is very simple - they accept a String as input and return a String as output. This API is now becoming obsolete in favor of chat API (second API type).
  • ChatLanguageModels. These accept either a single or multiple ChatMessages as input and return an AiMessage as output. ChatMessage usually contains text, but some LLMs also support a mix of text and Images. Examples of such chat models include OpenAI's gpt-3.5-turbo and Google's gemini-pro.

Support for LanguageModels will no longer be expanded in LangChain4j, so in all new features, we will use a ChatLanguageModel API.

ChatLanguageModel is the low-level API in LangChain4j, offering the most power and flexibility. There are also high-level APIs (Chains and AiServices) that we will cover later, after we go over the basics.

Apart from ChatLanguageModel and LanguageModel, LangChain4j supports the following types of models:

  • EmbeddingModel - This model can translate text into an Embedding.
  • ImageModel - This model can generate and edit Images.
  • ModerationModel - This model can check if the text contains harmful content.
  • ScoringModel - This model can score (or rank) multiple pieces of text against a query, essentially determining how relevant each piece of text is to the query. This is useful for RAG. These will be covered later.

Now, let's take a closer look at the ChatLanguageModel API.

public interface ChatLanguageModel {

String generate(String userMessage);

...
}

As you can see, there is a generate method that takes a String as input and returns a String as output, similar to LanguageModel. This is just a convenience method so you can play around quickly and easily without needing to wrap the String in a UserMessage.

But here is the actual chat API:

    ...

Response<AiMessage> generate(ChatMessage... messages);

Response<AiMessage> generate(List<ChatMessage> messages);

...

These versions of the generate methods take one or multiple ChatMessages as input. ChatMessage is a base interface that represents a chat message.

Types of ChatMessage

There are currently four types of chat messages, one for each "source" of the message:

  • UserMessage: This is a message from the user. The user can be either an end user of your application (a human) or your application itself. Depending on the modalities supported by the LLM, UserMessage can contain either just text (String), or text and/or images (Image).
  • AiMessage: This is a message that was generated by the AI, usually in response to the UserMessage. As you might have noted, the generate method returns an AiMessage wrapped in a Response. AiMessage can contain either a textual response (String), or a request to execute a tool (ToolExecutionRequest). Don't worry, we will explore tools a bit later.
  • ToolExecutionResultMessage: This is the result of the ToolExecutionRequest. We will cover this more a bit later.
  • SystemMessage: This is a message from the system. Usually, you, as a developer, should define the content of this message. Normally, you would write here instructions on what the LLM's role is in this conversation, how it should behave, in what style to answer, etc. LLMs are trained to pay more attention to SystemMessage than to other types of messages, so be careful and it's better not to give an end user free access to define or inject some input into a SystemMessage. Usually, it is located at the start of the conversation.

Now that we know all types of ChatMessage, let's see how we can combine them in the conversation.

In the simplest scenario we can provide a single instance of UserMessage into the generate method. This is similar to the first version of the generate method, which takes a String as input. The major difference here is that it now returns not a String, but a Response<AiMessage>. Response is a wrapper around content (payload), and you will often see it as a return type of *Model classes. In addition to content (in this case, AiMessage), Response also contains meta information about the generation. First, it includes TokenUsage, which contains stats about how many tokens the input (all the ChatMessages that you provided to the generate method) contained, how many tokens were generated as output (in the AiMessage), and the total (input + output). You will need this information to calculate how much a given call to the LLM costs. Then, Response also contains FinishReason, which is an enum with various reasons why generation has stopped. Usually, it will be FinishReason.STOP, if the LLM decided to stop generation itself.

There are multiple ways to create a UserMessage, depending on the contents. The simplest one is new UserMessage("Hi") or UserMessage.from("Hi").

Multiple ChatMessages

Now, why do you need to provide multiple ChatMessages as input, instead of just one? This is because LLMs are stateless by nature, meaning they do not maintain the state of the conversation. So, if you want to support multi-turn conversations, you should take care of managing the state of the conversation.

Let's say you want to build a chatbot. Imagine a simple multi-turn conversation between a user and a chatbot (AI):

  • User: Hello, my name is Klaus
  • AI: Hi Klaus, how can I help you?
  • User: What is my name?
  • AI: Klaus

This is what interactions with ChatLanguageModel will look like:

UserMessage firstUserMessage = UserMessage.from("Hello, my name is Klaus");
AiMessage firstAiMessage = model.generate(firstUserMessage).content(); // Hi Klaus, how can I help you?
UserMessage secondUserMessage = UserMessage.from("What is my name?");
AiMessage secondAiMessage = model.generate(firstUserMessage, firstAiMessage, secondUserMessage).content(); // Klaus

As you can see, in the second call of the generate method, we provide not just a single secondUserMessage, but also previous messages in the conversation.

Maintaining and managing these messages manually is cumbersome. Therefore, the concept of ChatMemory exists, which we will explore in the next section.