watsonx.ai
Maven Dependency
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-watsonx</artifactId>
<version>1.4.0-beta10</version>
</dependency>
WatsonxChatModel
The WatsonxChatModel
class allows you to create an instance of the ChatModel
interface fully encapsulated within LangChain4j.
To create an instance, you must specify the mandatory parameters:
url(...)
– IBM Cloud endpoint URL (asString
,URI
, orCloudRegion
);apiKey(...)
– IBM Cloud IAM API key;projectId(...)
– IBM Cloud Project ID (or usespaceId(...)
);modelName(...)
– Foundation model ID for inference;
Example
import dev.langchain4j.model.chat.ChatModel;
import dev.langchain4j.model.watsonx.WatsonxChatModel;
import com.ibm.watsonx.ai.CloudRegion;
ChatModel chatModel = WatsonxChatModel.builder()
.url(CloudRegion.FRANKFURT)
.apiKey("your-api-key")
.projectId("your-project-id")
.modelName("llama-4-maverick-17b-128e-instruct-fp8")
.temperature(0.7)
.maxOutputTokens(0)
.build();
String answer = chatModel.chat("Hello from watsonx.ai");
System.out.println(answer);
How to create an IBM Cloud API Key
You can create an API key at https://cloud.ibm.com/iam/apikeys by clicking Create +.
How to find your Project ID
- Visit https://dataplatform.cloud.ibm.com/projects/?context=wx
- Open your project
- Go to the Manage tab
- Copy the Project ID from the Details section
How to find the model name
Available foundation models are listed here.
WatsonxStreamingChatModel
The WatsonxStreamingChatModel
provides streaming support for IBM watsonx.ai within LangChain4j. It’s useful when you want to process tokens as they are generated, ideal for real-time applications such as chat UIs or long text generation.
Streaming uses the same configuration structure and parameters as the non-streaming WatsonxChatModel
. The main difference is that responses are delivered incrementally through a handler interface.
Example
import dev.langchain4j.model.chat.StreamingChatModel;
import dev.langchain4j.model.chat.StreamingChatResponseHandler;
import dev.langchain4j.model.chat.ChatResponse;
import dev.langchain4j.model.watsonx.WatsonxStreamingChatModel;
import com.ibm.watsonx.ai.CloudRegion;
StreamingChatModel model = WatsonxStreamingChatModel.builder()
.url(CloudRegion.FRANKFURT)
.apiKey("your-api-key")
.projectId("your-project-id")
.modelName("llama-4-maverick-17b-128e-instruct-fp8")
.maxOutputTokens(0)
.build();
model.chat("What is the capital of Italy?", new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String partialResponse) {
System.out.println("Partial: " + partialResponse);
}
@Override
public void onCompleteResponse(ChatResponse completeResponse) {
System.out.println("Complete: " + completeResponse);
}
@Override
public void onError(Throwable error) {
error.printStackTrace();
}
});
Tool Integration
Both WatsonxChatModel
and WatsonxStreamingChatModel
support LangChain4j Tools, allowing the model to call Java methods annotated with @Tool
.
Here’s an example using the synchronous model (WatsonxChatModel
), but the same approach applies to the streaming variant.
static class Tools {
@Tool
LocalDate currentDate() {
return LocalDate.now();
}
@Tool
LocalTime currentTime() {
return LocalTime.now();
}
}
interface AiService {
String chat(String userMessage);
}
ChatModel chatModel = WatsonxChatModel.builder()
.url(CloudRegion.FRANKFURT)
.apiKey("your-api-key")
.projectId("your-project-id")
.modelName("llama-4-maverick-17b-128e-instruct-fp8")
.maxOutputTokens(0)
.build();
AiService aiService = AiServices.builder(AiService.class)
.chatModel(model)
.tools(new Tools())
.build();
String answer = aiService.chat("What is the date today?");
System.out.println(answer);
NOTE: Ensure your selected model supports tool use.
Enabling Thinking / Reasoning Output
Some foundation models can include "thinking" or "reasoning" steps in their responses.
You can capture and separate this reasoning content from the final answer by using the thinking(...)
builder method with ExtractionTags
.
ExtractionTags
defines the XML-like tags used to extract different parts of the model output:
- Reasoning tag: typically
<think>
— contains the model's internal reasoning. - Response tag: typically
<response>
— contains the user-facing answer.
If no response
tag is provided, it defaults to root
, meaning that text directly under the root element is treated as the final response.
Example ChatModel
ChatModel chatModel = WatsonxChatModel.builder()
.url(CloudRegion.FRANKFURT)
.apiKey("your-api-key")
.projectId("your-project-id")
.modelName("ibm/granite-3-3-8b-instruct")
.maxOutputTokens(0)
.thinking(ExtractionTags.of("think", "response"))
.build();
ChatResponse chatResponse = chatModel.chat(
UserMessage.userMessage("Why the sky is blue?")
);
AiMessage aiMessage = chatResponse.aiMessage();
System.out.println(aiMessage.thinking());
System.out.println(aiMessage.text());
Example StreamingChatModel
StreamingChatModel model = WatsonxStreamingChatModel.builder()
.url(CloudRegion.FRANKFURT)
.apiKey("your-api-key")
.projectId("your-project-id")
.modelName("ibm/granite-3-3-8b-instruct")
.maxOutputTokens(0)
.thinking(ExtractionTags.of("think", "response"))
.build();
List<ChatMessage> messages = List.of(
UserMessage.userMessage("Why the sky is blue?")
);
ChatRequest chatRequest = ChatRequest.builder()
.messages(messages)
.maxOutputTokens(0)
.build();
model.chat(chatRequest, new StreamingChatResponseHandler() {
@Override
public void onPartialResponse(String partialResponse) {
...
}
@Override
public void onPartialThinking(PartialThinking partialThinking) {
...
}
});
You can also provide only the reasoning tag — in that case, the response
tag defaults to "root"
:
ChatModel model = WatsonxChatModel.builder()
...
.thinking(ExtractionTags.of("think"))
.build();
Note: Ensure that the selected model is enabled for reasoning.
WatsonxEmbeddingModel
The WatsonxEmbeddingModel
enables you to generate embeddings using IBM watsonx.ai and integrate them with LangChain4j's vector-based operations such as search, retrieval-augmented generation (RAG), and similarity comparison.
It implements the LangChain4j EmbeddingModel
interface.
EmbeddingModel embeddingModel = WatsonxEmbeddingModel.builder()
.url("https://test.com")
.apiKey("...")
.projectId("...")
.modelName("ibm/granite-embedding-278m-multilingual")
.build();
System.out.println(embeddingModel.embed("Hello from watsonx.ai"));
WatsonxScoringModel
The WatsonxScoringModel
provides a LangChain4j-compatible implementation of a ScoringModel
using IBM watsonx.ai Rerank (cross-encoder) models.
It is particularly useful for ranking a list of documents (or text segments) based on their relevance to a user query.
Example: LangChain4j Integration
ScoringModel scoringModel = WatsonxScoringModel.builder()
.url("https://test.com")
.apiKey("...")
.projectId("...")
.modelName("cross-encoder/ms-marco-minilm-l-12-v2")
.build();
ScoringModel model = new WatsonxScoringModel(rerankService);
var scores = scoringModel.scoreAll(
List.of(
TextSegment.from("Example_1"),
TextSegment.from("Example_2")
),
"Hello from watsonx.ai"
);
System.out.println(scores);
Quarkus
See more details here.