ZhiPu AI
ZhiPu AI is a platform to provide model service including text generation, text embedding, image generation and so on. You can refer to ZhiPu AI Open Platform for more details. LangChain4j integrates with ZhiPu AI by using HTTP endpoint. We are consider migrating it from HTTP endpoint to official SDK and are appreciated of any help!
Maven Dependency
You can use ZhiPu AI with LangChain4j in plain Java or Spring Boot applications.
Plain Java
Since 1.0.0-alpha1, langchain4j-zhipu-ai has migrated to langchain4j-community and is renamed to
langchain4j-community-zhipu-ai
Before 1.0.0-alpha1:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-zhipu-ai</artifactId>
<version>${previous version here}</version>
</dependency>
1.0.0-alpha1 and later:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-community-zhipu-ai</artifactId>
<version>${latest version here}</version>
</dependency>
Or, you can use BOM to manage dependencies consistently:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-community-bom</artifactId>
<version>${latest version here}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
Configurable parameters
ZhipuAiChatModel
ZhipuAiChatModel has following parameters to configure when you initialize it:
| Property | Description | Default Value |
|---|---|---|
| baseUrl | The URL to connect to. You can use HTTP or websocket to connect to DashScope | https://open.bigmodel.cn/ |
| apiKey | The API Key | |
| model | The model to use. | glm-4-flash |
| topP | The probability threshold for kernel sampling controls the diversity of texts generated by the model. the higher the top_p, the more diverse the generated texts, and vice versa. Value range: (0, 1.0]. We generally recommend altering this or temperature but not both. | |
| maxRetries | The maximum retry times to request | 3 |
| temperature | Sampling temperature that controls the diversity of the text generated by the model. the higher the temperature, the more diverse the generated text, and vice versa. Value range: [0, 2) | 0.7 |
| stops | With the stop parameter, the model will automatically stop generating text when it is about to contain the specified string or token_id. | |
| maxToken | The maximum number of tokens returned by this request. | 512 |
| listeners | Listeners that listen for request, response and errors. | |
| callTimeout | OKHttp timeout config for request | |
| connectTimeout | OKHttp timeout config for request | |
| writeTimeout | OKHttp timeout config for request | |
| readTimeout | OKHttp timeout config for request | |
| logRequests | Whether to log request or not | false |
| logResponses | Whether to log response or not | false |
| doSample | Whether to use sampling. When set to false, the model will use greedy decoding | |
| toolStream | Whether to enable partial tool streaming. When set to true, tool calls can be streamed incrementally | false |
ZhipuAiChatRequestParameters
ZhipuAiChatRequestParameters can be used to configure additional parameters when sending a chat request:
| Property | Description | Default Value |
|---|---|---|
| doSample | Whether to use sampling. When set to false, the model will use greedy decoding | |
| toolStream | Whether to enable partial tool streaming. When set to true, tool calls can be streamed incrementally | false |
| thinking | Configuration for reasoning mode. type specifies the reasoning type, clearThinking controls whether to show the internal thinking process in the response |
ZhipuAiStreamingChatModel
Same as ZhipuAiChatModel, except maxRetries.
Examples
Plain Java
You can initialize ZhipuAiChatModel by using following code:
ChatModel model = ZhipuAiChatModel.builder()
.apiKey("You API key here")
.callTimeout(Duration.ofSeconds(60))
.connectTimeout(Duration.ofSeconds(60))
.writeTimeout(Duration.ofSeconds(60))
.readTimeout(Duration.ofSeconds(60))
.build();
Or more custom for other parameters:
ChatModel model = ZhipuAiChatModel.builder()
.apiKey("You API key here")
.model("glm-4")
.temperature(0.6)
.maxToken(1024)
.maxRetries(2)
.callTimeout(Duration.ofSeconds(60))
.connectTimeout(Duration.ofSeconds(60))
.writeTimeout(Duration.ofSeconds(60))
.readTimeout(Duration.ofSeconds(60))
.build();
Reasoning
You can enable reasoning mode to get the model's internal thinking process:
ChatModel model = ZhipuAiChatModel.builder()
.apiKey("You API key here")
.model(ChatCompletionModel.GLM_4_7) // Use GLM-4-5 or upper model for reasoning support
.build();
ChatResponse response = model.chat(
ChatRequest.builder()
.messages(UserMessage.from("What is the capital of Germany?"))
.parameters(ZhipuAiChatRequestParameters.builder()
.thinking(Thinking.builder()
.type("reasoning")
.clearThinking(true)
.build())
.build())
.build());
AiMessage aiMessage = response.aiMessage();
System.out.println("Answer: "+aiMessage.text());
System.out.println("Thinking: "+aiMessage.thinking());
Partial Tool Call (Streaming)
You can stream partial tool calls incrementally using toolStream:
ZhipuAiStreamingChatModel model = ZhipuAiStreamingChatModel.builder()
.apiKey("You API key here")
.model(ChatCompletionModel.GLM_4_7)
.build();
ToolSpecification calculator = ToolSpecification.builder()
.name("calculator")
.description("returns a sum of two numbers")
.parameters(JsonObjectSchema.builder()
.addIntegerProperty("first")
.addIntegerProperty("second")
.build())
.build();
TestStreamingChatResponseHandler handler = new TestStreamingChatResponseHandler() {
@Override
public void onPartialToolCall(ToolExecutionRequest partialToolCall) {
System.out.println("Partial tool call: " + partialToolCall.name() + " - " + partialToolCall.arguments());
}
};
model.chat(
ChatRequest.builder()
.messages(UserMessage.from("2+2=?"))
.parameters(ZhipuAiChatRequestParameters.builder()
.toolSpecifications(calculator)
.toolStream(true)
.build())
.build(),
handler);
More Examples
You can check more examples in: