Skip to main content

ZhiPu AI

ZhiPu AI is a platform to provide model service including text generation, text embedding, image generation and so on. You can refer to ZhiPu AI Open Platform for more details. LangChain4j integrates with ZhiPu AI by using HTTP endpoint. We are consider migrating it from HTTP endpoint to official SDK and are appreciated of any help!

Maven Dependency

You can use ZhiPu AI with LangChain4j in plain Java or Spring Boot applications.

Plain Java

note

Since 1.0.0-alpha1, langchain4j-zhipu-ai has migrated to langchain4j-community and is renamed to langchain4j-community-zhipu-ai

Before 1.0.0-alpha1:


<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-zhipu-ai</artifactId>
<version>${previous version here}</version>
</dependency>

1.0.0-alpha1 and later:


<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-community-zhipu-ai</artifactId>
<version>${latest version here}</version>
</dependency>

Or, you can use BOM to manage dependencies consistently:


<dependencyManagement>
<dependencies>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-community-bom</artifactId>
<version>${latest version here}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>

Configurable parameters

ZhipuAiChatModel

ZhipuAiChatModel has following parameters to configure when you initialize it:

PropertyDescriptionDefault Value
baseUrlThe URL to connect to. You can use HTTP or websocket to connect to DashScopehttps://open.bigmodel.cn/
apiKeyThe API Key
modelThe model to use.glm-4-flash
topPThe probability threshold for kernel sampling controls the diversity of texts generated by the model. the higher the top_p, the more diverse the generated texts, and vice versa. Value range: (0, 1.0]. We generally recommend altering this or temperature but not both.
maxRetriesThe maximum retry times to request3
temperatureSampling temperature that controls the diversity of the text generated by the model. the higher the temperature, the more diverse the generated text, and vice versa. Value range: [0, 2)0.7
stopsWith the stop parameter, the model will automatically stop generating text when it is about to contain the specified string or token_id.
maxTokenThe maximum number of tokens returned by this request.512
listenersListeners that listen for request, response and errors.
callTimeoutOKHttp timeout config for request
connectTimeoutOKHttp timeout config for request
writeTimeoutOKHttp timeout config for request
readTimeoutOKHttp timeout config for request
logRequestsWhether to log request or notfalse
logResponsesWhether to log response or notfalse
doSampleWhether to use sampling. When set to false, the model will use greedy decoding
toolStreamWhether to enable partial tool streaming. When set to true, tool calls can be streamed incrementallyfalse

ZhipuAiChatRequestParameters

ZhipuAiChatRequestParameters can be used to configure additional parameters when sending a chat request:

PropertyDescriptionDefault Value
doSampleWhether to use sampling. When set to false, the model will use greedy decoding
toolStreamWhether to enable partial tool streaming. When set to true, tool calls can be streamed incrementallyfalse
thinkingConfiguration for reasoning mode. type specifies the reasoning type, clearThinking controls whether to show the internal thinking process in the response

ZhipuAiStreamingChatModel

Same as ZhipuAiChatModel, except maxRetries.

Examples

Plain Java

You can initialize ZhipuAiChatModel by using following code:

ChatModel model = ZhipuAiChatModel.builder()
.apiKey("You API key here")
.callTimeout(Duration.ofSeconds(60))
.connectTimeout(Duration.ofSeconds(60))
.writeTimeout(Duration.ofSeconds(60))
.readTimeout(Duration.ofSeconds(60))
.build();

Or more custom for other parameters:

ChatModel model = ZhipuAiChatModel.builder()
.apiKey("You API key here")
.model("glm-4")
.temperature(0.6)
.maxToken(1024)
.maxRetries(2)
.callTimeout(Duration.ofSeconds(60))
.connectTimeout(Duration.ofSeconds(60))
.writeTimeout(Duration.ofSeconds(60))
.readTimeout(Duration.ofSeconds(60))
.build();

Reasoning

You can enable reasoning mode to get the model's internal thinking process:

ChatModel model = ZhipuAiChatModel.builder()
.apiKey("You API key here")
.model(ChatCompletionModel.GLM_4_7) // Use GLM-4-5 or upper model for reasoning support
.build();

ChatResponse response = model.chat(
ChatRequest.builder()
.messages(UserMessage.from("What is the capital of Germany?"))
.parameters(ZhipuAiChatRequestParameters.builder()
.thinking(Thinking.builder()
.type("reasoning")
.clearThinking(true)
.build())
.build())
.build());

AiMessage aiMessage = response.aiMessage();
System.out.println("Answer: "+aiMessage.text());
System.out.println("Thinking: "+aiMessage.thinking());

Partial Tool Call (Streaming)

You can stream partial tool calls incrementally using toolStream:

ZhipuAiStreamingChatModel model = ZhipuAiStreamingChatModel.builder()
.apiKey("You API key here")
.model(ChatCompletionModel.GLM_4_7)
.build();

ToolSpecification calculator = ToolSpecification.builder()
.name("calculator")
.description("returns a sum of two numbers")
.parameters(JsonObjectSchema.builder()
.addIntegerProperty("first")
.addIntegerProperty("second")
.build())
.build();

TestStreamingChatResponseHandler handler = new TestStreamingChatResponseHandler() {
@Override
public void onPartialToolCall(ToolExecutionRequest partialToolCall) {
System.out.println("Partial tool call: " + partialToolCall.name() + " - " + partialToolCall.arguments());
}
};

model.chat(
ChatRequest.builder()
.messages(UserMessage.from("2+2=?"))
.parameters(ZhipuAiChatRequestParameters.builder()
.toolSpecifications(calculator)
.toolStream(true)
.build())
.build(),
handler);

More Examples

You can check more examples in: