Interface StreamingChatResponseHandler
StreamingChatModel response.- See Also:
-
Method Summary
Modifier and TypeMethodDescriptionvoidonCompleteResponse(ChatResponse completeResponse) Invoked when the model has finished streaming a response.default voidonCompleteToolCall(CompleteToolCall completeToolCall) Invoked when the model has finished streaming a single tool call.voidThis method is invoked when an error occurs during streaming.default voidonPartialResponse(PartialResponse partialResponse, PartialResponseContext context) Invoked each time the model generates a partial textual response, usually a single token.default voidonPartialResponse(String partialResponse) Invoked each time the model generates a partial textual response, usually a single token.default voidonPartialThinking(PartialThinking partialThinking) Invoked each time the model generates a partial thinking/reasoning text, usually a single token.default voidonPartialThinking(PartialThinking partialThinking, PartialThinkingContext context) Invoked each time the model generates a partial thinking/reasoning text, usually a single token.default voidonPartialToolCall(PartialToolCall partialToolCall) This callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments.default voidonPartialToolCall(PartialToolCall partialToolCall, PartialToolCallContext context) This callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments.
-
Method Details
-
onPartialResponse
Invoked each time the model generates a partial textual response, usually a single token.Please note that some LLM providers do not stream individual tokens, but send responses in batches. In such cases, this callback may receive multiple tokens at once.
Either this or the
onPartialResponse(PartialResponse, PartialResponseContext)method should be implemented if you want to consume tokens as soon as they become available.- Parameters:
partialResponse- A partial textual response, usually a single token.- See Also:
-
onPartialResponse
@Experimental default void onPartialResponse(PartialResponse partialResponse, PartialResponseContext context) Invoked each time the model generates a partial textual response, usually a single token.Please note that some LLM providers do not stream individual tokens, but send responses in batches. In such cases, this callback may receive multiple tokens at once.
Either this or the
onPartialResponse(String)method should be implemented if you want to consume tokens as soon as they become available.- Parameters:
partialResponse- A partial textual response, usually a single token.context- A partial response context. Contains aStreamingHandlethat can be used to cancel streaming.- Since:
- 1.8.0
- See Also:
-
onPartialThinking
Invoked each time the model generates a partial thinking/reasoning text, usually a single token.Please note that some LLM providers do not stream individual tokens, but send thinking tokens in batches. In such cases, this callback may receive multiple tokens at once.
Either this or the
onPartialThinking(PartialThinking, PartialThinkingContext)method should be implemented if you want to consume thinking tokens as soon as they become available.- Parameters:
partialThinking- A partial thinking text, usually a single token.- Since:
- 1.2.0
- See Also:
-
onPartialThinking
@Experimental default void onPartialThinking(PartialThinking partialThinking, PartialThinkingContext context) Invoked each time the model generates a partial thinking/reasoning text, usually a single token.Please note that some LLM providers do not stream individual tokens, but send thinking tokens in batches. In such cases, this callback may receive multiple tokens at once.
Either this or the
onPartialThinking(PartialThinking)method should be implemented if you want to consume thinking tokens as soon as they become available.- Parameters:
partialThinking- A partial thinking text, usually a single token.context- A partial thinking context. Contains aStreamingHandlethat can be used to cancel streaming.- Since:
- 1.8.0
- See Also:
-
onPartialToolCall
This callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments. It is typically invoked multiple times for a single tool call untilonCompleteToolCall(CompleteToolCall)is eventually invoked, indicating that the streaming for that tool call is finished.Here's an example of what streaming a single tool call might look like:
1. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "{\"") 2. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "city") 3. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = ""\":\"") 4. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "Mun") 5. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "ich") 6. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "\"}") 7. onCompleteToolCall(index = 0, id = "call_abc", name = "get_weather", arguments = "{\"city\":\"Munich\"}")If the model decides to call multiple tools, the index will increment, allowing you to correlate.
Please note that not all LLM providers stream tool calls token by token. Some providers (e.g., Bedrock, Google, Mistral, Ollama) return only complete tool calls. In those cases, this callback won't be invoked - only
onCompleteToolCall(CompleteToolCall)will be called.Either this or the
onPartialToolCall(PartialToolCall, PartialToolCallContext)method should be implemented if you want to consume partial tool calls as soon as they become available.- Parameters:
partialToolCall- A partial tool call that contains the index, tool ID, tool name and partial arguments.- Since:
- 1.2.0
- See Also:
-
onPartialToolCall
@Experimental default void onPartialToolCall(PartialToolCall partialToolCall, PartialToolCallContext context) This callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments. It is typically invoked multiple times for a single tool call untilonCompleteToolCall(CompleteToolCall)is eventually invoked, indicating that the streaming for that tool call is finished.Here's an example of what streaming a single tool call might look like:
1. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "{\"") 2. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "city") 3. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = ""\":\"") 4. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "Mun") 5. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "ich") 6. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "\"}") 7. onCompleteToolCall(index = 0, id = "call_abc", name = "get_weather", arguments = "{\"city\":\"Munich\"}")If the model decides to call multiple tools, the index will increment, allowing you to correlate.
Please note that not all LLM providers stream tool calls token by token. Some providers (e.g., Bedrock, Google, Mistral, Ollama) return only complete tool calls. In those cases, this callback won't be invoked - only
onCompleteToolCall(CompleteToolCall)will be called.Either this or the
onPartialToolCall(PartialToolCall)method should be implemented if you want to consume partial tool calls as soon as they become available.- Parameters:
partialToolCall- A partial tool call that contains the index, tool ID, tool name and partial arguments.context- A partial tool call context. Contains aStreamingHandlethat can be used to cancel streaming.- Since:
- 1.8.0
- See Also:
-
onCompleteToolCall
Invoked when the model has finished streaming a single tool call.- Parameters:
completeToolCall- A complete tool call that contains the index, tool ID, tool name, and fully assembled arguments.- Since:
- 1.2.0
-
onCompleteResponse
Invoked when the model has finished streaming a response.- Parameters:
completeResponse- The complete response generated by the model, containing all assembled partial text and tool calls.
-
onError
This method is invoked when an error occurs during streaming.- Parameters:
error- The error that occurred
-