Class MappingTrackingStreamingChatResponseHandler
- All Implemented Interfaces:
StreamingChatResponseHandler
StreamingChatResponseHandler that records whether the underlying handler received
any typed, user-facing callback (partial response/thinking/tool call, complete tool call, complete response
or error) while processing a single provider streaming event - i.e. whether the event was mapped to a
typed callback.
Providers use this to decide whether to additionally surface a raw event via
StreamingChatResponseHandler.onUnmappedRawEvent(Object): a raw event should only be emitted for events
that were not mapped to a typed callback. Forwarding a raw event (via onUnmappedRawEvent(Object))
does not count as a mapping.
Not thread-safe: it assumes a provider processes streaming events one at a time and calls
resetMappingTracking() before each event.
-
Constructor Summary
ConstructorsConstructorDescription -
Method Summary
Modifier and TypeMethodDescriptionvoidonCompleteResponse(ChatResponse completeResponse) Invoked when the model has finished streaming a response.voidonCompleteToolCall(CompleteToolCall completeToolCall) Invoked when the model has finished streaming a single tool call.voidThis method is invoked when an error occurs during streaming.voidonPartialResponse(PartialResponse partialResponse, PartialResponseContext context) Invoked each time the model generates a partial textual response, usually a single token.voidonPartialResponse(String partialResponse) Invoked each time the model generates a partial textual response, usually a single token.voidonPartialThinking(PartialThinking partialThinking) Invoked each time the model generates a partial thinking/reasoning text, usually a single token.voidonPartialThinking(PartialThinking partialThinking, PartialThinkingContext context) Invoked each time the model generates a partial thinking/reasoning text, usually a single token.voidonPartialToolCall(PartialToolCall partialToolCall) This callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments.voidonPartialToolCall(PartialToolCall partialToolCall, PartialToolCallContext context) This callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments.voidonUnmappedRawEvent(Object rawEvent) Invoked when a provider emits a raw streaming event that is not already exposed through one of the typed callbacks (such asStreamingChatResponseHandler.onPartialResponse(PartialResponse, PartialResponseContext),StreamingChatResponseHandler.onPartialThinking(PartialThinking, PartialThinkingContext),StreamingChatResponseHandler.onPartialToolCall(PartialToolCall, PartialToolCallContext)orStreamingChatResponseHandler.onCompleteToolCall(CompleteToolCall)).voidResets mapping tracking.boolean
-
Constructor Details
-
MappingTrackingStreamingChatResponseHandler
-
-
Method Details
-
resetMappingTracking
public void resetMappingTracking()Resets mapping tracking. Must be called before processing each provider streaming event. -
wasMapped
public boolean wasMapped()- Returns:
trueif a typed, user-facing callback was invoked since the lastresetMappingTracking().
-
onPartialResponse
Description copied from interface:StreamingChatResponseHandlerInvoked each time the model generates a partial textual response, usually a single token.Please note that some LLM providers do not stream individual tokens, but send responses in batches. In such cases, this callback may receive multiple tokens at once.
Either this or the
StreamingChatResponseHandler.onPartialResponse(PartialResponse, PartialResponseContext)method should be implemented if you want to consume tokens as soon as they become available.- Specified by:
onPartialResponsein interfaceStreamingChatResponseHandler- Parameters:
partialResponse- A partial textual response, usually a single token.- See Also:
-
onPartialResponse
Description copied from interface:StreamingChatResponseHandlerInvoked each time the model generates a partial textual response, usually a single token.Please note that some LLM providers do not stream individual tokens, but send responses in batches. In such cases, this callback may receive multiple tokens at once.
Either this or the
StreamingChatResponseHandler.onPartialResponse(String)method should be implemented if you want to consume tokens as soon as they become available.- Specified by:
onPartialResponsein interfaceStreamingChatResponseHandler- Parameters:
partialResponse- A partial textual response, usually a single token.context- A partial response context. Contains aStreamingHandlethat can be used to cancel streaming.- See Also:
-
onPartialThinking
Description copied from interface:StreamingChatResponseHandlerInvoked each time the model generates a partial thinking/reasoning text, usually a single token.Please note that some LLM providers do not stream individual tokens, but send thinking tokens in batches. In such cases, this callback may receive multiple tokens at once.
Either this or the
StreamingChatResponseHandler.onPartialThinking(PartialThinking, PartialThinkingContext)method should be implemented if you want to consume thinking tokens as soon as they become available.- Specified by:
onPartialThinkingin interfaceStreamingChatResponseHandler- Parameters:
partialThinking- A partial thinking text, usually a single token.- See Also:
-
onPartialThinking
Description copied from interface:StreamingChatResponseHandlerInvoked each time the model generates a partial thinking/reasoning text, usually a single token.Please note that some LLM providers do not stream individual tokens, but send thinking tokens in batches. In such cases, this callback may receive multiple tokens at once.
Either this or the
StreamingChatResponseHandler.onPartialThinking(PartialThinking)method should be implemented if you want to consume thinking tokens as soon as they become available.- Specified by:
onPartialThinkingin interfaceStreamingChatResponseHandler- Parameters:
partialThinking- A partial thinking text, usually a single token.context- A partial thinking context. Contains aStreamingHandlethat can be used to cancel streaming.- See Also:
-
onPartialToolCall
Description copied from interface:StreamingChatResponseHandlerThis callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments. It is typically invoked multiple times for a single tool call untilStreamingChatResponseHandler.onCompleteToolCall(CompleteToolCall)is eventually invoked, indicating that the streaming for that tool call is finished.Here's an example of what streaming a single tool call might look like:
1. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "{\"") 2. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "city") 3. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = ""\":\"") 4. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "Mun") 5. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "ich") 6. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "\"}") 7. onCompleteToolCall(index = 0, id = "call_abc", name = "get_weather", arguments = "{\"city\":\"Munich\"}")If the model decides to call multiple tools, the index will increment, allowing you to correlate.
Please note that not all LLM providers stream tool calls token by token. Some providers (e.g., Bedrock, Google, Mistral, Ollama) return only complete tool calls. In those cases, this callback won't be invoked - only
StreamingChatResponseHandler.onCompleteToolCall(CompleteToolCall)will be called.Either this or the
StreamingChatResponseHandler.onPartialToolCall(PartialToolCall, PartialToolCallContext)method should be implemented if you want to consume partial tool calls as soon as they become available.- Specified by:
onPartialToolCallin interfaceStreamingChatResponseHandler- Parameters:
partialToolCall- A partial tool call that contains the index, tool ID, tool name and partial arguments.- See Also:
-
onPartialToolCall
Description copied from interface:StreamingChatResponseHandlerThis callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments. It is typically invoked multiple times for a single tool call untilStreamingChatResponseHandler.onCompleteToolCall(CompleteToolCall)is eventually invoked, indicating that the streaming for that tool call is finished.Here's an example of what streaming a single tool call might look like:
1. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "{\"") 2. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "city") 3. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = ""\":\"") 4. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "Mun") 5. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "ich") 6. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "\"}") 7. onCompleteToolCall(index = 0, id = "call_abc", name = "get_weather", arguments = "{\"city\":\"Munich\"}")If the model decides to call multiple tools, the index will increment, allowing you to correlate.
Please note that not all LLM providers stream tool calls token by token. Some providers (e.g., Bedrock, Google, Mistral, Ollama) return only complete tool calls. In those cases, this callback won't be invoked - only
StreamingChatResponseHandler.onCompleteToolCall(CompleteToolCall)will be called.Either this or the
StreamingChatResponseHandler.onPartialToolCall(PartialToolCall)method should be implemented if you want to consume partial tool calls as soon as they become available.- Specified by:
onPartialToolCallin interfaceStreamingChatResponseHandler- Parameters:
partialToolCall- A partial tool call that contains the index, tool ID, tool name and partial arguments.context- A partial tool call context. Contains aStreamingHandlethat can be used to cancel streaming.- See Also:
-
onCompleteToolCall
Description copied from interface:StreamingChatResponseHandlerInvoked when the model has finished streaming a single tool call.- Specified by:
onCompleteToolCallin interfaceStreamingChatResponseHandler- Parameters:
completeToolCall- A complete tool call that contains the index, tool ID, tool name, and fully assembled arguments.
-
onUnmappedRawEvent
Description copied from interface:StreamingChatResponseHandlerInvoked when a provider emits a raw streaming event that is not already exposed through one of the typed callbacks (such asStreamingChatResponseHandler.onPartialResponse(PartialResponse, PartialResponseContext),StreamingChatResponseHandler.onPartialThinking(PartialThinking, PartialThinkingContext),StreamingChatResponseHandler.onPartialToolCall(PartialToolCall, PartialToolCallContext)orStreamingChatResponseHandler.onCompleteToolCall(CompleteToolCall)).This acts as an escape hatch for provider-specific events that langchain4j does not model, such as server-tool lifecycle events (e.g., OpenAI's
web_search_call.in_progress). Events that are already delivered as partial responses, thinking or tool calls are not repeated here.The event type depends on the provider implementation. Implementations using the
dev.langchain4j.http.client.HttpClientabstraction (e.g., OpenAI, Anthropic, Google AI Gemini) typically exposeServerSentEvent; other implementations can expose provider-specific event objects (e.g., the OpenAI official Responses model exposes the SDK'sResponseStreamEvent).- Specified by:
onUnmappedRawEventin interfaceStreamingChatResponseHandler- Parameters:
rawEvent- A raw provider streaming event.
-
onCompleteResponse
Description copied from interface:StreamingChatResponseHandlerInvoked when the model has finished streaming a response.- Specified by:
onCompleteResponsein interfaceStreamingChatResponseHandler- Parameters:
completeResponse- The complete response generated by the model, containing all assembled partial text and tool calls.
-
onError
Description copied from interface:StreamingChatResponseHandlerThis method is invoked when an error occurs during streaming.- Specified by:
onErrorin interfaceStreamingChatResponseHandler- Parameters:
error- The error that occurred
-