dev.langchain4j.internal.MappingTrackingStreamingChatResponseHandler

All Implemented Interfaces:: StreamingChatResponseHandler

public class MappingTrackingStreamingChatResponseHandler extends Object implements StreamingChatResponseHandler

Delegating StreamingChatResponseHandler that records whether the underlying handler received any typed, user-facing callback (partial response/thinking/tool call, complete tool call, complete response or error) while processing a single provider streaming event - i.e. whether the event was mapped to a typed callback.

Providers use this to decide whether to additionally surface a raw event via StreamingChatResponseHandler.onUnmappedRawEvent(Object): a raw event should only be emitted for events that were not mapped to a typed callback. Forwarding a raw event (via onUnmappedRawEvent(Object)) does not count as a mapping.

Not thread-safe: it assumes a provider processes streaming events one at a time and calls resetMappingTracking() before each event.

Constructor Summary

Constructors

Constructor

Description

MappingTrackingStreamingChatResponseHandler(StreamingChatResponseHandler delegate)
Method Summary

Modifier and Type

Method

Description

void

onCompleteResponse(ChatResponse completeResponse)

Invoked when the model has finished streaming a response.

void

onCompleteToolCall(CompleteToolCall completeToolCall)

Invoked when the model has finished streaming a single tool call.

void

onError(Throwable error)

This method is invoked when an error occurs during streaming.

void

onPartialResponse(PartialResponse partialResponse, PartialResponseContext context)

Invoked each time the model generates a partial textual response, usually a single token.

void

onPartialResponse(String partialResponse)

Invoked each time the model generates a partial textual response, usually a single token.

void

onPartialThinking(PartialThinking partialThinking)

Invoked each time the model generates a partial thinking/reasoning text, usually a single token.

void

onPartialThinking(PartialThinking partialThinking, PartialThinkingContext context)

Invoked each time the model generates a partial thinking/reasoning text, usually a single token.

void

onPartialToolCall(PartialToolCall partialToolCall)

This callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments.

void

onPartialToolCall(PartialToolCall partialToolCall, PartialToolCallContext context)

This callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments.

void

onUnmappedRawEvent(Object rawEvent)

Invoked when a provider emits a raw streaming event that is not already exposed through one of the typed callbacks (such as StreamingChatResponseHandler.onPartialResponse(PartialResponse, PartialResponseContext), StreamingChatResponseHandler.onPartialThinking(PartialThinking, PartialThinkingContext), StreamingChatResponseHandler.onPartialToolCall(PartialToolCall, PartialToolCallContext) or StreamingChatResponseHandler.onCompleteToolCall(CompleteToolCall)).

void

resetMappingTracking()

Resets mapping tracking.

boolean

wasMapped()

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- MappingTrackingStreamingChatResponseHandler
  
  public MappingTrackingStreamingChatResponseHandler(StreamingChatResponseHandler delegate)
Method Details
- resetMappingTracking
  
  public void resetMappingTracking()
  
  Resets mapping tracking. Must be called before processing each provider streaming event.
- wasMapped
  
  public boolean wasMapped()
  
  Returns:
  
  true if a typed, user-facing callback was invoked since the last resetMappingTracking().
- onPartialResponse
  public void onPartialResponse(String partialResponse)
  
  Description copied from interface: StreamingChatResponseHandler
  
  Invoked each time the model generates a partial textual response, usually a single token.
  Please note that some LLM providers do not stream individual tokens, but send responses in batches. In such cases, this callback may receive multiple tokens at once.
  Either this or the StreamingChatResponseHandler.onPartialResponse(PartialResponse, PartialResponseContext) method should be implemented if you want to consume tokens as soon as they become available.
  
  Specified by:
  
  onPartialResponse in interface StreamingChatResponseHandler
  
  Parameters:
  
  partialResponse - A partial textual response, usually a single token.
  
  See Also:
  
  StreamingChatResponseHandler.onPartialResponse(PartialResponse, PartialResponseContext)
- onPartialResponse
  public void onPartialResponse(PartialResponse partialResponse, PartialResponseContext context)
  
  Description copied from interface: StreamingChatResponseHandler
  
  Invoked each time the model generates a partial textual response, usually a single token.
  Please note that some LLM providers do not stream individual tokens, but send responses in batches. In such cases, this callback may receive multiple tokens at once.
  Either this or the StreamingChatResponseHandler.onPartialResponse(String) method should be implemented if you want to consume tokens as soon as they become available.
  
  Specified by:
  
  onPartialResponse in interface StreamingChatResponseHandler
  
  Parameters:
  
  partialResponse - A partial textual response, usually a single token.
  
  context - A partial response context. Contains a StreamingHandle that can be used to cancel streaming.
  
  See Also:
  
  StreamingChatResponseHandler.onPartialResponse(String)
- onPartialThinking
  public void onPartialThinking(PartialThinking partialThinking)
  
  Description copied from interface: StreamingChatResponseHandler
  
  Invoked each time the model generates a partial thinking/reasoning text, usually a single token.
  Please note that some LLM providers do not stream individual tokens, but send thinking tokens in batches. In such cases, this callback may receive multiple tokens at once.
  Either this or the StreamingChatResponseHandler.onPartialThinking(PartialThinking, PartialThinkingContext) method should be implemented if you want to consume thinking tokens as soon as they become available.
  
  Specified by:
  
  onPartialThinking in interface StreamingChatResponseHandler
  
  Parameters:
  
  partialThinking - A partial thinking text, usually a single token.
  
  See Also:
  
  StreamingChatResponseHandler.onPartialThinking(PartialThinking, PartialThinkingContext)
- onPartialThinking
  public void onPartialThinking(PartialThinking partialThinking, PartialThinkingContext context)
  
  Description copied from interface: StreamingChatResponseHandler
  
  Invoked each time the model generates a partial thinking/reasoning text, usually a single token.
  Please note that some LLM providers do not stream individual tokens, but send thinking tokens in batches. In such cases, this callback may receive multiple tokens at once.
  Either this or the StreamingChatResponseHandler.onPartialThinking(PartialThinking) method should be implemented if you want to consume thinking tokens as soon as they become available.
  
  Specified by:
  
  onPartialThinking in interface StreamingChatResponseHandler
  
  Parameters:
  
  partialThinking - A partial thinking text, usually a single token.
  
  context - A partial thinking context. Contains a StreamingHandle that can be used to cancel streaming.
  
  See Also:
  
  StreamingChatResponseHandler.onPartialThinking(PartialThinking)
- onPartialToolCall
  public void onPartialToolCall(PartialToolCall partialToolCall)
  
  Description copied from interface: StreamingChatResponseHandler
  
  This callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments. It is typically invoked multiple times for a single tool call until StreamingChatResponseHandler.onCompleteToolCall(CompleteToolCall) is eventually invoked, indicating that the streaming for that tool call is finished.
  Here's an example of what streaming a single tool call might look like:
  1. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "{\"") 2. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "city") 3. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = ""\":\"") 4. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "Mun") 5. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "ich") 6. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "\"}") 7. onCompleteToolCall(index = 0, id = "call_abc", name = "get_weather", arguments = "{\"city\":\"Munich\"}")
  
  If the model decides to call multiple tools, the index will increment, allowing you to correlate.
  Please note that not all LLM providers stream tool calls token by token. Some providers (e.g., Bedrock, Google, Mistral, Ollama) return only complete tool calls. In those cases, this callback won't be invoked - only StreamingChatResponseHandler.onCompleteToolCall(CompleteToolCall) will be called.
  Either this or the StreamingChatResponseHandler.onPartialToolCall(PartialToolCall, PartialToolCallContext) method should be implemented if you want to consume partial tool calls as soon as they become available.
  
  Specified by:
  
  onPartialToolCall in interface StreamingChatResponseHandler
  
  Parameters:
  
  partialToolCall - A partial tool call that contains the index, tool ID, tool name and partial arguments.
  
  See Also:
  
  StreamingChatResponseHandler.onPartialToolCall(PartialToolCall, PartialToolCallContext)
- onPartialToolCall
  public void onPartialToolCall(PartialToolCall partialToolCall, PartialToolCallContext context)
  
  Description copied from interface: StreamingChatResponseHandler
  
  This callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments. It is typically invoked multiple times for a single tool call until StreamingChatResponseHandler.onCompleteToolCall(CompleteToolCall) is eventually invoked, indicating that the streaming for that tool call is finished.
  Here's an example of what streaming a single tool call might look like:
  1. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "{\"") 2. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "city") 3. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = ""\":\"") 4. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "Mun") 5. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "ich") 6. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "\"}") 7. onCompleteToolCall(index = 0, id = "call_abc", name = "get_weather", arguments = "{\"city\":\"Munich\"}")
  
  If the model decides to call multiple tools, the index will increment, allowing you to correlate.
  Please note that not all LLM providers stream tool calls token by token. Some providers (e.g., Bedrock, Google, Mistral, Ollama) return only complete tool calls. In those cases, this callback won't be invoked - only StreamingChatResponseHandler.onCompleteToolCall(CompleteToolCall) will be called.
  Either this or the StreamingChatResponseHandler.onPartialToolCall(PartialToolCall) method should be implemented if you want to consume partial tool calls as soon as they become available.
  
  Specified by:
  
  onPartialToolCall in interface StreamingChatResponseHandler
  
  Parameters:
  
  partialToolCall - A partial tool call that contains the index, tool ID, tool name and partial arguments.
  
  context - A partial tool call context. Contains a StreamingHandle that can be used to cancel streaming.
  
  See Also:
  
  StreamingChatResponseHandler.onPartialToolCall(PartialToolCall)
- onCompleteToolCall
  
  public void onCompleteToolCall(CompleteToolCall completeToolCall)
  
  Description copied from interface: StreamingChatResponseHandler
  
  Invoked when the model has finished streaming a single tool call.
  
  Specified by:
  
  onCompleteToolCall in interface StreamingChatResponseHandler
  
  Parameters:
  
  completeToolCall - A complete tool call that contains the index, tool ID, tool name, and fully assembled arguments.
- onUnmappedRawEvent
  
  public void onUnmappedRawEvent(Object rawEvent)
  
  Description copied from interface: StreamingChatResponseHandler
  
  Invoked when a provider emits a raw streaming event that is not already exposed through one of the typed callbacks (such as StreamingChatResponseHandler.onPartialResponse(PartialResponse, PartialResponseContext), StreamingChatResponseHandler.onPartialThinking(PartialThinking, PartialThinkingContext), StreamingChatResponseHandler.onPartialToolCall(PartialToolCall, PartialToolCallContext) or StreamingChatResponseHandler.onCompleteToolCall(CompleteToolCall)).
  This acts as an escape hatch for provider-specific events that langchain4j does not model, such as server-tool lifecycle events (e.g., OpenAI's web_search_call.in_progress). Events that are already delivered as partial responses, thinking or tool calls are not repeated here.
  The event type depends on the provider implementation. Implementations using the dev.langchain4j.http.client.HttpClient abstraction (e.g., OpenAI, Anthropic, Google AI Gemini) typically expose ServerSentEvent; other implementations can expose provider-specific event objects (e.g., the OpenAI official Responses model exposes the SDK's ResponseStreamEvent).
  
  Specified by:
  
  onUnmappedRawEvent in interface StreamingChatResponseHandler
  
  Parameters:
  
  rawEvent - A raw provider streaming event.
- onCompleteResponse
  
  public void onCompleteResponse(ChatResponse completeResponse)
  
  Description copied from interface: StreamingChatResponseHandler
  
  Invoked when the model has finished streaming a response.
  
  Specified by:
  
  onCompleteResponse in interface StreamingChatResponseHandler
  
  Parameters:
  
  completeResponse - The complete response generated by the model, containing all assembled partial text and tool calls.
- onError
  
  public void onError(Throwable error)
  
  Description copied from interface: StreamingChatResponseHandler
  
  This method is invoked when an error occurs during streaming.
  
  Specified by:
  
  onError in interface StreamingChatResponseHandler
  
  Parameters:
  
  error - The error that occurred

Class MappingTrackingStreamingChatResponseHandler

Constructor Summary

Method Summary

Methods inherited from class Object

Constructor Details

MappingTrackingStreamingChatResponseHandler

Method Details

resetMappingTracking

wasMapped

onPartialResponse

onPartialResponse

onPartialThinking

onPartialThinking

onPartialToolCall

onPartialToolCall

onCompleteToolCall

onUnmappedRawEvent

onCompleteResponse

onError