Interface StreamingChatResponseHandler


public interface StreamingChatResponseHandler
Represents a handler for a StreamingChatModel response.
See Also:
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    onCompleteResponse(ChatResponse completeResponse)
    Invoked when the model has finished streaming a response.
    default void
    Invoked when the model has finished streaming a single tool call.
    void
    This method is invoked when an error occurs during streaming.
    void
    onPartialResponse(String partialResponse)
    Invoked each time the model generates a partial textual response, usually a single token.
    default void
    Invoked each time the model generates a partial thinking/reasoning text, usually a single token.
    default void
    This callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments.
  • Method Details

    • onPartialResponse

      void onPartialResponse(String partialResponse)
      Invoked each time the model generates a partial textual response, usually a single token.

      Please note that some LLM providers do not stream individual tokens, but send responses in batches. In such cases, this callback may receive multiple tokens at once.

      Parameters:
      partialResponse - A partial textual response, usually a single token.
    • onPartialThinking

      @Experimental default void onPartialThinking(PartialThinking partialThinking)
      Invoked each time the model generates a partial thinking/reasoning text, usually a single token.

      Please note that some LLM providers do not stream individual tokens, but send thinking tokens in batches. In such cases, this callback may receive multiple tokens at once.

      Parameters:
      partialThinking - A partial thinking text, usually a single token.
      Since:
      1.2.0
    • onPartialToolCall

      @Experimental default void onPartialToolCall(PartialToolCall partialToolCall)
      This callback is invoked each time the model generates a partial tool call, which contains a single token of the tool's arguments. It is typically invoked multiple times for a single tool call until onCompleteToolCall(CompleteToolCall) is eventually invoked, indicating that the streaming for that tool call is finished.

      Here's an example of what streaming a single tool call might look like:

       1. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "{\"")
       2. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "city")
       3. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = ""\":\"")
       4. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "Mun")
       5. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "ich")
       6. onPartialToolCall(index = 0, id = "call_abc", name = "get_weather", partialArguments = "\"}")
       7. onCompleteToolCall(index = 0, id = "call_abc", name = "get_weather", arguments = "{\"city\":\"Munich\"}")
       

      If the model decides to call multiple tools, the index will increment, allowing you to correlate.

      Please note that not all LLM providers stream tool calls token by token. Some providers (e.g., Bedrock, Google, Mistral, Ollama) return only complete tool calls. In those cases, this callback won't be invoked - only onCompleteToolCall(CompleteToolCall) will be called.

      Parameters:
      partialToolCall - A partial tool call that contains the index, tool ID, tool name and partial arguments.
      Since:
      1.2.0
    • onCompleteToolCall

      @Experimental default void onCompleteToolCall(CompleteToolCall completeToolCall)
      Invoked when the model has finished streaming a single tool call.
      Parameters:
      completeToolCall - A complete tool call that contains the index, tool ID, tool name, and fully assembled arguments.
      Since:
      1.2.0
    • onCompleteResponse

      void onCompleteResponse(ChatResponse completeResponse)
      Invoked when the model has finished streaming a response.
      Parameters:
      completeResponse - The complete response generated by the model, containing all assembled partial text and tool calls.
    • onError

      void onError(Throwable error)
      This method is invoked when an error occurs during streaming.
      Parameters:
      error - The error that occurred