Interface TokenStream

All Known Implementing Classes:
AiServiceTokenStream

public interface TokenStream
Represents a token stream from the model to which you can subscribe and receive updates when a new partial response (usually a single token) is available, when the model finishes streaming, or when an error occurs during streaming. It is intended to be used as a return type in AI Service.
  • Method Details

    • onPartialResponse

      TokenStream onPartialResponse(Consumer<String> partialResponseHandler)
      The provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.
      Parameters:
      partialResponseHandler - lambda that will be invoked when a model generates a new partial textual response
      Returns:
      token stream instance used to configure or start stream processing
    • onPartialThinking

      @Experimental default TokenStream onPartialThinking(Consumer<PartialThinking> partialThinkingHandler)
      The provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.
      Parameters:
      partialThinkingHandler - lambda that will be invoked when a model generates a new partial thinking/reasoning text
      Returns:
      token stream instance used to configure or start stream processing
      Since:
      1.2.0
    • onRetrieved

      TokenStream onRetrieved(Consumer<List<Content>> contentHandler)
      The provided consumer will be invoked if any Contents are retrieved using RetrievalAugmentor.

      The invocation happens before any call is made to the language model.

      Parameters:
      contentHandler - lambda that consumes all retrieved contents
      Returns:
      token stream instance used to configure or start stream processing
    • onIntermediateResponse

      default TokenStream onIntermediateResponse(Consumer<ChatResponse> intermediateResponseHandler)
      The provided consumer will be invoked when a language model finishes streaming the intermediate chat response, as opposed to the final response (see onCompleteResponse(Consumer)). Intermediate chat responses contain ToolExecutionRequests, AI service will execute them after returning from this consumer.
      Parameters:
      intermediateResponseHandler - lambda that consumes intermediate chat responses
      Returns:
      token stream instance used to configure or start stream processing
      Since:
      1.2.0
      See Also:
    • beforeToolExecution

      default TokenStream beforeToolExecution(Consumer<BeforeToolExecution> beforeToolExecutionHandler)
      The provided consumer will be invoked right before a tool is executed.
      Parameters:
      beforeToolExecutionHandler - lambda that consumes BeforeToolExecution
      Returns:
      token stream instance used to configure or start stream processing
      Since:
      1.2.0
    • onToolExecuted

      TokenStream onToolExecuted(Consumer<ToolExecution> toolExecuteHandler)
      The provided consumer will be invoked right after a tool is executed.

      The invocation happens after the tool method has finished and before any other tool is executed.

      Parameters:
      toolExecuteHandler - lambda that consumes ToolExecution
      Returns:
      token stream instance used to configure or start stream processing
    • onCompleteResponse

      TokenStream onCompleteResponse(Consumer<ChatResponse> completeResponseHandler)
      The provided consumer will be invoked when a language model finishes streaming the final chat response, as opposed to the intermediate response (see onIntermediateResponse(Consumer)).

      Please note that ChatResponse.tokenUsage() contains aggregate token usage across all calls to the LLM. It is a sum of ChatResponse.tokenUsage()s of all intermediate responses (onIntermediateResponse(Consumer)).

      Parameters:
      completeResponseHandler - lambda that will be invoked when language model finishes streaming
      Returns:
      token stream instance used to configure or start stream processing
      See Also:
    • onError

      TokenStream onError(Consumer<Throwable> errorHandler)
      The provided consumer will be invoked when an error occurs during streaming.
      Parameters:
      errorHandler - lambda that will be invoked when an error occurs
      Returns:
      token stream instance used to configure or start stream processing
    • ignoreErrors

      TokenStream ignoreErrors()
      All errors during streaming will be ignored (but will be logged with a WARN log level).
      Returns:
      token stream instance used to configure or start stream processing
    • start

      void start()
      Completes the current token stream building and starts processing.

      Will send a request to LLM and start response streaming.