Interface TokenStream
- All Known Implementing Classes:
AiServiceTokenStream
-
Method Summary
Modifier and TypeMethodDescriptiondefault TokenStreambeforeToolExecution(Consumer<BeforeToolExecution> beforeToolExecutionHandler) The provided consumer will be invoked right before a tool is executed.All errors during streaming will be ignored (but will be logged with a WARN log level).onCompleteResponse(Consumer<ChatResponse> completeResponseHandler) The provided consumer will be invoked when a language model finishes streaming the final chat response, as opposed to the intermediate response (seeonIntermediateResponse(Consumer)).The provided consumer will be invoked when an error occurs during streaming.default TokenStreamonIntermediateResponse(Consumer<ChatResponse> intermediateResponseHandler) The provided consumer will be invoked when a language model finishes streaming the intermediate chat response, as opposed to the final response (seeonCompleteResponse(Consumer)).onPartialResponse(Consumer<String> partialResponseHandler) The provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.default TokenStreamThe provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.default TokenStreamonPartialThinking(Consumer<PartialThinking> partialThinkingHandler) The provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.default TokenStreamThe provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.onRetrieved(Consumer<List<Content>> contentHandler) The provided consumer will be invoked if anyContents are retrieved usingRetrievalAugmentor.onToolExecuted(Consumer<ToolExecution> toolExecuteHandler) The provided consumer will be invoked right after a tool is executed.voidstart()Completes the current token stream building and starts processing.
-
Method Details
-
onPartialResponse
The provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.Either this or the
onPartialResponseWithContext(BiConsumer)callback can be used if you want to consume tokens as soon as they become available.- Parameters:
partialResponseHandler- lambda that will be invoked when a model generates a new partial textual response- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onPartialResponseWithContext
@Experimental default TokenStream onPartialResponseWithContext(BiConsumer<PartialResponse, PartialResponseContext> handler) The provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.Either this or the
onPartialResponse(Consumer)callback can be used if you want to consume tokens as soon as they become available.- Parameters:
handler- lambda that will be invoked when a model generates a new partial textual response- Returns:
- token stream instance used to configure or start stream processing
- Since:
- 1.8.0
- See Also:
-
onPartialThinking
@Experimental default TokenStream onPartialThinking(Consumer<PartialThinking> partialThinkingHandler) The provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.Either this or the
onPartialThinkingWithContext(BiConsumer)callback can be used if you want to consume thinking tokens as soon as they become available.- Parameters:
partialThinkingHandler- lambda that will be invoked when a model generates a new partial thinking/reasoning text- Returns:
- token stream instance used to configure or start stream processing
- Since:
- 1.2.0
- See Also:
-
onPartialThinkingWithContext
@Experimental default TokenStream onPartialThinkingWithContext(BiConsumer<PartialThinking, PartialThinkingContext> handler) The provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.Either this or the
onPartialThinking(Consumer)callback can be used if you want to consume thinking tokens as soon as they become available.- Parameters:
handler- lambda that will be invoked when a model generates a new partial thinking/reasoning text- Returns:
- token stream instance used to configure or start stream processing
- Since:
- 1.8.0
- See Also:
-
onRetrieved
The provided consumer will be invoked if anyContents are retrieved usingRetrievalAugmentor.The invocation happens before any call is made to the language model.
- Parameters:
contentHandler- lambda that consumes all retrieved contents- Returns:
- token stream instance used to configure or start stream processing
-
onIntermediateResponse
The provided consumer will be invoked when a language model finishes streaming the intermediate chat response, as opposed to the final response (seeonCompleteResponse(Consumer)). Intermediate chat responses containToolExecutionRequests, AI service will execute them after returning from this consumer.- Parameters:
intermediateResponseHandler- lambda that consumes intermediate chat responses- Returns:
- token stream instance used to configure or start stream processing
- Since:
- 1.2.0
- See Also:
-
beforeToolExecution
The provided consumer will be invoked right before a tool is executed.- Parameters:
beforeToolExecutionHandler- lambda that consumesBeforeToolExecution- Returns:
- token stream instance used to configure or start stream processing
- Since:
- 1.2.0
-
onToolExecuted
The provided consumer will be invoked right after a tool is executed.The invocation happens after the tool method has finished and before any other tool is executed.
- Parameters:
toolExecuteHandler- lambda that consumesToolExecution- Returns:
- token stream instance used to configure or start stream processing
-
onCompleteResponse
The provided consumer will be invoked when a language model finishes streaming the final chat response, as opposed to the intermediate response (seeonIntermediateResponse(Consumer)).Please note that
ChatResponse.tokenUsage()contains aggregate token usage across all calls to the LLM. It is a sum ofChatResponse.tokenUsage()s of all intermediate responses (onIntermediateResponse(Consumer)).- Parameters:
completeResponseHandler- lambda that will be invoked when language model finishes streaming- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onError
The provided consumer will be invoked when an error occurs during streaming.- Parameters:
errorHandler- lambda that will be invoked when an error occurs- Returns:
- token stream instance used to configure or start stream processing
-
ignoreErrors
TokenStream ignoreErrors()All errors during streaming will be ignored (but will be logged with a WARN log level).- Returns:
- token stream instance used to configure or start stream processing
-
start
void start()Completes the current token stream building and starts processing.Will send a request to LLM and start response streaming.
-