Package dev.langchain4j.service
Interface TokenStream
- All Known Implementing Classes:
AiServiceTokenStream
public interface TokenStream
Represents a token stream from the model to which you can subscribe and receive updates
when a new partial response (usually a single token) is available,
when the model finishes streaming, or when an error occurs during streaming.
It is intended to be used as a return type in AI Service.
-
Method Summary
Modifier and TypeMethodDescriptiondefault TokenStream
beforeToolExecution
(Consumer<BeforeToolExecution> beforeToolExecutionHandler) The provided consumer will be invoked right before a tool is executed.All errors during streaming will be ignored (but will be logged with a WARN log level).onCompleteResponse
(Consumer<ChatResponse> completeResponseHandler) The provided consumer will be invoked when a language model finishes streaming the final chat response, as opposed to the intermediate response (seeonIntermediateResponse(Consumer)
).The provided consumer will be invoked when an error occurs during streaming.default TokenStream
onIntermediateResponse
(Consumer<ChatResponse> intermediateResponseHandler) The provided consumer will be invoked when a language model finishes streaming the intermediate chat response, as opposed to the final response (seeonCompleteResponse(Consumer)
).onPartialResponse
(Consumer<String> partialResponseHandler) The provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.default TokenStream
onPartialThinking
(Consumer<PartialThinking> partialThinkingHandler) The provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.onRetrieved
(Consumer<List<Content>> contentHandler) The provided consumer will be invoked if anyContent
s are retrieved usingRetrievalAugmentor
.onToolExecuted
(Consumer<ToolExecution> toolExecuteHandler) The provided consumer will be invoked right after a tool is executed.void
start()
Completes the current token stream building and starts processing.
-
Method Details
-
onPartialResponse
The provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.- Parameters:
partialResponseHandler
- lambda that will be invoked when a model generates a new partial textual response- Returns:
- token stream instance used to configure or start stream processing
-
onPartialThinking
@Experimental default TokenStream onPartialThinking(Consumer<PartialThinking> partialThinkingHandler) The provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.- Parameters:
partialThinkingHandler
- lambda that will be invoked when a model generates a new partial thinking/reasoning text- Returns:
- token stream instance used to configure or start stream processing
- Since:
- 1.2.0
-
onRetrieved
The provided consumer will be invoked if anyContent
s are retrieved usingRetrievalAugmentor
.The invocation happens before any call is made to the language model.
- Parameters:
contentHandler
- lambda that consumes all retrieved contents- Returns:
- token stream instance used to configure or start stream processing
-
onIntermediateResponse
The provided consumer will be invoked when a language model finishes streaming the intermediate chat response, as opposed to the final response (seeonCompleteResponse(Consumer)
). Intermediate chat responses containToolExecutionRequest
s, AI service will execute them after returning from this consumer.- Parameters:
intermediateResponseHandler
- lambda that consumes intermediate chat responses- Returns:
- token stream instance used to configure or start stream processing
- Since:
- 1.2.0
- See Also:
-
beforeToolExecution
The provided consumer will be invoked right before a tool is executed.- Parameters:
beforeToolExecutionHandler
- lambda that consumesBeforeToolExecution
- Returns:
- token stream instance used to configure or start stream processing
- Since:
- 1.2.0
-
onToolExecuted
The provided consumer will be invoked right after a tool is executed.The invocation happens after the tool method has finished and before any other tool is executed.
- Parameters:
toolExecuteHandler
- lambda that consumesToolExecution
- Returns:
- token stream instance used to configure or start stream processing
-
onCompleteResponse
The provided consumer will be invoked when a language model finishes streaming the final chat response, as opposed to the intermediate response (seeonIntermediateResponse(Consumer)
).Please note that
ChatResponse.tokenUsage()
contains aggregate token usage across all calls to the LLM. It is a sum ofChatResponse.tokenUsage()
s of all intermediate responses (onIntermediateResponse(Consumer)
).- Parameters:
completeResponseHandler
- lambda that will be invoked when language model finishes streaming- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onError
The provided consumer will be invoked when an error occurs during streaming.- Parameters:
errorHandler
- lambda that will be invoked when an error occurs- Returns:
- token stream instance used to configure or start stream processing
-
ignoreErrors
TokenStream ignoreErrors()All errors during streaming will be ignored (but will be logged with a WARN log level).- Returns:
- token stream instance used to configure or start stream processing
-
start
void start()Completes the current token stream building and starts processing.Will send a request to LLM and start response streaming.
-