Class AiServiceTokenStream
- All Implemented Interfaces:
TokenStream
-
Constructor Summary
ConstructorsConstructorDescriptionAiServiceTokenStream(AiServiceTokenStreamParameters parameters) Creates a new instance ofAiServiceTokenStreamwith the given parameters. -
Method Summary
Modifier and TypeMethodDescriptionbeforeToolExecution(Consumer<BeforeToolExecution> beforeToolExecutionHandler) The provided consumer will be invoked right before a tool is executed.All errors during streaming will be ignored (but will be logged with a WARN log level).onCompleteResponse(Consumer<ChatResponse> completionHandler) The provided consumer will be invoked when a language model finishes streaming the final chat response, as opposed to the intermediate response (seeTokenStream.onIntermediateResponse(Consumer)).The provided consumer will be invoked when an error occurs during streaming.onIntermediateResponse(Consumer<ChatResponse> intermediateResponseHandler) The provided consumer will be invoked when a language model finishes streaming the intermediate chat response, as opposed to the final response (seeTokenStream.onCompleteResponse(Consumer)).onPartialResponse(Consumer<String> partialResponseHandler) The provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.The provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.onPartialThinking(Consumer<PartialThinking> partialThinkingHandler) The provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.The provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.onRetrieved(Consumer<List<Content>> contentsHandler) The provided consumer will be invoked if anyContents are retrieved usingRetrievalAugmentor.onToolExecuted(Consumer<ToolExecution> toolExecutionHandler) The provided consumer will be invoked right after a tool is executed.voidstart()Completes the current token stream building and starts processing.
-
Constructor Details
-
AiServiceTokenStream
Creates a new instance ofAiServiceTokenStreamwith the given parameters.- Parameters:
parameters- the parameters for creating the token stream
-
-
Method Details
-
onPartialResponse
Description copied from interface:TokenStreamThe provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.Either this or the
TokenStream.onPartialResponseWithContext(BiConsumer)callback can be used if you want to consume tokens as soon as they become available.- Specified by:
onPartialResponsein interfaceTokenStream- Parameters:
partialResponseHandler- lambda that will be invoked when a model generates a new partial textual response- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onPartialResponseWithContext
public TokenStream onPartialResponseWithContext(BiConsumer<PartialResponse, PartialResponseContext> handler) Description copied from interface:TokenStreamThe provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.Either this or the
TokenStream.onPartialResponse(Consumer)callback can be used if you want to consume tokens as soon as they become available.- Specified by:
onPartialResponseWithContextin interfaceTokenStream- Parameters:
handler- lambda that will be invoked when a model generates a new partial textual response- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onPartialThinking
Description copied from interface:TokenStreamThe provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.Either this or the
TokenStream.onPartialThinkingWithContext(BiConsumer)callback can be used if you want to consume thinking tokens as soon as they become available.- Specified by:
onPartialThinkingin interfaceTokenStream- Parameters:
partialThinkingHandler- lambda that will be invoked when a model generates a new partial thinking/reasoning text- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onPartialThinkingWithContext
public TokenStream onPartialThinkingWithContext(BiConsumer<PartialThinking, PartialThinkingContext> handler) Description copied from interface:TokenStreamThe provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.Either this or the
TokenStream.onPartialThinking(Consumer)callback can be used if you want to consume thinking tokens as soon as they become available.- Specified by:
onPartialThinkingWithContextin interfaceTokenStream- Parameters:
handler- lambda that will be invoked when a model generates a new partial thinking/reasoning text- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onRetrieved
Description copied from interface:TokenStreamThe provided consumer will be invoked if anyContents are retrieved usingRetrievalAugmentor.The invocation happens before any call is made to the language model.
- Specified by:
onRetrievedin interfaceTokenStream- Parameters:
contentsHandler- lambda that consumes all retrieved contents- Returns:
- token stream instance used to configure or start stream processing
-
onIntermediateResponse
Description copied from interface:TokenStreamThe provided consumer will be invoked when a language model finishes streaming the intermediate chat response, as opposed to the final response (seeTokenStream.onCompleteResponse(Consumer)). Intermediate chat responses containToolExecutionRequests, AI service will execute them after returning from this consumer.- Specified by:
onIntermediateResponsein interfaceTokenStream- Parameters:
intermediateResponseHandler- lambda that consumes intermediate chat responses- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
beforeToolExecution
Description copied from interface:TokenStreamThe provided consumer will be invoked right before a tool is executed.- Specified by:
beforeToolExecutionin interfaceTokenStream- Parameters:
beforeToolExecutionHandler- lambda that consumesBeforeToolExecution- Returns:
- token stream instance used to configure or start stream processing
-
onToolExecuted
Description copied from interface:TokenStreamThe provided consumer will be invoked right after a tool is executed.The invocation happens after the tool method has finished and before any other tool is executed.
- Specified by:
onToolExecutedin interfaceTokenStream- Parameters:
toolExecutionHandler- lambda that consumesToolExecution- Returns:
- token stream instance used to configure or start stream processing
-
onCompleteResponse
Description copied from interface:TokenStreamThe provided consumer will be invoked when a language model finishes streaming the final chat response, as opposed to the intermediate response (seeTokenStream.onIntermediateResponse(Consumer)).Please note that
ChatResponse.tokenUsage()contains aggregate token usage across all calls to the LLM. It is a sum ofChatResponse.tokenUsage()s of all intermediate responses (TokenStream.onIntermediateResponse(Consumer)).- Specified by:
onCompleteResponsein interfaceTokenStream- Parameters:
completionHandler- lambda that will be invoked when language model finishes streaming- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onError
Description copied from interface:TokenStreamThe provided consumer will be invoked when an error occurs during streaming.- Specified by:
onErrorin interfaceTokenStream- Parameters:
errorHandler- lambda that will be invoked when an error occurs- Returns:
- token stream instance used to configure or start stream processing
-
ignoreErrors
Description copied from interface:TokenStreamAll errors during streaming will be ignored (but will be logged with a WARN log level).- Specified by:
ignoreErrorsin interfaceTokenStream- Returns:
- token stream instance used to configure or start stream processing
-
start
public void start()Description copied from interface:TokenStreamCompletes the current token stream building and starts processing.Will send a request to LLM and start response streaming.
- Specified by:
startin interfaceTokenStream
-