Class AiServiceTokenStream
- All Implemented Interfaces:
TokenStream
-
Constructor Summary
ConstructorsConstructorDescriptionAiServiceTokenStream(AiServiceTokenStreamParameters parameters) Creates a new instance ofAiServiceTokenStreamwith the given parameters. -
Method Summary
Modifier and TypeMethodDescriptionbeforeToolExecution(Consumer<BeforeToolExecution> beforeToolExecutionHandler) The provided consumer will be invoked right before a tool is executed.All errors during streaming will be ignored (but will be logged with a WARN log level).onCompleteResponse(Consumer<ChatResponse> completionHandler) The provided consumer will be invoked when a language model finishes streaming the final chat response, as opposed to the intermediate response (seeTokenStream.onIntermediateResponse(Consumer)).The provided consumer will be invoked when an error occurs during streaming.onIntermediateResponse(Consumer<ChatResponse> intermediateResponseHandler) The provided consumer will be invoked when a language model finishes streaming the intermediate chat response, as opposed to the final response (seeTokenStream.onCompleteResponse(Consumer)).onPartialResponse(Consumer<String> partialResponseHandler) The provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.The provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.onPartialThinking(Consumer<PartialThinking> partialThinkingHandler) The provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.The provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.onPartialToolCall(Consumer<PartialToolCall> partialToolCallHandler) The provided consumer will be invoked every time a new partial tool call (usually containing a single token of the tool's arguments) from a language model is available.The provided consumer will be invoked every time a new partial tool call (usually containing a single token of the tool's arguments) from a language model is available.onRetrieved(Consumer<List<Content>> contentsHandler) The provided consumer will be invoked if anyContents are retrieved usingRetrievalAugmentor.onToolExecuted(Consumer<ToolExecution> toolExecutionHandler) The provided consumer will be invoked right after a tool is executed.onUnmappedRawEvent(Consumer<Object> rawEventHandler) The provided consumer will be invoked when a provider emits a raw streaming event that is not already exposed through one of the typed callbacks (such asTokenStream.onPartialResponse(Consumer),TokenStream.onPartialThinking(Consumer)orTokenStream.onToolExecuted(Consumer)).voidstart()Completes the current token stream building and starts processing.
-
Constructor Details
-
AiServiceTokenStream
Creates a new instance ofAiServiceTokenStreamwith the given parameters.- Parameters:
parameters- the parameters for creating the token stream
-
-
Method Details
-
onPartialResponse
Description copied from interface:TokenStreamThe provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.Either this or the
TokenStream.onPartialResponseWithContext(BiConsumer)callback can be used if you want to consume tokens as soon as they become available.- Specified by:
onPartialResponsein interfaceTokenStream- Parameters:
partialResponseHandler- lambda that will be invoked when a model generates a new partial textual response- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onPartialResponseWithContext
public TokenStream onPartialResponseWithContext(BiConsumer<PartialResponse, PartialResponseContext> handler) Description copied from interface:TokenStreamThe provided consumer will be invoked every time a new partial textual response (usually a single token) from a language model is available.Either this or the
TokenStream.onPartialResponse(Consumer)callback can be used if you want to consume tokens as soon as they become available.- Specified by:
onPartialResponseWithContextin interfaceTokenStream- Parameters:
handler- lambda that will be invoked when a model generates a new partial textual response- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onPartialThinking
Description copied from interface:TokenStreamThe provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.Either this or the
TokenStream.onPartialThinkingWithContext(BiConsumer)callback can be used if you want to consume thinking tokens as soon as they become available.- Specified by:
onPartialThinkingin interfaceTokenStream- Parameters:
partialThinkingHandler- lambda that will be invoked when a model generates a new partial thinking/reasoning text- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onPartialThinkingWithContext
public TokenStream onPartialThinkingWithContext(BiConsumer<PartialThinking, PartialThinkingContext> handler) Description copied from interface:TokenStreamThe provided consumer will be invoked every time a new partial thinking/reasoning text (usually a single token) from a language model is available.Either this or the
TokenStream.onPartialThinking(Consumer)callback can be used if you want to consume thinking tokens as soon as they become available.- Specified by:
onPartialThinkingWithContextin interfaceTokenStream- Parameters:
handler- lambda that will be invoked when a model generates a new partial thinking/reasoning text- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onPartialToolCall
Description copied from interface:TokenStreamThe provided consumer will be invoked every time a new partial tool call (usually containing a single token of the tool's arguments) from a language model is available.Either this or the
TokenStream.onPartialToolCallWithContext(BiConsumer)callback can be used if you want to consume partial tool calls as soon as they become available.- Specified by:
onPartialToolCallin interfaceTokenStream- Parameters:
partialToolCallHandler- lambda that will be invoked when a model generates a new partial tool call- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onPartialToolCallWithContext
public TokenStream onPartialToolCallWithContext(BiConsumer<PartialToolCall, PartialToolCallContext> handler) Description copied from interface:TokenStreamThe provided consumer will be invoked every time a new partial tool call (usually containing a single token of the tool's arguments) from a language model is available.Either this or the
TokenStream.onPartialToolCall(Consumer)callback can be used if you want to consume partial tool calls as soon as they become available.- Specified by:
onPartialToolCallWithContextin interfaceTokenStream- Parameters:
handler- lambda that will be invoked when a model generates a new partial tool call- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onRetrieved
Description copied from interface:TokenStreamThe provided consumer will be invoked if anyContents are retrieved usingRetrievalAugmentor.The invocation happens before any call is made to the language model.
- Specified by:
onRetrievedin interfaceTokenStream- Parameters:
contentsHandler- lambda that consumes all retrieved contents- Returns:
- token stream instance used to configure or start stream processing
-
onIntermediateResponse
Description copied from interface:TokenStreamThe provided consumer will be invoked when a language model finishes streaming the intermediate chat response, as opposed to the final response (seeTokenStream.onCompleteResponse(Consumer)). Intermediate chat responses containToolExecutionRequests, AI service will execute them after returning from this consumer.- Specified by:
onIntermediateResponsein interfaceTokenStream- Parameters:
intermediateResponseHandler- lambda that consumes intermediate chat responses- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
beforeToolExecution
Description copied from interface:TokenStreamThe provided consumer will be invoked right before a tool is executed.- Specified by:
beforeToolExecutionin interfaceTokenStream- Parameters:
beforeToolExecutionHandler- lambda that consumesBeforeToolExecution- Returns:
- token stream instance used to configure or start stream processing
-
onUnmappedRawEvent
Description copied from interface:TokenStreamThe provided consumer will be invoked when a provider emits a raw streaming event that is not already exposed through one of the typed callbacks (such asTokenStream.onPartialResponse(Consumer),TokenStream.onPartialThinking(Consumer)orTokenStream.onToolExecuted(Consumer)).This acts as an escape hatch for provider-specific events that langchain4j does not model, such as server-tool lifecycle events (e.g., OpenAI's
web_search_call.in_progress). Events that are already delivered as partial responses, thinking or tool calls are not repeated here.The event type depends on the provider implementation. Implementations using the
dev.langchain4j.http.client.HttpClientabstraction (e.g., OpenAI, Anthropic, Google AI Gemini) typically exposeServerSentEvent; other implementations can expose provider-specific event objects (e.g., the OpenAI official Responses model exposes the SDK'sResponseStreamEvent).- Specified by:
onUnmappedRawEventin interfaceTokenStream- Parameters:
rawEventHandler- lambda that consumes raw provider streaming events- Returns:
- token stream instance used to configure or start stream processing
-
onToolExecuted
Description copied from interface:TokenStreamThe provided consumer will be invoked right after a tool is executed.The invocation happens after the tool method has finished and before any other tool is executed.
- Specified by:
onToolExecutedin interfaceTokenStream- Parameters:
toolExecutionHandler- lambda that consumesToolExecution- Returns:
- token stream instance used to configure or start stream processing
-
onCompleteResponse
Description copied from interface:TokenStreamThe provided consumer will be invoked when a language model finishes streaming the final chat response, as opposed to the intermediate response (seeTokenStream.onIntermediateResponse(Consumer)).Please note that
ChatResponse.tokenUsage()contains aggregate token usage across all calls to the LLM. It is a sum ofChatResponse.tokenUsage()s of all intermediate responses (TokenStream.onIntermediateResponse(Consumer)).- Specified by:
onCompleteResponsein interfaceTokenStream- Parameters:
completionHandler- lambda that will be invoked when language model finishes streaming- Returns:
- token stream instance used to configure or start stream processing
- See Also:
-
onError
Description copied from interface:TokenStreamThe provided consumer will be invoked when an error occurs during streaming.- Specified by:
onErrorin interfaceTokenStream- Parameters:
errorHandler- lambda that will be invoked when an error occurs- Returns:
- token stream instance used to configure or start stream processing
-
ignoreErrors
Description copied from interface:TokenStreamAll errors during streaming will be ignored (but will be logged with a WARN log level).- Specified by:
ignoreErrorsin interfaceTokenStream- Returns:
- token stream instance used to configure or start stream processing
-
start
public void start()Description copied from interface:TokenStreamCompletes the current token stream building and starts processing.Will send a request to LLM and start response streaming.
- Specified by:
startin interfaceTokenStream
-