Observability
Chat Model Observability
Certain implementations of ChatLanguageModel
and StreamingChatLanguageModel
(see "Observability" column") allow configuring ChatModelListener
(s) to listen for events such as:
- Requests to the LLM
- Response from the LLM
- Errors
These events include various attributes, as described in the OpenTelemetry Generative AI Semantic Conventions, such as:
- Request:
- Messages
- Model
- Temperature
- Top P
- Max Tokens
- Tools
- Response Format
- etc
- Response:
- Assistant Message
- ID
- Model
- Token Usage
- Finish Reason
- etc
Here is an example of using ChatModelListener
:
ChatModelListener listener = new ChatModelListener() {
@Override
public void onRequest(ChatModelRequestContext requestContext) {
ChatRequest chatRequest = requestContext.chatRequest();
List<ChatMessage> messages = chatRequest.messages();
System.out.println(messages);
ChatRequestParameters parameters = chatRequest.parameters();
System.out.println(parameters.modelName());
System.out.println(parameters.temperature());
System.out.println(parameters.topP());
System.out.println(parameters.topK());
System.out.println(parameters.frequencyPenalty());
System.out.println(parameters.presencePenalty());
System.out.println(parameters.maxOutputTokens());
System.out.println(parameters.stopSequences());
System.out.println(parameters.toolSpecifications());
System.out.println(parameters.toolChoice());
System.out.println(parameters.responseFormat());
if (parameters instanceof OpenAiChatRequestParameters openAiParameters) {
System.out.println(openAiParameters.maxCompletionTokens());
System.out.println(openAiParameters.logitBias());
System.out.println(openAiParameters.parallelToolCalls());
System.out.println(openAiParameters.seed());
System.out.println(openAiParameters.user());
System.out.println(openAiParameters.store());
System.out.println(openAiParameters.metadata());
System.out.println(openAiParameters.serviceTier());
System.out.println(openAiParameters.reasoningEffort());
}
Map<Object, Object> attributes = requestContext.attributes();
attributes.put("my-attribute", "my-value");
}
@Override
public void onResponse(ChatModelResponseContext responseContext) {
ChatResponse chatResponse = responseContext.chatResponse();
AiMessage aiMessage = chatResponse.aiMessage();
System.out.println(aiMessage);
ChatResponseMetadata metadata = chatResponse.metadata();
System.out.println(metadata.id());
System.out.println(metadata.modelName());
System.out.println(metadata.finishReason());
if (metadata instanceof OpenAiChatResponseMetadata openAiMetadata) {
System.out.println(openAiMetadata.created());
System.out.println(openAiMetadata.serviceTier());
System.out.println(openAiMetadata.systemFingerprint());
}
TokenUsage tokenUsage = metadata.tokenUsage();
System.out.println(tokenUsage.inputTokenCount());
System.out.println(tokenUsage.outputTokenCount());
System.out.println(tokenUsage.totalTokenCount());
if (tokenUsage instanceof OpenAiTokenUsage openAiTokenUsage) {
System.out.println(openAiTokenUsage.inputTokensDetails().cachedTokens());
System.out.println(openAiTokenUsage.outputTokensDetails().reasoningTokens());
}
ChatRequest chatRequest = responseContext.chatRequest();
System.out.println(chatRequest);
Map<Object, Object> attributes = responseContext.attributes();
System.out.println(attributes.get("my-attribute"));
}
@Override
public void onError(ChatModelErrorContext errorContext) {
Throwable error = errorContext.error();
error.printStackTrace();
ChatRequest chatRequest = errorContext.chatRequest();
System.out.println(chatRequest);
Map<Object, Object> attributes = errorContext.attributes();
System.out.println(attributes.get("my-attribute"));
}
};
ChatLanguageModel model = OpenAiChatModel.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.modelName(GPT_4_O_MINI)
.listeners(List.of(listener))
.build();
model.generate("Tell me a joke about Java");
The attributes
map allows passing information between the onRequest
, onResponse
, and onError
methods of the same
ChatModelListener
, as well as between multiple ChatModelListener
s.
How listeners work
- Listeners are specified as a
List<ChatModelListener>
and are called in the order of iteration. - Listeners are called synchronously and in the same thread. See more details about the streaming case below. The second listener is not called until the first one returns.
- The
ChatModelListener.onRequest()
method is called right before calling the LLM provider API. - The
ChatModelListener.onRequest()
method is called only once per request. If an error occurs while calling the LLM provider API and a retry happens,ChatModelListener.onRequest()
will not be called for every retry. - The
ChatModelListener.onResponse()
method is called only once, immediately after receiving a successful response from the LLM provider. - The
ChatModelListener.onError()
method is called only once. If an error occurs while calling the LLM provider API and a retry happens,ChatModelListener.onError()
will not be called for every retry. - If an exception is thrown from one of the
ChatModelListener
methods, it will be logged at theWARN
level. The execution of subsequent listeners will continue as usual. - The
ChatRequest
provided viaChatModelRequestContext
,ChatModelResponseContext
, andChatModelErrorContext
is the final request, containing both the defaultChatRequestParameters
configured on theChatLanguageModel
and the request-specificChatRequestParameters
merged together. - For
StreamingChatLanguageModel
, theChatModelListener.onResponse()
andChatModelListener.onError()
are called on a different thread than theChatModelListener.onRequest()
. The thread context is currently not propagated automatically, so you might want to use theattributes
map to propagate any necessary data fromChatModelListener.onRequest()
toChatModelListener.onResponse()
orChatModelListener.onError()
. - For
StreamingChatLanguageModel
, theChatModelListener.onResponse()
is called before theStreamingChatResponseHandler.onCompleteResponse()
is called. TheChatModelListener.onError()
is called before theStreamingChatResponseHandler.onError()
is called.
Observability in Spring Boot Application
See more details here.