Observability
AI Service Observability
AI Service observability is an experimental feature. Its API and behavior may change in future versions.
AI Service observability mechanisms allow users to track what is happening during an AiService
invocation. A single invocation may involve multiple LLM invocations, any of which may succeed or fail. AI Service observability allows users to track the full sequence of invocations and their outcomes.
The AI Service observability capabilities are only available when using AI Services. They are a higher-level construct that can not be applied to a ChatModel
or StreamingChatModel
.
The implementation was originally implemented in the Quarkus LangChain4j extension and was backported here.
Types of events
Each type of event has a unique identifier, which can be used to correlate events across multiple invocations.
Each type of event includes information encapsulated inside an
InvocationContext
.
The following types of events are currently available:
Event Name | Description |
---|---|
AiServiceStartedEvent | Invoked when an LLM invocation has started. |
AiServiceResponseReceivedEvent | Invoked with a response from an LLM. It is important to note that this can be invoked multiple times during a single AiService invocation when tools or guardrails exist. Contains information such as the system message and the user message. Not every invocation will receive this event. If an invocation fails it will receive an AiServiceErrorEvent instead. |
AiServiceErrorEvent | Fired when an invocation with an LLM fails. The failure could be because of network failure, AiService unavailable, input/output guardrails blocking the request, or many other reasons. Contains information about the failure that occurred. |
AiServiceCompletedEvent | Invoked when an LLM invocation has completed successfully. Not every invocation will receive this event. If an invocation fails it will receive an AiServiceErrorEvent instead.Contains information about the result of the invocation. |
ToolExecutedEvent | Invoked when a tool invocation has completed. It is important to note that this can be invoked multiple times within a single llm invocation. Contains information about the tool request and result. |
InputGuardrailExecutedEvent | Invoked when an input guardrail validation has been executed. One of these events will be fired for each invocation of a guardrail. Contains information about the input to an individual input guardrail as well as its output (i.e. was it successful or a failure?). |
OutputGuardrailExecutedEvent | Invoked when an output guardrail validation has been executed. One of these events will be fired for each invocation of a guardrail. Contains information about the input to an individual output guardrail as well as its output (i.e. was it successful? failure? a retry? reprompt?). |
Listening for an event
Each of the types of events has its own listener that can be implemented to receive the event. You can pick and choose which events you want to listen for.
To listen for an event, create your own class implementing the listener interface you'd like to listen to. These are the available listener interfaces:
Once you've defined your listener(s), register them when you create your AI Services. There are various registerListener
method variants on the AiServices
class.
For example, you could do the following to create and register a listener for an AiServiceCompletedEvent
:
import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;
import dev.langchain4j.observability.api.AiServiceListenerRegistrar;
import dev.langchain4j.observability.api.event.AiServiceCompletedEvent;
import dev.langchain4j.observability.api.listener.AiServiceCompletedListener;
import dev.langchain4j.invocation.InvocationContext;
public class MyAiServiceCompletedListener implements AiServiceCompletedListener {
@Override
public void onEvent(AiServiceCompletedEvent event) {
InvocationContext invocationContext = event.invocationContext();
Optional<Object> result = event.result();
// The invocationId will be the same for all events related to the same LLM invocation
UUID invocationId = invocationContext.invocationId();
String aiServiceInterfaceName = invocationContext.interfaceName();
String aiServiceMethodName = invocationContext.methodName();
List<Object> aiServiceMethodArgs = invocationContext.methodArguments();
Object chatMemoryId = invocationContext.chatMemoryId();
Instant eventTimestamp = invocationContext.timestamp();
// Do something with the data
}
}
// When creating your AI Service
MyAiServiceCompletedListener myListener = new MyAiServiceCompletedListener();
var myService = AiServices.builder(MyAiService.class)
.chatModel(chatModel) // Could also be .streamingChatModel(...)
.registerListener(myListener)
.build();
Creating your own events and listeners
The AI Service observability capabilities are designed to be extensible. If you'd like to create your own events, you can do so by implementing the AiServiceEvent
interface to define your own event.
Then, create your own event listener by implementing the AiServiceListener
interface.
Once you have your event and listener, you need to fire the event by obtaining/managing an instance of AiServiceListenerRegistrar
and calling the fireEvent(event)
method.
Once the event is getting fired, you can then create listeners and register your listeners just like you would with the built-in events.
Extension points
You can also create your own custom AiServiceListenerRegistrar
by implementing the AiServiceListenerRegistrarFactory
and registering it with the Java Service Provider Interface (Java SPI).
This could be useful if you want to manage the way you register/unregister your listeners and/or how you want to fire your events.
Chat Model Observability
Certain implementations of ChatModel
and StreamingChatModel
(see "Observability" column") allow configuring ChatModelListener
(s) to listen for events such as:
- Requests to the LLM
- Response from the LLM
- Errors
These events include various attributes, as described in the OpenTelemetry Generative AI Semantic Conventions, such as:
- Request:
- Messages
- Model
- Temperature
- Top P
- Max Tokens
- Tools
- Response Format
- etc
- Response:
- Assistant Message
- ID
- Model
- Token Usage
- Finish Reason
- etc
Here is an example of using ChatModelListener
:
ChatModelListener listener = new ChatModelListener() {
@Override
public void onRequest(ChatModelRequestContext requestContext) {
ChatRequest chatRequest = requestContext.chatRequest();
List<ChatMessage> messages = chatRequest.messages();
System.out.println(messages);
ChatRequestParameters parameters = chatRequest.parameters();
System.out.println(parameters.modelName());
System.out.println(parameters.temperature());
System.out.println(parameters.topP());
System.out.println(parameters.topK());
System.out.println(parameters.frequencyPenalty());
System.out.println(parameters.presencePenalty());
System.out.println(parameters.maxOutputTokens());
System.out.println(parameters.stopSequences());
System.out.println(parameters.toolSpecifications());
System.out.println(parameters.toolChoice());
System.out.println(parameters.responseFormat());
if (parameters instanceof OpenAiChatRequestParameters openAiParameters) {
System.out.println(openAiParameters.maxCompletionTokens());
System.out.println(openAiParameters.logitBias());
System.out.println(openAiParameters.parallelToolCalls());
System.out.println(openAiParameters.seed());
System.out.println(openAiParameters.user());
System.out.println(openAiParameters.store());
System.out.println(openAiParameters.metadata());
System.out.println(openAiParameters.serviceTier());
System.out.println(openAiParameters.reasoningEffort());
}
System.out.println(requestContext.modelProvider());
Map<Object, Object> attributes = requestContext.attributes();
attributes.put("my-attribute", "my-value");
}
@Override
public void onResponse(ChatModelResponseContext responseContext) {
ChatResponse chatResponse = responseContext.chatResponse();
AiMessage aiMessage = chatResponse.aiMessage();
System.out.println(aiMessage);
ChatResponseMetadata metadata = chatResponse.metadata();
System.out.println(metadata.id());
System.out.println(metadata.modelName());
System.out.println(metadata.finishReason());
if (metadata instanceof OpenAiChatResponseMetadata openAiMetadata) {
System.out.println(openAiMetadata.created());
System.out.println(openAiMetadata.serviceTier());
System.out.println(openAiMetadata.systemFingerprint());
}
TokenUsage tokenUsage = metadata.tokenUsage();
System.out.println(tokenUsage.inputTokenCount());
System.out.println(tokenUsage.outputTokenCount());
System.out.println(tokenUsage.totalTokenCount());
if (tokenUsage instanceof OpenAiTokenUsage openAiTokenUsage) {
System.out.println(openAiTokenUsage.inputTokensDetails().cachedTokens());
System.out.println(openAiTokenUsage.outputTokensDetails().reasoningTokens());
}
ChatRequest chatRequest = responseContext.chatRequest();
System.out.println(chatRequest);
System.out.println(responseContext.modelProvider());
Map<Object, Object> attributes = responseContext.attributes();
System.out.println(attributes.get("my-attribute"));
}
@Override
public void onError(ChatModelErrorContext errorContext) {
Throwable error = errorContext.error();
error.printStackTrace();
ChatRequest chatRequest = errorContext.chatRequest();
System.out.println(chatRequest);
System.out.println(errorContext.modelProvider());
Map<Object, Object> attributes = errorContext.attributes();
System.out.println(attributes.get("my-attribute"));
}
};
ChatModel model = OpenAiChatModel.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.modelName(GPT_4_O_MINI)
.listeners(List.of(listener))
.build();
model.chat("Tell me a joke about Java");
The attributes
map allows passing information between the onRequest
, onResponse
, and onError
methods of the same
ChatModelListener
, as well as between multiple ChatModelListener
s.
How listeners work
- Listeners are specified as a
List<ChatModelListener>
and are called in the order of iteration. - Listeners are called synchronously and in the same thread. See more details about the streaming case below. The second listener is not called until the first one returns.
- The
ChatModelListener.onRequest()
method is called right before calling the LLM provider API. - The
ChatModelListener.onRequest()
method is called only once per request. If an error occurs while calling the LLM provider API and a retry happens,ChatModelListener.onRequest()
will not be called for every retry. - The
ChatModelListener.onResponse()
method is called only once, immediately after receiving a successful response from the LLM provider. - The
ChatModelListener.onError()
method is called only once. If an error occurs while calling the LLM provider API and a retry happens,ChatModelListener.onError()
will not be called for every retry. - If an exception is thrown from one of the
ChatModelListener
methods, it will be logged at theWARN
level. The execution of subsequent listeners will continue as usual. - The
ChatRequest
provided viaChatModelRequestContext
,ChatModelResponseContext
, andChatModelErrorContext
is the final request, containing both the defaultChatRequestParameters
configured on theChatModel
and the request-specificChatRequestParameters
merged together. - For
StreamingChatModel
, theChatModelListener.onResponse()
andChatModelListener.onError()
are called on a different thread than theChatModelListener.onRequest()
. The thread context is currently not propagated automatically, so you might want to use theattributes
map to propagate any necessary data fromChatModelListener.onRequest()
toChatModelListener.onResponse()
orChatModelListener.onError()
. - For
StreamingChatModel
, theChatModelListener.onResponse()
is called before theStreamingChatResponseHandler.onCompleteResponse()
is called. TheChatModelListener.onError()
is called before theStreamingChatResponseHandler.onError()
is called.
Observability in Spring Boot Application
See more details here.
Third-party Integrations
OpenTelemetry GenAI instrumentation
The community-maintained otel-genai-bridges project ships a Spring Boot starter that auto-instruments LangChain4j chat applications using the OpenTelemetry Generative AI semantic conventions.
Why use it?
- Wraps any
ChatLanguageModel
bean and emits spans, events, and metrics. - Captures prompts, completions, tool calls, latency, token usage, cost, and RAG retrieval latency out of the box.
- Provides Docker Compose samples (Collector → Tempo/Prometheus → Grafana) with prebuilt Grafana dashboards.
Getting started
Add the starter to your Spring Boot project:
<!-- pom.xml -->
<dependency>
<groupId>com.dineshkumarkummara.otel</groupId>
<artifactId>langchain4j-otel</artifactId>
<version>0.1.0-SNAPSHOT</version>
</dependency>
Enable the starter via application.yaml
:
otel:
langchain4j:
enabled: true
system: openai
default-model: gpt-4o
capture-prompts: true
capture-completions: true
cost:
enabled: true
input-per-thousand: 0.0005
output-per-thousand: 0.0015
The nested cost
stanza is optional; include it when you want cost-per-token metrics.
With the dependency on the classpath, the starter locates ChatLanguageModel
beans automatically and wraps them with telemetry.
Observability view
For a full working example (including the observability stack and Semantic Kernel parity), see dineshkumarkummara/otel-genai-bridges.