Observability

AI Service Observability

note

AI Service observability is an experimental feature. Its API and behavior may change in future versions.

AI Service observability mechanisms allow users to track what is happening during an AiService invocation. A single invocation may involve multiple LLM invocations, any of which may succeed or fail. AI Service observability allows users to track the full sequence of invocations and their outcomes.

note

The AI Service observability capabilities are only available when using AI Services. They are a higher-level construct that can not be applied to a ChatModel or StreamingChatModel.

The implementation was originally implemented in the Quarkus LangChain4j extension and was backported here.

Types of events

Each type of event has a unique identifier, which can be used to correlate events across multiple invocations. Each type of event includes information encapsulated inside an InvocationContext.

The following types of events are currently available:

Event Name	Description
`AiServiceStartedEvent`	Invoked when an LLM invocation has started.
`AiServiceResponseReceivedEvent`	Invoked with a response from an LLM. It is important to note that this can be invoked multiple times during a single AiService invocation when tools or guardrails exist. Contains information such as the system message and the user message. Not every invocation will receive this event. If an invocation fails it will receive an `AiServiceErrorEvent` instead.
`AiServiceErrorEvent`	Fired when an invocation with an LLM fails. The failure could be because of network failure, AiService unavailable, input/output guardrails blocking the request, or many other reasons. Contains information about the failure that occurred.
`AiServiceCompletedEvent`	Invoked when an LLM invocation has completed successfully. Not every invocation will receive this event. If an invocation fails it will receive an `AiServiceErrorEvent` instead. Contains information about the result of the invocation.
`ToolExecutedEvent`	Invoked when a tool invocation has completed. It is important to note that this can be invoked multiple times within a single LLM invocation. Contains information about the tool request and result.
`InputGuardrailExecutedEvent`	Invoked when an input guardrail validation has been executed. One of these events will be fired for each invocation of a guardrail. Contains information about the input to an individual input guardrail as well as its output (i.e. was it successful or a failure?).
`OutputGuardrailExecutedEvent`	Invoked when an output guardrail validation has been executed. One of these events will be fired for each invocation of a guardrail. Contains information about the input to an individual output guardrail as well as its output (i.e. was it successful? failure? a retry? reprompt?).

Listening for an event

Each of the types of events has its own listener that can be implemented to receive the event. You can pick and choose which events you want to listen for.

To listen for an event, create your own class implementing the listener interface you'd like to listen to. These are the available listener interfaces:

Listener Name	Event
`AiServiceStartedListener`	`AiServiceStartedEvent`
`AiServiceResponseReceivedListener`	`AiServiceResponseReceivedEvent`
`AiServiceErrorListener`	`AiServiceErrorEvent`
`AiServiceCompletedListener`	`AiServiceCompletedEvent`
`ToolExecutedEventListener`	`ToolExecutedEvent`
`InputGuardrailExecutedListener`	`InputGuardrailExecutedEvent`
`OutputGuardrailExecutedListener`	`OutputGuardrailExecutedEvent`

Once you've defined your listener(s), register them when you create your AI Services. There are various registerListener method variants on the AiServices class.

For example, you could do the following to create and register a listener for an AiServiceCompletedEvent:

import java.time.Instant;
import java.util.List;
import java.util.Optional;
import java.util.UUID;

import dev.langchain4j.observability.api.AiServiceListenerRegistrar;
import dev.langchain4j.observability.api.event.AiServiceCompletedEvent;
import dev.langchain4j.observability.api.listener.AiServiceCompletedListener;
import dev.langchain4j.invocation.InvocationContext;

public class MyAiServiceCompletedListener implements AiServiceCompletedListener {
    @Override
    public void onEvent(AiServiceCompletedEvent event) {
        InvocationContext invocationContext = event.invocationContext();
        Optional<Object> result = event.result();

        // The invocationId will be the same for all events related to the same LLM invocation
        UUID invocationId = invocationContext.invocationId();
        String aiServiceInterfaceName = invocationContext.interfaceName();
        String aiServiceMethodName = invocationContext.methodName();
        List<Object> aiServiceMethodArgs = invocationContext.methodArguments();
        Object chatMemoryId = invocationContext.chatMemoryId();
        Instant eventTimestamp = invocationContext.timestamp();

        // Do something with the data
    }
}

// When creating your AI Service
MyAiServiceCompletedListener myListener = new MyAiServiceCompletedListener();

var myService = AiServices.builder(MyAiService.class)
        .chatModel(chatModel)  // Could also be .streamingChatModel(...)
        .registerListener(myListener)
        .build();

Creating your own events and listeners

The AI Service observability capabilities are designed to be extensible. If you'd like to create your own events, you can do so by implementing the AiServiceEvent interface to define your own event.

Then, create your own event listener by implementing the AiServiceListener interface.

Once you have your event and listener, you need to fire the event by obtaining/managing an instance of AiServiceListenerRegistrar and calling the fireEvent(event) method.

Once the event is getting fired, you can then create listeners and register your listeners just like you would with the built-in events.

Extension points

You can also create your own custom AiServiceListenerRegistrar by implementing the AiServiceListenerRegistrarFactory and registering it with the Java Service Provider Interface (Java SPI).

This could be useful if you want to manage the way you register/unregister your listeners and/or how you want to fire your events.

Chat Model Observability

Certain implementations of ChatModel and StreamingChatModel (see "Observability" column) allow configuring ChatModelListener(s) to listen for events such as:

Requests to the LLM
Response from the LLM
Errors

These events include various attributes, as described in the OpenTelemetry Generative AI Semantic Conventions, such as:

Request:
- Messages
- Model
- Temperature
- Top P
- Max Tokens
- Tools
- Response Format
- etc
Response:
- Assistant Message
- ID
- Model
- Token Usage
- Finish Reason
- etc

Here is an example of using ChatModelListener:

ChatModelListener listener = new ChatModelListener() {

    @Override
    public void onRequest(ChatModelRequestContext requestContext) {
        ChatRequest chatRequest = requestContext.chatRequest();

        List<ChatMessage> messages = chatRequest.messages();
        System.out.println(messages);

        ChatRequestParameters parameters = chatRequest.parameters();
        System.out.println(parameters.modelName());
        System.out.println(parameters.temperature());
        System.out.println(parameters.topP());
        System.out.println(parameters.topK());
        System.out.println(parameters.frequencyPenalty());
        System.out.println(parameters.presencePenalty());
        System.out.println(parameters.maxOutputTokens());
        System.out.println(parameters.stopSequences());
        System.out.println(parameters.toolSpecifications());
        System.out.println(parameters.toolChoice());
        System.out.println(parameters.responseFormat());

        if (parameters instanceof OpenAiChatRequestParameters openAiParameters) {
            System.out.println(openAiParameters.maxCompletionTokens());
            System.out.println(openAiParameters.logitBias());
            System.out.println(openAiParameters.parallelToolCalls());
            System.out.println(openAiParameters.seed());
            System.out.println(openAiParameters.user());
            System.out.println(openAiParameters.store());
            System.out.println(openAiParameters.metadata());
            System.out.println(openAiParameters.serviceTier());
            System.out.println(openAiParameters.reasoningEffort());
        }

        System.out.println(requestContext.modelProvider());

        Map<Object, Object> attributes = requestContext.attributes();
        attributes.put("my-attribute", "my-value");
    }

    @Override
    public void onResponse(ChatModelResponseContext responseContext) {
        ChatResponse chatResponse = responseContext.chatResponse();

        AiMessage aiMessage = chatResponse.aiMessage();
        System.out.println(aiMessage);

        ChatResponseMetadata metadata = chatResponse.metadata();
        System.out.println(metadata.id());
        System.out.println(metadata.modelName());
        System.out.println(metadata.finishReason());

        if (metadata instanceof OpenAiChatResponseMetadata openAiMetadata) {
            System.out.println(openAiMetadata.created());
            System.out.println(openAiMetadata.serviceTier());
            System.out.println(openAiMetadata.systemFingerprint());
        }

        TokenUsage tokenUsage = metadata.tokenUsage();
        System.out.println(tokenUsage.inputTokenCount());
        System.out.println(tokenUsage.outputTokenCount());
        System.out.println(tokenUsage.totalTokenCount());
        if (tokenUsage instanceof OpenAiTokenUsage openAiTokenUsage) {
            System.out.println(openAiTokenUsage.inputTokensDetails().cachedTokens());
            System.out.println(openAiTokenUsage.outputTokensDetails().reasoningTokens());
        }

        ChatRequest chatRequest = responseContext.chatRequest();
        System.out.println(chatRequest);

        System.out.println(responseContext.modelProvider());

        Map<Object, Object> attributes = responseContext.attributes();
        System.out.println(attributes.get("my-attribute"));
    }

    @Override
    public void onError(ChatModelErrorContext errorContext) {
        Throwable error = errorContext.error();
        error.printStackTrace();

        ChatRequest chatRequest = errorContext.chatRequest();
        System.out.println(chatRequest);

        System.out.println(errorContext.modelProvider());

        Map<Object, Object> attributes = errorContext.attributes();
        System.out.println(attributes.get("my-attribute"));
    }
};

ChatModel model = OpenAiChatModel.builder()
        .apiKey(System.getenv("OPENAI_API_KEY"))
        .modelName(GPT_4_O_MINI)
        .listeners(List.of(listener))
        .build();

model.chat("Tell me a joke about Java");

The attributes map allows passing information between the onRequest, onResponse, and onError methods of the same ChatModelListener, as well as between multiple ChatModelListeners.

How listeners work

Listeners are specified as a List<ChatModelListener> and are called in the order of iteration.
Listeners are called synchronously and in the same thread. See more details about the streaming case below. The second listener is not called until the first one returns.
The ChatModelListener.onRequest() method is called right before calling the LLM provider API.
The ChatModelListener.onRequest() method is called only once per request. If an error occurs while calling the LLM provider API and a retry happens, ChatModelListener.onRequest() will not be called for every retry.
The ChatModelListener.onResponse() method is called only once, immediately after receiving a successful response from the LLM provider.
The ChatModelListener.onError() method is called only once. If an error occurs while calling the LLM provider API and a retry happens, ChatModelListener.onError() will not be called for every retry.
If an exception is thrown from one of the ChatModelListener methods, it will be logged at the WARN level. The execution of subsequent listeners will continue as usual.
The ChatRequest provided via ChatModelRequestContext, ChatModelResponseContext, and ChatModelErrorContext is the final request, containing both the default ChatRequestParameters configured on the ChatModel and the request-specific ChatRequestParameters merged together.
For StreamingChatModel, the ChatModelListener.onResponse() and ChatModelListener.onError() are called on a different thread than the ChatModelListener.onRequest(). The thread context is currently not propagated automatically, so you might want to use the attributes map to propagate any necessary data from ChatModelListener.onRequest() to ChatModelListener.onResponse() or ChatModelListener.onError().
For StreamingChatModel, the ChatModelListener.onResponse() is called before the StreamingChatResponseHandler.onCompleteResponse() is called. The ChatModelListener.onError() is called before the StreamingChatResponseHandler.onError() is called.

Observability in Spring Boot Application

See more details here.

Third-party Integrations

Arize Phoenix

OpenTelemetry GenAI instrumentation

The community-maintained otel-genai-bridges project ships a Spring Boot starter that auto-instruments LangChain4j chat applications using the OpenTelemetry Generative AI semantic conventions.

Why use it?

Wraps any ChatLanguageModel bean and emits spans, events, and metrics.
Captures prompts, completions, tool calls, latency, token usage, cost, and RAG retrieval latency out of the box.
Provides Docker Compose samples (Collector → Tempo/Prometheus → Grafana) with prebuilt Grafana dashboards.

Getting started

Add the starter to your Spring Boot project:

<!-- pom.xml -->
<dependency>
  <groupId>com.dineshkumarkummara.otel</groupId>
  <artifactId>langchain4j-otel</artifactId>
  <version>0.1.0-SNAPSHOT</version>
</dependency>

Enable the starter via application.yaml:

otel:
  langchain4j:
    enabled: true
    system: openai
    default-model: gpt-4o
    capture-prompts: true
    capture-completions: true
    cost:
      enabled: true
      input-per-thousand: 0.0005
      output-per-thousand: 0.0015

The nested cost stanza is optional; include it when you want cost-per-token metrics.

With the dependency on the classpath, the starter locates ChatLanguageModel beans automatically and wraps them with telemetry.

Observability view

Grafana latency panel

For a full working example (including the observability stack and Semantic Kernel parity), see dineshkumarkummara/otel-genai-bridges.

AI Service Observability​

Types of events​

Listening for an event​

Creating your own events and listeners​

Extension points​

Chat Model Observability​

How listeners work​

Observability in Spring Boot Application​

Third-party Integrations​

OpenTelemetry GenAI instrumentation​

Why use it?​

Getting started​

Observability view​