Skip to main content

Cohere

note

This is the documentation for the community Cohere chat model integration.

It is implemented based on Cohere's V2 Chat API.

Maven Dependency

1.0.0-alpha1 and later:

<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-community-cohere</artifactId>
<version>${latest version here}</version>
</dependency>

Or, you can use BOM to manage dependencies consistently:

<dependencyManagement>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-community-bom</artifactId>
<version>${latest version here}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencyManagement>

Chat Model Support

You can instantiate CohereChatModel using the following code:

ChatModel model = CohereChatModel.builder()
.apiKey(System.getenv("CO_API_KEY"))
.modelName("command-r7b-12-2024")
.logRequests(true)
.logResponses(true)
.build();

For streamed responses, use CohereStreamingChatModel:

StreamingChatModel streamingModel = CohereStreamingChatModel.builder()
.apiKey(System.getenv("CO_API_KEY"))
.modelName("command-r7b-12-2024")
.logRequests(true)
.logResponses(true)
.build();

Configurable Parameters

CohereChatModel and CohereStreamingChatModel accept the following parameters:

PropertyDescriptionDefault Value
baseUrlThe URL to connect to the Cohere API.https://api.cohere.com/v2/
apiKeyThe API Key.
modelNameThe model to use, e.g. command-r7b-12-2024 or command-r-plus.
timeoutHTTP client timeout for requests.
maxRetriesMaximum number of retries per request. Only available on CohereChatModel.3
temperatureSampling temperature.
topPNucleus sampling threshold.
topKLimits sampling to the topK most likely tokens at each step.
frequencyPenaltyPenalty for tokens based on how often they have appeared.
presencePenaltyPenalty for tokens that have appeared at least once.
maxTokensThe maximum number of tokens returned by this request.
stopSequencesSequences that cause the model to stop generating further text.
toolSpecificationsTool (function) definitions the model can call.
toolChoiceA ToolChoice controlling how the model selects tools. Possible values: AUTO, REQUIRED.
responseFormatThe response format, e.g. TEXT or JSON.
thinkingTypeA CohereThinkingType enabling or disabling extended thinking for reasoning-capable models.
thinkingTokenBudgetMaximum tokens the model may spend on internal thinking.
safetyModeA CohereSafetyMode inserted into the prompt. Possible values: CONTEXTUAL, STRICT, OFF.
priorityRequest priority when the Cohere API is under load.
seedIf set, the model samples tokens deterministically.
logprobsWhether to include token log probabilities in the response.
strictToolsWhether to enforce strict adherence to tool definitions.
defaultRequestParametersDefault ChatRequestParameters applied to every request.
listenersListeners that listen for request, response and errors.
logRequestsWhether to log request or not.false
logResponsesWhether to log response or not.false

Response Metadata

You can access Cohere-specific response metadata:

ChatResponse response = model.chat(UserMessage.from("Hello"));
CohereChatResponseMetadata metadata = (CohereChatResponseMetadata) response.metadata();

List<CohereLogprobs> logprobs = metadata.logprobs();
CohereBilledUnits billedUnits = metadata.billedUnits();
Integer cachedTokens = metadata.cachedTokens();
PropertyDescription
logprobsLog probabilities for generated tokens. Returned when logprobs is enabled.
billedUnitsBilling breakdown for the request (input tokens, output tokens, search units, classifications).
cachedTokensNumber of tokens served from Cohere's prompt cache.

Examples