Class OpenAiTokenizer

java.lang.Object
dev.langchain4j.model.openai.OpenAiTokenizer
All Implemented Interfaces:
Tokenizer

public class OpenAiTokenizer extends Object implements Tokenizer
This class can be used to estimate the cost (in tokens) before calling OpenAI or when using streaming. Magic numbers present in this class were found empirically while testing. There are integration tests in place that are making sure that the calculations here are very close to that of OpenAI.
  • Constructor Details

    • OpenAiTokenizer

      @Deprecated(forRemoval=true) public OpenAiTokenizer()
      Deprecated, for removal: This API element is subject to removal in a future version.
      Please use other constructors and specify the model name explicitly.
      Creates an instance of the OpenAiTokenizer for the "gpt-3.5-turbo" model. It should be suitable for all current OpenAI models, as they all use the same cl100k_base encoding.
    • OpenAiTokenizer

      public OpenAiTokenizer(OpenAiChatModelName modelName)
      Creates an instance of the OpenAiTokenizer for a given OpenAiChatModelName.
    • OpenAiTokenizer

      public OpenAiTokenizer(OpenAiEmbeddingModelName modelName)
      Creates an instance of the OpenAiTokenizer for a given OpenAiEmbeddingModelName.
    • OpenAiTokenizer

      public OpenAiTokenizer(OpenAiLanguageModelName modelName)
      Creates an instance of the OpenAiTokenizer for a given OpenAiLanguageModelName.
    • OpenAiTokenizer

      public OpenAiTokenizer(String modelName)
      Creates an instance of the OpenAiTokenizer for a given model name.
  • Method Details

    • estimateTokenCountInText

      public int estimateTokenCountInText(String text)
      Description copied from interface: Tokenizer
      Estimates the count of tokens in the given text.
      Specified by:
      estimateTokenCountInText in interface Tokenizer
      Parameters:
      text - the text.
      Returns:
      the estimated count of tokens.
    • estimateTokenCountInMessage

      public int estimateTokenCountInMessage(ChatMessage message)
      Description copied from interface: Tokenizer
      Estimates the count of tokens in the given message.
      Specified by:
      estimateTokenCountInMessage in interface Tokenizer
      Parameters:
      message - the message.
      Returns:
      the estimated count of tokens.
    • estimateTokenCountInMessages

      public int estimateTokenCountInMessages(Iterable<ChatMessage> messages)
      Description copied from interface: Tokenizer
      Estimates the count of tokens in the given messages.
      Specified by:
      estimateTokenCountInMessages in interface Tokenizer
      Parameters:
      messages - the messages.
      Returns:
      the estimated count of tokens.
    • encode

      public List<Integer> encode(String text)
    • encode

      public List<Integer> encode(String text, int maxTokensToEncode)
    • decode

      public String decode(List<Integer> tokens)