Class EmbeddingModelTextClassifier<L>

java.lang.Object
dev.langchain4j.classification.EmbeddingModelTextClassifier<L>
Type Parameters:
L - The type of the label (e.g., String, Enum, etc.)
All Implemented Interfaces:
TextClassifier<L>

public class EmbeddingModelTextClassifier<L> extends Object implements TextClassifier<L>
A TextClassifier that uses an EmbeddingModel and predefined examples to perform classification. Classification is done by comparing the embedding of the text being classified with the embeddings of predefined examples. The classification quality improves with a greater number of examples for each label. Examples can be easily generated with the help of an LLM.

Example:


 enum Sentiment {
     POSITIVE, NEUTRAL, NEGATIVE
 }

  Map<Sentiment, List<String>> examples = Map.of(
     POSITIVE, List.of("This is great!", "Wow, awesome!"),
     NEUTRAL,  List.of("Well, it's fine", "It's ok"),
     NEGATIVE, List.of("It is pretty bad", "Worst experience ever!")
 );

 EmbeddingModel embeddingModel = new AllMiniLmL6V2QuantizedEmbeddingModel();

 TextClassifier<Sentiment> classifier = new EmbeddingModelTextClassifier<>(embeddingModel, examples);

 List<Sentiment> sentiments = classifier.classify("Awesome!");
 System.out.println(sentiments); // [POSITIVE]
 
  • Constructor Details

    • EmbeddingModelTextClassifier

      public EmbeddingModelTextClassifier(EmbeddingModel embeddingModel, Map<L,? extends Collection<String>> examplesByLabel)
      Creates a classifier with the default values for maxResults (1), minScore (0) and meanToMaxScoreRatio (0.5).
      Parameters:
      embeddingModel - The embedding model used for embedding both the examples and the text to be classified.
      examplesByLabel - A map containing examples of texts for each label. The more examples, the better. Examples can be easily generated by the LLM.
    • EmbeddingModelTextClassifier

      public EmbeddingModelTextClassifier(EmbeddingModel embeddingModel, Map<L,? extends Collection<String>> examplesByLabel, int maxResults, double minScore, double meanToMaxScoreRatio)
      Creates a classifier.
      Parameters:
      embeddingModel - The embedding model used for embedding both the examples and the text to be classified.
      examplesByLabel - A map containing examples of texts for each label. The more examples, the better. Examples can be easily generated by the LLM.
      maxResults - The maximum number of labels to return for each classification.
      minScore - The minimum similarity score required for classification, in the range [0..1]. Labels scoring lower than this value will be discarded.
      meanToMaxScoreRatio - A ratio, in the range [0..1], between the mean and max scores used for calculating the final score. During classification, the embeddings of examples for each label are compared to the embedding of the text being classified. This results in two metrics: the mean and max scores. The mean score is the average similarity score for all examples associated with a given label. The max score is the highest similarity score, corresponding to the example most similar to the text being classified. A value of 0 means that only the mean score will be used for ranking labels. A value of 0.5 means that both scores will contribute equally to the final score. A value of 1 means that only the max score will be used for ranking labels.
  • Method Details

    • classifyWithScores

      public ClassificationResult<L> classifyWithScores(String text)
      Description copied from interface: TextClassifier
      Classifies the given text and returns labels with scores.
      Specified by:
      classifyWithScores in interface TextClassifier<L>
      Parameters:
      text - Text to classify.
      Returns:
      a result object containing a list of labels with corresponding scores. Can contain zero, one, or multiple labels.