In-process (ONNX)

LangChain4j provides a few popular local embedding models packaged as maven dependencies. They are powered by ONNX runtime and are running in the same java process.

Each model is provided in 2 flavours: original and quantized (has a -q suffix in maven artifact name and Quantized in the class name).

For example:

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-embeddings-all-minilm-l6-v2</artifactId>
    <version>1.3.0-beta9</version>
</dependency>

EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel();
Response<Embedding> response = embeddingModel.embed("test");
Embedding embedding = response.content();

Or quantized:

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-embeddings-all-minilm-l6-v2-q</artifactId>
    <version>1.3.0-beta9</version>
</dependency>

EmbeddingModel embeddingModel = new AllMiniLmL6V2QuantizedEmbeddingModel();
Response<Embedding> response = embeddingModel.embed("test");
Embedding embedding = response.content();

The complete list of all embedding models can be found here.

Parallelization

By default, the embedding process is parallelized using all available CPU cores, so each TextSegment is embedded in a separate thread.

The parallelization is done by using an Executor. By default, in-process embedding models use a cached thread pool with the number of threads equal to the number of available processors. Threads are cached for 1 second.

You can provide a custom instance of the Executor when creating a model:

Executor = ...;
EmbeddingModel embeddingModel = new AllMiniLmL6V2QuantizedEmbeddingModel(executor);

Embedding using GPU is not supported yet.

Custom models

Many models (e.g., from Hugging Face) can be used, as long as they are in the ONNX format.

Information on how to convert models into ONNX format can be found here.

Many models already converted to ONNX format are available here.

Example of using custom embedding model:

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-embeddings</artifactId>
    <version>1.3.0-beta9</version>
</dependency>

String pathToModel = "/home/langchain4j/model.onnx";
String pathToTokenizer = "/home/langchain4j/tokenizer.json";
PoolingMode poolingMode = PoolingMode.MEAN;
EmbeddingModel embeddingModel = new OnnxEmbeddingModel(pathToModel, pathToTokenizer, poolingMode);

Response<Embedding> response = embeddingModel.embed("test");
Embedding embedding = response.content();

Examples

InProcessEmbeddingModelExamples

Parallelization​

Custom models​

Examples​

Parallelization

Custom models

Examples