Class LanguageModelSqlFilterBuilder

java.lang.Object
dev.langchain4j.store.embedding.filter.builder.sql.LanguageModelSqlFilterBuilder

public class LanguageModelSqlFilterBuilder extends Object
Given a natural language Query, this class creates a suitable Filter using a language model.
This approach is also known as self-querying.
It is useful for improving retrieval from an EmbeddingStore by narrowing down the search space.
For instance, if you have internal company documentation for multiple products in the same EmbeddingStore and want to search the documentation of a specific product without forcing the user to specify the Filter manually, you could use LanguageModelSqlFilterBuilder to automatically create the filter using a language model.

First, describe the Metadata of your TextSegment as if it were an SQL table using TableDefinition:
 TableDefinition tableDefinition = TableDefinition.builder()
     .name("documentation") // table name
     .addColumn("product", "VARCHAR", "one of [iPhone, iPad, MacBook]") // column name, column type, comment
     ... other relevant metadata keys (columns) ...
     .build();
 
Then, create a LanguageModelSqlFilterBuilder by providing a language model and a TableDefinition, and use it with EmbeddingStoreContentRetriever:
 LanguageModelSqlFilterBuilder sqlFilterBuilder = new LanguageModelSqlFilterBuilder(model, tableDefinition);
 ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
                 .embeddingStore(embeddingStore)
                 .embeddingModel(embeddingModel)
                 .dynamicFilter(sqlFilterBuilder::build)
                 .build();
 
When the user asks, for example, "How to make the screen of my phone brighter?", the language model will generate an SQL query like SELECT * from documentation WHERE product = 'iPhone'.
Then, SqlFilterParser will parse the generated SQL into the following Filter object: metadataKey("product").isEqualTo("iPhone").
This filter will be applied during similarity search in the EmbeddingStore. This means that only those TextSegments with a Metadata entry product = "iPhone" will be considered for the search.

It is recommended to use a capable language model, such as gpt-3.5-turbo, or the smaller one but fine-tuned for the text-to-SQL task, such as SQLCoder. SQLCoder is also available via Ollama.
The default PromptTemplate in this class is suited for SQLCoder, but should work fine with capable language models like gpt-3.5-turbo and better.
You can override the default PromptTemplate using builder.

In case SQL parsing fails (e.g., the generated SQL is invalid or contains text in addition to the SQL statement), LanguageModelSqlFilterBuilder will first try to extract the valid SQL from the input string. If parsing fails again, it will return null, meaning no filtering will be applied during the search.