RAG (Retrieval-Augmented Generation)
LLM's knowledge is limited to the data it has been trained on. If you want to make an LLM aware of domain-specific knowledge or proprietary data, you can:
- Use RAG, which we will cover in this section
- Fine-tune the LLM with your data
- Combine both RAG and fine-tuning
What is RAG?
Simply put, RAG is the way to find and inject relevant pieces of information from your data into the prompt before sending it to the LLM. This way LLM will get (hopefully) relevant information and will be able to reply using this information, which should reduce the probability of hallucinations.
Relevant pieces of information can be found using various information retrieval methods. The most popular are:
- Full-text (keyword) search. This method uses techniques like TF-IDF and BM25 to search documents by matching the keywords in a query (e.g., what the user is asking) against a database of documents. It ranks results based on the frequency and relevance of these keywords in each document.
- Vector search, also known as "semantic search". Text documents are converted into vectors of numbers using embedding models. It then finds and ranks documents based on the cosine similarity or other similarity/distance measures between the query vector and document vectors, thus capturing deeper semantic meanings.
- Hybrid. Combining multiple search methods (e.g., full-text + vector) usually improves the effectiveness of the search.
Currently, this page focuses mostly on vector search.
Full-text and hybrid search are currently supported only by Azure AI Search integration and Elasticsearch,
see AzureAiSearchContentRetriever and ElasticsearchContentRetriever for more details.
We plan to expand the RAG toolbox to include full-text and hybrid search in the near future.
RAG Stages
The RAG process is divided into 2 distinct stages: indexing and retrieval. LangChain4j provides tooling for both stages.
Indexing
During the indexing stage, documents are pre-processed in a way that enables efficient search during the retrieval stage.
This process can vary depending on the information retrieval method used. For vector search, this typically involves cleaning the documents, enriching them with additional data and metadata, splitting them into smaller segments (aka chunking), embedding these segments, and finally storing them in an embedding store (aka vector database).
The indexing stage usually occurs offline, meaning it does not require end users to wait for its completion. This can be achieved through, for example, a cron job that re-indexes internal company documentation once a week during the weekend. The code responsible for indexing can also be a separate application that only handles indexing tasks.
However, in some scenarios, end users may want to upload their custom documents to make them accessible to the LLM. In this case, indexing should be performed online and be a part of the main application.
Here is a simplified diagram of the indexing stage:

Retrieval
The retrieval stage usually occurs online, when a user submits a question that should be answered using the indexed documents.
This process can vary depending on the information retrieval method used. For vector search, this typically involves embedding the user's query (question) and performing a similarity search in the embedding store. Relevant segments (pieces of the original documents) are then injected into the prompt and sent to the LLM.
Here is a simplified diagram of the retrieval stage:

RAG Flavours in LangChain4j
LangChain4j offers three flavors of RAG:
- Easy RAG: the easiest way to start with RAG
- Naive RAG: a basic implementation of RAG using vector search
- Advanced RAG: a modular RAG framework that allows for additional steps such as query transformation, retrieval from multiple sources, and re-ranking