RAG with Spring AI: From Naive to Advanced

In the previous article, we built the ingestion pipeline to prepare our data in a vector database. Now it's time to query it. But not all retrieval approaches are equal.

In this article, we will compare two approaches:

Naive RAG: direct similarity search + prompt injection
Advanced RAG: pre-retrieval strategies (query rewriting, expansion, compression)

A- Naive RAG: Direct Similarity

The naive-rag module implements the simplest approach: searching for documents most similar to the user's question and injecting them directly into the prompt.

NaiveService Code

@Service
public class NaiveService {
 
    private final ChatClient chatClient;
    private final VectorStore vectorStore;
 
    public NaiveService(ChatClient.Builder chatClientBuilder,
                        VectorStore vectorStore) {
        this.vectorStore = vectorStore;
        this.chatClient = chatClientBuilder
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
    }
 
    public String rag(String question) {
        // 1 - Search for similar documents
        var context = vectorStore.similaritySearch(
            SearchRequest.builder()
                .query(question)
                .similarityThreshold(0.0)
                .topK(2)
                .build());
 
        // 2 - Build prompt with context
        var systemMessage = new SystemPromptTemplate("""
            Context information is below.
            CONTEXT: {context}
            Given the context information and not prior knowledge,
            answer the question in the same language.
            QUESTION: {question}
            """).createMessage(
                Map.of("question", question, "context", context));
 
        var userMessage = new UserMessage(question);
        var prompt = new Prompt(List.of(systemMessage, userMessage));
 
        return chatClient.prompt(prompt).call().content();
    }
}

How It Works

Similarity search: vectorStore.similaritySearch() compares the question's vector with stored vectors and returns the topK closest documents
Prompt construction: found documents are injected into a SystemPromptTemplate as context
LLM call: the model generates its response based on the provided context

Search Parameters

Parameter	Description	Value
`query`	The user's question	Free text
`similarityThreshold`	Minimum similarity threshold (0.0 to 1.0)	`0.0` (no filter)
`topK`	Maximum number of results	`2`

Limitations of Naive RAG

Quality depends directly on question phrasing
Vague or poorly worded questions yield poor results
No query transformation or optimization before search

B- Advanced RAG: Pre-Retrieval Strategies

The advanced-rag module uses Spring AI's RetrievalAugmentationAdvisor with three pre-retrieval strategies to improve result quality.

The RetrievalAugmentationAdvisor

This is a dedicated RAG Advisor that automatically orchestrates:

Query transformation (pre-retrieval)
Vector store search
Context injection into the prompt

Additional Dependencies

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-rag</artifactId>
</dependency>

Strategy 1: Query Rewriting (RewriteQueryTransformer)

public String withQueryRewrite(String input) {
    var advisor = RetrievalAugmentationAdvisor.builder()
            .queryTransformers(
                RewriteQueryTransformer.builder()
                    .chatClientBuilder(chatClient.mutate())
                    .promptTemplate(new PromptTemplate(rewritePrompt))
                    .build())
            .documentRetriever(
                VectorStoreDocumentRetriever.builder()
                    .vectorStore(vectorStore)
                    .build())
            .build();
 
    return chatClient.prompt()
            .advisors(advisor)
            .user(input)
            .call()
            .content();
}

The RewriteQueryTransformer uses an LLM to reformulate the user's question into a version better suited for vector search. For example:

Original: "what's that Italian thing with rice?"
Rewritten: "Italian risotto recipe with parmesan"

Strategy 2: Query Expansion (MultiQueryExpander)

public String withQueryExpansion(String input) {
    var advisor = RetrievalAugmentationAdvisor.builder()
            .queryExpander(
                MultiQueryExpander.builder()
                    .chatClientBuilder(chatClient.mutate())
                    .build())
            .documentRetriever(
                VectorStoreDocumentRetriever.builder()
                    .vectorStore(vectorStore)
                    .build())
            .build();
 
    return chatClient.prompt()
            .advisors(advisor)
            .user(input)
            .call()
            .content();
}

The MultiQueryExpander generates multiple variants of the original question, performs a search for each, then merges the results. This increases search coverage:

Original: "African recipe"
Variant 1: "traditional dish from Africa"
Variant 2: "popular African cuisine"
Variant 3: "typical recipe from the African continent"

Strategy 3: Query Compression (CompressionQueryTransformer)

public String withQueryCompression(String input) {
    var advisor = RetrievalAugmentationAdvisor.builder()
            .queryTransformers(
                CompressionQueryTransformer.builder()
                    .chatClientBuilder(chatClient.mutate())
                    .build())
            .documentRetriever(
                VectorStoreDocumentRetriever.builder()
                    .vectorStore(vectorStore)
                    .build())
            .build();
 
    return chatClient.prompt()
            .advisors(advisor)
            .user(input)
            .call()
            .content();
}

The CompressionQueryTransformer condenses a multi-turn conversation into a single standalone query, useful when the user refers to previous messages:

History: "Tell me about African dishes" → "The one from Senegal"
Compressed: "Thiéboudienne recipe from Senegal"

C- Naive vs Advanced Comparison

Aspect	Naive RAG	Advanced RAG
Query transformation	None	Rewriting, expansion, compression
Result quality	Depends on phrasing	Automatically optimized
Cost (LLM calls)	1 call	2+ calls (transformation + response)
Code complexity	Simple	Moderate (but encapsulated by Advisor)
Use case	Precise queries	Vague, conversational queries

D- Combining Strategies

The RetrievalAugmentationAdvisor allows combining multiple transformers and expanders:

var advisor = RetrievalAugmentationAdvisor.builder()
        .queryTransformers(
            RewriteQueryTransformer.builder()
                .chatClientBuilder(chatClient.mutate())
                .build(),
            CompressionQueryTransformer.builder()
                .chatClientBuilder(chatClient.mutate())
                .build())
        .queryExpander(
            MultiQueryExpander.builder()
                .chatClientBuilder(chatClient.mutate())
                .build())
        .documentRetriever(
            VectorStoreDocumentRetriever.builder()
                .vectorStore(vectorStore)
                .build())
        .build();

Transformers execute sequentially (rewriting then compression), then the expander generates variants, and finally the retriever performs the searches.

E- The chatClient.mutate() Pattern

You may have noticed the use of chatClient.mutate() in the transformer builders:

RewriteQueryTransformer.builder()
    .chatClientBuilder(chatClient.mutate())
    .build()

The mutate() method creates a new ChatClient.Builder from an existing ChatClient, inheriting its configuration. This allows transformers to use the same AI model as the main ChatClient.

Conclusion

Spring AI's advanced RAG transforms vague queries into precise searches, significantly improving response quality. Pre-retrieval strategies are encapsulated in the RetrievalAugmentationAdvisor, keeping application code simple and readable.

Key takeaways:

Naive RAG works well for precise queries
Rewriting improves phrasing for vector search
Expansion broadens search coverage
Compression handles multi-turn conversations

In the next article, we will explore Function Calling: when the LLM directly calls your Java methods.

I hope you found this article useful. Thank you for reading.

To learn more:

RAG Documentation: https://docs.spring.io/spring-ai/reference/api/retrieval-augmented-generation.html
Project source code: spring-ai-en-action
Find our #autourducode videos on our YouTube channel