ChatClient API: Getting Started with the Spring AI ChatClient

After laying the foundations of Spring AI in the introductory article, it's time to get hands-on with code. The ChatClient is the framework's central component: it's what allows us to communicate with AI models.

In this article, we will explore the ChatClient through three sub-modules of the demo project:

single-chat-model: first prompt and token tracking
multi-chat-model: REST controller with synchronous calls and streaming
multimodality-chat-model: vision model support (multimodality)

A- The ChatClient: A Familiar Fluent API

Spring AI's ChatClient provides a fluent API for communicating with AI models. If you've already used Spring's WebClient or RestClient, you'll feel right at home.

The principle is simple:

Build a ChatClient from the ChatClient.Builder (auto-configured by Spring Boot)
Send a prompt with .prompt("your question")
Call the model with .call() (synchronous) or .stream() (reactive)
Retrieve the response with .content() (raw text) or .chatResponse() (full response with metadata)

ChatClient chatClient = chatClientBuilder.build();
 
// Simple response (text)
String text = chatClient.prompt("Hello!").call().content();
 
// Full response (with metadata)
ChatResponse response = chatClient.prompt("Hello!").call().chatResponse();
 
// Reactive streaming
Flux<String> stream = chatClient.prompt("Hello!").stream().content();

B- Single Chat Model: First Prompt and Token Tracking

The first sub-module, single-chat-model, illustrates the most basic usage of the ChatClient: sending a prompt to a model and retrieving the complete response with token consumption tracking.

Dependency

To use Ollama as an inference engine, a single dependency is sufficient:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>

Configuration

Model configuration is done in application.yaml:

spring:
  ai:
    ollama:
      chat:
        model: qwen3:0.6b

Here, we use qwen3:0.6b, a lightweight Qwen model ideal for local testing. You need to download it first with ollama pull qwen3:0.6b.

Code

The code uses a CommandLineRunner to execute the prompt at application startup:

@SpringBootApplication
public class SingleChatApplication {
 
    static void main(String[] args) {
        SpringApplication.run(SingleChatApplication.class, args);
    }
 
    @Bean
    CommandLineRunner runnerSingleChat(ChatClient.Builder chatClientBuilder) {
        return _ -> {
            ChatClient chatClient = chatClientBuilder.build();
            var response = chatClient
                    .prompt("Can you briefly explain how LLMs work?")
                    .call().chatResponse();
            var tokenUsage = response.getMetadata().getUsage();
            System.out.printf(
                "Tokens used: %d (prompt: %d, response: %d)%n",
                tokenUsage.getTotalTokens(),
                tokenUsage.getPromptTokens(),
                tokenUsage.getCompletionTokens()
            );
        };
    }
}

Key Points

The ChatClient.Builder is auto-configured by Spring Boot thanks to the Ollama starter. Just inject it.
The call .call().chatResponse() returns a ChatResponse object containing the text response as well as metadata like token usage.
The _ -> pattern in the lambda is a Java 22+ feature (unnamed variables): we don't use the args parameter from CommandLineRunner.
Token tracking is essential in production for monitoring costs and consumption.

Token Details

Metric	Method	Description
Prompt tokens	`getPromptTokens()`	Number of tokens sent to the model
Response tokens	`getCompletionTokens()`	Number of tokens generated by the model
Total	`getTotalTokens()`	Sum of both

C- Multi Chat Model: REST + Streaming + Multi-Provider

The second sub-module, multi-chat-model, goes further by exposing the ChatClient via a REST controller with streaming support and multiple providers (Ollama + OpenAI).

Dependencies

This module declares two model starters:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

Configuration

spring:
  ai:
    ollama:
      chat:
        model: qwen3:0.6b
    openai:
      api-key: ${OPENAI_API_KEY}
    model:
      chat: none

The spring.ai.model.chat: none property is important when multiple chat models are available. It disables the default model auto-configuration, giving you full control over model selection.

REST Controller

@RestController
@RequestMapping("/chat")
public class DemoController {
 
    private final ChatClient chatClient;
 
    public DemoController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }
 
    @GetMapping
    public String sync(String message) {
        return chatClient.prompt(message).call().content();
    }
 
    @GetMapping("/stream")
    public Flux<String> stream(String message) {
        return chatClient.prompt(message).stream().content();
    }
}

Key Points

Synchronous call (/chat?message=...): .call().content() blocks until the full response is received, then returns the text.
Streaming (/chat/stream?message=...): .stream().content() returns a Flux<String> that emits tokens as they are generated. Ideal for a smooth user experience with progressive display.
The ChatClient is built once in the constructor and reused. It is thread-safe.

Testing the Endpoints

# Synchronous call
curl "http://localhost:8080/chat?message=Hello"
 
# Streaming (tokens arrive progressively)
curl -N "http://localhost:8080/chat/stream?message=Tell+me+a+story"

The -N flag in curl disables buffering to see tokens arriving in real time.

D- Multimodality: when AI understands your images

The third sub-module, multimodality-chat-model, introduces multimodality support: the ability to send not only text, but also images, audio, and videos to an AI model.

Configuration

spring:
  ai:
    ollama:
      chat:
        model: qwen3-vl:2b

The qwen3-vl:2b model is a vision-language model that can analyze images and answer questions about them.

Code

@SpringBootApplication
public class MultimodalityChatApplication {
 
    static void main(String[] args) {
        SpringApplication.run(MultimodalityChatApplication.class, args);
    }
 
    @Bean
    CommandLineRunner runnerMultimodalityChat(ChatClient.Builder chatClientBuilder) {
        return _ -> {
            var chatClient = chatClientBuilder.build();
            var response = chatClient
                    .prompt("Can you briefly explain how LLMs work?")
                    .call().content();
            System.out.println(response);
        };
    }
}

Multimodal Support with Spring AI

Spring AI offers native multimodality support through the Resource and MimeTypeUtils classes. Here's an example of a multimodal prompt with an image:

@Value("classpath:test-image.png")
private Resource imageResource;
 
var response = chatClient.prompt()
    .user(u -> u.text("Describe this image")
        .media(MimeTypeUtils.IMAGE_PNG, imageResource))
    .call()
    .content();

Key Points

Multimodality requires a compatible model (such as qwen3-vl, GPT-4o, Claude, etc.)
Spring AI handles image serialization and transmission to the model transparently
The same ChatClient is used — only the prompt changes to include media
Supported media types depend on the model: images (PNG, JPEG), audio, etc.

E- Portability in Action

One of Spring AI's great advantages is code portability. Look at the REST controller from the multi-chat-model module: no reference to Ollama or OpenAI in the Java code. The provider choice is entirely driven by configuration.

To switch from Ollama to OpenAI, just change the configuration:

# Before: local Ollama
spring:
  ai:
    ollama:
      chat:
        model: qwen3:0.6b
 
# After: cloud OpenAI
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o

The Java code remains strictly identical. That's the power of Spring AI's abstraction.

F- Summary

Feature	Module	API
Simple prompt + tokens	single-chat-model	`.call().chatResponse()`
Synchronous REST call	multi-chat-model	`.call().content()`
Reactive streaming	multi-chat-model	`.stream().content()` → `Flux<String>`
Multimodality	multimodality-chat-model	`.user(u -> u.text().media())`
Multi-provider	multi-chat-model	YAML configuration only

Conclusion

Spring AI's ChatClient is a powerful and elegant component. In just a few lines of code, we were able to:

Send a prompt and retrieve the response with token metadata
Expose a synchronous REST endpoint and a streaming endpoint
Use vision models for multimodality
Switch between providers without modifying Java code

In the next article, we will see how to give memory to our AI by managing conversational context with Spring AI.

I hope you found this article useful. Thank you for reading.

To learn more:

ChatClient Documentation: https://docs.spring.io/spring-ai/reference/api/chatclient.html
Project source code: spring-ai-en-action
Find our #autourducode videos on our YouTube channel

A- The ChatClient: A Familiar Fluent API

B- Single Chat Model: First Prompt and Token Tracking

Dependency

Configuration

Code

Key Points

Token Details

C- Multi Chat Model: REST + Streaming + Multi-Provider

Dependencies

Configuration

REST Controller

Key Points

Testing the Endpoints

D- Multimodality: when AI understands your images

Configuration

Code

Multimodal Support with Spring AI

Key Points

E- Portability in Action

F- Summary

Conclusion

"Spring AI in Action" Series