ChatClient API: Getting Started with the Spring AI ChatClient
The ChatClient is the main entry point for interacting with AI models in Spring AI. In this article, we explore its usage through three sub-modules: a simple prompt with token tracking, a REST controller with multi-model streaming, and multimodality support.
After laying the foundations of Spring AI in the introductory article, it's time to get hands-on with code. The ChatClient is the framework's central component: it's what allows us to communicate with AI models.
In this article, we will explore the ChatClient through three sub-modules of the demo project:
- single-chat-model: first prompt and token tracking
- multi-chat-model: REST controller with synchronous calls and streaming
- multimodality-chat-model: vision model support (multimodality)
A- The ChatClient: A Familiar Fluent API
Spring AI's ChatClient provides a fluent API for communicating with AI models. If you've already used Spring's WebClient or RestClient, you'll feel right at home.
The principle is simple:
- Build a
ChatClientfrom theChatClient.Builder(auto-configured by Spring Boot) - Send a prompt with
.prompt("your question") - Call the model with
.call()(synchronous) or.stream()(reactive) - Retrieve the response with
.content()(raw text) or.chatResponse()(full response with metadata)
ChatClient chatClient = chatClientBuilder.build();
// Simple response (text)
String text = chatClient.prompt("Hello!").call().content();
// Full response (with metadata)
ChatResponse response = chatClient.prompt("Hello!").call().chatResponse();
// Reactive streaming
Flux<String> stream = chatClient.prompt("Hello!").stream().content();B- Single Chat Model: First Prompt and Token Tracking
The first sub-module, single-chat-model, illustrates the most basic usage of the ChatClient: sending a prompt to a model and retrieving the complete response with token consumption tracking.
Dependency
To use Ollama as an inference engine, a single dependency is sufficient:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>Configuration
Model configuration is done in application.yaml:
spring:
ai:
ollama:
chat:
model: qwen3:0.6bHere, we use qwen3:0.6b, a lightweight Qwen model ideal for local testing. You need to download it first with ollama pull qwen3:0.6b.
Code
The code uses a CommandLineRunner to execute the prompt at application startup:
@SpringBootApplication
public class SingleChatApplication {
static void main(String[] args) {
SpringApplication.run(SingleChatApplication.class, args);
}
@Bean
CommandLineRunner runnerSingleChat(ChatClient.Builder chatClientBuilder) {
return _ -> {
ChatClient chatClient = chatClientBuilder.build();
var response = chatClient
.prompt("Can you briefly explain how LLMs work?")
.call().chatResponse();
var tokenUsage = response.getMetadata().getUsage();
System.out.printf(
"Tokens used: %d (prompt: %d, response: %d)%n",
tokenUsage.getTotalTokens(),
tokenUsage.getPromptTokens(),
tokenUsage.getCompletionTokens()
);
};
}
}Key Points
- The
ChatClient.Builderis auto-configured by Spring Boot thanks to the Ollama starter. Just inject it. - The call
.call().chatResponse()returns aChatResponseobject containing the text response as well as metadata like token usage. - The
_ ->pattern in the lambda is a Java 22+ feature (unnamed variables): we don't use theargsparameter fromCommandLineRunner. - Token tracking is essential in production for monitoring costs and consumption.
Token Details
| Metric | Method | Description |
|---|---|---|
| Prompt tokens | getPromptTokens() | Number of tokens sent to the model |
| Response tokens | getCompletionTokens() | Number of tokens generated by the model |
| Total | getTotalTokens() | Sum of both |
C- Multi Chat Model: REST + Streaming + Multi-Provider
The second sub-module, multi-chat-model, goes further by exposing the ChatClient via a REST controller with streaming support and multiple providers (Ollama + OpenAI).
Dependencies
This module declares two model starters:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>Configuration
spring:
ai:
ollama:
chat:
model: qwen3:0.6b
openai:
api-key: ${OPENAI_API_KEY}
model:
chat: noneThe spring.ai.model.chat: none property is important when multiple chat models are available. It disables the default model auto-configuration, giving you full control over model selection.
REST Controller
@RestController
@RequestMapping("/chat")
public class DemoController {
private final ChatClient chatClient;
public DemoController(ChatClient.Builder chatClientBuilder) {
this.chatClient = chatClientBuilder.build();
}
@GetMapping
public String sync(String message) {
return chatClient.prompt(message).call().content();
}
@GetMapping("/stream")
public Flux<String> stream(String message) {
return chatClient.prompt(message).stream().content();
}
}Key Points
- Synchronous call (
/chat?message=...):.call().content()blocks until the full response is received, then returns the text. - Streaming (
/chat/stream?message=...):.stream().content()returns aFlux<String>that emits tokens as they are generated. Ideal for a smooth user experience with progressive display. - The
ChatClientis built once in the constructor and reused. It is thread-safe.
Testing the Endpoints
# Synchronous call
curl "http://localhost:8080/chat?message=Hello"
# Streaming (tokens arrive progressively)
curl -N "http://localhost:8080/chat/stream?message=Tell+me+a+story"The -N flag in curl disables buffering to see tokens arriving in real time.
D- Multimodality: when AI understands your images
The third sub-module, multimodality-chat-model, introduces multimodality support: the ability to send not only text, but also images, audio, and videos to an AI model.
Configuration
spring:
ai:
ollama:
chat:
model: qwen3-vl:2bThe qwen3-vl:2b model is a vision-language model that can analyze images and answer questions about them.
Code
@SpringBootApplication
public class MultimodalityChatApplication {
static void main(String[] args) {
SpringApplication.run(MultimodalityChatApplication.class, args);
}
@Bean
CommandLineRunner runnerMultimodalityChat(ChatClient.Builder chatClientBuilder) {
return _ -> {
var chatClient = chatClientBuilder.build();
var response = chatClient
.prompt("Can you briefly explain how LLMs work?")
.call().content();
System.out.println(response);
};
}
}Multimodal Support with Spring AI
Spring AI offers native multimodality support through the Resource and MimeTypeUtils classes. Here's an example of a multimodal prompt with an image:
@Value("classpath:test-image.png")
private Resource imageResource;
var response = chatClient.prompt()
.user(u -> u.text("Describe this image")
.media(MimeTypeUtils.IMAGE_PNG, imageResource))
.call()
.content();Key Points
- Multimodality requires a compatible model (such as qwen3-vl, GPT-4o, Claude, etc.)
- Spring AI handles image serialization and transmission to the model transparently
- The same
ChatClientis used — only the prompt changes to include media - Supported media types depend on the model: images (PNG, JPEG), audio, etc.
E- Portability in Action
One of Spring AI's great advantages is code portability. Look at the REST controller from the multi-chat-model module: no reference to Ollama or OpenAI in the Java code. The provider choice is entirely driven by configuration.
To switch from Ollama to OpenAI, just change the configuration:
# Before: local Ollama
spring:
ai:
ollama:
chat:
model: qwen3:0.6b
# After: cloud OpenAI
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4oThe Java code remains strictly identical. That's the power of Spring AI's abstraction.
F- Summary
| Feature | Module | API |
|---|---|---|
| Simple prompt + tokens | single-chat-model | .call().chatResponse() |
| Synchronous REST call | multi-chat-model | .call().content() |
| Reactive streaming | multi-chat-model | .stream().content() → Flux<String> |
| Multimodality | multimodality-chat-model | .user(u -> u.text().media()) |
| Multi-provider | multi-chat-model | YAML configuration only |
Conclusion
Spring AI's ChatClient is a powerful and elegant component. In just a few lines of code, we were able to:
- Send a prompt and retrieve the response with token metadata
- Expose a synchronous REST endpoint and a streaming endpoint
- Use vision models for multimodality
- Switch between providers without modifying Java code
In the next article, we will see how to give memory to our AI by managing conversational context with Spring AI.
I hope you found this article useful. Thank you for reading.
To learn more:
- ChatClient Documentation: https://docs.spring.io/spring-ai/reference/api/chatclient.html
- Project source code: spring-ai-en-action
- Find our #autourducode videos on our YouTube channel