Skip to main content
Article

ChatClient API: Getting Started with the Spring AI ChatClient

The ChatClient is the main entry point for interacting with AI models in Spring AI. In this article, we explore its usage through three sub-modules: a simple prompt with token tracking, a REST controller with multi-model streaming, and multimodality support.

8 min read
spring-aiiallmjavaspring-boot
spring-aiiallm

After laying the foundations of Spring AI in the introductory article, it's time to get hands-on with code. The ChatClient is the framework's central component: it's what allows us to communicate with AI models.

In this article, we will explore the ChatClient through three sub-modules of the demo project:

  • single-chat-model: first prompt and token tracking
  • multi-chat-model: REST controller with synchronous calls and streaming
  • multimodality-chat-model: vision model support (multimodality)

A- The ChatClient: A Familiar Fluent API

Spring AI's ChatClient provides a fluent API for communicating with AI models. If you've already used Spring's WebClient or RestClient, you'll feel right at home.

The principle is simple:

  1. Build a ChatClient from the ChatClient.Builder (auto-configured by Spring Boot)
  2. Send a prompt with .prompt("your question")
  3. Call the model with .call() (synchronous) or .stream() (reactive)
  4. Retrieve the response with .content() (raw text) or .chatResponse() (full response with metadata)
ChatClient chatClient = chatClientBuilder.build();
 
// Simple response (text)
String text = chatClient.prompt("Hello!").call().content();
 
// Full response (with metadata)
ChatResponse response = chatClient.prompt("Hello!").call().chatResponse();
 
// Reactive streaming
Flux<String> stream = chatClient.prompt("Hello!").stream().content();

B- Single Chat Model: First Prompt and Token Tracking

The first sub-module, single-chat-model, illustrates the most basic usage of the ChatClient: sending a prompt to a model and retrieving the complete response with token consumption tracking.

Dependency

To use Ollama as an inference engine, a single dependency is sufficient:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>

Configuration

Model configuration is done in application.yaml:

spring:
  ai:
    ollama:
      chat:
        model: qwen3:0.6b

Here, we use qwen3:0.6b, a lightweight Qwen model ideal for local testing. You need to download it first with ollama pull qwen3:0.6b.

Code

The code uses a CommandLineRunner to execute the prompt at application startup:

@SpringBootApplication
public class SingleChatApplication {
 
    static void main(String[] args) {
        SpringApplication.run(SingleChatApplication.class, args);
    }
 
    @Bean
    CommandLineRunner runnerSingleChat(ChatClient.Builder chatClientBuilder) {
        return _ -> {
            ChatClient chatClient = chatClientBuilder.build();
            var response = chatClient
                    .prompt("Can you briefly explain how LLMs work?")
                    .call().chatResponse();
            var tokenUsage = response.getMetadata().getUsage();
            System.out.printf(
                "Tokens used: %d (prompt: %d, response: %d)%n",
                tokenUsage.getTotalTokens(),
                tokenUsage.getPromptTokens(),
                tokenUsage.getCompletionTokens()
            );
        };
    }
}

Key Points

  • The ChatClient.Builder is auto-configured by Spring Boot thanks to the Ollama starter. Just inject it.
  • The call .call().chatResponse() returns a ChatResponse object containing the text response as well as metadata like token usage.
  • The _ -> pattern in the lambda is a Java 22+ feature (unnamed variables): we don't use the args parameter from CommandLineRunner.
  • Token tracking is essential in production for monitoring costs and consumption.

Token Details

MetricMethodDescription
Prompt tokensgetPromptTokens()Number of tokens sent to the model
Response tokensgetCompletionTokens()Number of tokens generated by the model
TotalgetTotalTokens()Sum of both

C- Multi Chat Model: REST + Streaming + Multi-Provider

The second sub-module, multi-chat-model, goes further by exposing the ChatClient via a REST controller with streaming support and multiple providers (Ollama + OpenAI).

Dependencies

This module declares two model starters:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

Configuration

spring:
  ai:
    ollama:
      chat:
        model: qwen3:0.6b
    openai:
      api-key: ${OPENAI_API_KEY}
    model:
      chat: none

The spring.ai.model.chat: none property is important when multiple chat models are available. It disables the default model auto-configuration, giving you full control over model selection.

REST Controller

@RestController
@RequestMapping("/chat")
public class DemoController {
 
    private final ChatClient chatClient;
 
    public DemoController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }
 
    @GetMapping
    public String sync(String message) {
        return chatClient.prompt(message).call().content();
    }
 
    @GetMapping("/stream")
    public Flux<String> stream(String message) {
        return chatClient.prompt(message).stream().content();
    }
}

Key Points

  • Synchronous call (/chat?message=...): .call().content() blocks until the full response is received, then returns the text.
  • Streaming (/chat/stream?message=...): .stream().content() returns a Flux<String> that emits tokens as they are generated. Ideal for a smooth user experience with progressive display.
  • The ChatClient is built once in the constructor and reused. It is thread-safe.

Testing the Endpoints

# Synchronous call
curl "http://localhost:8080/chat?message=Hello"
 
# Streaming (tokens arrive progressively)
curl -N "http://localhost:8080/chat/stream?message=Tell+me+a+story"

The -N flag in curl disables buffering to see tokens arriving in real time.

D- Multimodality: when AI understands your images

The third sub-module, multimodality-chat-model, introduces multimodality support: the ability to send not only text, but also images, audio, and videos to an AI model.

Configuration

spring:
  ai:
    ollama:
      chat:
        model: qwen3-vl:2b

The qwen3-vl:2b model is a vision-language model that can analyze images and answer questions about them.

Code

@SpringBootApplication
public class MultimodalityChatApplication {
 
    static void main(String[] args) {
        SpringApplication.run(MultimodalityChatApplication.class, args);
    }
 
    @Bean
    CommandLineRunner runnerMultimodalityChat(ChatClient.Builder chatClientBuilder) {
        return _ -> {
            var chatClient = chatClientBuilder.build();
            var response = chatClient
                    .prompt("Can you briefly explain how LLMs work?")
                    .call().content();
            System.out.println(response);
        };
    }
}

Multimodal Support with Spring AI

Spring AI offers native multimodality support through the Resource and MimeTypeUtils classes. Here's an example of a multimodal prompt with an image:

@Value("classpath:test-image.png")
private Resource imageResource;
 
var response = chatClient.prompt()
    .user(u -> u.text("Describe this image")
        .media(MimeTypeUtils.IMAGE_PNG, imageResource))
    .call()
    .content();

Key Points

  • Multimodality requires a compatible model (such as qwen3-vl, GPT-4o, Claude, etc.)
  • Spring AI handles image serialization and transmission to the model transparently
  • The same ChatClient is used — only the prompt changes to include media
  • Supported media types depend on the model: images (PNG, JPEG), audio, etc.

E- Portability in Action

One of Spring AI's great advantages is code portability. Look at the REST controller from the multi-chat-model module: no reference to Ollama or OpenAI in the Java code. The provider choice is entirely driven by configuration.

To switch from Ollama to OpenAI, just change the configuration:

# Before: local Ollama
spring:
  ai:
    ollama:
      chat:
        model: qwen3:0.6b
 
# After: cloud OpenAI
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o

The Java code remains strictly identical. That's the power of Spring AI's abstraction.

F- Summary

FeatureModuleAPI
Simple prompt + tokenssingle-chat-model.call().chatResponse()
Synchronous REST callmulti-chat-model.call().content()
Reactive streamingmulti-chat-model.stream().content()Flux<String>
Multimodalitymultimodality-chat-model.user(u -> u.text().media())
Multi-providermulti-chat-modelYAML configuration only

Conclusion

Spring AI's ChatClient is a powerful and elegant component. In just a few lines of code, we were able to:

  • Send a prompt and retrieve the response with token metadata
  • Expose a synchronous REST endpoint and a streaming endpoint
  • Use vision models for multimodality
  • Switch between providers without modifying Java code

In the next article, we will see how to give memory to our AI by managing conversational context with Spring AI.

I hope you found this article useful. Thank you for reading.

To learn more:


"Spring AI in Action" Series

  1. Introduction to Spring AI
  2. ChatClient API: Getting Started with the API
  3. Chat Memory: Conversational Context
  4. RAG: Ingestion Pipeline
  5. RAG: From Naive to Advanced
  6. Function Calling
  7. Tools + Security
  8. Multi-Agent Orchestration
  9. Model Context Protocol (MCP)
ShareXLinkedIn