Some models expose their internal reasoning as "thinking" content -- Claude with extended thinking, OpenAI o-series models with reasoning, and others. TanStack AI captures this as ThinkingPart in messages, streamed to your UI in real-time alongside text and tool calls.
Thinking content is UI-only. It is never sent back to the model in subsequent requests.
When a model emits reasoning tokens, the adapter converts them into AG-UI STEP_STARTED and STEP_FINISHED events. The stream processor accumulates these into a single ThinkingPart on the assistant's UIMessage:
interface ThinkingPart {
type: "thinking";
content: string;
}
The ThinkingPart appears in UIMessage.parts alongside TextPart and ToolCallPart entries. Each STEP_FINISHED event carries an incremental delta and the full accumulated content, so you always have both the latest token and the complete thinking so far.
How you enable thinking depends on the provider.
Pass the thinking option in providerOptions. You must specify budget_tokens (minimum 1024):
import { chat } from "@tanstack/ai";
import { anthropicText } from "@tanstack/ai-anthropic";
const stream = chat({
adapter: anthropicText("claude-sonnet-4-20250514"),
messages,
providerOptions: {
thinking: { type: "enabled", budget_tokens: 10000 },
},
});
For Claude Opus 4.6 and later, you can use adaptive thinking, where the model decides how much to think:
const stream = chat({
adapter: anthropicText("claude-opus-4-6-20250514"),
messages,
providerOptions: {
thinking: { type: "adaptive" },
effort: "high", // 'max' | 'high' | 'medium' | 'low'
},
});
OpenAI o-series models (o1, o3, o3-mini, o3-pro) perform reasoning automatically. You can control the depth with the reasoning option:
import { chat } from "@tanstack/ai";
import { openaiText } from "@tanstack/ai-openai";
const stream = chat({
adapter: openaiText("o3-mini"),
messages,
providerOptions: {
reasoning: {
effort: "medium", // 'low' | 'medium' | 'high'
summary: "auto", // 'auto' | 'detailed'
},
},
});
When reasoning.summary is set, the adapter streams reasoning summary text as thinking content. Without it, reasoning tokens are still used internally but may not be surfaced depending on the model.
GPT-5 and later models also support reasoning when you set the effort to a non-none value:
const stream = chat({
adapter: openaiText("gpt-5"),
messages,
providerOptions: {
reasoning: { effort: "high" },
},
});
Thinking parts appear in message.parts just like text and tool calls. A common pattern is to render them in a collapsible element so they don't dominate the UI:
function MessageContent({ message }) {
return (
<div>
{message.parts.map((part, idx) => {
if (part.type === "thinking") {
return (
<details key={idx}>
<summary>Thinking...</summary>
<pre style={{ whiteSpace: "pre-wrap" }}>{part.content}</pre>
</details>
);
}
if (part.type === "text") {
return <p key={idx}>{part.content}</p>;
}
return null;
})}
</div>
);
}
The Quick Start guide shows a simpler inline pattern where thinking is rendered as italic text above the response.
Thinking content streams before the final text response. As reasoning tokens arrive, ThinkingPart.content accumulates token by token, the same way TextPart.content does for the response text.
The typical streaming order is:
The stream processor handles all of this for you. If you use useChat from @tanstack/ai-react (or the Solid/Vue/Svelte equivalents), your messages array updates automatically with both thinking and text parts as they arrive.