TanStack AI supports multimodal content in messages, allowing you to send images, audio, video, and documents alongside text to AI models that support these modalities.
When sending messages to AI models, you can include different types of content:
Multimodal messages use the ContentPart type to represent different content types:
import type { ContentPart, ImagePart, TextPart } from '@tanstack/ai'
// Text content
const textPart: TextPart = {
type: 'text',
text: 'What do you see in this image?'
}
// Image from base64 data
const imagePart: ImagePart = {
type: 'image',
source: {
type: 'data',
value: 'base64EncodedImageData...'
},
metadata: {
// Provider-specific metadata
detail: 'high' // OpenAI detail level
}
}
// Image from URL
const imageUrlPart: ImagePart = {
type: 'image',
source: {
type: 'url',
value: 'https://example.com/image.jpg'
}
}
import type { ContentPart, ImagePart, TextPart } from '@tanstack/ai'
// Text content
const textPart: TextPart = {
type: 'text',
text: 'What do you see in this image?'
}
// Image from base64 data
const imagePart: ImagePart = {
type: 'image',
source: {
type: 'data',
value: 'base64EncodedImageData...'
},
metadata: {
// Provider-specific metadata
detail: 'high' // OpenAI detail level
}
}
// Image from URL
const imageUrlPart: ImagePart = {
type: 'image',
source: {
type: 'url',
value: 'https://example.com/image.jpg'
}
}
Messages can have content as either a string or an array of ContentPart:
import { chat } from '@tanstack/ai'
import { OpenAI } from '@tanstack/ai-openai'
const openai = new OpenAI({ apiKey: 'your-key' })
const response = await chat({
adapter: openai,
model: 'gpt-4o',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{
type: 'image',
source: {
type: 'url',
value: 'https://example.com/photo.jpg'
}
}
]
}
]
})
import { chat } from '@tanstack/ai'
import { OpenAI } from '@tanstack/ai-openai'
const openai = new OpenAI({ apiKey: 'your-key' })
const response = await chat({
adapter: openai,
model: 'gpt-4o',
messages: [
{
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{
type: 'image',
source: {
type: 'url',
value: 'https://example.com/photo.jpg'
}
}
]
}
]
})
OpenAI supports images and audio in their vision and audio models:
import { OpenAI } from '@tanstack/ai-openai'
const openai = new OpenAI({ apiKey: 'your-key' })
// Image with detail level metadata
const message = {
role: 'user' ,
content: [
{ type: 'text' , text: 'Describe this image' },
{
type: 'image' ,
source: { type: 'data' , value: imageBase64 },
metadata: { detail: 'high' } // 'auto' | 'low' | 'high'
}
]
}
import { OpenAI } from '@tanstack/ai-openai'
const openai = new OpenAI({ apiKey: 'your-key' })
// Image with detail level metadata
const message = {
role: 'user' ,
content: [
{ type: 'text' , text: 'Describe this image' },
{
type: 'image' ,
source: { type: 'data' , value: imageBase64 },
metadata: { detail: 'high' } // 'auto' | 'low' | 'high'
}
]
}
Supported modalities by model:
Anthropic's Claude models support images and PDF documents:
import { Anthropic } from '@tanstack/ai-anthropic'
const anthropic = new Anthropic({ apiKey: 'your-key' })
// Image with media type
const imageMessage = {
role: 'user' ,
content: [
{ type: 'text' , text: 'What do you see?' },
{
type: 'image' ,
source: { type: 'data' , value: imageBase64 },
metadata: { media_type: 'image/jpeg' }
}
]
}
// PDF document
const docMessage = {
role: 'user',
content: [
{ type: 'text', text: 'Summarize this document' },
{
type: 'document',
source: { type: 'data', value: pdfBase64 }
}
]
}
import { Anthropic } from '@tanstack/ai-anthropic'
const anthropic = new Anthropic({ apiKey: 'your-key' })
// Image with media type
const imageMessage = {
role: 'user' ,
content: [
{ type: 'text' , text: 'What do you see?' },
{
type: 'image' ,
source: { type: 'data' , value: imageBase64 },
metadata: { media_type: 'image/jpeg' }
}
]
}
// PDF document
const docMessage = {
role: 'user',
content: [
{ type: 'text', text: 'Summarize this document' },
{
type: 'document',
source: { type: 'data', value: pdfBase64 }
}
]
}
Supported modalities:
Google's Gemini models support a wide range of modalities:
import { GeminiAdapter } from '@tanstack/ai-gemini'
const gemini = new GeminiAdapter({ apiKey: 'your-key' })
// Image with mimeType
const message = {
role: 'user',
content: [
{ type: 'text', text: 'Analyze this image' },
{
type: 'image',
source: { type: 'data', value: imageBase64 },
metadata: { mimeType: 'image/png' }
}
]
}
import { GeminiAdapter } from '@tanstack/ai-gemini'
const gemini = new GeminiAdapter({ apiKey: 'your-key' })
// Image with mimeType
const message = {
role: 'user',
content: [
{ type: 'text', text: 'Analyze this image' },
{
type: 'image',
source: { type: 'data', value: imageBase64 },
metadata: { mimeType: 'image/png' }
}
]
}
Supported modalities:
Ollama supports images in compatible models:
import { OllamaAdapter } from '@tanstack/ai-ollama'
const ollama = new OllamaAdapter({ host: 'http://localhost:11434' })
// Image as base64
const message = {
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{
type: 'image',
source: { type: 'data', value: imageBase64 }
}
]
}
import { OllamaAdapter } from '@tanstack/ai-ollama'
const ollama = new OllamaAdapter({ host: 'http://localhost:11434' })
// Image as base64
const message = {
role: 'user',
content: [
{ type: 'text', text: 'What is in this image?' },
{
type: 'image',
source: { type: 'data', value: imageBase64 }
}
]
}
Note: Ollama support varies by model. Check the specific model documentation for multimodal capabilities.
Content can be provided as either inline data or a URL:
Use type: 'data' for inline base64-encoded content:
const imagePart = {
type: 'image',
source: {
type: 'data',
value: 'iVBORw0KGgoAAAANSUhEUgAAAAUA...' // Base64 string
}
}
const imagePart = {
type: 'image',
source: {
type: 'data',
value: 'iVBORw0KGgoAAAANSUhEUgAAAAUA...' // Base64 string
}
}
Use type: 'url' for content hosted at a URL:
const imagePart = {
type: 'image' ,
source: {
type: 'url' ,
value: 'https://example.com/image.jpg'
}
}
const imagePart = {
type: 'image' ,
source: {
type: 'url' ,
value: 'https://example.com/image.jpg'
}
}
Note: Not all providers support URL-based content for all modalities. Check provider documentation for specifics.
String content continues to work as before:
// This still works
const message = {
role: 'user',
content: 'Hello, world!'
}
// And this works for multimodal
const multimodalMessage = {
role: 'user',
content: [
{ type: 'text', text: 'Hello, world!' },
{ type: 'image', source: { type: 'url', value: '...' } }
]
}
// This still works
const message = {
role: 'user',
content: 'Hello, world!'
}
// And this works for multimodal
const multimodalMessage = {
role: 'user',
content: [
{ type: 'text', text: 'Hello, world!' },
{ type: 'image', source: { type: 'url', value: '...' } }
]
}
The multimodal types are fully typed. Provider-specific metadata types are available:
import type {
ContentPart,
ImagePart,
DocumentPart,
AudioPart,
VideoPart,
TextPart
} from '@tanstack/ai'
// Provider-specific metadata types
import type { OpenAIImageMetadata } from '@tanstack/ai-openai'
import type { AnthropicImageMetadata } from '@tanstack/ai-anthropic'
import type { GeminiMediaMetadata } from '@tanstack/ai-gemini'
import type {
ContentPart,
ImagePart,
DocumentPart,
AudioPart,
VideoPart,
TextPart
} from '@tanstack/ai'
// Provider-specific metadata types
import type { OpenAIImageMetadata } from '@tanstack/ai-openai'
import type { AnthropicImageMetadata } from '@tanstack/ai-anthropic'
import type { GeminiMediaMetadata } from '@tanstack/ai-gemini'
When receiving messages from external sources (like request.json()), the data is typed as any, which can bypass TypeScript's type checking. Use assertMessages to restore type safety:
import { assertMessages, chat } from '@tanstack/ai'
import { openai } from '@tanstack/ai-openai'
// In an API route handler
const { messages: incomingMessages } = await request.json()
const adapter = openai()
// Assert incoming messages are compatible with gpt-4o (text + image only)
const typedMessages = assertMessages({ adapter, model: 'gpt-4o' }, incomingMessages)
// Now TypeScript will properly check any additional messages you add
const stream = chat({
adapter,
model: 'gpt-4o',
messages: [
...typedMessages,
// This will error if you try to add unsupported content types
{
role: 'user',
content: [
{ type: 'text', text: 'What do you see?' },
{ type: 'image', source: { type: 'url', value: '...' } }
]
}
]
})
import { assertMessages, chat } from '@tanstack/ai'
import { openai } from '@tanstack/ai-openai'
// In an API route handler
const { messages: incomingMessages } = await request.json()
const adapter = openai()
// Assert incoming messages are compatible with gpt-4o (text + image only)
const typedMessages = assertMessages({ adapter, model: 'gpt-4o' }, incomingMessages)
// Now TypeScript will properly check any additional messages you add
const stream = chat({
adapter,
model: 'gpt-4o',
messages: [
...typedMessages,
// This will error if you try to add unsupported content types
{
role: 'user',
content: [
{ type: 'text', text: 'What do you see?' },
{ type: 'image', source: { type: 'url', value: '...' } }
]
}
]
})
Note: assertMessages is a type-level assertion only. It does not perform runtime validation. For runtime validation of message content, use a schema validation library like Zod.
Use appropriate source type: Use data for small content or when you need to include content inline. Use url for large files or when the content is already hosted.
Include metadata: Provide relevant metadata (like mimeType or detail) to help the model process the content correctly.
Check model support: Not all models support all modalities. Verify the model you're using supports the content types you want to send.
Handle errors gracefully: When a model doesn't support a particular modality, it may throw an error. Handle these cases in your application.
