TanStack AI provides support for text-to-speech generation through dedicated TTS adapters. This guide covers how to convert text into spoken audio using OpenAI and Gemini providers.
Text-to-speech (TTS) is handled by TTS adapters that follow the same tree-shakeable architecture as other adapters in TanStack AI. The TTS adapters support:
import { generateSpeech } from '@tanstack/ai'
import { openaiTTS } from '@tanstack/ai-openai'
// Create a TTS adapter (uses OPENAI_API_KEY from environment)
const adapter = openaiSpeech()
// Generate speech from text
const result = await generateSpeech({
adapter: openaiTTS('tts-1'),
text: 'Hello, welcome to TanStack AI!',
voice: 'alloy',
})
// result.audio contains base64-encoded audio data
console.log(result.format) // 'mp3'
console.log(result.contentType) // 'audio/mpeg'
import { generateSpeech } from '@tanstack/ai'
import { geminiSpeech } from '@tanstack/ai-gemini'
// Create a TTS adapter (uses GOOGLE_API_KEY from environment)
const adapter = geminiSpeech()
// Generate speech from text
const result = await generateSpeech({
adapter: geminiTTS('gemini-2.5-flash-preview-tts'),
text: 'Hello from Gemini TTS!',
})
console.log(result.audio) // Base64 encoded audio
All TTS adapters support these common options:
| Option | Type | Description |
|---|---|---|
| text | string | The text to convert to speech (required) |
| voice | string | The voice to use for generation |
| format | string | Output audio format (e.g., "mp3", "wav") |
OpenAI provides several distinct voices:
| Voice | Description |
|---|---|
| alloy | Neutral, balanced voice |
| echo | Warm, conversational voice |
| fable | Expressive, storytelling voice |
| onyx | Deep, authoritative voice |
| nova | Friendly, upbeat voice |
| shimmer | Clear, gentle voice |
| ash | Calm, measured voice |
| ballad | Melodic, flowing voice |
| coral | Bright, energetic voice |
| sage | Wise, thoughtful voice |
| verse | Poetic, rhythmic voice |
| Format | Description |
|---|---|
| mp3 | MP3 audio (default) |
| opus | Opus audio (good for streaming) |
| aac | AAC audio |
| flac | FLAC audio (lossless) |
| wav | WAV audio (uncompressed) |
| pcm | Raw PCM audio |
const result = await generateSpeech({
adapter: openaiTTS('tts-1-hd'),
text: 'High quality speech synthesis',
voice: 'nova',
format: 'mp3',
modelOptions: {
speed: 1.0, // 0.25 to 4.0
},
})
| Option | Type | Description |
|---|---|---|
| speed | number | Playback speed (0.25 to 4.0, default 1.0) |
| instructions | string | Voice style instructions (GPT-4o audio models only) |
Note: The instructions and stream_format options are only available with gpt-4o-audio-preview and gpt-4o-mini-audio-preview models, not with tts-1 or tts-1-hd.
The TTS result includes:
interface TTSResult {
id: string // Unique identifier for this generation
model: string // The model used
audio: string // Base64-encoded audio data
format: string // Audio format (e.g., "mp3")
contentType: string // MIME type (e.g., "audio/mpeg")
duration?: number // Duration in seconds (if available)
}
// Convert base64 to audio and play
function playAudio(result: TTSResult) {
const audioData = atob(result.audio)
const bytes = new Uint8Array(audioData.length)
for (let i = 0; i < audioData.length; i++) {
bytes[i] = audioData.charCodeAt(i)
}
const blob = new Blob([bytes], { type: result.contentType })
const url = URL.createObjectURL(blob)
const audio = new Audio(url)
audio.play()
// Clean up when done
audio.onended = () => URL.revokeObjectURL(url)
}
import { writeFile } from 'fs/promises'
async function saveAudio(result: TTSResult, filename: string) {
const audioBuffer = Buffer.from(result.audio, 'base64')
await writeFile(filename, audioBuffer)
console.log(`Saved to ${filename}`)
}
// Usage
const result = await generateSpeech({
adapter: openaiTTS('tts-1'),
text: 'Hello world!',
})
await saveAudio(result, 'output.mp3')
TanStack AI provides React hooks and server-side streaming helpers to build full-stack text-to-speech with minimal boilerplate.
Server — Create an API route that wraps generateSpeech as a streaming response:
// routes/api/generate/speech.ts
import { generateSpeech, toServerSentEventsResponse } from '@tanstack/ai'
import { openaiTTS } from '@tanstack/ai-openai'
import { createFileRoute } from '@tanstack/react-router'
export const Route = createFileRoute('/api/generate/speech')({
server: {
handlers: {
POST: async ({ request }) => {
const body = await request.json()
const { text, voice, format, model } = body.data
const stream = generateSpeech({
adapter: openaiTTS(model ?? 'tts-1'),
text,
voice,
format,
stream: true,
})
return toServerSentEventsResponse(stream)
},
},
},
})
Client — Use the useGenerateSpeech hook with a connection adapter:
import { useGenerateSpeech, fetchServerSentEvents } from '@tanstack/ai-react'
function SpeechGenerator() {
const { generate, result, isLoading, error } = useGenerateSpeech({
connection: fetchServerSentEvents('/api/generate/speech'),
})
const playAudio = () => {
if (!result) return
const audioData = atob(result.audio)
const bytes = new Uint8Array(audioData.length)
for (let i = 0; i < audioData.length; i++) {
bytes[i] = audioData.charCodeAt(i)
}
const blob = new Blob([bytes], { type: result.contentType })
const url = URL.createObjectURL(blob)
const audio = new Audio(url)
audio.play()
audio.onended = () => URL.revokeObjectURL(url)
}
return (
<div>
<button
onClick={() => generate({ text: 'Hello, welcome to TanStack AI!' })}
disabled={isLoading}
>
{isLoading ? 'Generating...' : 'Generate Speech'}
</button>
{error && <p>Error: {error.message}</p>}
{result && <button onClick={playAudio}>Play Audio</button>}
</div>
)
}
For non-streaming usage with TanStack Start server functions:
// lib/server-functions.ts
import { createServerFn } from '@tanstack/react-start'
import { generateSpeech } from '@tanstack/ai'
import { openaiTTS } from '@tanstack/ai-openai'
export const generateSpeechFn = createServerFn({ method: 'POST' })
.inputValidator((data: { text: string; voice?: string }) => data)
.handler(async ({ data }) => {
return generateSpeech({
adapter: openaiTTS('tts-1'),
text: data.text,
voice: data.voice,
})
})
import { useGenerateSpeech } from '@tanstack/ai-react'
import { generateSpeechFn } from '../lib/server-functions'
function SpeechGenerator() {
const { generate, result, isLoading } = useGenerateSpeech({
fetcher: (input) => generateSpeechFn({ data: input }),
})
// ... same UI as above
}
For TanStack Start server functions that stream results. The fetcher receives type-safe input and returns an SSE Response — the client parses it automatically:
// lib/server-functions.ts
import { createServerFn } from '@tanstack/react-start'
import { generateSpeech, toServerSentEventsResponse } from '@tanstack/ai'
import { openaiTTS } from '@tanstack/ai-openai'
export const generateSpeechStreamFn = createServerFn({ method: 'POST' })
.inputValidator((data: { text: string; voice?: string }) => data)
.handler(({ data }) => {
return toServerSentEventsResponse(
generateSpeech({
adapter: openaiTTS('tts-1'),
text: data.text,
voice: data.voice,
stream: true,
}),
)
})
import { useGenerateSpeech } from '@tanstack/ai-react'
import { generateSpeechStreamFn } from '../lib/server-functions'
function SpeechGenerator() {
const { generate, result, isLoading } = useGenerateSpeech({
fetcher: (input) => generateSpeechStreamFn({ data: input }),
})
// ... same UI as above
}
The useGenerateSpeech hook accepts:
| Option | Type | Description |
|---|---|---|
| connection | ConnectionAdapter | Streaming transport (SSE, HTTP stream, custom) |
| fetcher | (input) => Promise<TTSResult | Response> | Direct async function, or server function returning an SSE Response |
| onResult | (result) => TOutput | null | void | Callback when audio is generated. Optionally return a transformed value (see Result Transform) |
| onError | (error) => void | Callback on error |
| onProgress | (progress, message?) => void | Progress updates (0-100) |
And returns:
| Property | Type | Description |
|---|---|---|
| generate | (input: SpeechGenerateInput) => Promise<void> | Trigger generation |
| result | TOutput | null | The result (or transformed result), or null |
| isLoading | boolean | Whether generation is in progress |
| error | Error | undefined | Current error, if any |
| status | GenerationClientState | 'idle' | 'generating' | 'success' | 'error' |
| stop | () => void | Abort the current generation |
| reset | () => void | Clear result, error, and return to idle |
The onResult callback can optionally return a transformed value that replaces the stored result. This is useful for converting raw API responses into a more convenient format for your components.
Transform behavior:
Example: Convert base64 audio to a playable Audio element
import { useGenerateSpeech, fetchServerSentEvents } from '@tanstack/ai-react'
function SpeechPlayer() {
const { generate, result, isLoading } = useGenerateSpeech({
connection: fetchServerSentEvents('/api/generate/speech'),
onResult: (raw) => {
const audioData = atob(raw.audio)
const bytes = new Uint8Array(audioData.length)
for (let i = 0; i < audioData.length; i++) {
bytes[i] = audioData.charCodeAt(i)
}
const blob = new Blob([bytes], { type: raw.contentType })
const url = URL.createObjectURL(blob)
return {
audio: new Audio(url),
duration: raw.duration,
}
},
})
return (
<div>
<button
onClick={() => generate({ text: 'Hello world!', voice: 'alloy' })}
disabled={isLoading}
>
Generate
</button>
{result && (
<button onClick={() => result.audio.play()}>
Play Audio
</button>
)}
</div>
)
}
TypeScript automatically infers the result type from your onResult return value — no explicit generic parameter needed. In this example, result is inferred as { audio: HTMLAudioElement; duration?: number } | null, so result.audio.play() is fully type-safe.
| Model | Quality | Speed | Use Case |
|---|---|---|---|
| tts-1 | Standard | Fast | Real-time applications |
| tts-1-hd | High | Slower | Production audio |
| gpt-4o-audio-preview | Highest | Variable | Advanced voice control |
| gpt-4o-mini-audio-preview | High | Fast | Balanced quality/speed |
| Model | Status | Notes |
|---|---|---|
| gemini-2.5-flash-preview-tts | Experimental | May require Live API for full features |
try {
const result = await generateSpeech({
adapter: openaiTTS('tts-1'),
text: 'Hello!',
})
} catch (error) {
if (error.message.includes('exceeds maximum length')) {
console.error('Text is too long (max 4096 characters)')
} else if (error.message.includes('Speed must be between')) {
console.error('Invalid speed value')
} else {
console.error('TTS error:', error.message)
}
}
The TTS adapters use the same environment variables as other adapters:
For production use or when you need explicit control:
import { createOpenaiTTS } from '@tanstack/ai-openai'
import { createGeminiTTS } from '@tanstack/ai-gemini'
// OpenAI
const openaiAdapter = createOpenaiTTS('your-openai-api-key')
// Gemini
const geminiAdapter = createGeminiTTS('your-google-api-key')
Text Length: OpenAI TTS supports up to 4096 characters per request. For longer content, split into chunks.
Voice Selection: Choose voices appropriate for your content—use onyx for authoritative content, nova for friendly interactions.
Format Selection: Use mp3 for general use, opus for streaming, wav for further processing.
Caching: Cache generated audio to avoid regenerating the same content.
Error Handling: Always handle errors gracefully, especially for user-facing applications.