by Alem Tuzlak on Apr 24, 2026.

The AI audio ecosystem is a mess. Gemini's Lyria wants a natural-language prompt and returns raw PCM you have to wrap in a RIFF header yourself. Fal hosts dozens of audio models where one wants music_length_ms in milliseconds, the next wants seconds_total, and most want plain duration. ElevenLabs has its own shape. Whisper has another. Every provider disagrees on whether you get a URL, a base64 blob, or a raw buffer.
If you are shipping an AI product that needs music, sound effects, speech, or transcription, you end up writing the same boring glue code five times.
TanStack AI just removed that glue. The latest release lands a full audio stack: a new generateAudio activity, streaming support, fal and Gemini Lyria adapters, and framework hooks for React, Solid, Vue, and Svelte. One typed API, any provider.
Here is what shipped and why you should care.
The new generateAudio() activity sits alongside generateImage, generateSpeech, generateVideo, and generateTranscription in @tanstack/ai. It takes a text prompt, dispatches to whatever adapter you hand it, and returns a GeneratedAudio object with exactly one of url or b64Json.
import { generateAudio } from '@tanstack/ai'
import { geminiAudio } from '@tanstack/ai-gemini/adapters'
const adapter = geminiAudio('lyria-3-pro-preview')
const result = await generateAudio({
adapter,
prompt: 'A cinematic orchestral piece with a rising string motif',
})
// result.audio is { url: string } | { b64Json: string } — exactly one, enforced by the typeimport { generateAudio } from '@tanstack/ai'
import { geminiAudio } from '@tanstack/ai-gemini/adapters'
const adapter = geminiAudio('lyria-3-pro-preview')
const result = await generateAudio({
adapter,
prompt: 'A cinematic orchestral piece with a rising string motif',
})
// result.audio is { url: string } | { b64Json: string } — exactly one, enforced by the typeSwap geminiAudio for falAudio and the exact same call generates music through MiniMax, DiffRhythm, Stable Audio 2.5, or any of the other models in fal's catalog. The adapter translates per-model details (like fal's music_length_ms vs seconds_total vs duration naming) so your app code never sees them.
Music and SFX generation is slow. Lyria 3 Pro takes several seconds. Stable Audio takes longer. If you are building a UI, blocking the request the whole time is a bad experience.
generateAudio now supports stream: true, returning an AsyncIterable<StreamChunk> you can pipe straight through toServerSentEventsResponse():
export async function POST(req: Request) {
const { prompt } = await req.json()
const stream = await generateAudio({
adapter: falAudio('fal-ai/minimax-music/v2.6'),
prompt,
stream: true,
})
return toServerSentEventsResponse(stream)
}export async function POST(req: Request) {
const { prompt } = await req.json()
const stream = await generateAudio({
adapter: falAudio('fal-ai/minimax-music/v2.6'),
prompt,
stream: true,
})
return toServerSentEventsResponse(stream)
}The client receives progress events and the final audio over a single SSE connection, the same transport model already used by generateImage and generateVideo. No new infrastructure, no special-case code paths.
Every framework integration gets a new hook matching the existing media-hook shape:
The API is identical to useGenerateImage and friends:
import { useGenerateAudio } from '@tanstack/ai-react'
function MusicGen() {
const { generate, result, isLoading, error, stop, reset } = useGenerateAudio({
connection,
})
return (
<>
<button onClick={() => generate({ prompt: 'Lo-fi hip-hop beat' })}>
Generate
</button>
{isLoading && <button onClick={stop}>Stop</button>}
{result?.audio.url && <audio src={result.audio.url} controls />}
</>
)
}import { useGenerateAudio } from '@tanstack/ai-react'
function MusicGen() {
const { generate, result, isLoading, error, stop, reset } = useGenerateAudio({
connection,
})
return (
<>
<button onClick={() => generate({ prompt: 'Lo-fi hip-hop beat' })}>
Generate
</button>
{isLoading && <button onClick={stop}>Stop</button>}
{result?.audio.url && <audio src={result.audio.url} controls />}
</>
)
}Both connection (SSE) and fetcher (plain HTTP) transports are supported, so this works with TanStack Start, Next.js, Remix, or any backend you already have.
Gemini gets two new entry points:
Fal gets three tree-shakeable adapters:
All four follow the tree-shakeable subpath-import pattern, so your bundle only grows by the adapters you actually import.
The new activity is live in @tanstack/ai and the two provider packages:
pnpm add @tanstack/ai @tanstack/ai-fal
# or
pnpm add @tanstack/ai @tanstack/ai-geminipnpm add @tanstack/ai @tanstack/ai-fal
# or
pnpm add @tanstack/ai @tanstack/ai-geminiThen open the audio generation guide for the full adapter matrix, or pull the ts-react-chat example to see working TTS and transcription tabs plus a /generations/audio route covering Lyria and fal side by side.
Star TanStack AI on GitHub if you want to see where this goes next.