The Soniox adapter provides access to Soniox transcription models.
npm install @soniox/tanstack-ai-adapter
Set SONIOX_API_KEY in your environment or pass apiKey when creating the adapter. Get your API key from the Soniox Console.
import { generateTranscription } from "@tanstack/ai";
import { sonioxTranscription } from "@soniox/tanstack-ai-adapter";
const result = await generateTranscription({
adapter: sonioxTranscription("stt-async-v3"),
audio: audioFile,
modelOptions: {
enableLanguageIdentification: true,
enableSpeakerDiarization: true,
},
});
console.log(result.text);
console.log(result.segments);
import { generateTranscription } from "@tanstack/ai";
import { createSonioxTranscription } from "@soniox/tanstack-ai-adapter";
const adapter = createSonioxTranscription("stt-async-v3", process.env.SONIOX_API_KEY!);
const result = await generateTranscription({
adapter,
audio: audioFile,
});
Use createSonioxTranscription to customize the adapter instance:
import { createSonioxTranscription } from "@soniox/tanstack-ai-adapter";
const adapter = createSonioxTranscription("stt-async-v3", process.env.SONIOX_API_KEY!, {
baseUrl: "https://api.soniox.com",
pollingIntervalMs: 1000,
timeout: 180000,
});
Options:
See the Soniox regional endpoints if you need data residency.
Per-request options are passed via modelOptions:
const result = await generateTranscription({
adapter: sonioxTranscription("stt-async-v3"),
audio,
modelOptions: {
languageHints: ["en", "es"],
enableLanguageIdentification: true,
enableSpeakerDiarization: true,
context: {
terms: ["Soniox", "TanStack"],
},
},
});
Available options:
Check the Soniox API reference for more details.
Soniox automatically detects and transcribes speech in 60+ languages. When you know which languages are likely to appear in your audio, provide languageHints to improve accuracy by biasing recognition toward those languages.
Language hints do not restrict recognition. If you pass the TanStack language option, this adapter merges it into languageHints.
const result = await generateTranscription({
adapter: sonioxTranscription("stt-async-v3"),
audio,
modelOptions: {
languageHints: ["en", "es"],
},
});
For more details, see the Soniox language hints documentation.
Provide custom context to improve transcription and translation accuracy. Context helps the model understand your domain, recognize important terms, and apply custom vocabulary.
The context object supports four optional sections:
const result = await generateTranscription({
adapter: sonioxTranscription("stt-async-v3"),
audio,
modelOptions: {
context: {
general: [
{ key: "domain", value: "Healthcare" },
{ key: "topic", value: "Diabetes management consultation" },
{ key: "doctor", value: "Dr. Martha Smith" },
],
text: "The patient has a history of...",
terms: ["Celebrex", "Zyrtec", "Xanax"],
translationTerms: [
{ source: "Mr. Smith", target: "Sr. Smith" },
{ source: "MRI", target: "RM" },
],
},
},
});
For more details, see the Soniox context documentation.
Configure translation for your transcriptions:
const result = await generateTranscription({
adapter: sonioxTranscription("stt-async-v3"),
audio,
modelOptions: {
translation: {
type: "one_way",
targetLanguage: "es",
},
},
});
For two-way translation:
const result = await generateTranscription({
adapter: sonioxTranscription("stt-async-v3"),
audio,
modelOptions: {
translation: {
type: "two_way",
languageA: "en",
languageB: "es",
},
},
});
When using translation, the API returns both transcription tokens and translation tokens. The segments array always includes only transcription tokens. To access translation tokens, use providerMetadata and filter by translation_status === "translation".
When using translation or working with multilingual audio, you may need access to raw tokens with per-token language information and translation status. The adapter attaches a non-standard providerMetadata field at runtime:
const result = await generateTranscription({
adapter: sonioxTranscription("stt-async-v3"),
audio,
modelOptions: {
translation: { type: "one_way", targetLanguage: "es" },
},
});
const rawTokens = (result as any).providerMetadata?.soniox?.tokens;
if (rawTokens) {
rawTokens.forEach((token) => {
// token.text - token text
// token.start_ms - start time in milliseconds
// token.end_ms - end time in milliseconds
// token.language - detected language for this token
// token.translation_status - translation status (if translation enabled)
// token.speaker - speaker identifier
// token.confidence - confidence score
});
}