Async pipeline (delayed service)
core-dubbing is is_delayed:true. POST returns 202 + gateway job_id. Worker handles the upstream POST + status polling. Use the standard /v2/jobs/{job_id} endpoint for status.
Turn transcribed segments into time-aligned dubbed audio (async, gateway-job pattern)
Send a transcription (array of segments with start, end, text), pick a target_language, provider (elevenlabs / minimax) and gender for the voice. The service stitches segments into natural sentences (optional), keeps timing within the given ranges, and produces a single dubbed audio file. `core-dubbing` is an **async/delayed** service: the gateway returns 202 + a gateway job_id immediately and processes the upstream call in the worker. Poll `GET /v2/jobs/{job_id}` for status. The result is delivered when complete (or via your configured webhook). Available languages and genders per provider come from the free `core-dubbing-languages` endpoint. The gateway also exposes `core-dubbing-status` for direct upstream-job inspection — but most clients should rely on `/v2/jobs/{job_id}` instead.
core-dubbing is is_delayed:true. POST returns 202 + gateway job_id. Worker handles the upstream POST + status polling. Use the standard /v2/jobs/{job_id} endpoint for status.
elevenlabs (13 languages including Turkish) and minimax (13 languages including Arabic and Cantonese). Both support female/male per language. See core-dubbing-languages.
use_stitching merges short segments into natural sentences for better prosody. use_precise_timing keeps each segment locked to its (start, end) window.
Optional voice_settings (stability, similarity_boost, etc.) — same shape as core-tts; provider-dependent.
Pass clone_from_audio or clone_from_video to clone the voice and dub with it in one go (same body fields as core-tts). For dedicated clone+dub flows, see Voice Clone Dubbing.
Time-aligned dubs of an existing transcript into another language.
Transcribe once, dub into multiple languages with the same segment boundaries — keeps sync with on-screen visuals.
Re-render scripts in different locales while keeping timings intact.
Same content in multiple languages and voices (female/male per language) from a single transcript.
Input
transcription (array of {start,end,text} segments) + target_language + provider + gender + optional flags/settings
Output
Initial 202 + gateway job_id; eventual result_data carries the dubbed audio URL + duration + metadata
Prerequisites
List supported languages and gender options per provider. Use these to pick provider, target_language, gender.
GET /v1/proxy/core-dubbing-languages
Response
{
"status": "success",
"data": {
"providers": {
"elevenlabs": {
"languages": ["dutch", "english", "french", "german", "hindi", "indonesian", "italian", "japanese", "korean", "mandarin", "portuguese", "spanish", "turkish"],
"genders_by_language": {
"english": ["female", "male"],
"turkish": ["female", "male"]
}
},
"minimax": {
"languages": ["arabic", "cantonese", "english", "french", "german", "indonesian", "italian", "japanese", "korean", "mandarin", "portuguese", "spanish", "turkish"],
"genders_by_language": {
"english": ["female", "male"],
"turkish": ["female", "male"]
}
}
}
}
}OpenAI is NOT a dubbing provider — only elevenlabs and minimax. Cantonese / Arabic are minimax-only; Hindi / Dutch are elevenlabs-only.
POST the transcription with target language and gender. Response is 202 + gateway job_id. Gateway worker handles the upstream pipeline.
{
"transcription": [
{ "start": 0.0, "end": 2.5, "text": "Hello and welcome to our platform." },
{ "start": 2.6, "end": 5.0, "text": "Today we will show you how to get started." }
],
"target_language": "turkish",
"provider": "elevenlabs",
"gender": "female",
"use_stitching": true,
"use_precise_timing": false,
"voice_settings": { "stability": 0.85, "similarity_boost": 0.85 }
}Response
{
"status": "accepted",
"message": "Job queued for processing",
"job": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued",
"service": "core-dubbing",
"created_at": "2026-04-27T10:30:00Z"
}
}Save the gateway job.id and poll GET /v2/jobs/{job_id} until status=completed.
Standard async-job polling. The gateway worker runs the upstream POST and the upstream's own status polling internally — you only see the gateway job.
GET /v2/jobs/{job_id}Response
{
"id": "550e8400-...",
"status": "completed",
"service_name": "core-dubbing",
"created_at": "2026-04-27T10:30:00Z",
"completed_at": "2026-04-27T10:32:15Z",
"result_data": {
"status": "success",
"audio_url": "https://s3.example.com/.../dubbed.mp3",
"duration_seconds": 5.0,
"target_language": "turkish",
"provider": "elevenlabs"
},
"units_consumed": 5.0,
"token_cost": 0.25
}Field shapes within result_data depend on the upstream — the example shows the typical fields. Billing: 5s × $0.05 = $0.25 (see cost.tokens / token_cost).
Async dubbing job. Returns 202 + gateway job_id. The worker calls the upstream and polls until the dubbed audio is ready. Billed per second of generated dubbed audio at $0.05/s.
/v1/proxy/core-dubbing
Direct upstream-job status (advanced/debug). Most clients should use the standard /v2/jobs/{job_id} endpoint instead — the gateway worker manages upstream polling automatically.
/v1/proxy/core-dubbing-status/{job_id}
Browse supported languages + gender options per dubbing provider (elevenlabs and minimax).
/v1/proxy/core-dubbing-languages
Per second of generated dubbed audio. Status polling via the standard jobs endpoint and language listing are free.
| Service | Unit | Price |
|---|---|---|
| Dubbing | second | $0.05/second |
| Voice-Clone Dubbing | second | $0.08/second (see voice-clone-dubbing) |
| Languages / Status | item | Free |
A: Typically 1-5 minutes depending on total transcription length. Async — poll GET /v2/jobs/{job_id} for status.
A: OpenAI doesn't expose a dubbing API. Only elevenlabs and minimax are supported here.
A: Almost never — the gateway's /v2/jobs/{job_id} is the canonical job endpoint. core-dubbing-status takes the UPSTREAM's job_id (which clients don't see in the normal flow); it's mostly for debugging.
A: No — as of 2026-04-27, the worker detects upstream {status:'error'} envelopes and HTTP 4xx/5xx, marks the job FAILED, and skips billing. See worker.py:_complete_job.
A: Two options: (a) pass clone_from_audio / clone_from_video on this endpoint per call, or (b) use the dedicated Voice Clone Dubbing service (core-dubbing-clone).
1.1 (2026-04-27)
1.0 (2026-01-26)