Dubbing

Turn transcribed segments into time-aligned dubbed audio (async, gateway-job pattern)

Send a transcription (array of segments with start, end, text), pick a target_language, provider (elevenlabs / minimax) and gender for the voice. The service stitches segments into natural sentences (optional), keeps timing within the given ranges, and produces a single dubbed audio file. `core-dubbing` is an **async/delayed** service: the gateway returns 202 + a gateway job_id immediately and processes the upstream call in the worker. Poll `GET /v2/jobs/{job_id}` for status. The result is delivered when complete (or via your configured webhook). Available languages and genders per provider come from the free `core-dubbing-languages` endpoint. The gateway also exposes `core-dubbing-status` for direct upstream-job inspection — but most clients should rely on `/v2/jobs/{job_id}` instead.

audiodubbingttstranscriptionasync

Overview

Features

Async pipeline (delayed service)

core-dubbing is is_delayed:true. POST returns 202 + gateway job_id. Worker handles the upstream POST + status polling. Use the standard /v2/jobs/{job_id} endpoint for status.

Two providers

elevenlabs (13 languages including Turkish) and minimax (13 languages including Arabic and Cantonese). Both support female/male per language. See core-dubbing-languages.

Stitching & precise timing

use_stitching merges short segments into natural sentences for better prosody. use_precise_timing keeps each segment locked to its (start, end) window.

Voice settings

Optional voice_settings (stability, similarity_boost, etc.) — same shape as core-tts; provider-dependent.

Per-call voice cloning

Pass clone_from_audio or clone_from_video to clone the voice and dub with it in one go (same body fields as core-tts). For dedicated clone+dub flows, see Voice Clone Dubbing.

Use Cases

Localized narration

Time-aligned dubs of an existing transcript into another language.

E-learning + explainers

Transcribe once, dub into multiple languages with the same segment boundaries — keeps sync with on-screen visuals.

Short-form / social repurposing

Re-render scripts in different locales while keeping timings intact.

Accessibility / multilingual

Same content in multiple languages and voices (female/male per language) from a single transcript.

Input / Output

Input

transcription (array of {start,end,text} segments) + target_language + provider + gender + optional flags/settings

JSON body

Output

Initial 202 + gateway job_id; eventual result_data carries the dubbed audio URL + duration + metadata

JSON (job envelope)Audio URL (in result_data)

Specs

Latency
Async — typically 1-5 minutes depending on segment count and total duration
Async
true
Rate Limit
60 req/min per API key
Max Input
Transcription size and segment count depend on provider limits

Quickstart

Prerequisites

  • -A CN8 Gateway API key with core-dubbing in allowed_services
  • -A transcription: array of segments with start, end, text

1. Browse languages and genders (free)

core-dubbing-languages

List supported languages and gender options per provider. Use these to pick provider, target_language, gender.

GET/v1/proxy/core-dubbing-languages
GET /v1/proxy/core-dubbing-languages

Response

{
  "status": "success",
  "data": {
    "providers": {
      "elevenlabs": {
        "languages": ["dutch", "english", "french", "german", "hindi", "indonesian", "italian", "japanese", "korean", "mandarin", "portuguese", "spanish", "turkish"],
        "genders_by_language": {
          "english": ["female", "male"],
          "turkish": ["female", "male"]
        }
      },
      "minimax": {
        "languages": ["arabic", "cantonese", "english", "french", "german", "indonesian", "italian", "japanese", "korean", "mandarin", "portuguese", "spanish", "turkish"],
        "genders_by_language": {
          "english": ["female", "male"],
          "turkish": ["female", "male"]
        }
      }
    }
  }
}

OpenAI is NOT a dubbing provider — only elevenlabs and minimax. Cantonese / Arabic are minimax-only; Hindi / Dutch are elevenlabs-only.

2. Submit a dubbing job (async)

core-dubbing

POST the transcription with target language and gender. Response is 202 + gateway job_id. Gateway worker handles the upstream pipeline.

POST/v1/proxy/core-dubbing
{
  "transcription": [
    { "start": 0.0, "end": 2.5, "text": "Hello and welcome to our platform." },
    { "start": 2.6, "end": 5.0, "text": "Today we will show you how to get started." }
  ],
  "target_language": "turkish",
  "provider": "elevenlabs",
  "gender": "female",
  "use_stitching": true,
  "use_precise_timing": false,
  "voice_settings": { "stability": 0.85, "similarity_boost": 0.85 }
}

Response

{
  "status": "accepted",
  "message": "Job queued for processing",
  "job": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "queued",
    "service": "core-dubbing",
    "created_at": "2026-04-27T10:30:00Z"
  }
}

Save the gateway job.id and poll GET /v2/jobs/{job_id} until status=completed.

3. Poll for status

Standard async-job polling. The gateway worker runs the upstream POST and the upstream's own status polling internally — you only see the gateway job.

GET
GET /v2/jobs/{job_id}

Response

{
  "id": "550e8400-...",
  "status": "completed",
  "service_name": "core-dubbing",
  "created_at": "2026-04-27T10:30:00Z",
  "completed_at": "2026-04-27T10:32:15Z",
  "result_data": {
    "status": "success",
    "audio_url": "https://s3.example.com/.../dubbed.mp3",
    "duration_seconds": 5.0,
    "target_language": "turkish",
    "provider": "elevenlabs"
  },
  "units_consumed": 5.0,
  "token_cost": 0.25
}

Field shapes within result_data depend on the upstream — the example shows the typical fields. Billing: 5s × $0.05 = $0.25 (see cost.tokens / token_cost).

Create Dubbing

POSTasync

Async dubbing job. Returns 202 + gateway job_id. The worker calls the upstream and polls until the dubbed audio is ready. Billed per second of generated dubbed audio at $0.05/s.

/v1/proxy/core-dubbing

Dubbing Status (upstream-job direct)

GETsync

Direct upstream-job status (advanced/debug). Most clients should use the standard /v2/jobs/{job_id} endpoint instead — the gateway worker manages upstream polling automatically.

/v1/proxy/core-dubbing-status/{job_id}

List Dubbing Languages

GETsync

Browse supported languages + gender options per dubbing provider (elevenlabs and minimax).

/v1/proxy/core-dubbing-languages

Pricing

Per second of generated dubbed audio. Status polling via the standard jobs endpoint and language listing are free.

ServiceUnitPrice
Dubbingsecond$0.05/second
Voice-Clone Dubbingsecond$0.08/second (see voice-clone-dubbing)
Languages / StatusitemFree
  • -core-dubbing is async — billing happens once when the gateway job completes (when the upstream returns the dubbed audio).
  • -If the upstream returns an error envelope (status:error), the gateway marks the job FAILED and does NOT bill (see worker.py fix v2026-04-27).

Guides & Tips

Async lifecycle

  • -1. POST core-dubbing → 202 + gateway job.id.
  • -2. Worker calls upstream POST /api/dubbing/create → upstream returns its own job_id.
  • -3. Worker polls upstream `/api/dubbing/status/{upstream_job_id}` automatically.
  • -4. When upstream returns completed/failed, gateway updates its job record and bills (only on success).
  • -5. Optionally, gateway publishes a webhook event (see Webhook docs).
  • -6. Client polls GET /v2/jobs/{gateway_job_id} for the final result.

Choosing a provider

  • -ElevenLabs: best Western-language coverage (English, French, German, Spanish, Italian, Portuguese, Dutch + Asian languages). Includes Hindi.
  • -MiniMax: alternative provider with strong Asian coverage (Cantonese, Mandarin, Japanese, Korean, Indonesian) plus Arabic.
  • -Cross-check `core-dubbing-languages.providers.{provider}.languages` before submitting a job.

Stitching vs precise timing

  • -`use_stitching:true` (default): merges short adjacent segments into natural sentences for better prosody. Use this for narration / longer-form content.
  • -`use_precise_timing:true`: each segment's output is locked to its (start, end) window — useful when audio must align with on-screen captions or other sync-critical content.
  • -The two flags are independent; you can stitch within precise-timing windows.

FAQ

Q: How long does a dubbing job take?

A: Typically 1-5 minutes depending on total transcription length. Async — poll GET /v2/jobs/{job_id} for status.

Q: Why don't I see OpenAI as a provider?

A: OpenAI doesn't expose a dubbing API. Only elevenlabs and minimax are supported here.

Q: When should I use core-dubbing-status instead of /v2/jobs/{job_id}?

A: Almost never — the gateway's /v2/jobs/{job_id} is the canonical job endpoint. core-dubbing-status takes the UPSTREAM's job_id (which clients don't see in the normal flow); it's mostly for debugging.

Q: Will I be billed if the upstream fails?

A: No — as of 2026-04-27, the worker detects upstream {status:'error'} envelopes and HTTP 4xx/5xx, marks the job FAILED, and skips billing. See worker.py:_complete_job.

Q: Can I dub with a cloned voice?

A: Two options: (a) pass clone_from_audio / clone_from_video on this endpoint per call, or (b) use the dedicated Voice Clone Dubbing service (core-dubbing-clone).

Related Products

Changelog

1.1 (2026-04-27)

  • -Documented the full async/delayed lifecycle (gateway job vs upstream job; worker auto-polling).
  • -Clarified core-dubbing-status semantics (advanced/debug; not the primary status endpoint).
  • -Added missing request fields from the upstream Pydantic DubbingRequest: voice_id, clone_from_audio, clone_from_video, keep_source_file, auto_delete_after_hours, cinema8_env.
  • -Confirmed only two providers (elevenlabs, minimax) — removed any OpenAI references.
  • -Documented the no-bill-on-failure behaviour (worker.py fix).

1.0 (2026-01-26)

  • -Initial release with multi-provider dubbing and language listing.