Video from Text

Generate short videos from a text prompt (Google Veo, async/delayed)

Async video generation from a text prompt using Google Veo. Choose model (`veo (Audio)` or `veo fast (Audio)`), resolution (512P / 1080P), duration (4s / 8s), and aspect_ratio (16:9 / 9:16). `media-video-from-text` is **is_delayed:true**: the gateway returns 202 + a gateway job_id immediately. The worker calls the upstream, which itself queues a task in a dedicated video worker pool, and the gateway worker polls the upstream's `/api/video/status/{job_id}` automatically. Pricing: $1.50/second of generated video (catalog rate). For a 4s video that's $6 per generation.

videogenerationtext-to-videoveoasync

Overview

Features

Google Veo backend

Two model options: `veo (Audio)` (default, includes generated audio) and `veo fast (Audio)` (faster). Both produce video with audio.

Async pipeline (delayed service + double polling)

Gateway returns 202 + job.id; worker calls upstream, upstream returns its own job_id, worker polls upstream's /api/video/status/{job_id} until complete. Up to 600s timeout (10 min).

Configurable resolution / duration / aspect

resolution: 512P or 1080P. duration: 4s or 8s. aspect_ratio: 16:9 (landscape) or 9:16 (portrait).

Per-second pricing

$1.50/s — a 4s video is $6, an 8s video is $12. usage.units = duration in seconds.

Use Cases

Marketing snippets

Short promo / ad / social video from a brief text prompt.

Animated product hero

Generate a short looping hero video for landing pages.

Storyboard / pre-vis

Quick visual pre-vis from a script line before commissioning real production.

Vertical (TikTok / Reels) content

Use aspect_ratio: 9:16 to generate portrait-format videos for short-form platforms.

Input / Output

Input

prompt + optional model, resolution, duration, aspect_ratio

JSON body

Output

Initial 202 + gateway job_id; eventual result_data carries the video URL + duration + metadata

JSON (job envelope)Video URL (in result_data)

Specs

Latency
Async — typically 1-5 minutes (longer for 1080P / 8s). Gateway timeout 600s (10 min).
Async
true
Rate Limit
60 req/min per API key
Max Input
Prompt length depends on Veo's prompt limits

Quickstart

Prerequisites

  • -A CN8 Gateway API key with media-video-from-text in allowed_services

1. Submit a generation job

media-video-from-text

POST a prompt + optional parameters. The gateway returns 202 + job.id immediately.

POST/v1/proxy/media-video-from-text
{
  "prompt": "A cinematic shot of a serene mountain lake at sunset, gentle ripples on the water, soft mist rising",
  "model": "veo (Audio)",
  "resolution": "1080P",
  "duration": "4s",
  "aspect_ratio": "16:9"
}

Response

{
  "status": "accepted",
  "message": "Job queued for processing",
  "job": {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "status": "queued",
    "service": "media-video-from-text",
    "created_at": "2026-04-27T10:30:00Z"
  }
}

Save the gateway job.id and poll GET /v2/jobs/{job_id} until status=completed. Worker handles upstream + upstream-status-polling automatically.

2. Poll for the video

Standard async-job polling. Worker is doing the heavy work behind the scenes.

GET
GET /v2/jobs/{job_id}

Response

{
  "id": "550e8400-...",
  "status": "completed",
  "service_name": "media-video-from-text",
  "created_at": "2026-04-27T10:30:00Z",
  "completed_at": "2026-04-27T10:33:42Z",
  "result_data": {
    "status": "success",
    "video_url": "https://s3.example.com/.../video.mp4",
    "duration_seconds": 4.0,
    "model": "veo (Audio)",
    "resolution": "1080P",
    "aspect_ratio": "16:9"
  },
  "units_consumed": 4.0,
  "token_cost": 6.0
}

Billing: 4s × $1.50 = $6.00. result_data shape comes from the upstream — typical fields shown.

Video from Text

POSTasync

Async text-to-video via Google Veo. Returns 202 + gateway job_id. The worker handles upstream POST + status polling. Billed per second of generated video.

/v1/proxy/media-video-from-text

Pricing

Per second of generated video. Async — billing happens once when the gateway job completes.

ServiceUnitPrice
Video from Textsecond$1.50/second
  • -Examples: 4s 1080P video = $6.00; 8s 1080P video = $12.00.
  • -Pricing is the same for veo (Audio) and veo fast (Audio) — model choice is about latency, not cost.
  • -If the upstream fails (HTTP 4xx/5xx or {status:'error'} envelope), the gateway marks the job FAILED and skips billing (worker.py fix 2026-04-27).

Guides & Tips

Async lifecycle (double polling)

  • -1. POST media-video-from-text → 202 + gateway job.id.
  • -2. Worker calls upstream POST /api/video/from-text → upstream returns its own job_id (queued in upstream's video worker pool).
  • -3. Gateway worker polls upstream `/api/video/status/{upstream_job_id}` automatically.
  • -4. When upstream returns success, gateway updates its job and bills (only on success).
  • -5. Optionally publishes a webhook event.
  • -6. Client polls GET /v2/jobs/{gateway_job_id} for the final result.

Veo vs Veo fast

  • -veo (Audio) (default): higher quality, slower. Use for final outputs.
  • -veo fast (Audio): faster generation, slightly lower quality. Use for iteration / drafts.
  • -Both include generated audio.

Aspect ratio choice

  • -`16:9` — landscape, default. Best for desktop / YouTube / TV.
  • -`9:16` — portrait. Best for TikTok / Instagram Reels / Shorts.

FAQ

Q: How long does video generation take?

A: Async, typically 1-5 minutes for 4s/1080P. Longer for 8s or with veo (Audio). Gateway timeout 600s (10 min) — if the upstream doesn't return by then, the job fails.

Q: Can I customise the audio?

A: Both models include generated audio aligned with the visuals. The audio is not separately controllable via this endpoint.

Q: Why does duration take a string ('4s') instead of a number?

A: Upstream Pydantic expects exact string values '4s' or '8s'. Passing 4 (int) returns a 400.

Q: Will I be billed if generation fails?

A: No — as of 2026-04-27, the worker detects upstream failures (HTTP 4xx/5xx or {status:'error'} envelope) and skips billing for FAILED jobs.

Related Products

Changelog

1.1 (2026-04-27)

  • -Documented the full async lifecycle (gateway job + worker auto-polling of the upstream's own status endpoint).
  • -Confirmed Pydantic VideoFromTextRequest defaults: model='veo (Audio)', resolution='1080P', duration='4s', aspect_ratio='16:9'.
  • -Documented enum values exactly: model is one of 'veo (Audio)' / 'veo fast (Audio)' (with the parenthesised 'Audio' suffix), duration is '4s'/'8s' (string with s suffix).
  • -Documented the 600s gateway timeout.
  • -Added the no-bill-on-failure note (worker.py fix 2026-04-27).

1.0 (2026-01-26)

  • -Initial catalog: media-video-from-text.