Google Veo backend
Two model options: `veo (Audio)` (default, includes generated audio) and `veo fast (Audio)` (faster). Both produce video with audio.
Generate short videos from a text prompt (Google Veo, async/delayed)
Async video generation from a text prompt using Google Veo. Choose model (`veo (Audio)` or `veo fast (Audio)`), resolution (512P / 1080P), duration (4s / 8s), and aspect_ratio (16:9 / 9:16). `media-video-from-text` is **is_delayed:true**: the gateway returns 202 + a gateway job_id immediately. The worker calls the upstream, which itself queues a task in a dedicated video worker pool, and the gateway worker polls the upstream's `/api/video/status/{job_id}` automatically. Pricing: $1.50/second of generated video (catalog rate). For a 4s video that's $6 per generation.
Two model options: `veo (Audio)` (default, includes generated audio) and `veo fast (Audio)` (faster). Both produce video with audio.
Gateway returns 202 + job.id; worker calls upstream, upstream returns its own job_id, worker polls upstream's /api/video/status/{job_id} until complete. Up to 600s timeout (10 min).
resolution: 512P or 1080P. duration: 4s or 8s. aspect_ratio: 16:9 (landscape) or 9:16 (portrait).
$1.50/s — a 4s video is $6, an 8s video is $12. usage.units = duration in seconds.
Short promo / ad / social video from a brief text prompt.
Generate a short looping hero video for landing pages.
Quick visual pre-vis from a script line before commissioning real production.
Use aspect_ratio: 9:16 to generate portrait-format videos for short-form platforms.
Input
prompt + optional model, resolution, duration, aspect_ratio
Output
Initial 202 + gateway job_id; eventual result_data carries the video URL + duration + metadata
Prerequisites
POST a prompt + optional parameters. The gateway returns 202 + job.id immediately.
{
"prompt": "A cinematic shot of a serene mountain lake at sunset, gentle ripples on the water, soft mist rising",
"model": "veo (Audio)",
"resolution": "1080P",
"duration": "4s",
"aspect_ratio": "16:9"
}Response
{
"status": "accepted",
"message": "Job queued for processing",
"job": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued",
"service": "media-video-from-text",
"created_at": "2026-04-27T10:30:00Z"
}
}Save the gateway job.id and poll GET /v2/jobs/{job_id} until status=completed. Worker handles upstream + upstream-status-polling automatically.
Standard async-job polling. Worker is doing the heavy work behind the scenes.
GET /v2/jobs/{job_id}Response
{
"id": "550e8400-...",
"status": "completed",
"service_name": "media-video-from-text",
"created_at": "2026-04-27T10:30:00Z",
"completed_at": "2026-04-27T10:33:42Z",
"result_data": {
"status": "success",
"video_url": "https://s3.example.com/.../video.mp4",
"duration_seconds": 4.0,
"model": "veo (Audio)",
"resolution": "1080P",
"aspect_ratio": "16:9"
},
"units_consumed": 4.0,
"token_cost": 6.0
}Billing: 4s × $1.50 = $6.00. result_data shape comes from the upstream — typical fields shown.
Async text-to-video via Google Veo. Returns 202 + gateway job_id. The worker handles upstream POST + status polling. Billed per second of generated video.
/v1/proxy/media-video-from-text
Per second of generated video. Async — billing happens once when the gateway job completes.
| Service | Unit | Price |
|---|---|---|
| Video from Text | second | $1.50/second |
A: Async, typically 1-5 minutes for 4s/1080P. Longer for 8s or with veo (Audio). Gateway timeout 600s (10 min) — if the upstream doesn't return by then, the job fails.
A: Both models include generated audio aligned with the visuals. The audio is not separately controllable via this endpoint.
A: Upstream Pydantic expects exact string values '4s' or '8s'. Passing 4 (int) returns a 400.
A: No — as of 2026-04-27, the worker detects upstream failures (HTTP 4xx/5xx or {status:'error'} envelope) and skips billing for FAILED jobs.
1.1 (2026-04-27)
1.0 (2026-01-26)