XAU$4,080.00▼ 1.21%TRX$0.3312▲ 0.77%ETH$1,661.92▲ 0.67%LEO$9.47▼ 0.61%WTI$71.82▼ 1.90%DOGE$0.0785▼ 0.51%BTC$62,379.00▲ 0.19%SOL$68.95▲ 0.43%BRENT$75.59▼ 1.93%HYPE$62.01▼ 0.56%XRP$1.09▼ 1.40%NATGAS$3.21▲ 2.07%FIGR_HELOC$1.03▼ 0.15%XMR$321.96▲ 1.81%XLM$0.1902▼ 0.64%RAIN$0.0160▲ 1.56%ZEC$411.45▼ 2.34%XAG$60.92▼ 1.78%BNB$574.79▲ 0.56%USDS$0.9995▼ 0.02%XAU$4,080.00▼ 1.21%TRX$0.3312▲ 0.77%ETH$1,661.92▲ 0.67%LEO$9.47▼ 0.61%WTI$71.82▼ 1.90%DOGE$0.0785▼ 0.51%BTC$62,379.00▲ 0.19%SOL$68.95▲ 0.43%BRENT$75.59▼ 1.93%HYPE$62.01▼ 0.56%XRP$1.09▼ 1.40%NATGAS$3.21▲ 2.07%FIGR_HELOC$1.03▼ 0.15%XMR$321.96▲ 1.81%XLM$0.1902▼ 0.64%RAIN$0.0160▲ 1.56%ZEC$411.45▼ 2.34%XAG$60.92▼ 1.78%BNB$574.79▲ 0.56%USDS$0.9995▼ 0.02%
Prices as of 10:59 UTC

AI Video Generation Reaches Commercial Production Scale

AI Video Generation Reaches Commercial Production Scale

AI Video Generation Reaches Commercial Production Scale

Google DeepMind’s Veo 3, released to Gemini API access in June 2026, generates video with synchronised audio from a text or image prompt — the first commercially available text-to-video model capable of producing audio and video together without a separate post-production step. Google DeepMind’s Veo 3 technical overview describes resolution and audio fidelity benchmarks that, for the first time, place generated video within the quality range acceptable for digital advertising placements without reshooting. Advertising agencies have been testing the model since its limited preview in Q1 2026; the commercial tier opened for enterprise access in May.

OpenAI’s Sora, available to enterprise API customers since late 2025, has a different profile — higher control over camera motion and scene consistency, but audio generation requires a separate pipeline. Neither model eliminates the human direction and curation that production-quality commercial work requires. What they have changed is the cost and speed of the iteration stages that precede final production.

Veo 3 and Sora: The Commercial Quality Gap That Closed

The quality threshold for commercial video is more precisely defined than popular coverage suggests. Digital advertising — social media placements, pre-roll video, display — operates at lower resolution and shorter runtime than broadcast or theatrical content. A 15-second social ad at 1080p with synchronised ambient audio is achievable with current AI video generation models. A 60-second brand film with principal photography, dialogue, and performance is not.

The commercial case for AI video generation in 2026 is strongest in the use cases that live between these poles: concept visualisation (showing a client what a campaign could look like before production commitment), product placement and lifestyle context shots (placing a product in a generated scene rather than building a physical set), and social content iteration (generating 20 variants of a 10-second clip to test performance, then producing only the winning version at full cost).

Advertising holding companies — WPP, Publicis, Omnicom — have disclosed AI video tooling in their capability stack, though none have published specifics on generated video’s share of delivered work. Independent evidence from creative agencies suggests that AI video is being used for client presentations and internal creative exploration substantially more than for delivered client assets, with a 3-6 month lag expected before delivered work volumes follow.

Where Agencies Are Actually Deploying Generated Video

The deployment patterns differ by agency type. Digital-first performance marketing agencies are adopting AI video most aggressively: for these teams, A/B testing video creative at scale has always been limited by production cost, and AI generation removes that constraint. A performance agency can generate 50 variants of a product video for a fraction of the cost of shooting 5, run them against real audiences at the top of the funnel, and commission full production only for the concepts with proven performance data.

Traditional brand-focused agencies are adopting more cautiously, primarily because their clients’ brand standards apply to generated outputs as directly as to produced work. A generated video asset bearing a luxury brand’s visual identity must meet the same colour, composition, and talent standards as a produced one — and the curation cost of ensuring compliance at scale is not trivial. These agencies are using AI video for internal concepting and client pitch decks, where the brand exposure risk is managed.

The infrastructure investment that hyperscalers have committed to AI in 2026 is making the compute cost of video generation fall on a steeper curve than language model inference costs fell in 2022-2023. Video generation is more compute-intensive per token-equivalent than text, but the cost trajectory follows the same pattern: each generation of model infrastructure reduces marginal generation cost by a factor that makes previously expensive use cases economically accessible.

The Rights and Billing Economics of Generated Video

The intellectual property landscape for AI-generated video remains partially unsettled, which is creating predictable risk tiering in enterprise adoption. The unresolved questions — whether training data rights transfer to outputs, who owns generated video for commercial purposes, what disclosure requirements apply — are being handled differently by different clients. Regulated industries (financial services, pharmaceuticals) are moving slowly because their legal review processes apply to generated outputs. Consumer goods, e-commerce, and direct-to-consumer brands are moving faster because their legal exposure from video content is lower.

The billing model that has emerged from both Google (Veo 3 API) and OpenAI (Sora API) is per-second of generated video, with tiers for resolution and audio inclusion. Enterprise clients running large-scale creative testing programmes are generating thousands of seconds of video monthly — a cost structure that functions as a production retainer rather than a per-asset spend. The economics are favourable compared to producing the equivalent volume of traditional video, but the comparison is only meaningful for the agencies that have the workflow infrastructure to manage and curate generated output at volume.

OpenAI’s model release cadence suggests that Sora’s commercial capability will continue to evolve rapidly — the same pattern of regular capability updates that has characterised the GPT series applies to multimodal models. The implication for agencies is that workflows built on current Sora capability will need to accommodate models that are substantially more capable 12 months from now, which counsels against deep workflow integration that assumes a fixed output quality ceiling.

The agencies building durable advantage from AI video generation are those treating it as a workflow redesign exercise — rethinking how creative concepting, client review, and production sequencing operate — rather than those treating it as a tool that accelerates existing workflows. The enterprise AI market is bifurcating along the same line in text-generation applications: the firms redesigning workflows around AI capability are outcompeting the firms using AI to do old workflows faster.

The Product Decision AI Video Generation Forces on Every Creative Agency

Marty Cagan’s framework for product decisions centres on the difference between value, viability, feasibility, and usability — and the reason most enterprise technology adoption stalls is that all four gates need to be cleared simultaneously, not sequentially. AI video generation in 2026 has cleared feasibility (the tools work) and usability (the interface is accessible to non-engineers) faster than creative agencies have cleared the value and viability questions. The result is a capability gap that is visible in revenue terms: agencies that restructured their production workflows around AI capability are outcompeting on margin, not on creative output quality.

The product decision that every creative agency with a video production practice now faces is not whether to adopt AI generation — that gate has closed. The decision is whether to use the cost reduction to compress margins for competitive pricing, reinvest in creative talent that works at the AI-augmented layer, or extend into services that were previously not economically viable at human-only production rates. Each is a coherent strategy. The agencies that are visibly struggling are the ones that have not made the decision consciously and are drifting between all three simultaneously.

Veo 3’s transition from a quality benchmark to a commercial production tool happened in roughly six months. The agency-facing question is not about Veo 3 specifically — it is about the rate at which AI video tooling improves relative to the rate at which agencies can build institutional knowledge about deploying it. The institutional knowledge gap is real: knowing that Veo 3 can generate high-quality footage at scale is not the same as knowing which brief types it excels at, which legal review workflows apply to generated content, or how to price the hybrid human-AI production engagement to clients who are still calibrating what they should be paying. Those are the product questions that determine which agencies emerge from the transition as category leaders and which ones become commoditised fulfilment shops for clients who have their own prompting capability in-house.

Zoe Kessler
Zoe Kessler read mathematics at Cambridge before a postgraduate year at Imperial College, where her thesis examined interpretability methods for financial AI systems. She spent three years at a Brussels-based AI governance think tank before going independent. She splits her time between London and Berlin, covering AI policy with rare technical precision.
Home » AI Video Generation Reaches Commercial Production Scale