Veo 3 can generate videos — and soundtracks to go along with them

SaveSavedRemoved 0

Google’s latest video-generating AI model, Veo 3, can create audio to go along with the clips that it generates.

On Tuesday during the Google I/O 2025 developer conference, Google unveiled Veo 3, which the company claims can generate sound effects, background noises, and even dialogue to accompany the videos it creates. Veo 3 also improves upon its predecessor, Veo 2, in terms of the quality of footage it can generate, Google says.

Veo 3 is available beginning Tuesday in Google’s Gemini chatbot app for subscribers to Google’s $249.99-per-month AI Ultra plan, where it can be prompted with text or an image.

“For the first time, we’re emerging from the silent era of video generation,” Demis Hassabis, the CEO of Google DeepMind, Google’s AI R&D division, said during a press briefing. “[You can give Veo 3] a prompt describing characters and an environment, and suggest dialogue with a description of how you want it to sound.”

The wide availability of tools to build video generators has led to such an explosion of providers that the space is becoming saturated. Startups including Runway, Lightricks, Genmo, Pika, Higgsfield, Kling, and Luma, as well as tech giants such as OpenAI and Alibaba, are releasing models at a fast clip. In many cases, little distinguishes one model from another.

Audio output stands to be a big differentiator for Veo 3, if Google can deliver on its promises. AI-powered sound-generating tools aren’t novel, nor are models to create video sound effects. But Veo 3 uniquely can understand the raw pixels from its videos and sync generated sounds with clips automatically, per Google.

Here’s a sample clip from the model:

Veo 3 was likely made possible by DeepMind’s earlier work in “video-to-audio” AI. Last June, DeepMind revealed that it was developing AI tech to generate soundtracks for videos by training a model on a combination of sounds and dialogue transcripts as well as video clips.

DeepMind won’t say exactly where it sourced the content to train Veo 3, but YouTube is a strong possibility. Google owns YouTube, and DeepMind previously told TechCrunch that Google models like Veo “may” be trained on some YouTube material.

To mitigate the risk of deepfakes, DeepMind says it’s using its proprietary watermarking technology, SynthID, to embed invisible markers into frames Veo 3 generates.

While companies like Google pitch Veo 3 as powerful creative tools, many artists are understandably wary of them — they threaten to upend entire industries. A 2024 study commissioned by the Animation Guild, a union representing Hollywood animators and cartoonists, estimates that more than 100,000 U.S.-based film, television, and animation jobs will be disrupted by AI by 2026.

Google also today rolled out new capabilities for Veo 2, including a feature that lets users give the model images of characters, scenes, objects, and styles for better consistency. The latest Veo 2 can understand camera movements like rotations, dollies, and zooms, and it allows users to add or erase objects from videos or broaden the frames of clips to, for example, turn them from portrait into landscape.

Google says that all of these new Veo 2 capabilities will come to its Vertex AI API platform in the coming weeks.