Since OpenAI unveiled its text-to-video (T2V) tool, Sora, earlier this week, a flurry of predictions has emerged, characterized by a mix of wonder and apprehension, praise and concern. Observers have marveled at what appears to be a significant advancement in T2V technology compared to other existing tools like Runway's Gen-2 and Pika.
Generative AI Software Market Five-Year Revenue Growth, by Segment
Based solely on its sample results, OpenAI's Sora stands out as the most impressive video diffusion model to date. However, despite its advancements, Sora and similar models still have limitations that would hinder their effectiveness in Hollywood filmmaking.
"Sora represents a significant step forward in producing realistic content suitable for high-end entertainment. However, the demands of creatives for complete control over performances and scene content mean there's a considerable distance to cover before diffusion models can autonomously create Hollywood movies," stated Tom Graham, CEO and co-founder of AI firm Metaphysic, renowned for de-aging Tom Hanks in the Miramax film "Here."
What sets Sora apart exactly?
While Sora offers capabilities akin to other video diffusion models like those from Runway and Pika, such as video generation and editing, it excels in several areas:
-
Video Quality and Realism: Sora's outputs exhibit notably higher photorealism and fidelity compared to other models.
-
Video Length: Sora can generate videos up to a minute long while maintaining coherence, a significant improvement over Runway's Gen-2, which could only manage up to 18 seconds per generation as of August 2023.
-
Spatiotemporal Consistency: Sora can extend generated videos seamlessly, ensuring consistency even when subjects briefly exit the frame.
This ability to maintain consistency addresses a common challenge in AI-generated filmmaking, where stitching together multiple video outputs struggles to preserve character and scene continuity due to variations in generation results.
However, despite these advancements, several challenges remain for models like Sora to be adopted in Hollywood:
-
Continuity: Sora's improvements don't guarantee complete subject/object and environment continuity, crucial for a coherent narrative or visual consistency.
-
Controllability: Current tools lack the level of creative control filmmakers need, making AI adoption potentially more constraining than traditional methods.
-
Copyright Concerns: Legal uncertainties surrounding copyright protection and liability hinder the use of AI-generated material in Hollywood productions.
Until these challenges are addressed, AI tools like Sora are likely to find utility mainly in previsualization stages of projects, aiding in concept development and iteration.
Even in early-stage concept work, concerns about copyright infringement and protectability persist, especially if AI-generated elements are incorporated into human-created media like TV shows, movies, or video games.
Creative Tasks Entertainment Workers Think Generative AI Will Be Able to Do Effectively
Once publicly available, Sora will undergo rigorous red-teaming to identify vulnerabilities and potential misuse. OpenAI has committed to gathering feedback from policymakers, educators, and artists worldwide to understand concerns and pinpoint beneficial applications.
This mirrors the stance taken by Google researchers with their T2V diffusion model, Lumiere, presented in late January. While recognizing its creative potential, they acknowledged the risk of misuse for generating fake or harmful content.
Upon its release, Sora could empower social media users and average individuals to unleash their creativity, flooding platforms with generated videos. Advertisers and content marketers may also leverage its capabilities. However, the proliferation of deepfake disinformation is a concerning possibility, even with implemented watermarks. The ultimate impact of OpenAI's consultations remains uncertain.