Everything You Need To Know About Sora, OpenAI’s Text-To-Video Generator

1 year ago

And why people are worried, plus the environmental impact

This post may contain affiliate links. Learn more

Everything You Need To Know About Sora, OpenAI's Text-To-Video Generator

You’d have to have been living under a rock to avoid hearing about OpenAI – or, at least, its products Chat-GPT and DALL-E. It’s set to become the US’s second most valuable start-up (just tailing Elon Musk’s SpaceX) with a potential valuation of over $100bn. And yesterday the tech company announced its latest product: Sora, an AI model which generates photorealistic video from text input.

OpenAI Launch Text To Video AI Generator, Sora

What is Sora?

Sora is an AI model that generates video – including photorealistic video – from text instructions. It can currently generate up to 60 seconds of material. Here’s the announcement on X (formerly Twitter) yesterday:

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf

— OpenAI (@OpenAI) February 15, 2024

In the announcement, the company demonstrated its capabilities including hyper-realistic videos of mammoths navigating snowy terrain, a spaceman in a salt desert in the style of a movie trailer, and a woman walking through a neon street in Tokyo.

Its capabilities remain semi-limited. In one video, a cat has an extra paw; in other videos, images (compression artifacts) are fuzzy and low resolution. Twitter users have suggested that the latter fact might indicate that OpenAI trained the model on copyrighted content.

Tech entrepreneur @ChombaBupe said:

You can see interpolation, disocclusion & compression artifacts.

Only means one thing – it’s remixing content from the training dataset. https://t.co/32kFd4ToJ0

— Chomba Bupe (@ChombaBupe) February 15, 2024

What’s the controversy?

OpenAI is typically fairly vague about the data on which they train their models. Ed Newton-Rex, CEO of FairlyTrained – a non-profiting certification for AI companies to verify their datasets as fairly sourced – said on twitter: ‘There is a very real danger that the phrase “publicly available” is used to hide copyright infringement in plain sight.’

Indeed, OpenAI is in a legal battle with the New York Times over use of the latter’s copyrighted work in the training of its LLM, ChatGPT.

Those in the art, animation and VFX industries are raising concerns about the impact this will have on the availability of the work that commercially supports a lot of their livelihoods:

That’s before we get into devastating job losses, the destruction of any ability to make a living as an artist, the fact all the money will funnel into few pockets, the fact that the entire dataset is built on theft

…and worst: that it’s a huge contributor to global warming

— Max Nichols (@maxnichols) February 15, 2024

Further worry lies in the environmental impact of generative AI and the potential energy intensity of text-to-video generation:

Additional to these great questions, I’d say some inquiry into the amount of energy required to generate these videos is warranted – we already know that text and image gen takes absolutely devastating amounts of energy, I imagine video gen will compound that exponentially. https://t.co/HAOAB4hhB4

— • Silthulhu • 🌻🍉 (@Silthulhu) February 15, 2024

Finally, another worry is around the potential for abuse and directed harm towards women and children. Recently, it was reported that fake explicit photos of Taylor Swift were circulated around internet forums; these images were generated with the same technology as DALL-E (although OpenAI denies they were created on their platform). There is a non-zero risk of this technology being abused in a similar and harmful way.

Perhaps of note, OpenAI has had no women on its board since December 2023.

What are OpenAI saying about Sora?

Of course, OpenAI has said – in the wake of this announcement – that it’s working with ‘red teamers – domain experts in areas like misinformation, hateful content, and bias – who will be adversarially testing the model’. Its policies currently protect against this technology being used for misinformation and harassment – but not for worker automation. However, these policies are not governed by any external regulation, and subject to change. For example, OpenAI quietly backtracked in January on its own ban on using ChatGPT and similar technologies for military and defense and have since started working with the US military department, so these policies are clearly changeable.

Featured photo from Pexels.