Chad Hetherington

When generative AI first made a splash, folks were undecided; unsure what to think. Would it take jobs away from creative professionals? How smart can it really get? Are we doomed?

Generating text at a breakneck pace is impressive; and images even more so. But video? That would be incredible, right? Well, that time has come. Text-to-video models are growing more capable than ever and, if they continue their current trajectory, are set to make a seismic impact across industries.

OpenAI’s contribution to the text-to-video world, Sora AI, hasn’t been released yet, but the teasers for the tool are nothing short of fascinating. Let’s explore what Sora AI is all about — what it can and cannot do, and how text-to-video could affect marketing as we know it.

What Is Sora AI?

Sora AI is a powerful text-to-video generative AI model developed by OpenAI, the very same folks behind the now ubiquitous text-based ChatGPT model.

Simply put, Sora can create realistic and imaginative video scenes — up to one minute long — from text instructions, simulating the physical world in motion. Y’know how ChatGPT can write or tell you about things that you ask it to? Sora does the same thing with video. And it’s kind of crazy.

While it isn’t available for the public to use just yet — as OpenAI continues to work with policymakers and artists — it seems we’re not terribly far away from a public release.

How Do Text-to-Video Models Work?

I won’t even pretend to understand how these increasingly advanced AI models actually pull off their purpose. Data, numbers, algorithms … magic? Maybe a combination of those things? The technology behind these text-to-video tools in particular — called a denoising latent diffusion model — certainly goes over my head, but thankfully, artificial intelligence is smart enough to help us describe how it works in simple terms (double-checked and cross-referenced, of course):

  • Noise Initialization: The process begins with a random field of noise, which is essentially a bunch of pixels scattered about with no real care.
  • Diffusion Process: This involves adding noise to the image in a controlled manner. The model learns to predict the noise that should be added at each step based on the current image and the desired output.
  • Denoising: After adding noise, the model denoises the image, removing the noise and bringing it closer to the desired output.
  • Iteration: This process is repeated many times, with the model gradually refining the image until it closely matches the text instruction.

If that still feels like a lot to unpack, here’s a fun analogy:

You’re a sculptor who uses granite as their primary medium. You start out with a giant rectangular block of beige material, and with each precise chisel, your artwork becomes clearer. Oh, and once you’re finished, it comes to life like Frankenstein.

That’s basically how this technology works. It starts with a noisy, chaotic video, and then uses a process called “denoising” to gradually refine it until it resembles something recognizable thanks to its transformer architecture and a multifaceted AI training process.

But what can’t it do?

Sora’s Limitations According to OpenAI (and Basic Ethical Principles)

The technology isn’t perfect just yet, which comes as no surprise given that OpenAI hasn’t set it free to the public. In fact, it has lots of quirks and kinks when it comes to things like complex, real-world physics and its capability to refine granular details.

OpenAI says “It [Sora AI] does not accurately model the physics of many basic interactions, like glass shattering. Other interactions, like eating food, do not always yield correct changes in object state.”

To give you an idea of the model’s limitations when it comes to depicting a person eating food, for example, a video may show someone taking a bite out of something only to pull it away from their mouth without a piece missing. That’s just one example of how Sora struggles with finer details, but it also has a hard time accurately representing other details like facial expressions, hand gestures and precise object placements. Don’t let that sour your opinion, though. The technology is astonishing and it truly can create dynamic, high-quality footage that you just have to see for yourself.

Ethical Considerations

Beyond these limitations, there are ethical considerations, too, just like any artificial intelligence model. For many, this is the scariest part. The better and more capable AI video tools become, the potential for misuse skyrockets. Deepfakes are a huge concern right now. If you’re unaware of what the term means, it’s defined as “a video of a person in which their face or body has been digitally altered so that they appear to be someone else, typically used maliciously or to spread false information.”

Based on that short description alone, you can probably deduce how things could get ugly quickly. In that regard, responsible development and deployment are crucial.

But back to the bright side. It’s no surprise we’re fans of responsible AI technology, and text-to-video is no different. So let’s talk about valuable marketing use cases.

How Sora AI and AI Video Generation Could Assist Marketers With Video Production

When Sora AI officially releases, whenever that may be, people are going to begin experimenting right away — marketers included. One-minute, high-quality video and no camera equipment or actors required? People will take this tech and run with it.

So, here are a few ways marketers may approach these new-found AI capabilities:

Dynamic Video Advertisements

Marketers could create personalized or highly targeted video ads by simply providing text prompts, making it easier to develop unique content for different audiences or product lines.

Social Media Content

While it does depend on the platform, videos across most social media are seldom longer than 60 seconds, so Sora will certainly claim its stake in this part of marketers’ video strategies. Aside from length, social platforms seem to be increasingly prioritizing video content over written. Sora AI could help marketers rapidly generate engaging videos tailored to specific social media trends to keep up with the waves.

While this will certainly be tried and tested by marketers around the world, how audiences and consumers will feel about it is another story.

According to a recent survey, only 20% of consumers are interested in engaging with various forms of AI-assisted media, while an overwhelming majority are either averse to the idea or don’t have an opinion. Specifically, 37% of respondents say they’d be less interested in engaging with images and videos on social media if they knew they were produced using AI; 31% essentially said they don’t care and 10% felt unsure.

A/B Testing With Video Content

In theory, Sora AI could make it easier to test multiple video concepts. Since it can work with existing videos, one production cycle is all you may need before feeding it to the robot to make slight variations before deploying each as part of your A/B testing process.

Product Demonstrations

Sora AI could be used to create virtual product demonstrations or explainer videos by inputting descriptions and key features. This would allow companies to quickly showcase new or complex products visually.

A Text-to-Video Future Is Approaching

Text-to-video technology looks promising and powerful, which means it must be handled with utmost responsibility and care. OpenAI says they understand this, which is why Sora has yet to be set free. When it is, only time will tell how marketers prefer to use it and if consumers choose to engage with AI-assisted content.

One thing is for certain, though: Things are about to get interesting.