Posts

Sora Text to Video: Playing with AI Like It's 2049

I finally spent some time experimenting with Sora, and the shift from text-to-image to text-to-video feels bigger than I expected. I have used Stable Diffusion for a long time. Image generation already changed how I …

February 15, 2025 2 min read 360 words

ai-systems

In this article

I finally spent some time experimenting with Sora, and the shift from text-to-image to text-to-video feels bigger than I expected.

I have used Stable Diffusion for a long time. Image generation already changed how I think about prompts, style, references, and iteration.

Video adds another layer.

You are not just describing what something looks like anymore.

You are describing what happens.

Prompting a Moment

With image generation, a prompt describes a frame.

With video generation, a prompt describes a moment:

what is moving
how the camera behaves
what changes over time
what the atmosphere feels like
what the subject is doing before and after the obvious action

That is a different kind of writing.

It is closer to directing than captioning.

You have to think about motion, pacing, lighting, intent, and continuity. A good prompt does not just describe the object in the scene. It describes the scene becoming something.

What Felt Different

Some of the results were rough.

Some were uncanny.

Some were good enough to make me stop and rethink what “drafting a visual idea” means.

That is the interesting part to me.

The first draft of a video concept used to require a lot more tooling, time, and specialized skill. Now the distance between “I can picture this” and “I can show a rough version of this” is getting much shorter.

That does not replace actual video production.

It does change the early creative loop.

Example

Here is one of the generated clips:

The Practical Shift

The same way prompt-based image generation pushed people to describe style more precisely, text-to-video pushes us to describe action more precisely.

That means better language around:

motion
timing
scene transitions
camera movement
emotional tone
visual continuity

That is useful even when the generated result is imperfect.

The process forces you to explain the scene in a way another person, or another tool, can understand.

The Bottom Line

Sora is not just “image generation, but moving.”

It changes the unit of thought from a picture to a moment.

That is why it feels different.

We are not only learning how to prompt images anymore.

We are learning how to describe time.

-Rob