Stable Diffusion Explained Like I Am 5

Thu, 10 Nov 2022

Stable Diffusion is modern magic

Imagine “a world, earth, seen from space, 8k, unreal engine, detailed, photorealistic”…

Positive prompt:“a world, earth, seen from space, 8k, unreal engine, detailed, photorealistic”

Remember the common phrase, “a picture is worth a thousand words”? Well, it’s time to rethink that.

With advancements in image generation technology, we may need to start saying, “a few words are worth thousands of pictures.”

This technology is simply astonishing.

Any image. Any concept. Any quality. Any style. Near-instant results.

If a data model hasn’t been trained on a style, create one! It only takes a few photos and some computational cycles to create magic.

What is This Technology?

Explaining stable diffusion simply is challenging. If you’ve ever seen a Magic Eye picture—an image that reveals a hidden three-dimensional picture when viewed in a certain way—you might have an idea of how this technology works.

This process involves introducing a lot of noise to images and then figuring out how to restore the image to its original state, repeating the process over and over.

Add in a sprinkle of technological algorithmic magic, dash of CUDA cores, and a big helping of pre-built models trained on hundreds of millions of images. For instance, the HuggingFace.co model I used, versions 1.4 and 1.5, was trained on 800+ million images.

Then using descriptive “words”, aka tokens, with special numerical weights applied, you can create images from text to achieve remarkable results.

I’ve never seen technology advance so rapidly.

I never thought I’d find myself immersed in learning about the world’s artists, their styles, and working to enhance my vocabulary.

The future of graphic design is now shaped by painted words.

Crazy how much difference a month can make.

-Rob

Coderrob

Hi, I'm Rob. I'm a programmer, Pluralsight author, software architect, emerging technologist, and life long learner.