Simon Willison’s Weblog

Subscribe
Atom feed for text-to-image Random

43 posts tagged “text-to-image”

2023

ControlNet (via) A spectacular step forward in image generation—using “conditional control” to control models like Stable Diffusion. The README here is full of examples of what this enables. Extremely finely grained control of generated images based on a sketch, or in input image—including tricks like using Canny edge detection (an algorithm from 1986) to convert any image into an outline which can then be used as input to the model.

# 22nd February 2023, 5:45 pm / ai, stable-diffusion, generative-ai, text-to-image

2022

Stable Diffusion 2.0 and the Importance of Negative Prompts for Good Results. Stable Diffusion 2.0 is out, and it’s a very different model from 1.4/1.5. It’s trained using a new text encoder (OpenCLIP, in place of OpenAI’s CLIP) which means a lot of the old tricks—notably using “Greg Rutkowski” to get high quality fantasy art—no longer work. What DOES work, incredibly well, is negative prompting—saying things like “cyberpunk forest by Salvador Dali” but negative on “trees, green”. Max Woolf explores negative prompting in depth in this article, including how to combine it with textual inversion.

# 29th November 2022, 1:22 am / max-woolf, stable-diffusion, prompt-engineering, generative-ai, text-to-image

The AI that creates any picture you want, explained. Vox made this explainer video about text-to-image generative AI models back in June, months before Stable Diffusion was released and shortly before the DALL-E preview started rolling out to a wider audience. It’s a really good video—in particular the animation that explains at a high level how diffusion models work, which starts about 5m30s in.

# 10th October 2022, 3:28 am / ai, dalle, stable-diffusion, generative-ai, text-to-image

The Illustrated Stable Diffusion (via) Jay Alammar provides a detailed, clearly explained description of how the Stable Diffusion image generation model actually works under the hood..

# 5th October 2022, 2:58 am / ai, stable-diffusion, generative-ai, text-to-image

I Resurrected “Ugly Sonic” with Stable Diffusion Textual Inversion (via) “I trained an Ugly Sonic object concept on 5 image crops from the movie trailer, with 6,000 steps [...] (on a T4 GPU, this took about 1.5 hours and cost about $0.21 on a GCP Spot instance)”

# 20th September 2022, 3:35 am / machine-learning, ai, max-woolf, stable-diffusion, generative-ai, text-to-image

The Changelog: Stable Diffusion breaks the internet. I’m on this week’s episode of The Changelog podcast, talking about Stable Diffusion, AI ethics and a little bit about prompt injection attacks too.

# 17th September 2022, 2:14 am / podcasts, ai, stable-diffusion, prompt-engineering, prompt-injection, generative-ai, llms, text-to-image, podcast-appearances

Exploring the training data behind Stable Diffusion

Visit Exploring the training data behind Stable Diffusion

Two weeks ago, the Stable Diffusion image generation model was released to the public. I wrote about this last week, in Stable Diffusion is a really big deal—a post which has since become one of the top ten results for “stable diffusion” on Google and shown up in all sorts of different places online.

[... 2,897 words]

For these reasons, I don’t think I’ll be using Midjourney or any similar tool to illustrate my newsletter going forward (an exception would be if I were writing about the technology at a later date and wanted to show examples). Even though the job wouldn’t go to a different, deserving, human artist, I think the optics are shitty, and I do worry about having any role in helping to set any kind of precedent in this direction.

Charlie Warzel

# 4th September 2022, 9:06 pm / ethics, ai, generative-ai, midjourney, text-to-image, ai-ethics

Grokking Stable Diffusion (via) Jonathan Whitaker built this interactive Jupyter notebook that walks through how to use Stable Diffusion from Python step-by-step, and then dives deep into helping understand the different components of the implementation, including how text is encoded, how the diffusion loop works and more. This is by far the most useful tool I’ve seen yet for understanding how this model actually works. You can run Jonathan’s notebook directly on Google Colab, with a GPU.

# 4th September 2022, 6:50 pm / jupyter, stable-diffusion, generative-ai, text-to-image

Run Stable Diffusion on your M1 Mac’s GPU. Ben Firshman provides detailed instructions for getting Stable Diffusion running on an M1 Mac.

# 1st September 2022, 5:41 pm / ben-firshman, machine-learning, macos, ai, stable-diffusion, generative-ai, text-to-image

Stable Diffusion is a really big deal

Visit Stable Diffusion is a really big deal

If you haven’t been paying attention to what’s going on with Stable Diffusion, you really should be.

[... 1,443 words]

Stable Diffusion Public Release (via) New AI just dropped. Stable Diffusion is similar to DALL-E, but completely open source and with a CC0 license applied to everything it generates. I have a Twitter thread (the via) link of comparisons I’ve made between its output and my previous DALL-E experiments. The announcement buries the lede somewhat: to try it out, visit beta.dreamstudio.ai—which you can use for free at the moment, but it’s unclear to me how billing is supposed to work.

# 22nd August 2022, 7:12 pm / machine-learning, dalle, stable-diffusion, generative-ai, text-to-image

First impressions of DALL-E, generating images from text

Visit First impressions of DALL-E, generating images from text

I made it off the DALL-E waiting list a few days ago and I’ve been having an enormous amount of fun experimenting with it. Here are some notes on what I’ve learned so far (and a bunch of example images too).

[... 2,102 words]