Simon Willison’s Weblog

Subscribe

My guess is that MidJourney has been doing a massive-scale reinforcement learning from human feedback ("RLHF") - possibly the largest ever for text-to-image.

When human users choose to upscale an image, it's because they prefer it over the alternatives. It'd be a huge waste not to use this as a reward signal - cheap to collect, and exactly aligned with what your user base wants.

The more users you have, the better RLHF you can do. And then the more users you gain.

Jim Fan