Cooking with Claude

23rd December 2025

I’ve been having an absurd amount of fun recently using LLMs for cooking. I started out using them for basic recipes, but as I’ve grown more confident in their culinary abilities I’ve leaned into them for more advanced tasks. Today I tried something new: having Claude vibe-code up a custom application to help with the timing for a complicated meal preparation. It worked really well!

A custom timing app for two recipes at once

We have family staying at the moment, which means cooking for four. We subscribe to a meal delivery service called Green Chef, mainly because it takes the thinking out of cooking three times a week: grab a bag from the fridge, follow the instructions, eat.

Each bag serves two portions, so cooking for four means preparing two bags at once.

I have done this a few times now and it is always a mad flurry of pans and ingredients and timers and desperately trying to figure out what should happen when and how to get both recipes finished at the same time. It’s fun but it’s also chaotic and error-prone.

This time I decided to try something different, and potentially even more chaotic and error-prone: I outsourced the planning entirely to Claude.

I took this single photo of the two recipe cards side-by-side and fed it to Claude Opus 4.5 (in the Claude iPhone app) with this prompt:

Extract both of these recipes in as much detail as possible

This is a moderately challenging vision task in that there quite a lot of small text in the photo. I wasn’t confident Opus could handle it.

I hadn’t read the recipe cards myself. The responsible thing to do here would be a thorough review or at least a spot-check—I chose to keep things chaotic and didn’t do any more than quickly eyeball the result.

I asked what pots I’d need:

Give me a full list of pots I would need if I was cooking both of them at once

Then I prompted it to build a custom application to help me with the cooking process itself:

I am going to cook them both at the same time. Build me a no react, mobile, friendly, interactive, artifact that spells out the process with exact timing on when everything needs to happen have a start setting at the top, which starts a timer and persists when I hit start in localStorage in case the page reloads. The next steps should show prominently with countdowns to when they open. The full combined timeline should be shown slow with calculated times tor when each thing should happen

I copied the result out onto my own hosting (you can try it here) because I wasn’t sure if localStorage would work inside the Claude app and I really didn’t want it to forget my times!

Then I clicked “start cooking”!

Here’s the full Claude transcript.

There was just one notable catch: our dog, Cleo, knows exactly when her dinner time is, at 6pm sharp. I forgot to mention this to Claude, which had scheduled several key steps colliding with Cleo’s meal. I got woofed at. I deserved it.

To my great surprise, it worked. I followed the recipe guide to the minute and served up both meals exactly 44 minutes after I started cooking.

A small bowl (a beautiful blue sea textured bowl, made by Natalie Downe) contains a chickpea stew. A larger black bowl has couscous, green beans and blackened cauliflower.

The best way to learn the capabilities of LLMs is to throw tasks at them that may be beyond their abilities and see what happens. In this case I fully expected that something would get forgotten or a detail would be hallucinated and I’d end up scrambling to fix things half way through the process. I was surprised and impressed that it worked so well.

Some credit for the app idea should go to my fellow hackers at /dev/fort 2 in 2009, when we rented Knockbrex Castle in Dumfries, Scotland for a week and attempted to build a cooking timer application for complex meals.

Generating recipes from scratch

Most of my other cooking experiments with LLMs have been a whole lot simpler than this: I ask for a recipe, ask for some variations and then cook one of them and see what happens.

This works remarkably well considering LLMs have no taste buds.

I’ve started to think of this as asking LLMs for the average recipe for a dish, based on all of the recipes they have hoovered up during their training. It turns out the mean version of every guacamole recipe on the internet is a decent guacamole!

Here’s an example of a recipe I tried recently that worked out really well. I was helping Natalie run her ceramic stall at the farmers market and the stall next to us sold excellent dried beans. I’ve never used dried beans before, so I took a photo of their selection and asked Claude what I could do with them:

Identify these beans

It took a guess at the beans, then I said:

Get me excited about cooking with these! If I bought two varietiew what could I make

“Get me excited” switches Claude into a sort of hype-man mode, which is kind of entertaining:

Oh, you’re about to enter the wonderful world of bean cooking! Let me get you pumped about some killer two-bean combos: [...]

Mixed bean salad with lemon, olive oil, fresh herbs, cherry tomatoes—light but satisfying [...]

I replied:

OK Bean salad has me interested—these are dried beans. Give me some salad options I can make that would last a long time in the fridge

... and after some back and forth we arrived on the recipe in this transcript, which I cooked the following day (asking plenty of follow-up questions) and thoroughly enjoyed.

I’ve done this a bunch of times with a bunch of different recipes across both Claude and ChatGPT and honestly I’ve not had a notable miss yet. Being able to say “make it vegan” or “I don’t have coriander, what can I use instead?” or just “make it tastier” is a really fun way to explore cooking.

It’s also fun to repeat “make it tastier” multiple times to see how absurd you can get.

I really want someone to turn this into a benchmark!

Cooking with LLMs is a lot of fun. There’s an opportunity here for a really neat benchmark: take a bunch of leading models, prompt them for recipes, follow those recipes and taste-test the results!

The logistics of running this are definitely too much for me to handle myself. I have enough trouble cooking two meals at once, for a solid benchmark you’d ideally have several models serving meals up at the same time to a panel of tasters.

If someone else wants to try this please let me know how it goes!

Posted 23rd December 2025 at 5:01 am · Follow me on Mastodon, Bluesky, Twitter or subscribe to my newsletter

Simon Willison’s Weblog