Simon Willison’s Weblog

Subscribe

Live blog: Claude 4 launch at Code with Claude

22nd May 2025

I’m at Anthropic’s Code with Claude event, where they are launching Claude 4. I’ll be live blogging the keynote here.

09:29 The keynote is just about to start. The schedule for the day is available on this site. Mike Krieger is on stage.

09:33 Dario Amodei is on. "I'm not one to hype things up" he says... and announces that "as of exactly this moment" they are releasing Claude 4 Opus and Claude 4 Sonnet.

09:34 "We haven't had an Opus model in a while, so as a reminder Opus is the most capable model and Sonnet is the good balance between intelligence and efficiency".

09:36 Claude 4 Opus has state of the art of SWE-bench, but "the benchmarks don't fully do justice to it". Anthropic's most senior engineers have been surprised at how much more productive it has made them. Claude 3 Sonnet is "a strict improvement on 3.7" at the same cost.

09:38 The WiFi at the event is feeling a bit shaky.

09:39 Mike is back, talking about Opus 4 specifically for advanced code. Sonnet 4 is good at code too and has great performance. Everything should be live right now in the Claude apps, Claude Code and the API.

09:40 "I know the term agents get thrown around a lot recently" - Mike has a joke about how long you can spend in a meeting at Anthropic without the word coming up, current record is 17 minutes.

09:41 When founding Instagram Mike's team had to make a bunch of very tough prioritization systems - video v.s. core platform, app improvements v.s. focusing on the apps. With AI agents, startups can follow more streams at once.

09:43 Mike was blown away by GitHub Copilot way back in 2021. "I got an even stronger feeling last summer when we launched Artifacts".

09:43 (Oops, I forgot to enable polling for the live blog - I've turned that on now.)

09:44 Mike hasn't actually defined agents. I don't think he's going to.

09:46 New code execution tool! They're finally running code on their own servers - previously their coding tool ran JavaScript in the browser, this brings it inline with ChatGPT Code Interpreter.

09:46 The new models can run autonomously for hours - they've seen Opus 4 run for seven hours without using its thread. I wonder if the context length is longer?

09:47 They talk about product-Anthropic fit: are people internally using the new models and tools on a regular basis?

09:48 Claude Code is coming to VS Code and JetBraine today.

09:49 With 2-6 engineers at Instagram supporting two different mobile platforms they would have been able to produce prototypes "in days and not weeks" with Claude Code.

09:50 Now talking about Architectural Safety: making them robust against exploitation (mention of prompt injection, though I bet they haven't completely solved that yet) and policies around transparency by default.

09:51 Dario talks about the race between model intelligence and interpretability. Mike pushed for the launch of Golden Gate Claude in his second week at Anthropic. I would love to see them release another version of that, it was fantastic.

09:52 Raw model capability alone isn't enough to unlock multi-hour workflows. Agents also need context from your environment. You can now connect MCP servers directly through their API - it looks similar to what openai announced yesterday.

09:53 "This lays the foundation of what could become the agent economy."

09:53 Second is web search: also available through the Anthropic API.

09:53 Third: the files API is now available through the API. Again, sounds similar to OpenAI's files mechanism.

09:55 Last: agents need to be able to scale. Prompt caching now has a longer TTL - it was five minutes, there's now a premium level that's a full hour. (I was hoping for automated prompt caching as seen in Gemini and OpenAI, but it sounds like you still need to explicitly configure it for Claude.)

09:56 A bunch of new things today then. MCP in the API, longer prompt caching, files in the API and (most exciting for me) a server-side code execution environment.

09:57 Mike says he loves hearing API feedback and invites people to send it to him directly.

09:58 Next up on stage is Cat Wu: PM for Claude Code.

09:58 Claude Code is now generally available as of today.

09:59 Developers are moving from describing individual features to asking for whole features. Today Claude Code is getting the new Claude 4 models.

10:00 You can now see Claude Code's proposed changes directly in your VS Code or JetBrains editors. New Claude Code SDK, with an open source demo that adds Claude Code integration to GitHub issues and PRs.

10:02 Now a demo. Claude Code is taking on this real open issue from 2022 in Excalidraw.

10:03 Claude Code in VS Code does look like a nice UI improvement from the terminal-only version.

10:03 Claude Code ran on this task for 90 minutes! I dread to think how many tokens it burned.

10:06 I hope they release the code from this demo, it's not something they've submitted as a PR to the repo yet (probably for the best, open source projects mostly aren't keen on LLM-generated PRs.) They did show a PR demo but it was against a private repository in the anthropic-experimental org on GitHub.

10:08 Run /install-github-app inside Claude Code to install this new GitHub integration.

10:08 Now up: Michael Gerstenhaber - Head of Product, API and Platform at Anthropic.

10:09 This is about "the Anthropic platform" - effectively their full suite of tools, models and APIs.

10:10 Here's the official blog post from Anthropic announcing Claude 4.

10:12 The two new "agent components" platform features today are the code execution tool and the new files API. (This is mostly a replay of the announcements from Mike earlier on.)

10:12 From Anthropic's blog post: "Both models can use tools—like web search—during extended thinking, allowing Claude to alternate between reasoning and tool use to improve responses."

10:13 That's a really big deal: o3 and o4-mini have had tool execution as part of their thinking process for a couple of months now and it's incredibly effective. Very glad to see Claude 4 catching up on that.

10:14 (Event WiFi is almost unusable now - I'm live blogging from my phone instead.)

10:14 I just missed a detail about "embeddable agents", which sounds like it relates to longer term agent memory.

10:14 Mario Rodriguez - Chief Product Officer at GitHub.

10:16 GitHub are announcing the ability to switch models in Copilot to Claude Sonnet 4 and Claude Opus 4. (Is that an inconsistency? I thought the name was Claude 4 Opus.) They launched that feature this morning.

10:17 This keynote is scheduled for a full 90 minutes! There's a fireside chat with Dario coming up.

10:18 Now talking about GitHub Copilot Coding Agent - the feature they launched last week which, notably, is not the same thing as GitHub's Copilot Workspaces despite being a very similar shape.

10:19 Sonnet 3.7 has "strong instruction following", particularly important for the new Copilot feature. They also make extensive use of prompt caching. With Sonnet 4 they've seen improvements in all of those areas.

10:20 GitHub launched an official MCP server recently and it was "the most popular repo on all of GitHub" that week.

10:23 I found the new model IDs on the model page: claude-opus-4-20250514 and claude-sonnet-4-20250514.

10:24 Now the conversation between Mike and Dario. Dario is excited about the "autonomy" - the thing where the new models can crunch for a long time. He's also keen on applications in cybersecurity and biomedical, especially for Opus 4.

10:24 Mike jokes that Dario's "Machines of loving grace" essay is actually a product roadmap for the new few years.

10:25 Model aliases are claude-opus-4-0 and claude-sonnet-4-0 - Dario just hinted that a 4.1 family is likely to follow.

10:29 The question about software careers came up. Dario liked Steve Yegge's Revenge of the junior developer blog post, and thinks it's a good reflection of where things are going.

10:30 Question about bigger models or smarter architectures? Do the scaling laws still hold? (Dario was involved in those.) Dario says the pre-training scaling laws are continuing to work but advances in post-training techniques have been complementing those a lot.

10:31 3.7 was just February! Already feels like an obsolete model now, just two and a half months later.

10:31 I just noticed from the model IDs that they really have swapped from Claude 3.7 Sonnet to Claude Sonnet 4.

10:33 I hope we get a Claude Haiku 4 some time. No mention of that at all, the word Haiku has not been uttered once.

10:38 When do you think there will be the first billion dollar with one employees? Dario: "2026".

10:40 Mike talked to one startup founder who tried to build something really hard for two years, was just about to give up and then 3.7 came out and everything started working. Dario's tile is to look for things that are "almost possible".

10:40 I had the new models "Generate an SVG of a pelican riding a bicycle". Here's claude-sonnet-4-0:

Pretty good bicycle, decent enough pelican

And here's claude-opus-4-0:

The bicycle isn't as good - no spokes. The pelican is about equal in quality.

10:42 Dario talks about the big question: what happens if the cost of software drops dramatically? He says that previously you wouldn't have built custom software unless it was for thousands of end users - the economics now are completely different, it's much more feasible to build custom things. This is my optimistic angle on this too: I think demand for custom software (and our skills as developers) goes through the roof as costs come down.

10:43 Everyone who came here to the event gets access to their highest tier Max plan for three months.

10:43 ... and that's the end of the keynote.