Simon Willison’s Weblog

Subscribe

Tuesday, 5th August 2025

A Friendly Introduction to SVG (via) This SVG tutorial by Josh Comeau is fantastic. It's filled with neat interactive illustrations - with a pleasing subtly "click" audio effect as you adjust their sliders - and provides a useful introduction to a bunch of well chosen SVG fundamentals.

I finally understand what all four numbers in the viewport="..." attribute are for!

# 5:20 am / svg, explorables, josh-comeau

I teach HS Science in the south. I can only speak for my district, but a few teacher work days in the wave of enthusiasm I'm seeing for AI tools is overwhelming. We're getting district approved ads for AI tools by email, Admin and ICs are pushing it on us, and at least half of the teaching staff seems all in at this point.

I was just in a meeting with my team and one of the older teachers brought out a powerpoint for our first lesson and almost everyone agreed to use it after a quick scan - but it was missing important tested material, repetitive, and just totally airy and meaningless. Just slide after slide of the same handful of sentences rephrased with random loosely related stock photos. When I asked him if it was AI generated, he said 'of course', like it was a strange question. [...]

We don't have a leg to stand on to teach them anything about originality, academic integrity/intellectual honesty, or the importance of doing things for themselves when they catch us indulging in it just to save time at work.

greyduet on r/teachers, Unpopular Opinion: Teacher AI use is already out of control and it's not ok

# 11:53 am / education, ai, generative-ai, llms, slop, ai-ethics

Claude Opus 4.1. Surprise new model from Anthropic today - Claude Opus 4.1, which they describe as "a drop-in replacement for Opus 4".

My favorite thing about this model is the version number - treating this as a .1 version increment looks like it's an accurate depiction of the model's capabilities.

Anthropic's own benchmarks show very small incremental gains.

Comparing Opus 4 and Opus 4.1 (I got 4.1 to extract this information from a screenshot of Anthropic's own benchmark scores, then asked it to look up the links, then verified the links myself and fixed a few):

  • Agentic coding (SWE-bench Verified): From 72.5% to 74.5%
  • Agentic terminal coding (Terminal-Bench): From 39.2% to 43.3%
  • Graduate-level reasoning (GPQA Diamond): From 79.6% to 80.9%
  • Agentic tool use (TAU-bench):
  • Retail: From 81.4% to 82.4%
  • Airline: From 59.6% to 56.0% (decreased)
  • Multilingual Q&A (MMMLU): From 88.8% to 89.5%
  • Visual reasoning (MMMU validation): From 76.5% to 77.1%
  • High school math competition (AIME 2025): From 75.5% to 78.0%

Likewise, the model card shows only tiny changes to the various safety metrics that Anthropic track.

It's priced the same as Opus 4 - $15/million for input and $75/million for output, making it one of the most expensive models on the market today.

I had it draw me this pelican riding a bicycle:

Pelican is line art, does have a good beak and feet on the pedals, bicycle is very poorly designed and not the right shape.

For comparison I got a fresh new pelican out of Opus 4 which I actually like a little more:

This one has shaded colors for the different parts of the pelican. Still a bad bicycle.

I shipped llm-anthropic 0.18 with support for the new model.

# 5:17 pm / ai, generative-ai, llms, llm, anthropic, claude, evals, llm-pricing, pelican-riding-a-bicycle, llm-release

OpenAI’s new open weight (Apache 2) models are really good

Visit OpenAI's new open weight (Apache 2) models are really good

The long promised OpenAI open weight models are here, and they are very impressive. They’re available under proper open source licenses—Apache 2.0—and come in two sizes, 120B and 20B.

[... 2,689 words]

2025 » August

MTWTFSS
    123
45678910
11121314151617
18192021222324
25262728293031