Simon Willison’s Weblog

Subscribe
Atom feed for space-invaders

4 posts tagged “space-invaders”

Results for my simple LLM benchmark "Write an HTML and JavaScript page implementing space invaders"

2025

Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM

Visit Trying out Qwen3 Coder Flash using LM Studio and Open WebUI and LLM

Qwen just released their sixth model(!) of this July called Qwen3-Coder-30B-A3B-Instruct—listed as Qwen3-Coder-Flash in their chat.qwen.ai interface.

[... 1,390 words]

Qwen3-30B-A3B-Thinking-2507 (via) Yesterday was Qwen3-30B-A3B-Instruct-2507. Qwen are clearly committed to their new split between reasoning and non-reasoning models (a reversal from Qwen 3 in April), because today they released the new reasoning partner to yesterday's model: Qwen3-30B-A3B-Thinking-2507.

I'm surprised at how poorly this reasoning mode performs at "Generate an SVG of a pelican riding a bicycle" compared to its non-reasoning partner. The reasoning trace appears to carefully consider each component and how it should be positioned... and then the final result looks like this:

A line with two dots, over a rhombus, with two circles and a pelican that looks like a grey snowman. They are not arranged in a sensible layout.

I ran this using chat.qwen.ai/?model=Qwen3-30B-A3B-2507 with the "reasoning" option selected.

I also tried the "Write an HTML and JavaScript page implementing space invaders" prompt I ran against the non-reasoning model. It did a better job in that the game works:

It's not as playable as the on I got from GLM-4.5 Air though - the invaders fire their bullets infrequently enough that the game isn't very challenging.

This model is part of a flurry of releases from Qwen over the past two 9 days. Here's my coverage of each of those:

# 30th July 2025, 3:36 pm / ai, generative-ai, llms, qwen, pelican-riding-a-bicycle, llm-reasoning, llm-release, ai-in-china, space-invaders

Qwen3-30B-A3B-Instruct-2507. New model update from Qwen, improving on their previous Qwen3-30B-A3B release from late April. In their tweet they said:

Smarter, faster, and local deployment-friendly.

✨ Key Enhancements:
✅ Enhanced reasoning, coding, and math skills
✅ Broader multilingual knowledge
✅ Improved long-context understanding (up to 256K tokens)
✅ Better alignment with user intent and open-ended tasks
✅ No more <think> blocks — now operating exclusively in non-thinking mode

🔧 With 3B activated parameters, it's approaching the performance of GPT-4o and Qwen3-235B-A22B Non-Thinking

I tried the chat.qwen.ai hosted model with "Generate an SVG of a pelican riding a bicycle" and got this:

This one is cute: blue sky, green grass, the sun is shining. The bicycle is a red block with wheels that looks more like a toy car. The pelican doesn't look like a pelican and has a quirky smile printed on its beak.

I particularly enjoyed this detail from the SVG source code:

<!-- Bonus: Pelican's smile -->
<path d="M245,145 Q250,150 255,145" fill="none" stroke="#d4a037" stroke-width="2"/>

I went looking for quantized versions that could fit on my Mac and found lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-8bit from LM Studio. Getting that up and running was a 32.46GB download and it appears to use just over 30GB of RAM.

The pelican I got from that one wasn't as good:

It looks more like a tall yellow hen chick riding a segway

I then tried that local model on the "Write an HTML and JavaScript page implementing space invaders" task that I ran against GLM-4.5 Air. The output looked promising, in particular it seemed to be putting more effort into the design of the invaders (GLM-4.5 Air just used rectangles):

// Draw enemy ship
ctx.fillStyle = this.color;

// Ship body
ctx.fillRect(this.x, this.y, this.width, this.height);

// Enemy eyes
ctx.fillStyle = '#fff';
ctx.fillRect(this.x + 6, this.y + 5, 4, 4);
ctx.fillRect(this.x + this.width - 10, this.y + 5, 4, 4);

// Enemy antennae
ctx.fillStyle = '#f00';
if (this.type === 1) {
    // Basic enemy
    ctx.fillRect(this.x + this.width / 2 - 1, this.y - 5, 2, 5);
} else if (this.type === 2) {
    // Fast enemy
    ctx.fillRect(this.x + this.width / 4 - 1, this.y - 5, 2, 5);
    ctx.fillRect(this.x + (3 * this.width) / 4 - 1, this.y - 5, 2, 5);
} else if (this.type === 3) {
    // Armored enemy
    ctx.fillRect(this.x + this.width / 2 - 1, this.y - 8, 2, 8);
    ctx.fillStyle = '#0f0';
    ctx.fillRect(this.x + this.width / 2 - 1, this.y - 6, 2, 3);
}

But the resulting code didn't actually work:

Black screen - a row of good looking space invaders advances across the screen for a moment... and then the entire screen goes blank.

That same prompt against the unquantized Qwen-hosted model produced a different result which sadly also resulted in an unplayable game - this time because everything moved too fast.

This new Qwen model is a non-reasoning model, whereas GLM-4.5 and GLM-4.5 Air are both reasoners. It looks like at this scale the "reasoning" may make a material difference in terms of getting code that works out of the box.

# 29th July 2025, 6:57 pm / ai, generative-ai, llms, qwen, mlx, llm-reasoning, llm-release, lm-studio, ai-in-china, space-invaders

My 2.5 year old laptop can write Space Invaders in JavaScript now, using GLM-4.5 Air and MLX

Visit My 2.5 year old laptop can write Space Invaders in JavaScript now, using GLM-4.5 Air and MLX

I wrote about the new GLM-4.5 model family yesterday—new open weight (MIT licensed) models from Z.ai in China which their benchmarks claim score highly in coding even against models such as Claude Sonnet 4.

[... 685 words]