Simon Willison’s Weblog

Subscribe
Atom feed for tom-gally

2 posts tagged “tom-gally”

2025

LLM SVG Generation Benchmark (via) Here's a delightful project by Tom Gally, inspired by my pelican SVG benchmark. He asked Claude to help create more prompts of the form Generate an SVG of [A] [doing] [B] and then ran 30 creative prompts against 9 frontier models - prompts like "an octopus operating a pipe organ" or "a starfish driving a bulldozer".

Here are some for "butterfly inspecting a steam engine":

Gemini 3.0 Pro Preview drew the best steam engine with nice gradients and a butterfly hovering near the chimney. DeepSeek V3.2-Exp drew a floating brown pill with a hint of a chimney and a butterfly possibly on fire. GLM-4.6 did the second best steam engine with a butterfly nearby. Qwen3-VL-235B-A22B-Thinking did a steam engine that looks a bit like a chests on wheels and a weird purple circle.

And for "sloth steering an excavator":

Claude Sonnet 4.5 drew the best excavator with a blobby sloth driving it. Claude Opus 4.5 did quite a blocky excavator with a sloth that isn't quite recognizable as a sloth. Grok Code Fast 1 drew a green alien standing on a set of grey blocks. Gemini 2.5 Pro did a good excavator with another blobby sloth.

It's worth browsing the whole collection, which gives a really good overall indication of which models are the best at SVG art.

# 25th November 2025, 4:02 am / benchmarks, svg, ai, generative-ai, llms, evals, pelican-riding-a-bicycle, tom-gally

A professional workflow for translation using LLMs. Tom Gally is a professional translator who has been exploring the use of LLMs since the release of GPT-4. In this Hacker News comment he shares a detailed workflow for how he uses them to assist in that process.

Tom starts with the source text and custom instructions, including context for how the translation will be used. Here's an imaginary example prompt, which starts:

The text below in Japanese is a product launch presentation for Sony's new gaming console, to be delivered by the CEO at Tokyo Game Show 2025. Please translate it into English. Your translation will be used in the official press kit and live interpretation feed. When translating this presentation, please follow these guidelines to create an accurate and engaging English version that preserves both the meaning and energy of the original: [...]

It then lists some tone, style and content guidelines custom to that text.

Tom runs that prompt through several different LLMs and starts by picking sentences and paragraphs from those that form a good basis for the translation.

As he works on the full translation he uses Claude to help brainstorm alternatives for tricky sentences:

When I am unable to think of a good English version for a particular sentence, I give the Japanese and English versions of the paragraph it is contained in to an LLM (usually, these days, Claude) and ask for ten suggestions for translations of the problematic sentence. Usually one or two of the suggestions work fine; if not, I ask for ten more. (Using an LLM as a sentence-level thesaurus on steroids is particularly wonderful.)

He uses another LLM and prompt to check his translation against the original and provide further suggestions, which he occasionally acts on. Then as a final step he runs the finished document through a text-to-speech engine to try and catch any "minor awkwardnesses" in the result.

I love this as an example of an expert using LLMs as tools to help further elevate their work. I'd love to read more examples like this one from experts in other fields.

# 2nd February 2025, 4:23 am / hacker-news, translation, ai, generative-ai, llms, tom-gally