109 items tagged “ai-assisted-programming”
Using AI tools such as Large Language Models to help write code.
2024
Generating documentation from tests using files-to-prompt and LLM. I was experimenting with the wasmtime-py Python library today (for executing WebAssembly programs from inside CPython) and I found the existing API docs didn't quite show me what I wanted to know.
The project has a comprehensive test suite so I tried seeing if I could generate documentation using that:
cd /tmp
git clone https://github.com/bytecodealliance/wasmtime-py
files-to-prompt -e py wasmtime-py/tests -c | \
llm -m claude-3.5-sonnet -s \
'write detailed usage documentation including realistic examples'
More notes in my TIL. You can see the full Claude transcript here - I think this worked really well!
California Clock Change. The clocks go back in California tonight and I finally built my dream application for helping me remember if I get an hour extra of sleep or not, using a Claude Artifact. Here's the transcript.
This is one of my favorite examples yet of the kind of tiny low stakes utilities I'm building with Claude Artifacts because the friction involved in churning out a working application has dropped almost to zero.
(I added another feature: it now includes a note of what time my Dog thinks it is if the clocks have recently changed.)
Claude Token Counter. Anthropic released a token counting API for Claude a few days ago.
I built this tool for running prompts, images and PDFs against that API to count the tokens in them.
The API is free (albeit rate limited), but you'll still need to provide your own API key in order to use it.
Here's the source code. I built this using two sessions with Claude - one to build the initial tool and a second to add PDF and image support. That second one is a bit of a mess - it turns out if you drop an HTML file onto a Claude conversation it converts it to Markdown for you, but I wanted it to modify the original HTML source.
The API endpoint also allows you to specify a model, but as far as I can tell from running some experiments the token count was the same for Haiku, Opus and Sonnet 3.5.
Cerebras Coder (via) Val Town founder Steve Krouse has been building demos on top of the Cerebras API that runs Llama3.1-70b at 2,000 tokens/second.
Having a capable LLM with that kind of performance turns out to be really interesting. Cerebras Coder is a demo that implements Claude Artifact-style on-demand JavaScript apps, and having it run at that speed means changes you request are visible within less than a second:
Steve's implementation (created with the help of Townie, the Val Town code assistant) demonstrates the simplest possible version of an iframe sandbox:
<iframe
srcDoc={code}
sandbox="allow-scripts allow-modals allow-forms allow-popups allow-same-origin allow-top-navigation allow-downloads allow-presentation allow-pointer-lock"
/>
Where code
is populated by a setCode(...)
call inside a React component.
The most interesting applications of LLMs continue to be where they operate in a tight loop with a human - this can make those review loops potentially much faster and more productive.
Bringing developer choice to Copilot with Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro, and OpenAI’s o1-preview. The big announcement from GitHub Universe: Copilot is growing support for alternative models.
GitHub Copilot predated the release of ChatGPT by more than year, and was the first widely used LLM-powered tool. This announcement includes a brief history lesson:
The first public version of Copilot was launched using Codex, an early version of OpenAI GPT-3, specifically fine-tuned for coding tasks. Copilot Chat was launched in 2023 with GPT-3.5 and later GPT-4. Since then, we have updated the base model versions multiple times, using a range from GPT 3.5-turbo to GPT 4o and 4o-mini models for different latency and quality requirements.
It's increasingly clear that any strategy that ties you to models from exclusively one provider is short-sighted. The best available model for a task can change every few months, and for something like AI code assistance model quality matters a lot. Getting stuck with a model that's no longer best in class could be a serious competitive disadvantage.
The other big announcement from the keynote was GitHub Spark, described like this:
Sparks are fully functional micro apps that can integrate AI features and external data sources without requiring any management of cloud resources.
I got to play with this at the event. It's effectively a cross between Claude Artifacts and GitHub Gists, with some very neat UI details. The features that really differentiate it from Artifacts is that Spark apps gain access to a server-side key/value store which they can use to persist JSON - and they can also access an API against which they can execute their own prompts.
The prompt integration is particularly neat because prompts used by the Spark apps are extracted into a separate UI so users can view and modify them without having to dig into the (editable) React JavaScript code.
Prompt GPT-4o audio. A week and a half ago I built a tool for experimenting with OpenAI's new audio input. I just put together the other side of that, for experimenting with audio output.
Once you've provided an API key (which is saved in localStorage) you can use this to prompt the gpt-4o-audio-preview
model with a system and regular prompt and select a voice for the response.
I built it with assistance from Claude: initial app, adding system prompt support.
You can preview and download the resulting wav
file, and you can also copy out the raw JSON. If you save that in a Gist you can then feed its Gist ID to https://tools.simonwillison.net/gpt-4o-audio-player?gist=GIST_ID_HERE
(Claude transcript) to play it back again.
You can try using that to listen to my French accented pelican description.
There's something really interesting to me here about this form of application which exists entirely as HTML and JavaScript that uses CORS to talk to various APIs. GitHub's Gist API is accessible via CORS too, so it wouldn't take much more work to add a "save" button which writes out a new Gist after prompting for a personal access token. I prototyped that a bit here.
Run a prompt to generate and execute jq programs using llm-jq
llm-jq is a brand new plugin for LLM which lets you pipe JSON directly into the llm jq
command along with a human-language description of how you’d like to manipulate that JSON and have a jq program generated and executed for you on the fly.
Mastodon discussion about sandboxing SVG data. I asked this on Mastodon and got some really useful replies:
How hard is it to process untrusted SVG data to strip out any potentially harmful tags or attributes (like stuff that might execute JavaScript)?
The winner for me turned out to be the humble <img src="">
tag. SVG images that are rendered in an image have all dynamic functionality - including embedded JavaScript - disabled by default, and that's something that's directly included in the spec:
2.2.6. Secure static mode
This processing mode is intended for circumstances where an SVG document is to be used as a non-animated image that is not allowed to resolve external references, and which is not intended to be used as an interactive document. This mode might be used where image support has traditionally been limited to non-animated raster images (such as JPEG and PNG.)
[...]
'image' references
An SVG embedded within an 'image' element must be processed in secure animated mode if the embedding document supports declarative animation, or in secure static mode otherwise.
The same processing modes are expected to be used for other cases where SVG is used in place of a raster image, such as an HTML 'img' element or in any CSS property that takes an
data type. This is consistent with HTML's requirement that image sources must reference "a non-interactive, optionally animated, image resource that is neither paged nor scripted" [HTML]
This also works for SVG data that's presented in a <img src="data:image/svg+xml;base64,...
attribute. I had Claude help spin me up this interactive demo:
Build me an artifact - just HTML, no JavaScript - which demonstrates embedding some SVG files using img src= base64 URIs
I want three SVGs - one of the sun, one of a pelican and one that includes some tricky javascript things which I hope the img src= tag will ignore
If you right click and "open in a new tab" on the JavaScript-embedding SVG that script will execute, showing an alert. You can click the image to see another alert showing location.href
and document.cookie
which should confirm that the base64 image is not treated as having the same origin as the page itself.
Notes on the new Claude analysis JavaScript code execution tool
Anthropic released a new feature for their Claude.ai consumer-facing chat bot interface today which they’re calling “the analysis tool”.
[... 918 words]Running prompts against images and PDFs with Google Gemini.
New TIL. I've been experimenting with the Google Gemini APIs for running prompts against images and PDFs (in preparation for finally adding multi-modal support to LLM) - here are my notes on how to send images or PDF files to their API using curl
and the base64 -i
macOS command.
I figured out the curl
incantation first and then got Claude to build me a Bash script that I can execute like this:
prompt-gemini 'extract text' example-handwriting.jpg
Playing with this is really fun. The Gemini models charge less than 1/10th of a cent per image, so it's really inexpensive to try them out.
I've often been building single-use apps with Claude Artifacts when I'm helping my children learn. For example here's one on visualizing fractions. [...] What's more surprising is that it is far easier to create an app on-demand than searching for an app in the app store that will do what I'm looking for. Searching for kids' learning apps is typically a nails-on-chalkboard painful experience because 95% of them are addictive garbage. And even if I find something usable, it can't match the fact that I can tell Claude what I want.
Everything I built with Claude Artifacts this week
I’m a huge fan of Claude’s Artifacts feature, which lets you prompt Claude to create an interactive Single Page App (using HTML, CSS and JavaScript) and then view the result directly in the Claude interface, iterating on it further with the bot and then, if you like, copying out the resulting code.
[... 2,273 words]Dashboard: Tools. I used Django SQL Dashboard to spin up a dashboard that shows all of the URLs to my tools.simonwillison.net site that I've shared on my blog so far. It uses this (Claude assisted) regular expression in a PostgreSQL SQL query:
select distinct on (tool_url)
unnest(regexp_matches(
body,
'(https://tools\.simonwillison\.net/[^<"\s)]+)',
'g'
)) as tool_url,
'https://simonwillison.net/' || left(type, 1) || '/' || id as blog_url,
title,
date(created) as created
from content
I've been really enjoying having a static hosting platform (it's GitHub Pages serving my simonw/tools repo) that I can use to quickly deploy little HTML+JavaScript interactive tools and demos.
The 3 AI Use Cases: Gods, Interns, and Cogs. Drew Breunig introduces an interesting new framework for categorizing use cases of modern AI:
- Gods refers to the autonomous, human replacement applications - I see that as AGI stuff that's still effectively science fiction.
- Interns are supervised copilots. This is how I get most of the value out of LLMs at the moment, delegating tasks to them that I can then review, such as AI-assisted programming.
- Cogs are the smaller, more reliable components that you can build pipelines and automations on top of without needing to review everything they do - think Whisper for transcriptions or maybe some limited LLM subtasks such as structured data extraction.
Drew also considers Toys as a subcategory of Interns: things like image generators, “defined by their usage by non-experts. Toys have a high tolerance for errors because they’re not being relied on for much beyond entertainment.”
You can use text-wrap: balance; on icons. Neat CSS experiment from Terence Eden: the new text-wrap: balance CSS property is intended to help make text like headlines display without ugly wrapped single orphan words, but Terence points out it can be used for icons too:
This inspired me to investigate if the same technique could work for text based navigation elements. I used Claude to build this interactive prototype of a navigation bar that uses text-wrap: balance
against a list of display: inline
menu list items. It seems to work well!
My first attempt used display: inline-block
which worked in Safari but failed in Firefox.
Notable limitation from that MDN article:
Because counting characters and balancing them across multiple lines is computationally expensive, this value is only supported for blocks of text spanning a limited number of lines (six or less for Chromium and ten or less for Firefox)
So it's fine for these navigation concepts but isn't something you can use for body text.
Experimenting with audio input and output for the OpenAI Chat Completion API
OpenAI promised this at DevDay a few weeks ago and now it’s here: their Chat Completion API can now accept audio as input and return it as output. OpenAI still recommend their WebSocket-based Realtime API for audio tasks, but the Chat Completion API is a whole lot easier to write code against.
[... 1,555 words]Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent
The other day I found myself needing to add up some numeric values that were scattered across twelve different emails.
[... 1,294 words]My Jina Reader tool. I wanted to feed the Cloudflare Durable Objects SQLite documentation into Claude, but I was on my iPhone so copying and pasting was inconvenient. Jina offer a Reader API which can turn any URL into LLM-friendly Markdown and it turns out it supports CORS, so I got Claude to build me this tool (second iteration, third iteration, final source code).
Paste in a URL to get the Jina Markdown version, along with an all important "Copy to clipboard" button.
An LLM TDD loop (via) Super neat demo by David Winterbottom, who wrapped my LLM and files-to-prompt tools in a short Bash script that can be fed a file full of Python unit tests and an empty implementation file and will then iterate on that file in a loop until the tests pass.
lm.rs: run inference on Language Models locally on the CPU with Rust (via) Impressive new LLM inference implementation in Rust by Samuel Vitorino. I tried it just now on an M2 Mac with 64GB of RAM and got very snappy performance for this Q8 Llama 3.2 1B, with Activity Monitor reporting 980% CPU usage over 13 threads.
Here's how I compiled the library and ran the model:
cd /tmp
git clone https://github.com/samuel-vitorino/lm.rs
cd lm.rs
RUSTFLAGS="-C target-cpu=native" cargo build --release --bin chat
curl -LO 'https://huggingface.co/samuel-vitorino/Llama-3.2-1B-Instruct-Q8_0-LMRS/resolve/main/tokenizer.bin?download=true'
curl -LO 'https://huggingface.co/samuel-vitorino/Llama-3.2-1B-Instruct-Q8_0-LMRS/resolve/main/llama3.2-1b-it-q80.lmrs?download=true'
./target/release/chat --model llama3.2-1b-it-q80.lmrs --show-metrics
That --show-metrics
option added this at the end of a response:
Speed: 26.41 tok/s
It looks like the performance is helped by two key dependencies: wide, which provides data types optimized for SIMD operations and rayon for running parallel iterators across multiple cores (used for matrix multiplication).
(I used LLM and files-to-prompt
to help figure this out.)
SVG to JPG/PNG. The latest in my ongoing series of interactive HTML and JavaScript tools written almost entirely by LLMs. This one lets you paste in (or open-from-file, or drag-onto-page) some SVG and then use that to render a JPEG or PNG image of your desired width.
I built this using Claude 3.5 Sonnet, initially as an Artifact and later in a code editor since some of the features (loading an example image and downloading the result) cannot run in the sandboxed iframe Artifact environment.
Here's the full transcript of the Claude conversation I used to build the tool, plus a few commits I later made by hand to further customize it.
The code itself is mostly quite simple. The most interesting part is how it renders the SVG to an image, which (simplified) looks like this:
// First extract the viewbox to get width/height
const svgElement = new DOMParser().parseFromString(
svgInput, 'image/svg+xml'
).documentElement;
let viewBox = svgElement.getAttribute('viewBox');
[, , width, height] = viewBox.split(' ').map(Number);
// Figure out the width/height of the output image
const newWidth = parseInt(widthInput.value) || 800;
const aspectRatio = width / height;
const newHeight = Math.round(newWidth / aspectRatio);
// Create off-screen canvas
const canvas = document.createElement('canvas');
canvas.width = newWidth;
canvas.height = newHeight;
// Draw SVG on canvas
const svgBlob = new Blob([svgInput], {type: 'image/svg+xml;charset=utf-8'});
const svgUrl = URL.createObjectURL(svgBlob);
const img = new Image();
const ctx = canvas.getContext('2d');
img.onload = function() {
ctx.drawImage(img, 0, 0, newWidth, newHeight);
URL.revokeObjectURL(svgUrl);
// Convert that to a JPEG
const imageDataUrl = canvas.toDataURL("image/jpeg");
const convertedImg = document.createElement('img');
convertedImg.src = imageDataUrl;
imageContainer.appendChild(convertedImg);
};
img.src = svgUrl;
Here's the MDN explanation of that revokeObjectURL() method, which I hadn't seen before.
Call this method when you've finished using an object URL to let the browser know not to keep the reference to the file any longer.
Building an automatically updating live blog in Django. Here's an extended write-up of how I implemented the live blog feature I used for my coverage of OpenAI DevDay yesterday. I built the first version using Claude while waiting for the keynote to start, then upgraded it during the lunch break with the help of GPT-4o to add sort options and incremental fetching of new updates.
In the future, we won't need programmers; just people who can describe to a computer precisely what they want it to do.
OpenFreeMap (via) New free map tile hosting service from Zsolt Ero:
OpenFreeMap lets you display custom maps on your website and apps for free. […] Using our public instance is completely free: there are no limits on the number of map views or requests. There’s no registration, no user database, no API keys, and no cookies. We aim to cover the running costs of our public instance through donations.
The site serves static vector tiles that work with MapLibre GL. It deliberately doesn’t offer any other services such as search or routing.
From the project README looks like it’s hosted on two Hetzner machines. I don’t think the public server is behind a CDN.
Part of the trick to serving the tiles efficiently is the way it takes advantage of Btrfs:
Production-quality hosting of 300 million tiny files is hard. The average file size is just 450 byte. Dozens of tile servers have been written to tackle this problem, but they all have their limitations.
The original idea of this project is to avoid using tile servers altogether. Instead, the tiles are directly served from Btrfs partition images + hard links using an optimised nginx config.
The self-hosting guide describes the scripts that are provided for downloading their pre-built tiles (needing a fresh Ubuntu server with 300GB of SSD and 4GB of RAM) or building the tiles yourself using Planetiler (needs 500GB of disk and 64GB of RAM).
Getting started is delightfully straightforward:
const map = new maplibregl.Map({
style: 'https://tiles.openfreemap.org/styles/liberty',
center: [13.388, 52.517],
zoom: 9.5,
container: 'map',
})
I got Claude to help build this demo showing a thousand random markers dotted around San Francisco. The 3D tiles even include building shapes!
Zsolt built OpenFreeMap based on his experience running MapHub over the last 9 years. Here’s a 2018 interview about that project.
It’s pretty incredible that the OpenStreetMap and open geospatial stack has evolved to the point now where it’s economically feasible for an individual to offer a service like this. I hope this turns out to be sustainable. Hetzner charge just €1 per TB for bandwidth (S3 can cost $90/TB) which should help a lot.
Solving a bug with o1-preview, files-to-prompt and LLM.
I added a new feature to DJP this morning: you can now have plugins specify their middleware in terms of how it should be positioned relative to other middleware - inserted directly before or directly after django.middleware.common.CommonMiddleware
for example.
At one point I got stuck with a weird test failure, and after ten minutes of head scratching I decided to pipe the entire thing into OpenAI's o1-preview
to see if it could spot the problem. I used files-to-prompt to gather the code and LLM to run the prompt:
files-to-prompt **/*.py -c | llm -m o1-preview "
The middleware test is failing showing all of these - why is MiddlewareAfter repeated so many times?
['MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware5', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware2', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware5', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware4', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware5', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware2', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware5', 'MiddlewareAfter', 'Middleware3', 'MiddlewareAfter', 'Middleware', 'MiddlewareBefore']"
The model whirled away for a few seconds and spat out an explanation of the problem - one of my middleware classes was accidentally calling self.get_response(request)
in two different places.
I did enjoy how o1 attempted to reference the relevant Django documentation and then half-repeated, half-hallucinated a quote from it:
This took 2,538 input tokens and 4,354 output tokens - by my calculations at $15/million input and $60/million output that prompt cost just under 30 cents.
The Pragmatic Engineer Podcast: AI tools for software engineers, but without the hype – with Simon Willison. Gergely Orosz has a brand new podcast, and I was the guest for the first episode. We covered a bunch of ground, but my favorite topic was an exploration of the (very legitimate) reasons that many engineers are resistant to taking advantage of AI-assisted programming tools.
XKCD 1425 (Tasks) turns ten years old today (via) One of the all-time great XKCDs. It's amazing that "check whether the photo is of a bird" has gone from PhD-level to trivially easy to solve (with a vision LLM, or CLIP, or ResNet+ImageNet among others).
The key idea still very much stands though. Understanding the difference between easy and hard challenges in software development continues to require an enormous depth of experience.
I'd argue that LLMs have made this even worse.
Understanding what kind of tasks LLMs can and cannot reliably solve remains incredibly difficult and unintuitive. They're computer systems that are terrible at maths and that can't reliably lookup facts!
On top of that, the rise of AI-assisted programming tools means more people than ever are beginning to create their own custom software.
These brand new AI-assisted proto-programmers are having a crash course in this easy-v.s.-hard problem.
I saw someone recently complaining that they couldn't build a Claude Artifact that could analyze images, even though they knew Claude itself could do that. Understanding why that's not possible involves understanding how the CSP headers that are used to serve Artifacts prevent the generated code from making its own API calls out to an LLM!
Jiter (via) One of the challenges in dealing with LLM streaming APIs is the need to parse partial JSON - until the stream has ended you won't have a complete valid JSON object, but you may want to display components of that JSON as they become available.
I've solved this previously using the ijson streaming JSON library, see my previous TIL.
Today I found out about Jiter, a new option from the team behind Pydantic. It's written in Rust and extracted from pydantic-core, so the Python wrapper for it can be installed using:
pip install jiter
You can feed it an incomplete JSON bytes object and use partial_mode="on"
to parse the valid subset:
import jiter partial_json = b'{"name": "John", "age": 30, "city": "New Yor' jiter.from_json(partial_json, partial_mode="on") # {'name': 'John', 'age': 30}
Or use partial_mode="trailing-strings"
to include incomplete string fields too:
jiter.from_json(partial_json, partial_mode="trailing-strings") # {'name': 'John', 'age': 30, 'city': 'New Yor'}
The current README was a little thin, so I submiitted a PR with some extra examples. I got some help from files-to-prompt
and Claude 3.5 Sonnet):
cd crates/jiter-python/ && files-to-prompt -c README.md tests | llm -m claude-3.5-sonnet --system 'write a new README with comprehensive documentation'
Whether you think coding with AI works today or not doesn’t really matter.
But if you think functional AI helping to code will make humans dumber or isn’t real programming just consider that’s been the argument against every generation of programming tools going back to Fortran.
Markdown and Math Live Renderer.
Another of my tiny Claude-assisted JavaScript tools. This one lets you enter Markdown with embedded mathematical expressions (like $ax^2 + bx + c = 0$
) and live renders those on the page, with an HTML version using MathML that you can export through copy and paste.
Here's the Claude transcript. I started by asking:
Are there any client side JavaScript markdown libraries that can also handle inline math and render it?
Claude gave me several options including the combination of Marked and KaTeX, so I followed up by asking:
Build an artifact that demonstrates Marked plus KaTeX - it should include a text area I can enter markdown in (repopulated with a good example) and live update the rendered version below. No react.
Which gave me this artifact, instantly demonstrating that what I wanted to do was possible.
I iterated on it a tiny bit to get to the final version, mainly to add that HTML export and a Copy button. The final source code is here.