Simon Willison’s Weblog

Subscribe
Atom feed for projects

458 items tagged “projects”

Posts about projects I have worked on.

2025

LLM 0.22, the annotated release notes

I released LLM 0.22 this evening. Here are the annotated release notes:

[... 1,340 words]

Run LLMs on macOS using llm-mlx and Apple’s MLX framework

Visit Run LLMs on macOS using llm-mlx and Apple's MLX framework

llm-mlx is a brand new plugin for my LLM Python Library and CLI utility which builds on top of Apple’s excellent MLX array framework library and mlx-lm package. If you’re a terminal user or Python developer with a Mac this may be the new easiest way to start exploring local Large Language Models.

[... 1,524 words]

files-to-prompt 0.5. My files-to-prompt tool (originally built using Claude 3 Opus back in April) had been accumulating a bunch of issues and PRs - I finally got around to spending some time with it and pushed a fresh release:

  • New -n/--line-numbers flag for including line numbers in the output. Thanks, Dan Clayton. #38
  • Fix for utf-8 handling on Windows. Thanks, David Jarman. #36
  • --ignore patterns are now matched against directory names as well as file names, unless you pass the new --ignore-files-only flag. Thanks, Nick Powell. #30

I use this tool myself on an almost daily basis - it's fantastic for quickly answering questions about code. Recently I've been plugging it into Gemini 2.0 with its 2 million token context length, running recipes like this one:

git clone https://github.com/bytecodealliance/componentize-py
cd componentize-py
files-to-prompt . -c | llm -m gemini-2.0-pro-exp-02-05 \
  -s 'How does this work? Does it include a python compiler or AST trick of some sort?'

I ran that question against the bytecodealliance/componentize-py repo - which provides a tool for turning Python code into compiled WASM - and got this really useful answer.

Here's another example. I decided to have o3-mini review how Datasette handles concurrent SQLite connections from async Python code - so I ran this:

git clone https://github.com/simonw/datasette
cd datasette/datasette
files-to-prompt database.py utils/__init__.py -c | \
  llm -m o3-mini -o reasoning_effort high \
  -s 'Output in markdown a detailed analysis of how this code handles the challenge of running SQLite queries from a Python asyncio application. Explain how it works in the first section, then explore the pros and cons of this design. In a final section propose alternative mechanisms that might work better.'

Here's the result. It did an extremely good job of explaining how my code works - despite being fed just the Python and none of the other documentation. Then it made some solid recommendations for potential alternatives.

I added a couple of follow-up questions (using llm -c) which resulted in a full working prototype of an alternative threadpool mechanism, plus some benchmarks.

One final example: I decided to see if there were any undocumented features in Litestream, so I checked out the repo and ran a prompt against just the .go files in that project:

git clone https://github.com/benbjohnson/litestream
cd litestream
files-to-prompt . -e go -c | llm -m o3-mini \
  -s 'Write extensive user documentation for this project in markdown'

Once again, o3-mini provided a really impressively detailed set of unofficial documentation derived purely from reading the source.

# 14th February 2025, 4:14 am / projects, llms, gemini, llm, ai-assisted-programming, generative-ai, ai, webassembly, python, async, datasette, sqlite, litestream

shot-scraper 1.6 with support for HTTP Archives. New release of my shot-scraper CLI tool for taking screenshots and scraping web pages.

The big new feature is HTTP Archive (HAR) support. The new shot-scraper har command can now create an archive of a page and all of its dependents like this:

shot-scraper har https://datasette.io/

This produces a datasette-io.har file (currently 163KB) which is JSON representing the full set of requests used to render that page. Here's a copy of that file. You can visualize that here using ericduran.github.io/chromeHAR.

The HAR viewer shows a line for each of the loaded resources, with options to view timing information

That JSON includes full copies of all of the responses, base64 encoded if they are binary files such as images.

You can add the --zip flag to instead get a datasette-io.har.zip file, containing JSON data in har.har but with the response bodies saved as separate files in that archive.

The shot-scraper multi command lets you run shot-scraper against multiple URLs in sequence, specified using a YAML file. That command now takes a --har option (or --har-zip or --har-file name-of-file), described in the documentation, which will produce a HAR at the same time as taking the screenshots.

Shots are usually defined in YAML that looks like this:

- output: example.com.png
  url: http://www.example.com/
- output: w3c.org.png
  url: https://www.w3.org/

You can now omit the output: keys and generate a HAR file without taking any screenshots at all:

- url: http://www.example.com/
- url: https://www.w3.org/

Run like this:

shot-scraper multi shots.yml --har

Which outputs:

Skipping screenshot of 'https://www.example.com/'
Skipping screenshot of 'https://www.w3.org/'
Wrote to HAR file: trace.har

shot-scraper is built on top of Playwright, and the new features use the browser.new_context(record_har_path=...) parameter.

# 13th February 2025, 9:02 pm / projects, shot-scraper, playwright, python, scraping

Using pip to install a Large Language Model that’s under 100MB

Visit Using pip to install a Large Language Model that's under 100MB

I just released llm-smollm2, a new plugin for LLM that bundles a quantized copy of the SmolLM2-135M-Instruct LLM inside of the Python package.

[... 1,553 words]

Datasette 1.0a17. New Datasette alpha, with a bunch of small changes and bug fixes accumulated over the past few months. Some (minor) highlights:

  • The register_magic_parameters(datasette) plugin hook can now register async functions. (#2441)
  • Breadcrumbs on database and table pages now include a consistent self-link for resetting query string parameters. (#2454)
  • New internal methods datasette.set_actor_cookie() and datasette.delete_actor_cookie(), described here. (#1690)
  • /-/permissions page now shows a list of all permissions registered by plugins. (#1943)
  • If a table has a single unique text column Datasette now detects that as the foreign key label for that table. (#2458)
  • The /-/permissions page now includes options for filtering or exclude permission checks recorded against the current user. (#2460)

I was incentivized to push this release by an issue I ran into in my new datasette-load plugin, which resulted in this fix:

  • Fixed a bug where replacing a database with a new one with the same name did not pick up the new database correctly. (#2465)

# 6th February 2025, 8:56 pm / projects, annotated-release-notes, datasette

OpenAI o3-mini, now available in LLM

OpenAI’s o3-mini is out today. As with other o-series models it’s a slightly difficult one to evaluate—we now need to decide if a prompt is best run using GPT-4o, o1, o3-mini or (if we have access) o1 Pro.

[... 748 words]

LLM 0.20. New release of my LLM CLI tool and Python library. A bunch of accumulated fixes and features since the start of December, most notably:

  • Support for OpenAI's o1 model - a significant upgrade from o1-preview given its 200,000 input and 100,000 output tokens (o1-preview was 128,000/32,768). #676
  • Support for the gpt-4o-audio-preview and gpt-4o-mini-audio-preview models, which can accept audio input: llm -m gpt-4o-audio-preview -a https://static.simonwillison.net/static/2024/pelican-joke-request.mp3 #677
  • A new llm -x/--extract option which extracts and returns the contents of the first fenced code block in the response. This is useful for prompts that generate code. #681
  • A new llm models -q 'search' option for searching available models - useful if you've installed a lot of plugins. Searches are case insensitive. #700

# 23rd January 2025, 4:55 am / llm, projects, generative-ai, annotated-release-notes, ai, llms, openai, o1

llm-gemini 0.9. This new release of my llm-gemini plugin adds support for two new experimental models:

  • learnlm-1.5-pro-experimental is "an experimental task-specific model that has been trained to align with learning science principles when following system instructions for teaching and learning use cases" - more here.
  • gemini-2.0-flash-thinking-exp-01-21 is a brand new version of the Gemini 2.0 Flash Thinking model released today:

    Latest version also includes code execution, a 1M token content window & a reduced likelihood of thought-answer contradictions.

The most exciting new feature though is support for Google search grounding, where some Gemini models can execute Google searches as part of answering a prompt. This feature can be enabled using the new -o google_search 1 option.

# 22nd January 2025, 4:32 am / gemini, llm, projects, generative-ai, inference-scaling, ai, llms

2024

Prompts.js

Visit Prompts.js

I’ve been putting the new o1 model from OpenAI through its paces, in particular for code. I’m very impressed—it feels like it’s giving me a similar code quality to Claude 3.5 Sonnet, at least for Python and JavaScript and Bash... but it’s returning output noticeably faster.

[... 1,119 words]

datasette-enrichments-llm. Today's new alpha release is datasette-enrichments-llm, a plugin for Datasette 1.0a+ that provides an enrichment that lets you run prompts against data from one or more column and store the result in another column.

So far it's a light re-implementation of the existing datasette-enrichments-gpt plugin, now using the new llm.get_async_models() method to allow users to select any async-enabled model that has been registered by a plugin - so currently any of the models from OpenAI, Anthropic, Gemini or Mistral via their respective plugins.

Still plenty to do on this one. Next step is to integrate it with datasette-llm-usage and use it to drive a design-complete stable version of that.

# 5th December 2024, 11:46 pm / llm, plugins, ai, llms, enrichments, releases, datasette, generative-ai, projects

First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin)

Visit First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin)

Amazon released three new Large Language Models yesterday at their AWS re:Invent conference. The new model family is called Amazon Nova and comes in three sizes: Micro, Lite and Pro.

[... 2,385 words]

datasette-queries. I released the first alpha of a new plugin to replace the crusty old datasette-saved-queries. This one adds a new UI element to the top of the query results page with an expandable form for saving the query as a new canned query:

Animated demo. I start on the table page, run a search, click View and edit SQL, then on the SQL query page open a Save query dialog, click a Suggest title and description button, wait for that to suggest something and click save.

It's my first plugin to depend on LLM and datasette-llm-usage - it uses GPT-4o mini to power an optional "Suggest title and description" button, labeled with the becoming-standard ✨ sparkles emoji to indicate an LLM-powered feature.

I intend to expand this to work across multiple models as I continue to iterate on llm-datasette-usage to better support those kinds of patterns.

For the moment though each suggested title and description call costs about 250 input tokens and 50 output tokens, which against GPT-4o mini adds up to 0.0067 cents.

# 3rd December 2024, 11:59 pm / projects, releases, datasette, plugins, llm, generative-ai, openai, ai, llms

datasette-llm-usage. I released the first alpha of a Datasette plugin to help track LLM usage by other plugins, with the goal of supporting token allowances - both for things like free public apps that stop working after a daily allowance, plus free previews of AI features for paid-account-based projects such as Datasette Cloud.

It's using the usage features I added in LLM 0.19.

The alpha doesn't do much yet - it will start getting interesting once I upgrade other plugins to depend on it.

Design notes so far in issue #1.

# 2nd December 2024, 9:33 pm / llm, datasette-cloud, plugins, ai, llms, datasette, generative-ai, projects, releases

LLM 0.19. I just released version 0.19 of LLM, my Python library and CLI utility for working with Large Language Models.

I released 0.18 a couple of weeks ago adding support for calling models from Python asyncio code. 0.19 improves on that, and also adds a new mechanism for models to report their token usage.

LLM can log those usage numbers to a SQLite database, or make then available to custom Python code.

My eventual goal with these features is to implement token accounting as a Datasette plugin so I can offer AI features in my SaaS platform without worrying about customers spending unlimited LLM tokens.

Those 0.19 release notes in full:

  • Tokens used by a response are now logged to new input_tokens and output_tokens integer columns and a token_details JSON string column, for the default OpenAI models and models from other plugins that implement this feature. #610
  • llm prompt now takes a -u/--usage flag to display token usage at the end of the response.
  • llm logs -u/--usage shows token usage information for logged responses.
  • llm prompt ... --async responses are now logged to the database. #641
  • llm.get_models() and llm.get_async_models() functions, documented here. #640
  • response.usage() and async response await response.usage() methods, returning a Usage(input=2, output=1, details=None) dataclass. #644
  • response.on_done(callback) and await response.on_done(callback) methods for specifying a callback to be executed when a response has completed, documented here. #653
  • Fix for bug running llm chat on Windows 11. Thanks, Sukhbinder Singh. #495

I also released three new plugin versions that add support for the new usage tracking feature: llm-gemini 0.5, llm-claude-3 0.10 and llm-mistral 0.9.

# 1st December 2024, 11:59 pm / llm, releasenotes, generative-ai, projects, ai, llms, releases

GitHub OAuth for a static site using Cloudflare Workers. Here's a TIL covering a Thanksgiving AI-assisted programming project. I wanted to add OAuth against GitHub to some of the projects on my tools.simonwillison.net site in order to implement "Save to Gist".

That site is entirely statically hosted by GitHub Pages, but OAuth has a required server-side component: there's a client_secret involved that should never be included in client-side code.

Since I serve the site from behind Cloudflare I realized that a minimal Cloudflare Workers script may be enough to plug the gap. I got Claude on my phone to build me a prototype and then pasted that (still on my phone) into a new Cloudflare Worker and it worked!

... almost. On later closer inspection of the code it was missing error handling... and then someone pointed out it was vulnerable to a login CSRF attack thanks to failure to check the state= parameter. I worked with Claude to fix those too.

Useful reminder here that pasting code AI-generated code around on a mobile phone isn't necessarily the best environment to encourage a thorough code review!

# 29th November 2024, 6:13 pm / ai, oauth, llms, security, tools, generative-ai, ai-assisted-programming, projects, cloudflare, github

Ask questions of SQLite databases and CSV/JSON files in your terminal

Visit Ask questions of SQLite databases and CSV/JSON files in your terminal

I built a new plugin for my sqlite-utils CLI tool that lets you ask human-language questions directly of SQLite databases and CSV/JSON files on your computer.

[... 723 words]

Weeknotes: asynchronous LLMs, synchronous embeddings, and I kind of started a podcast

These past few weeks I’ve been bringing Datasette and LLM together and distracting myself with a new sort-of-podcast crossed with a live streaming experiment.

[... 896 words]

llm-gguf 0.2, now with embeddings. This new release of my llm-gguf plugin - which provides support for locally hosted GGUF LLMs - adds a new feature: it now supports embedding models distributed as GGUFs as well.

This means you can use models like the bafflingly small (30.8MB in its smallest quantization) mxbai-embed-xsmall-v1 with LLM like this:

llm install llm-gguf
llm gguf download-embed-model \
  'https://huggingface.co/mixedbread-ai/mxbai-embed-xsmall-v1/resolve/main/gguf/mxbai-embed-xsmall-v1-q8_0.gguf'

Then to embed a string:

llm embed -m gguf/mxbai-embed-xsmall-v1-q8_0 -c 'hello'

The LLM docs have extensive coverage of things you can then do with this model, like embedding every row in a CSV file / file in a directory / record in a SQLite database table and running similarity and semantic search against them.

Under the hood this takes advantage of the create_embedding() method provided by the llama-cpp-python wrapper around llama.cpp.

# 21st November 2024, 7:24 am / llm, generative-ai, projects, ai, embeddings, llama-cpp

llm-gemini 0.4. New release of my llm-gemini plugin, adding support for asynchronous models (see LLM 0.18), plus the new gemini-exp-1114 model (currently at the top of the Chatbot Arena) and a -o json_object 1 option to force JSON output.

I also released llm-claude-3 0.9 which adds asynchronous support for the Claude family of models.

# 18th November 2024, 7:37 am / llm, plugins, ai, llms, async, python, generative-ai, projects, claude, gemini, anthropic, google

LLM 0.18. New release of LLM. The big new feature is asynchronous model support - you can now use supported models in async Python code like this:

import llm

model = llm.get_async_model("gpt-4o")
async for chunk in model.prompt(
    "Five surprising names for a pet pelican"
):
    print(chunk, end="", flush=True)

Also new in this release: support for sending audio attachments to OpenAI's gpt-4o-audio-preview model.

# 17th November 2024, 8:40 pm / async, llm, python, generative-ai, projects, ai, llms

QuickTime video script to capture frames and bounding boxes. An update to an older TIL. I'm working on the write-up for my DjangoCon US talk on plugins and I found myself wanting to capture individual frames from the video in two formats: a full frame capture, and another that captured just the portion of the screen shared from my laptop.

I have a script for the former, so I got Claude to update my script to add support for one or more --box options, like this:

capture-bbox.sh ../output.mp4  --box '31,17,100,87' --box '0,0,50,50'

Open output.mp4 in QuickTime Player, run that script and then every time you hit a key in the terminal app it will capture three JPEGs from the current position in QuickTime Player - one for the whole screen and one each for the specified bounding box regions.

Those bounding box regions are percentages of the width and height of the image. I also got Claude to build me this interactive tool on top of cropperjs to help figure out those boxes:

Screenshot of the tool. A frame from a video of a talk I gave at DjangoCon US is shown, with a crop region on it using drag handles for the different edges of the crop. Below that is a box showing --bbox '31,17,99,86'

# 14th November 2024, 7 pm / claude-artifacts, ai-assisted-programming, claude, tools, projects, ffmpeg, llms, ai, generative-ai

MDN Browser Support Timelines. I complained on Hacker News today that I wished the MDN browser compatibility ables - like this one for the Web Locks API - included an indication as to when each browser was released rather than just the browser numbers.

It turns out they do! If you click on each browser version in turn you can see an expanded area showing the browser release date:

Animated GIF showing the table, clicking a browser version expands a box showing when it was released

There's even an inline help tip telling you about the feature, which I've been studiously ignoring for years.

I want to see all the information at once without having to click through each browser. I had a poke around in the Firefox network tab and found https://bcd.developer.mozilla.org/bcd/api/v0/current/api.Lock.json - a JSON document containing browser support details (with release dates) for that API... and it was served using access-control-allow-origin: * which means I can hit it from my own little client-side applications.

I decided to build something with an autocomplete drop-down interface for selecting the API. That meant I'd need a list of all of the available APIs, and I used GitHub code search to find that in the mdn/browser-compat-data repository, in the api/ directory.

I needed the list of files in that directory for my autocomplete. Since there are just over 1,000 of those the regular GitHub contents API won't return them all, so I switched to the tree API instead.

Here's the finished tool - source code here:

Screenshot of browser support timeline. MDN Browser Support Timelines heading, ViewTransition search box, and api.ViewTransition section showing MDN Documentation and Specification links. Timeline shows Standard_track releases: webview_android v111 (Feb 28 2023), chrome v111 (Mar 6 2023), chrome_android v111 (Mar 6 2023), edge v111 (Mar 12 2023), opera v97 (Mar 21 2023), opera_android v75 (May 16 2023), samsunginternet_android v22.0 (Jul 13 2023), safari v18 (Sep 15 2024), safari_ios v18 (Sep 15 2024), webview_ios v18 (Sep 15 2024). Not Supported: firefox, firefox_android, ie, oculus

95% of the code was written by LLMs, but I did a whole lot of assembly and iterating to get it to the finished state. Three of the transcripts for that:

# 11th November 2024, 3:27 am / claude-3-5-sonnet, mozilla, javascript, projects, ai, github, llms, ai-assisted-programming

Visualizing local election results with Datasette, Observable and MapLibre GL

Visit Visualizing local election results with Datasette, Observable and MapLibre GL

Alex Garcia and myself hosted the first Datasette Open Office Hours on Friday—a live-streamed video session where we hacked on a project together and took questions and tips from community members on Discord.

[... 3,390 words]

Nous Hermes 3. The Nous Hermes family of fine-tuned models have a solid reputation. Their most recent release came out in August, based on Meta's Llama 3.1:

Our training data aggressively encourages the model to follow the system and instruction prompts exactly and in an adaptive manner. Hermes 3 was created by fine-tuning Llama 3.1 8B, 70B and 405B, and training on a dataset of primarily synthetically generated responses. The model boasts comparable and superior performance to Llama 3.1 while unlocking deeper capabilities in reasoning and creativity.

The model weights are on Hugging Face, including GGUF versions of the 70B and 8B models. Here's how to try the 8B model (a 4.58GB download) using the llm-gguf plugin:

llm install llm-gguf
llm gguf download-model 'https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B-GGUF/resolve/main/Hermes-3-Llama-3.1-8B.Q4_K_M.gguf' -a Hermes-3-Llama-3.1-8B
llm -m Hermes-3-Llama-3.1-8B 'hello in spanish'

Nous Research partnered with Lambda Labs to provide inference APIs. It turns out Lambda host quite a few models now, currently providing free inference to users with an API key.

I just released the first alpha of a llm-lambda-labs plugin. You can use that to try the larger 405b model (very hard to run on a consumer device) like this:

llm install llm-lambda-labs
llm keys set lambdalabs
# Paste key here
llm -m lambdalabs/hermes3-405b 'short poem about a pelican with a twist'

Here's the source code for the new plugin, which I based on llm-mistral. The plugin uses httpx-sse to consume the stream of tokens from the API.

# 4th November 2024, 6:20 pm / llm, generative-ai, llama, ai, edge-llms, llms, meta, projects, nous-research

California Clock Change. The clocks go back in California tonight and I finally built my dream application for helping me remember if I get an hour extra of sleep or not, using a Claude Artifact. Here's the transcript.

California Clock Change. For Pacific Time (PST/PDT) only. When you go to bed on Saturday, November 2, 2024That's tonight!, you will get an extra hour of sleep! The clocks fall back from 2:00 AM to 1:00 AM on Sunday, November 3, 2024.

This is one of my favorite examples yet of the kind of tiny low stakes utilities I'm building with Claude Artifacts because the friction involved in churning out a working application has dropped almost to zero.

(I added another feature: it now includes a note of what time my Dog thinks it is if the clocks have recently changed.)

# 3rd November 2024, 5:11 am / claude-artifacts, ai-assisted-programming, projects, ai, timezones, llms

Claude API: PDF support (beta) (via) Claude 3.5 Sonnet now accepts PDFs as attachments:

The new Claude 3.5 Sonnet (claude-3-5-sonnet-20241022) model now supports PDF input and understands both text and visual content within documents.

I just released llm-claude-3 0.7 with support for the new attachment type (attachments are a very new feature), so now you can do this:

llm install llm-claude-3 --upgrade
llm -m claude-3.5-sonnet 'extract text' -a mydoc.pdf

Visual PDF analysis can also be turned on for the Claude.ai application:

Screenshot of a feature preview interface showing experimental features. At top: Feature Preview with beaker icon. Main text explains these are upcoming enhancements that may affect Claude's behavior. Shows options for Analysis tool, LaTeX Rendering, and Visual PDFs. Right panel demonstrates Visual PDFs feature with Apollo 17 flight plan image and chat messages. Toggle switch shows feature is Off. Description states Give Claude 3.5 Sonnet the ability to view and analyze images, charts, and graphs in PDFs, in addition to text. PDFs that are less than 100 pages are supported.

Also new today: Claude now offers a free (albeit rate-limited) token counting API. This addresses a complaint I've had for a while: previously it wasn't possible to accurately estimate the cost of a prompt before sending it to be executed.

# 1st November 2024, 6:55 pm / vision-llms, claude-3-5-sonnet, llm, anthropic, claude, ai, llms, pdf, generative-ai, projects

You can now run prompts against images, audio and video in your terminal using LLM

Visit You can now run prompts against images, audio and video in your terminal using LLM

I released LLM 0.17 last night, the latest version of my combined CLI tool and Python library for interacting with hundreds of different Large Language Models such as GPT-4o, Llama, Claude and Gemini.

[... 1,399 words]

llm-whisper-api. I wanted to run an experiment through the OpenAI Whisper API this morning so I knocked up a very quick plugin for LLM that provides the following interface:

llm install llm-whisper-api
llm whisper-api myfile.mp3 > transcript.txt

It uses the API key that you previously configured using the llm keys set openai command. If you haven't configured one you can pass it as --key XXX instead.

It's a tiny plugin: the source code is here.

# 27th October 2024, 6:19 pm / llm, projects, plugins, openai, whisper, ai

Run a prompt to generate and execute jq programs using llm-jq

Visit Run a prompt to generate and execute jq programs using llm-jq

llm-jq is a brand new plugin for LLM which lets you pipe JSON directly into the llm jq command along with a human-language description of how you’d like to manipulate that JSON and have a jq program generated and executed for you on the fly.

[... 417 words]