Simon Willison’s Weblog

Subscribe

Series of posts

My open source process

Articles about the process I use for developing my open source projects.

  1. Documentation unit tests - July 28, 2018, 3:59 p.m.
  2. How to cheat at unit tests with pytest and Black - Feb. 11, 2020, 6:56 a.m.
  3. Open source projects: consider running office hours - Feb. 19, 2021, 9:54 p.m.
  4. How to build, test and publish an open source Python library - Nov. 4, 2021, 10:02 p.m.
  5. How I build a feature - Jan. 12, 2022, 6:10 p.m.
  6. Writing better release notes - Jan. 31, 2022, 8:13 p.m.
  7. Software engineering practices - Oct. 1, 2022, 3:56 p.m.
  8. Automating screenshots for the Datasette documentation using shot-scraper - Oct. 14, 2022, 11:44 p.m.
  9. The Perfect Commit - Oct. 29, 2022, 8:41 p.m.
  10. Coping strategies for the serial project hoarder - Nov. 26, 2022, 3:47 p.m.
  11. Things I've learned about building CLI tools in Python - Sept. 30, 2023, 12:12 a.m.
  12. Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions - Jan. 16, 2024, 9:59 p.m.

How I use LLMs and ChatGPT

Posts about ways I'm using LLM tools such as ChatGPT in my own work. This series starts with my experiments using GPT-3 in June 2022, so if you are looking for more recent material be sure to scroll to the bottom!

  1. How to use the GPT-3 language model - June 5, 2022, 5:28 p.m.
  2. Using GPT-3 to explain how code works - July 9, 2022, 3:19 p.m.
  3. AI assisted learning: Learning Rust with ChatGPT, Copilot and Advent of Code - Dec. 5, 2022, 9:11 p.m.
  4. Over-engineering Secret Santa with Python cryptography and Datasette - Dec. 11, 2022, 2:03 a.m.
  5. I built a ChatGPT plugin to answer questions about data hosted in Datasette - March 24, 2023, 3:43 p.m.
  6. AI-enhanced development makes me more ambitious with my projects - March 27, 2023, 2:38 p.m.
  7. Running Python micro-benchmarks using the ChatGPT Code Interpreter alpha - April 12, 2023, 1:14 a.m.
  8. How I make annotated presentations - Aug. 6, 2023, 5:15 p.m.
  9. Now add a walrus: Prompt engineering in DALL‑E 3 - Oct. 26, 2023, 9:11 p.m.
  10. Exploring GPTs: ChatGPT in a trench coat? - Nov. 15, 2023, 3:39 p.m.
  11. Claude and ChatGPT for ad-hoc sidequests - March 22, 2024, 7:44 p.m.
  12. Building and testing C extensions for SQLite with ChatGPT Code Interpreter - March 23, 2024, 5:50 p.m.
  13. llm cmd undo last git commit - a new plugin for LLM - March 26, 2024, 3:37 p.m.
  14. Running OCR against PDFs and images directly in your browser - March 30, 2024, 5:59 p.m.
  15. Building files-to-prompt entirely using Claude 3 Opus - April 8, 2024, 8:40 p.m.
  16. AI for Data Journalism: demonstrating what we can do with this stuff right now - April 17, 2024, 9:04 p.m.
  17. Building search-based RAG using Claude, Datasette and Val Town - June 21, 2024, 8:44 p.m.
  18. django-http-debug, a new Django app mostly written by Claude - Aug. 8, 2024, 3:26 p.m.
  19. Building a tool showing how Gemini Pro can return bounding boxes for objects in images - Aug. 26, 2024, 4:55 a.m.
  20. Notes on using LLMs for code - Sept. 20, 2024, 3:10 a.m.
  21. Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent - Oct. 17, 2024, 12:32 p.m.
  22. Everything I built with Claude Artifacts this week - Oct. 21, 2024, 2:32 p.m.
  23. Run a prompt to generate and execute jq programs using llm-jq - Oct. 27, 2024, 4:26 a.m.
  24. You can now run prompts against images, audio and video in your terminal using LLM - Oct. 29, 2024, 3:09 p.m.

New features in sqlite-utils

Any time I introduce a significant new feature in a release of my sqlite-utils package I write about it here.

  1. sqlite-utils: a Python library and CLI tool for building SQLite databases - Feb. 25, 2019, 3:29 a.m.
  2. Fun with binary data and SQLite - July 30, 2020, 11:22 p.m.
  3. Executing advanced ALTER TABLE operations in SQLite - Sept. 23, 2020, 1 a.m.
  4. Refactoring databases with sqlite-utils extract - Sept. 23, 2020, 4:02 p.m.
  5. Joining CSV and JSON data with an in-memory SQLite database - June 19, 2021, 10:55 p.m.
  6. Apply conversion functions to data in SQLite columns with the sqlite-utils CLI tool - Aug. 6, 2021, 6:05 a.m.
  7. What's new in sqlite-utils 3.20 and 3.21: --lines, --text, --convert - Jan. 11, 2022, 6:19 p.m.
  8. sqlite-utils now supports plugins - July 24, 2023, 5:06 p.m.

Prompt injection

A security vulnerability in software built on top of Large Language Models such as GPT-3, GPT-4, Claude, Llama, Mistral and Gemini.

  1. Prompt injection attacks against GPT-3 - Sept. 12, 2022, 10:20 p.m.
  2. I don't know how to solve prompt injection - Sept. 16, 2022, 4:28 p.m.
  3. You can't solve AI security problems with more AI - Sept. 17, 2022, 10:57 p.m.
  4. A new AI game: Give me ideas for crimes to do - Dec. 4, 2022, 3:11 p.m.
  5. Bing: "I will not harm you unless you harm me first" - Feb. 15, 2023, 3:05 p.m.
  6. Prompt injection: What's the worst that can happen? - April 14, 2023, 5:35 p.m.
  7. The Dual LLM pattern for building AI assistants that can resist prompt injection - April 25, 2023, 7 p.m.
  8. Prompt injection explained, with video, slides, and a transcript - May 2, 2023, 8:22 p.m.
  9. Delimiters won't save you from prompt injection - May 11, 2023, 3:51 p.m.
  10. Multi-modal prompt injection image attacks against GPT-4V - Oct. 14, 2023, 2:24 a.m.
  11. Prompt injection explained, November 2023 edition - Nov. 27, 2023, 3:55 a.m.
  12. Recommendations to help mitigate prompt injection: limit the blast radius - Dec. 20, 2023, 8:34 p.m.
  13. Prompt injection and jailbreaking are not the same thing - March 5, 2024, 4:05 p.m.
  14. Accidental prompt injection against RAG applications - June 6, 2024, 2 p.m.

Misconceptions about large language models

Large Language Models can behave in very unintuitive ways!

  1. ChatGPT couldn’t access the internet, even though it really looked like it could - March 10, 2023, 1:41 p.m.
  2. Don't trust AI to talk accurately about itself: Bard wasn't trained on Gmail - March 22, 2023, 3:13 a.m.
  3. Think of language models like ChatGPT as a "calculator for words" - April 2, 2023, 4:20 p.m.
  4. We need to tell people ChatGPT will lie to them, not debate linguistics - April 7, 2023, 4:34 p.m.
  5. Lawyer cites fake cases invented by ChatGPT, judge is not amused - May 27, 2023, 7:09 p.m.
  6. ChatGPT should include inline tips - May 30, 2023, 7:23 p.m.
  7. It's infuriatingly hard to understand how closed models train on their input - June 4, 2023, 6:09 p.m.
  8. ChatGPT in "4o" mode is not running the new features yet - May 15, 2024, 6:25 p.m.
  9. Training is not the same as chatting: ChatGPT and other LLMs don't remember everything you say - May 29, 2024, 10:51 a.m.
  10. ChatGPT will happily write you a thinly disguised horoscope - Oct. 15, 2024, 3:24 a.m.

CSS ain't rocket science

A CSS tutorial I wrote as a series of posts in 2003.

  1. Defending Structural Markup - May 4, 2003, 2:20 p.m.
  2. Delay to the start of my CSS tutorial series - May 6, 2003, 2:26 p.m.
  3. The anatomy of a stylesheet - May 18, 2003, 11:56 p.m.
  4. Scripting.com, with added CSS - May 19, 2003, 11:58 p.m.
  5. Defeating IE5 CSS bugs with the help of jwz - May 20, 2003, 11:58 p.m.
  6. Quick tip: Styling blockquotes with CSS - May 21, 2003, 11:54 p.m.
  7. CSS Tutorial: feedback so far - May 23, 2003, 11:59 p.m.
  8. Understanding the Box Model - May 26, 2003, 11:58 p.m.
  9. Fun with links - May 27, 2003, 11:58 p.m.

LLMs on personal devices

Large language models that can run on our own devices open up exciting new ways in which these tools can be used.

  1. Large language models are having their Stable Diffusion moment - March 11, 2023, 7:15 p.m.
  2. Stanford Alpaca, and the acceleration of on-device large language model development - March 13, 2023, 7:19 p.m.
  3. Could you train a ChatGPT-beating model for $85,000 and run it in a browser? - March 17, 2023, 3:43 p.m.
  4. Thoughts on AI safety in this era of increasingly powerful open source LLMs - April 10, 2023, 6:41 p.m.
  5. Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and it's very impressive - April 16, 2023, 3:10 p.m.
  6. Let's be bear or bunny - May 1, 2023, 6:37 p.m.
  7. Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - May 4, 2023, 4:05 p.m.
  8. My LLM CLI tool now supports self-hosted language models via plugins - July 12, 2023, 2:24 p.m.
  9. Run Llama 2 on your own Mac using LLM and Homebrew - Aug. 1, 2023, 6:56 p.m.
  10. llamafile is the new best way to run a LLM on your own computer - Nov. 29, 2023, 8:54 p.m.
  11. Many options for running Mistral models in your terminal using LLM - Dec. 18, 2023, 6:18 p.m.
  12. Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - Nov. 12, 2024, 11:37 p.m.

How it's trained

Investigating the training data behind different machine learning models.

  1. Exploring the training data behind Stable Diffusion - Sept. 5, 2022, 12:18 a.m.
  2. Exploring 10m scraped Shutterstock videos used to train Meta's Make-A-Video text-to-video model - Sept. 29, 2022, 7:31 p.m.
  3. Exploring MusicCaps, the evaluation data released to accompany Google's MusicLM text-to-music model - Jan. 27, 2023, 9:34 p.m.
  4. What's in the RedPajama-Data-1T LLM training set - April 17, 2023, 6:57 p.m.

Datasette Lite

A distribution of Datasette that runs entirely in the browser, using WebAssembly and Pyodide.

  1. Datasette Lite: a server-side Python web application running in a browser - May 4, 2022, 3:16 p.m.
  2. Joining CSV files in your browser using Datasette Lite - June 20, 2022, 9:20 p.m.
  3. Plugin support for Datasette Lite - Aug. 17, 2022, 6:20 p.m.
  4. Analyzing ScotRail audio announcements with Datasette - from prototype to production - Aug. 21, 2022, 2:04 a.m.
  5. Weeknotes: Datasette Lite, s3-credentials, shot-scraper, datasette-edit-templates and more - Sept. 16, 2022, 2:55 a.m.

VaccinateCA internal blog

I maintained an internal blog between February and April 2021 during my time at VaccinateCA / Vaccinate The States.

  1. Getting started - Feb. 22, 2021, 5 p.m.
  2. Spinning up a new Django app to act as a backend for VaccinateCA - Feb. 23, 2021, 5 p.m.
  3. Importing data from Airtable into Django, plus a search engine for all our code - Feb. 24, 2021, 5 p.m.
  4. Django admin customization, JSON in our PostgreSQL - Feb. 25, 2021, 5 p.m.
  5. Drawing the rest of the owl - March 1, 2021, 5 p.m.
  6. API ready for testing, first video status update - March 2, 2021, 5 p.m.
  7. Replaying logs to exercise the new API - March 3, 2021, 5 p.m.
  8. The simplest possible call queue - March 6, 2021, 5 p.m.
  9. New call queue ready to test. Also geography. - March 7, 2021, 5 p.m.
  10. APIs for importing locations - March 9, 2021, 5 p.m.
  11. VIAL is now live, plus django-sql-dashboard - March 15, 2021, 5 p.m.
  12. The Airtable formulas at the heart of everything - March 23, 2021, 5 p.m.
  13. VIAL: Preparing for some collaborative testing - April 1, 2021, 5 p.m.
  14. A CSV export, JSON import workflow for bulk updating our data - April 28, 2021, 5 p.m.

Git scraping

A technique for scraping content into a Git repository to track changes to it over time.

  1. Scraping hurricane Irma - Sept. 10, 2017, 6:21 a.m.
  2. Changelogs to help understand the fires in the North Bay - Oct. 10, 2017, 6:48 a.m.
  3. Generating a commit log for San Francisco's official list of trees - March 13, 2019, 2:49 p.m.
  4. Tracking PG&E outages by scraping to a git repo - Oct. 10, 2019, 11:32 p.m.
  5. Git scraping: track changes over time by scraping to a Git repository - Oct. 9, 2020, 6:27 p.m.
  6. Git scraping, the five minute lightning talk - March 5, 2021, 12:44 a.m.
  7. git-history: a tool for analyzing scraped data collected using Git and SQLite - Dec. 7, 2021, 10:32 p.m.
  8. Help scraping: track changes to CLI tools by recording their --help using Git - Feb. 2, 2022, 11:46 p.m.
  9. shot-scraper: automated screenshots for documentation, built on Playwright - March 10, 2022, 12:13 a.m.
  10. Scraping web pages from the command line with shot-scraper - March 14, 2022, 1:29 a.m.
  11. Automatically opening issues when tracked file content changes - April 28, 2022, 5:18 p.m.
  12. Measuring traffic during the Half Moon Bay Pumpkin Festival - Oct. 19, 2022, 3:41 p.m.
  13. Tracking Mastodon user numbers over time with a bucket of tricks - Nov. 20, 2022, 7 a.m.

Datasette: The annotated release notes

I like to accompany significant releases of my Datasette project with an annotated version of the release notes, providing extra background context on new features in the release.

  1. Datasette 0.44: The annotated release notes - June 12, 2020, 3:11 a.m.
  2. Datasette 0.45: The annotated release notes - July 1, 2020, 10:33 p.m.
  3. Datasette 0.49: The annotated release notes - Sept. 15, 2020, 11:45 p.m.
  4. Datasette 0.50: The annotated release notes - Oct. 9, 2020, 8:23 p.m.
  5. Datasette 0.54: The annotated release notes - Jan. 25, 2021, 5:31 p.m.
  6. Datasette 0.58: The annotated release notes - July 16, 2021, 2:21 a.m.
  7. Datasette Desktop 0.2.0: The annotated release notes - Sept. 13, 2021, 11:30 p.m.
  8. Datasette 0.59: The annotated release notes - Oct. 19, 2021, 4:59 a.m.
  9. Datasette 0.60: The annotated release notes - Jan. 14, 2022, 2:30 a.m.
  10. Datasette 0.61: The annotated release notes - March 24, 2022, 1:53 a.m.
  11. Datasette 0.63: The annotated release notes - Oct. 27, 2022, 10:13 p.m.
  12. Datasette's new JSON write API: The first alpha of Datasette 1.0 - Dec. 2, 2022, 11:15 p.m.
  13. Datasette 1.0a2: Upserts and finely grained permissions - Dec. 15, 2022, 5:58 p.m.
  14. Datasette 0.64, with a warning about SpatiaLite - Jan. 9, 2023, 9:22 p.m.
  15. Datasette 1.0a4 and 1.0a5, plus weeknotes - Aug. 30, 2023, 2:33 p.m.
  16. Datasette 1.0a8: JavaScript plugins, new plugin hooks and plugin configuration in datasette.yaml - Feb. 7, 2024, 4:37 p.m.
  17. Datasette 1.0a14: The annotated release notes - Aug. 5, 2024, 11:20 p.m.