Series of posts
My open source process
Articles about the process I use for developing my open source projects.
- Documentation unit tests - July 28, 2018, 3:59 p.m.
- How to cheat at unit tests with pytest and Black - Feb. 11, 2020, 6:56 a.m.
- Open source projects: consider running office hours - Feb. 19, 2021, 9:54 p.m.
- How to build, test and publish an open source Python library - Nov. 4, 2021, 10:02 p.m.
- How I build a feature - Jan. 12, 2022, 6:10 p.m.
- Writing better release notes - Jan. 31, 2022, 8:13 p.m.
- Software engineering practices - Oct. 1, 2022, 3:56 p.m.
- Automating screenshots for the Datasette documentation using shot-scraper - Oct. 14, 2022, 11:44 p.m.
- The Perfect Commit - Oct. 29, 2022, 8:41 p.m.
- Coping strategies for the serial project hoarder - Nov. 26, 2022, 3:47 p.m.
- Things I've learned about building CLI tools in Python - Sept. 30, 2023, 12:12 a.m.
- Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions - Jan. 16, 2024, 9:59 p.m.
How I use LLMs and ChatGPT
Posts about ways I'm using LLM tools such as ChatGPT in my own work. This series starts with my experiments using GPT-3 in June 2022, so if you are looking for more recent material be sure to scroll to the bottom!
- How to use the GPT-3 language model - June 5, 2022, 5:28 p.m.
- Using GPT-3 to explain how code works - July 9, 2022, 3:19 p.m.
- AI assisted learning: Learning Rust with ChatGPT, Copilot and Advent of Code - Dec. 5, 2022, 9:11 p.m.
- Over-engineering Secret Santa with Python cryptography and Datasette - Dec. 11, 2022, 2:03 a.m.
- I built a ChatGPT plugin to answer questions about data hosted in Datasette - March 24, 2023, 3:43 p.m.
- AI-enhanced development makes me more ambitious with my projects - March 27, 2023, 2:38 p.m.
- Running Python micro-benchmarks using the ChatGPT Code Interpreter alpha - April 12, 2023, 1:14 a.m.
- How I make annotated presentations - Aug. 6, 2023, 5:15 p.m.
- Now add a walrus: Prompt engineering in DALL‑E 3 - Oct. 26, 2023, 9:11 p.m.
- Exploring GPTs: ChatGPT in a trench coat? - Nov. 15, 2023, 3:39 p.m.
- Claude and ChatGPT for ad-hoc sidequests - March 22, 2024, 7:44 p.m.
- Building and testing C extensions for SQLite with ChatGPT Code Interpreter - March 23, 2024, 5:50 p.m.
- llm cmd undo last git commit - a new plugin for LLM - March 26, 2024, 3:37 p.m.
- Running OCR against PDFs and images directly in your browser - March 30, 2024, 5:59 p.m.
- Building files-to-prompt entirely using Claude 3 Opus - April 8, 2024, 8:40 p.m.
- AI for Data Journalism: demonstrating what we can do with this stuff right now - April 17, 2024, 9:04 p.m.
- Building search-based RAG using Claude, Datasette and Val Town - June 21, 2024, 8:44 p.m.
- django-http-debug, a new Django app mostly written by Claude - Aug. 8, 2024, 3:26 p.m.
- Building a tool showing how Gemini Pro can return bounding boxes for objects in images - Aug. 26, 2024, 4:55 a.m.
- Notes on using LLMs for code - Sept. 20, 2024, 3:10 a.m.
- Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent - Oct. 17, 2024, 12:32 p.m.
- Everything I built with Claude Artifacts this week - Oct. 21, 2024, 2:32 p.m.
- Run a prompt to generate and execute jq programs using llm-jq - Oct. 27, 2024, 4:26 a.m.
- You can now run prompts against images, audio and video in your terminal using LLM - Oct. 29, 2024, 3:09 p.m.
New features in sqlite-utils
Any time I introduce a significant new feature in a release of my sqlite-utils package I write about it here.
- sqlite-utils: a Python library and CLI tool for building SQLite databases - Feb. 25, 2019, 3:29 a.m.
- Fun with binary data and SQLite - July 30, 2020, 11:22 p.m.
- Executing advanced ALTER TABLE operations in SQLite - Sept. 23, 2020, 1 a.m.
- Refactoring databases with sqlite-utils extract - Sept. 23, 2020, 4:02 p.m.
- Joining CSV and JSON data with an in-memory SQLite database - June 19, 2021, 10:55 p.m.
- Apply conversion functions to data in SQLite columns with the sqlite-utils CLI tool - Aug. 6, 2021, 6:05 a.m.
- What's new in sqlite-utils 3.20 and 3.21: --lines, --text, --convert - Jan. 11, 2022, 6:19 p.m.
- sqlite-utils now supports plugins - July 24, 2023, 5:06 p.m.
Prompt injection
A security vulnerability in software built on top of Large Language Models such as GPT-3, GPT-4, Claude, Llama, Mistral and Gemini.
- Prompt injection attacks against GPT-3 - Sept. 12, 2022, 10:20 p.m.
- I don't know how to solve prompt injection - Sept. 16, 2022, 4:28 p.m.
- You can't solve AI security problems with more AI - Sept. 17, 2022, 10:57 p.m.
- A new AI game: Give me ideas for crimes to do - Dec. 4, 2022, 3:11 p.m.
- Bing: "I will not harm you unless you harm me first" - Feb. 15, 2023, 3:05 p.m.
- Prompt injection: What's the worst that can happen? - April 14, 2023, 5:35 p.m.
- The Dual LLM pattern for building AI assistants that can resist prompt injection - April 25, 2023, 7 p.m.
- Prompt injection explained, with video, slides, and a transcript - May 2, 2023, 8:22 p.m.
- Delimiters won't save you from prompt injection - May 11, 2023, 3:51 p.m.
- Multi-modal prompt injection image attacks against GPT-4V - Oct. 14, 2023, 2:24 a.m.
- Prompt injection explained, November 2023 edition - Nov. 27, 2023, 3:55 a.m.
- Recommendations to help mitigate prompt injection: limit the blast radius - Dec. 20, 2023, 8:34 p.m.
- Prompt injection and jailbreaking are not the same thing - March 5, 2024, 4:05 p.m.
- Accidental prompt injection against RAG applications - June 6, 2024, 2 p.m.
Misconceptions about large language models
Large Language Models can behave in very unintuitive ways!
- ChatGPT couldn’t access the internet, even though it really looked like it could - March 10, 2023, 1:41 p.m.
- Don't trust AI to talk accurately about itself: Bard wasn't trained on Gmail - March 22, 2023, 3:13 a.m.
- Think of language models like ChatGPT as a "calculator for words" - April 2, 2023, 4:20 p.m.
- We need to tell people ChatGPT will lie to them, not debate linguistics - April 7, 2023, 4:34 p.m.
- Lawyer cites fake cases invented by ChatGPT, judge is not amused - May 27, 2023, 7:09 p.m.
- ChatGPT should include inline tips - May 30, 2023, 7:23 p.m.
- It's infuriatingly hard to understand how closed models train on their input - June 4, 2023, 6:09 p.m.
- ChatGPT in "4o" mode is not running the new features yet - May 15, 2024, 6:25 p.m.
- Training is not the same as chatting: ChatGPT and other LLMs don't remember everything you say - May 29, 2024, 10:51 a.m.
- ChatGPT will happily write you a thinly disguised horoscope - Oct. 15, 2024, 3:24 a.m.
CSS ain't rocket science
A CSS tutorial I wrote as a series of posts in 2003.
- Defending Structural Markup - May 4, 2003, 2:20 p.m.
- Delay to the start of my CSS tutorial series - May 6, 2003, 2:26 p.m.
- The anatomy of a stylesheet - May 18, 2003, 11:56 p.m.
- Scripting.com, with added CSS - May 19, 2003, 11:58 p.m.
- Defeating IE5 CSS bugs with the help of jwz - May 20, 2003, 11:58 p.m.
- Quick tip: Styling blockquotes with CSS - May 21, 2003, 11:54 p.m.
- CSS Tutorial: feedback so far - May 23, 2003, 11:59 p.m.
- Understanding the Box Model - May 26, 2003, 11:58 p.m.
- Fun with links - May 27, 2003, 11:58 p.m.
LLMs on personal devices
Large language models that can run on our own devices open up exciting new ways in which these tools can be used.
- Large language models are having their Stable Diffusion moment - March 11, 2023, 7:15 p.m.
- Stanford Alpaca, and the acceleration of on-device large language model development - March 13, 2023, 7:19 p.m.
- Could you train a ChatGPT-beating model for $85,000 and run it in a browser? - March 17, 2023, 3:43 p.m.
- Thoughts on AI safety in this era of increasingly powerful open source LLMs - April 10, 2023, 6:41 p.m.
- Web LLM runs the vicuna-7b Large Language Model entirely in your browser, and it's very impressive - April 16, 2023, 3:10 p.m.
- Let's be bear or bunny - May 1, 2023, 6:37 p.m.
- Leaked Google document: "We Have No Moat, And Neither Does OpenAI" - May 4, 2023, 4:05 p.m.
- My LLM CLI tool now supports self-hosted language models via plugins - July 12, 2023, 2:24 p.m.
- Run Llama 2 on your own Mac using LLM and Homebrew - Aug. 1, 2023, 6:56 p.m.
- llamafile is the new best way to run a LLM on your own computer - Nov. 29, 2023, 8:54 p.m.
- Many options for running Mistral models in your terminal using LLM - Dec. 18, 2023, 6:18 p.m.
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - Nov. 12, 2024, 11:37 p.m.
How it's trained
Investigating the training data behind different machine learning models.
- Exploring the training data behind Stable Diffusion - Sept. 5, 2022, 12:18 a.m.
- Exploring 10m scraped Shutterstock videos used to train Meta's Make-A-Video text-to-video model - Sept. 29, 2022, 7:31 p.m.
- Exploring MusicCaps, the evaluation data released to accompany Google's MusicLM text-to-music model - Jan. 27, 2023, 9:34 p.m.
- What's in the RedPajama-Data-1T LLM training set - April 17, 2023, 6:57 p.m.
Datasette Lite
A distribution of Datasette that runs entirely in the browser, using WebAssembly and Pyodide.
- Datasette Lite: a server-side Python web application running in a browser - May 4, 2022, 3:16 p.m.
- Joining CSV files in your browser using Datasette Lite - June 20, 2022, 9:20 p.m.
- Plugin support for Datasette Lite - Aug. 17, 2022, 6:20 p.m.
- Analyzing ScotRail audio announcements with Datasette - from prototype to production - Aug. 21, 2022, 2:04 a.m.
- Weeknotes: Datasette Lite, s3-credentials, shot-scraper, datasette-edit-templates and more - Sept. 16, 2022, 2:55 a.m.
VaccinateCA internal blog
I maintained an internal blog between February and April 2021 during my time at VaccinateCA / Vaccinate The States.
- Getting started - Feb. 22, 2021, 5 p.m.
- Spinning up a new Django app to act as a backend for VaccinateCA - Feb. 23, 2021, 5 p.m.
- Importing data from Airtable into Django, plus a search engine for all our code - Feb. 24, 2021, 5 p.m.
- Django admin customization, JSON in our PostgreSQL - Feb. 25, 2021, 5 p.m.
- Drawing the rest of the owl - March 1, 2021, 5 p.m.
- API ready for testing, first video status update - March 2, 2021, 5 p.m.
- Replaying logs to exercise the new API - March 3, 2021, 5 p.m.
- The simplest possible call queue - March 6, 2021, 5 p.m.
- New call queue ready to test. Also geography. - March 7, 2021, 5 p.m.
- APIs for importing locations - March 9, 2021, 5 p.m.
- VIAL is now live, plus django-sql-dashboard - March 15, 2021, 5 p.m.
- The Airtable formulas at the heart of everything - March 23, 2021, 5 p.m.
- VIAL: Preparing for some collaborative testing - April 1, 2021, 5 p.m.
- A CSV export, JSON import workflow for bulk updating our data - April 28, 2021, 5 p.m.
Git scraping
A technique for scraping content into a Git repository to track changes to it over time.
- Scraping hurricane Irma - Sept. 10, 2017, 6:21 a.m.
- Changelogs to help understand the fires in the North Bay - Oct. 10, 2017, 6:48 a.m.
- Generating a commit log for San Francisco's official list of trees - March 13, 2019, 2:49 p.m.
- Tracking PG&E outages by scraping to a git repo - Oct. 10, 2019, 11:32 p.m.
- Git scraping: track changes over time by scraping to a Git repository - Oct. 9, 2020, 6:27 p.m.
- Git scraping, the five minute lightning talk - March 5, 2021, 12:44 a.m.
- git-history: a tool for analyzing scraped data collected using Git and SQLite - Dec. 7, 2021, 10:32 p.m.
- Help scraping: track changes to CLI tools by recording their --help using Git - Feb. 2, 2022, 11:46 p.m.
- shot-scraper: automated screenshots for documentation, built on Playwright - March 10, 2022, 12:13 a.m.
- Scraping web pages from the command line with shot-scraper - March 14, 2022, 1:29 a.m.
- Automatically opening issues when tracked file content changes - April 28, 2022, 5:18 p.m.
- Measuring traffic during the Half Moon Bay Pumpkin Festival - Oct. 19, 2022, 3:41 p.m.
- Tracking Mastodon user numbers over time with a bucket of tricks - Nov. 20, 2022, 7 a.m.
Datasette: The annotated release notes
I like to accompany significant releases of my Datasette project with an annotated version of the release notes, providing extra background context on new features in the release.
- Datasette 0.44: The annotated release notes - June 12, 2020, 3:11 a.m.
- Datasette 0.45: The annotated release notes - July 1, 2020, 10:33 p.m.
- Datasette 0.49: The annotated release notes - Sept. 15, 2020, 11:45 p.m.
- Datasette 0.50: The annotated release notes - Oct. 9, 2020, 8:23 p.m.
- Datasette 0.54: The annotated release notes - Jan. 25, 2021, 5:31 p.m.
- Datasette 0.58: The annotated release notes - July 16, 2021, 2:21 a.m.
- Datasette Desktop 0.2.0: The annotated release notes - Sept. 13, 2021, 11:30 p.m.
- Datasette 0.59: The annotated release notes - Oct. 19, 2021, 4:59 a.m.
- Datasette 0.60: The annotated release notes - Jan. 14, 2022, 2:30 a.m.
- Datasette 0.61: The annotated release notes - March 24, 2022, 1:53 a.m.
- Datasette 0.63: The annotated release notes - Oct. 27, 2022, 10:13 p.m.
- Datasette's new JSON write API: The first alpha of Datasette 1.0 - Dec. 2, 2022, 11:15 p.m.
- Datasette 1.0a2: Upserts and finely grained permissions - Dec. 15, 2022, 5:58 p.m.
- Datasette 0.64, with a warning about SpatiaLite - Jan. 9, 2023, 9:22 p.m.
- Datasette 1.0a4 and 1.0a5, plus weeknotes - Aug. 30, 2023, 2:33 p.m.
- Datasette 1.0a8: JavaScript plugins, new plugin hooks and plugin configuration in datasette.yaml - Feb. 7, 2024, 4:37 p.m.
- Datasette 1.0a14: The annotated release notes - Aug. 5, 2024, 11:20 p.m.