Weeknotes: DjangoCon, SQLite in Django, datasette-gunicorn
23rd October 2022
I spent most of this week at DjangoCon in San Diego—my first outside-of-the-Bay-Area conference since the before-times.
It was a most excellent event. I spent a lot of time in the corridor track—actually the sitting-outside-in-the-sunshine track, catching up with people I haven’t seen in several years.
I gave a talk titled "Massively increase your productivity on personal projects with comprehensive documentation and automated tests", with the alternative title "Coping strategies for the serial project hoarder". I’ll do a full write-up of this once the video is made available in a few weeks time, but in the meantime the talk materials can be found here:
- Supporting notes and links
- Slides on Speaker Deck
- Video for paying DjangoCon attendees (public video coming soon)
I also gave a lightning talk about AI and magic, which was effectively the five minute oral version of my recent blog post Is the AI spell-casting metaphor harmful or helpful?
Benchmarking SQLite in Django
I also hung around for the first day of the DjangoCon sprints.
For over a decade, the Django documentation has warned against using SQLite in production—recommending PostgreSQL or MySQL instead.
I asked Django Fellow Carlton Gibson what it would take to update that advice for 2022. He suggested that what we really needed was a solid idea for how well modern SQLite performs with Django, against a variety of different settings.
So I spent some time running benchmarks, using my new django_sqlite_benchmark repository.
You can follow the full details of my experiments in these issues:
- #2: Locust test to exercise /counter/xxx endpoint which runs benchmarks against a simple Django view that increments a counter stored in a SQLite table
- #3: Load test for larger writes runs a benchmark using a script that inserts larger JSON objects into a database table. I also tried this against PostgreSQL, getting very similar numbers to SQLite.
- #4: Benchmark endpoint that doesn’t interact with database benchmarks a simple “hello world” view that doesn’t use SQLite at all—as a baseline for comparison
I used Locust for all of these tests, and wrote up a TIL about using it as well.
Here’s the TLDR version of the results: SQLite in its default “journal” mode starts returning “database locked” errors pretty quickly as the write load increases. But... if you switch to “wal” mode (here’s how) those errors straight up vanish!
I was expecting WAL mode to improve things, but I thought I’d still be able to hit errors even with it enabled. No—it turns out that, at least for the amount of traffic I could generate on may laptop, WAL mode proved easily capable of handling the load.
Even without WAL mode, bumping the SQLite “timeout” option up to 20s solved most of the errors.
Even more interestingly: I tried using Gunicorn (and Uvicorn) to run multiple Django workers at once. I was certain this would lead to problems, as SQLite isn’t designed to handle writes from multiple processes at once... or so I thought. It turned out SQLite’s use of file locking meant everything worked far better than I expected—and upping the number of worker processes from 1 to 4 resulted in approximately a 4x increase in throughput.
I shouldn’t be surprised by this, if only because every time I’ve tried to push SQLite in a new direction it’s impressed me with how much more capable it is than I expected.
But still, these results are very exciting. This problem still needs more thorough testing and more eyes than just mine, but I think this indicates that SQLite should absolutely be considered a viable option for running Django in production in 2022.
datasette-gunicorn
Datasette has always run as a single process. It uses Uvicorn to serve requests, but it hard-codes Uvicorn to a single worker (here).
Based on my experiments with SQLite and Django—in particular how running multiple worker processes gave me an increase in how much traffic I could handle—I decided to try the same thing with Datasette itself.
Gunicorn remains one of the most well regarded options for deploying Python web applications. It acts as a process monitor, balancing requests between different workers and restarting anything that fails with an error.
I decided to experiment with this through the medium of a Datasette plugin. So I built datasette-gunicorn, a plugin that adds an extra command to Datasette that lets you start it like this:
datasette gunicorn my.db --workers 4
It takes most of the same arguments as Datasette’s regular datasette serve
command, plus that new -w/--workers
option for setting the number of workers.
Initial benchmarks were very positive: 21 requests a second with a single worker, increasing to 75 requests/second with four! Not bad for an initial experiment. I also tested it serving a static page through Datasette and got up to over 500 requests a second with a warning that Locust needed to be moved to a separate machine for a full load test.
In writing the plugin I had to figure out how to build a new command that mostly copied parameters from the existing datasette serve
Click command—I wrote a TIL about how I ended up doing that.
shot-scraper 1.0
Also this week: I released shot-scraper 1.0.
Despite the exciting version number this actually only has two small new features. Here’s the full changelog:
- New
shot-scraper html URL
command (documented here) for outputting the final HTML of a page, after JavaScript has been executed. #96shot-scraper javascript
has a new-r/--raw
option for outputting the result of the JavaScript expression as a raw string rather than JSON encoded (shot-scraper javascript documentation). #95- Tutorial: Automating screenshots for the Datasette documentation using shot-scraper.
I bumped it to 1.0 because shot-scraper
is mature enough now that I’m ready to commit to not breaking existing features (at least without shipping a 2.0, which I hope to avoid for as long as possible).
I’m always trying to get more brave when it comes to stamping a 1.0 release on my main projects.
(I really, really need to get Datasette 1.0 shipped soon.)
Releases this week
-
datasette-gunicorn: 0.1—2022-10-22
Plugin for running Datasette using Gunicorn -
shot-scraper: 1.0—(23 releases total)—2022-10-15
A command-line utility for taking automated screenshots of websites -
asgi-gzip: 0.2—(2 releases total)—2022-10-13
gzip middleware for ASGI applications, extracted from Starlette
TIL this week
More recent articles
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024
- Visualizing local election results with Datasette, Observable and MapLibre GL - 9th November 2024
- Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5 - 7th November 2024