Datasette 1.0a14: The annotated release notes
5th August 2024
Released today: Datasette 1.0a14. This alpha includes significant contributions from Alex Garcia, including some backwards-incompatible changes in the run-up to the 1.0 release.
- Metadata now lives in a database
- datasette-remote-metadata 0.2a0
- SQLite isolation_level=“IMMEDIATE”
- Updating the URLs
- Everything else
- Tricks to help construct the release notes
Metadata now lives in a database
The biggest change in the alpha concerns how Datasette’s metadata system works.
Datasette can record and serve metadata about the databases, tables and columns that it is serving. This includes things like the source of the data, the license it is made available under and descriptions of the tables and columns.
Historically this has been powered by a metadata.json
file. Over time, this file grew to include all sorts of things that weren’t strictly metadata—things like plugin configuration. Cleaning this up is a major breaking change for Datasette 1.0, and Alex has been working on this across several alphas.
The latest alpha adds a new upgrade guide describing changes plugin authors will need to make to support the new metadata system.
The big change in 1.0a14 is that metadata now lives in Datasette’s hidden _internal
SQLite database, in four new tables called metadata_instance
, metadata_databases
, metadata_resources
and metadata_columns
. The schema for these is now included in the documentation (updated using this Cog code), but rather than accessing those tables directly plugins are encouraged to use the new set_*_metadata() and get_*_metadata() methods on the Datasette
class.
I plan to use these new tables to build a new performant, paginated homepage that shows all of the databases and tables that Datasette is serving, complete with their metadata—without needing to make potentially hundreds of calls to the now-removed get_metadata()
plugin hook.
datasette-remote-metadata 0.2a0
When introducing new plugin internals like this it’s always good to accompany them with a plugin that exercises them. datasette-remote-metadata is a few years old now, and provides a mechanism for hosting the metadata for a Datasette instance at a separate URL. This means you can deploy a stateless Datasette instance with a large database and later update the attached metadata without having to re-deploy the whole thing.
I released a new alpha of that plugin which switches over to the new metadata mechanism. The core code ended up looking like this, imitating code Alex wrote for Datasette Core:
async def apply_metadata(datasette, metadata_dict): for key in metadata_dict or {}: if key == "databases": continue await datasette.set_instance_metadata(key, metadata_dict[key]) # database-level for dbname, db in metadata_dict.get("databases", {}).items(): for key, value in db.items(): if key == "tables": continue await datasette.set_database_metadata(dbname, key, value) # table-level for tablename, table in db.get("tables", {}).items(): for key, value in table.items(): if key == "columns": continue await datasette.set_resource_metadata(dbname, tablename, key, value) # column-level for columnname, column_description in table.get("columns", {}).items(): await datasette.set_column_metadata( dbname, tablename, columnname, "description", column_description )
SQLite isolation_level=“IMMEDIATE”
Sylvain Kerkour wrote about the benefits of IMMEDIATE transactions back in February. The key issue here is that SQLite defaults to starting transactions in DEFERRED
mode, which can lead to SQLITE_BUSY
errors if a transaction is upgraded to a write transaction mid-flight. Starting in IMMEDIATE
mode for Datasette’s dedicated write connection should help avoid this.
Frustratingly I failed to replicate the underlying problem in my own tests, despite having anecdotally seen it happen in the past.
After spending more time than I had budgeted for on this, I decided to ship it as an alpha to get it properly exercised before the 1.0 stable release.
Updating the URLs
Here’s another change that was important to get out before 1.0.
Datasette’s URL design had a subtle blemish. The following page had two potential meanings:
-
/databasename
—list all of the tables in the specified database -
/databasename?sql=
—execute an arbitrary SQL query against that database
This also meant that the JSON structure returned by /database.json
v.s. /database.json?sql=
was different.
Alex and I decided to fix that. Alex laid out the new design in issue #2360—there are quite a few other changes, but the big one is that we are splitting out the SQL query interface to a new URL: /databasename/-/query?sql=
—or /databasename/-/query.json?sql=
for the JSON API.
We’ve added redirects from the old URLs to the new ones, so existing links should continue to work.
Everything else
- Fix for a bug where canned queries with named parameters could fail against SQLite 3.46. (#2353)
This reflects a bug fix that went out in Datasette 0.64.7.
- Datasette now serves
E-Tag
headers for static files. Thanks, Agustin Bacigalup. (#2306)
There’s still more to be done making Datasette play well with caches, but this is a great, low-risk start.
- Dropdown menus now use a
z-index
that should avoid them being hidden by plugins. (#2311)
A cosmetic bug that showed up on Datasette Cloud when using the datasette-cluster-map plugin.
- Incorrect table and row names are no longer reflected back on the resulting 404 page. (#2359)
This was reported as a potential security issue. The table names were correctly escaped, so this wasn’t an XSS, but there was still potential for confusion if an attacker constructed a URL along the lines of /database-does-not-exist-visit-www.attacker.com-for-more-info
. A similar fix went out in Datasette 0.64.8.
- Improved documentation for async usage of the track_event(datasette, event) hook. (#2319)
- Fixed some HTTPX deprecation warnings. (#2307)
- Datasette now serves a
<html lange="en">
attribute. Thanks, Charles Nepote. (#2348)- Datasette’s automated tests now run against the maximum and minimum supported versions of SQLite: 3.25 (from September 2018) and 3.46 (from May 2024). Thanks, Alex Garcia. (#2352)
- Fixed an issue where clicking twice on the URL output by
datasette --root
produced a confusing error. (#2375)
Tricks to help construct the release notes
I still write the Datasette release notes entirely by hand (aside from a few words auto-completed by GitHub Copilot)—I find the process of writing them to be really useful as a way to construct a final review of everything before it goes out.
I used a couple of tricks to help this time. I always start my longer release notes with an issue. The GitHub diff view is useful for seeing what’s changed since the last release, but I took it a step further this time with the following shell command:
git log --pretty=format:"- %ad: %s %h" --date=short --reverse 1.0a13...81b68a14
This outputs a summary of each commit in the range, looking like this (truncated):
- 2024-03-12: Added two things I left out of the 1.0a13 release notes 8b6f155b
- 2024-03-15: Fix httpx warning about app=self.app, refs #2307 5af68377
- 2024-03-15: Fixed cookies= httpx warning, refs #2307 54f5604c
...
Crucially, the syntax of this output is in GitHub Flavored Markdown—and pasting it into an issue comment causes both the issue references and the commit hashes to be expanded into links that look like this:
It’s a neat way to get a quick review of what’s changed, and also means that those issues will automatically link back to the new issue where I’m constructing the release notes.
I wrote this up in a TIL here, along with another trick I used where I used LLM to get Claude 3.5 Sonnet to summarize my changes for me:
curl 'https://github.com/simonw/datasette/compare/1.0a13...2ad51baa3.diff' \
| llm -m claude-3.5-sonnet --system \
'generate a short summary of these changes, then a bullet point list of detailed release notes'
More recent articles
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024
- Visualizing local election results with Datasette, Observable and MapLibre GL - 9th November 2024
- Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5 - 7th November 2024