Weeknotes: datasette-remote-metadata, sqlite-transform --multi
29th July 2021
I mentioned Project Pelican (still a codename until the end of the embargo) last week. This week it inspired a new plugin, datasette-remote-metadata.
I also published an article describing the Baked Data architectural pattern and shared the video and transcript of a talk I gave at PyGotham last year.
datasette-remote-metadata
Datasette’s metadata system is one of the key ways of customizing a Datasette instance: it lets you add extra information about the tables hosted by the instance, lets you configure canned queries and also (due to some scope creep that I’d love to clean up) lets you configure options for various Datasette plugins.
There’s just one catch: when you update the metadata for an instance, you need to re-deploy the whole thing. I increasingly find myself using Datasette and Cloud Run to host databases that are over 1GB in size, so re-deploying just to tweak some text on the homepage is less than ideal.
The datasette-remote-metadata plugin uses the new get_metadata() plugin hook from Datasette 0.58 to work around this problem.
Essentially, it lets you tell Datasette “load your metadata from this external URL”. You provide it with a URL to a JSON or YAML file hosted elsewhere and the plugin will load that file on startup and periodically (by default every 30 seconds) check to see if it has changed.
Those checks for updates only happen if Datasette is receiving traffic. Datasette will try for up to 200ms, then continue the fetch but serve the request using the now-stale metadata rather than block for longer than that.
I’m trying this out for Project Pelican and it seems to be working great—which means we can iterate on the metadata (including the description_html
block displayed on the Datasette homepage) without needing to constantly re-deploy the 700+ MB SQLite database file to Cloud Run.
sqlite-transform --multi
sqlite-transform is my command-line tool for running transformations against columns of data in a SQLite database—parsing dates, splitting comma-separated lists into JSON arrays or using custom Python code to apply any custom transformation.
The latest release adds support for splitting one input column into multiple output columns. This is best explained with an example: consider a database table with a location
column that contains latitude,longitude
points which you would like to split out into separate latitude
and longitude
columns.
sqlite-transform lambda demo.db places location \
--code 'return {
"latitude": float(value.split(",")[0]),
"longitude": float(value.split(",")[1]),
}' --multi --drop
The new --multi
option means that the passed in --code
will return a Python dictionary, and the tool should then add new columns to the specified table for each of the returned dictionary keys and populate them with the corresponding values.
The new --drop
option tells the tool to drop that location
column after the transformation has been completed.
Releases this week
-
datasette-remote-metadata: 0.1—(2 releases total)—2021-07-29
Periodically refresh Datasette metadata from a remote URL -
datasette-haversine: 0.2—2021-07-28
Datasette plugin that adds a custom SQL function for haversine distances -
datasette-publish-vercel: 0.10—(16 releases total)—2021-07-25
Datasette plugin for publishing data using Vercel -
sqlite-transform: 1.2—(9 releases total)—2021-07-25
Tool for running transformations on columns in a SQLite database -
sqlite-utils: 3.13—(81 releases total)—2021-07-24
Python CLI utility and library for manipulating SQLite databases
TIL this week
More recent articles
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024
- Visualizing local election results with Datasette, Observable and MapLibre GL - 9th November 2024
- Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5 - 7th November 2024