Simon Willison’s Weblog

Subscribe

Weeknotes: datasette-enrichments, datasette-comments, sqlite-chronicle

8th December 2023

I’ve mainly been working on Datasette Enrichments and continuing to explore the possibilities enabled by sqlite-chronicle.

Enrichments

This is the biggest new Datasette feature to arrive in quite a while, and it’s entirely implemented as a plugin.

I described these in detail in Datasette Enrichments: a new plugin framework for augmenting your data (with an accompanying YouTube video demo). The short version: you can now install plugins that can “enrich” data by running transformations (or data fetches) against selected rows—geocoding addresses, or executing a GPT prompt, or applying a regular expression.

The datasette-enrichments plugin provides the mechanism for running these enrichments. Other plugins can then depend on it and define all manner of interesting options for enriching and transforming data.

I’ve built four of these so far, and I wrote some extensive documentation to help people build more. I’m excited to see how people use and build further on this initial foundation.

Datasette Comments

Alex Garcia released the first version of datasette-comments as part of our continuing collaboration to build out Datasette Cloud.

He wrote about that on the Datasette Cloud blog: Annotate and explore your data with datasette-comments.

Three comment threads demonstrating features of Datasette Comments - replies, reaction emoji, hashtags and the ability to mention other users.

This is another capability I’ve been looking forward to for years: the plugin lets you leave comments on individual rows within a Datasette instance, in order to collaborate with others on finding stories in data.

sqlite-chronicle and datasette-chronicle

I first wrote about sqlite-chronicle in weeknotes back in September. This week, inspired by my work on embeddings, I spent a bit more time on it and shipped a 0.2 release.

sqlite-chronicle is a Python library that implements a SQL pattern where a table can have a _chronicle_tablename companion table created, which is then updated using triggers against the main table.

The chronicle table has a shadow row for every row in the main table, duplicating its primary keys and then storing millisecond timestamp columns for added_ms and updated_ms, an integer version column and a deleted boolean indicator.

The goal is to record when a row was last inserted or updated, with an atomically incrementing version ID representing the version of the entire table.

This can then enable all sorts of interesting potential use-cases:

  • Identify which rows have been updated or inserted since a previously recorded version
  • Synchronize a table with another table, only updating/inserting/deleting rows that have changed since last time
  • Run scheduled tasks that only consider rows that have changed in some way

The relevance to enrichments is that I’d like to implement a form of “persistent” enrichment—an enrichment which is configured to run repeatedly against new or updated rows, geocoding new addresses for example.

To do that, I need a mechanism to identify which rows have already been enriched and which need to be enriched again. sqlite-chronicle is my current plan to provide that mechanism.

It’s still pretty experimental. I recently found that INSERT OR REPLACE INTO queries don’t behave how I would expect them to, see issue #7.

I also started a new plugin to accompany the feature: datasette-chronicle, which adds two features to Datasette:

  • “enable/disable chronicle tracking” table actions for users with the correct permissions, which can be used in the Datasette UI to turn chronicle tracking on and off for a specific table
  • For tables that have chronicle enabled, a ?_since=VERSION querystring parameter which can be used to filter the table to only rows that have changed since the specified version

I’m running the plugin against the documents table on demos.datasette.cloud—see _chronicle_documents there for the result. That table is populated via GitHub scheduled actions and the Datasette API, as described in Getting started with the Datasette Cloud API—it’s also where I first spotted the INSERT OR REPLACE INTO issue I described earlier.

Newsroom Robots

I recorded an episode of the Newsroom Robots AI in journalism podcast with Nikita Roy a couple of weeks ago.

She split our conversation into two episodes:

sqlite-utils 3.36

Quoting the release notes.

  • Support for creating tables in SQLite STRICT mode. Thanks, Taj Khattra. (#344)
    • CLI commands create-table, insert and upsert all now accept a --strict option.
    • Python methods that can create a table—table.create() and insert/upsert/insert_all/upsert_all all now accept an optional strict=True parameter.
    • The transform command and table.transform() method preserve strict mode when transforming a table.
  • The sqlite-utils create-table command now accepts str, int and bytes as aliases for text, integer and blob respectively. (#606)

Taj Khattra’s contribution of the --strict and strict=True options is a beautiful example of my ideal pull request: a clean implementation, comprehensive tests and thoughtful updates to the documentation all bundled together in one go.

Releases

TILs