Weeknotes: git-history, bug magnets and s3-credentials --public
8th December 2021
I’ve stopped considering my projects “shipped” until I’ve written a proper blog entry about them, so yesterday I finally shipped git-history, coinciding with the release of version 0.6—a full 27 days after the first 0.1.
It took way more work than I was expecting to get to this point!
I wrote the first version of git-history
in an afternoon, as a tool for a workshop I was presenting on Git scraping and Datasette.
Before promoting it more widely, I wanted to make some improvements to the schema. In particular, I wanted to record only the updated values in the item_version
table—which otherwise could end up duplicating a full copy of each item in the database hundreds or even thousands of times.
Getting this right took a lot of work, and I kept on getting stumped by weird bugs and edge-cases. This bug in particular added a couple of days to the project.
The whole project turned out to be something of a bug magnet, partly because of a design decision I made concerning column names.
git-history
creates tables with columns that correspond to the underlying data. Since it also needs its own columns for tracking things like commits and incremental versions, I decided to use underscore prefixes for reserved columns such as _item
and _version
Datasette uses underscore prefixes for its own purposes—special table arguments such as ?_facet=column-name
. It’s supposed to work with existing columns that use underscores by converting query string arguments like ?_item=3
into ?_item__exact=3
—but git-history
was the first of my projects to really exercise this, and I kept on finding bugs. Datasette 0.59.2 and 0.59.4 both have related bug fixes, and there’s a re-opened bug that I have yet to resolve.
Building the ca-fires demo also revealed a bug in datasette-cluster-map which I fixed in version 0.17.2.
s3-credentials --public
The git-history
live demos are built and deployed by this GitHub Actions workflow. The workflow works by checking out three separate repos and running git-history
against them. It takes advantage of that tool’s ability to add just new commits to an existing database to run faster, so it needs to persist database files in between runs.
Since these files can be several hundred MBs, I decided to persist them in an S3 bucket.
My s3-credentials tool provides the ability to create a new S3 bucket along with restricted read-write credentials just for that bucket, ideal for use in a GitHub Actions workflow.
I decided to make the bucket public such that anyone can download files from it, since there was no reason to keep it private. I’ve been wanting to add this ability to s3-credentials
for a while now, so this was the impetus I needed to finally ship that feature.
It’s surprisingly hard to figure out how to make an S3 bucket public these days! It turned out the magic recipe was adding a JSON bucket policy document to the bucket granting s3:GetObject
permission to principal *
—here’s that policy in full.
I released s3-credentials 0.8 with a new --public
option for creating public buckets—here are the release notes in full:
s3-credentials create my-bucket --public
option for creating public buckets, which allow anyone with knowledge of a filename to download that file. This works by attaching this public bucket policy to the bucket after it is created. #42s3-credentials put-object
now sets theContent-Type
header on the uploaded object. The type is detected based on the filename, or can be specified using the new--content-type
option. #43s3-credentials policy my-bucket --public-bucket
outputs the public bucket policy that would be attached to a bucket of that name. #44
I wrote up this TIL which doubles as a mini-tutorial on using s3-credentials
: Storing files in an S3 bucket between GitHub Actions runs.
datasette-hovercards
This was a quick experiment which turned into a prototype Datasette plugin. I really like how GitHub show hover card previews of links to issues in their interface:
I decided to see if I could build something similar for links within Datasette, specifically the links that show up when a column is a foreign key to another record.
Here’s what I’ve got so far:
There’s an interactive demo running on this table page.
It still needs a bunch of work—in particular I need to think harder about when the card is shown, where it displays relative to the mouse pointer, what causes it to be hidden again and how it should handle different page widths. Ideally I’d like to figure out a useful mobile / touch-screen variant, but I’m not sure how that could work.
The prototype plugin is called datasette-hovercards—I’d like to eventually merge this back into Datasette core once I’m happy with how it works.
Releases this week
-
git-history: 0.6.1—(9 releases total)—2021-12-08
Tools for analyzing Git history using SQLite -
datasette-cluster-map: 0.17.2—(20 releases total)—2021-12-07
Datasette plugin that shows a map for any data with latitude/longitude columns -
s3-credentials: 0.8—(8 releases total)—2021-12-07
A tool for creating credentials for accessing S3 buckets -
asyncinject: 0.2a1—(3 releases total)—2021-12-03
Run async workflows using pytest-fixtures-style dependency injection -
datasette-hovercards: 0.1a0—2021-12-02
Add preview hovercards to links in Datasette -
github-to-sqlite: 2.8.3—(22 releases total)—2021-12-01
Save data from GitHub to a SQLite database
TIL this week
More recent articles
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024
- Visualizing local election results with Datasette, Observable and MapLibre GL - 9th November 2024
- Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5 - 7th November 2024