Cross-database queries in SQLite (and weeknotes)
21st February 2021
I released Datasette 0.55 and sqlite-utils 3.6 this week with a common theme across both releases: supporting cross-database joins.
Cross-database queries in Datasette
SQLite databases are single files on disk. I really love this characteristic—it makes them easy to create, copy and move around. All you need is a disk volume and you can create as many SQLite databases as you like.
A lesser known feature of SQLite is that you can run queries, including joins, across tables from more than one database. The secret sauce is the ATTACH DATABASE command. Run the following SQL:
ATTACH 'other.db' AS other;
And now you can reference tables in that database as other.tablename
. You can then join against them, combine them with UNION
and generally treat them as if they were another table in your first connected database.
I’ve wanted to add support for cross-database queries to Datasette since May 2018. It took me quite a while to settle on a design—SQLite defaults to only allowing ten databases to be attached together, and I needed to figure out how multiple connected databases would fit with the design of the rest of Datasette.
In the end, I decided on the simplest option that would unlock the feature. Run Datasette with the new --crossdb
option and the first ten databases passed to Datasette will be ATTACHed to an in-memory database available at the /_memory
URL.
The latest.datasette.io
demo now exposes two databases using this feature. Here’s an illustrative example query that performs a UNION across the sqlite_master
metadata table in two databases:
select
'fixtures' as database, *
from
[fixtures].sqlite_master
union
select
'extra_database' as database, *
from
[extra_database].sqlite_master
Cross-database queries in sqlite-utils
sqlite-utils offers both a Python library and a command-line utility in one package. I’ve added ATTACH
support to both.
The Python library support looks like this:
db = Database("first.db") db.attach("second", "second.db") # Now you can run queries like this: cursor = db.execute(""" select * from table_in_first union all select * from second.table_in_second """) print(cursor.fetchall())
The command-line tool now has a new --attach option which lets you attach a database using an alias. The equivalent query to the above would look like this:
$ sqlite-utils first.db --attach second second.db '
select * from table_in_first
union all
select * from second.table_in_second'
This defaults to returning results as a JSON array, but you can add --csv
or --tsv
or other options to get the results back in different output formats.
A cosmetic upgrade to tags on my blog
I noticed that Will Larson’s blog shows little numbers next to the tags indicating how many times they have been used. I really liked that, so I’ve implemented it here as well.
Each entry (and quotation and link) now gets a block in the sidebar that looks like this:
As a long-time fan of faceted search interfaces I really like this upgrade—it helps indicate at a glance the kind of content I have stashed away in my blog’s archive.
Releases this week
-
datasette-json-preview: 0.3—2021-02-20
Preview of new JSON default format for Datasette -
sqlite-utils: 3.6—2021-02-19
Python CLI utility and library for manipulating SQLite databases -
datasette: 0.55—2021-02-19
An open source multi-tool for exploring and publishing data -
datasette-graphql: 1.4—2021-02-18
Datasette plugin providing an automatic GraphQL API for your SQLite databases -
higher-lower: 0.1—2021-02-16
Functions for finding numbers using higher/lower -
download-tiles: 0.4.1—2021-02-16
Download map tiles and store them in an MBTiles database
TIL this week
More recent articles
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024
- Visualizing local election results with Datasette, Observable and MapLibre GL - 9th November 2024
- Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5 - 7th November 2024