February 2021
Feb. 1, 2021
JMeter Result Analysis using Datasette (via) NaveenKumar Namachivayam wrote a detailed tutorial on using Datasette (on Windows) and csvs-to-sqlite to analyze the results of JMeter performance test runs and then publish them online using Vercel.
Feb. 3, 2021
Cleaning Up Your Postgres Database (via) Craig Kerstiens provides some invaluable tips on running an initial check of the health of a PostgreSQL database, by using queries against the pg_statio_user_indexes table to find the memory cache hit ratio and the pg_stat_user_tables table to see what percentage of queries to your tables are using an index.
Feb. 4, 2021
Serving map tiles from SQLite with MBTiles and datasette-tiles
Working on datasette-leaflet last week re-kindled my interest in using Datasette as a GIS (Geographic Information System) platform. SQLite already has strong GIS functionality in the form of SpatiaLite and datasette-cluster-map is currently the most downloaded plugin. Most importantly, maps are fun!
[... 1,334 words]Feb. 7, 2021
Weeknotes: Mostly messing around with map tiles
Most of what I worked on this week was covered in Serving map tiles from SQLite with MBTiles and datasette-tiles. I built two new plugins: datasette-tiles for serving map tiles, and datasette-basemap which bundles map tiles for zoom levels 0-6 of OpenStreetMap. I also released download-tiles for downloading tiles and bundling them into an MBTiles database.
[... 350 words]Video introduction to Datasette and sqlite-utils
I put together a 17 minute video introduction to Datasette and sqlite-utils for FOSDEM 2021, showing how you can use Datasette to explore data, and demonstrating using the sqlite-utils
command-line tool to convert a CSV file into a SQLite database, and then publish it using datasette publish
. Here’s the video, plus annotated screen captures with further links and commentary.
Feb. 8, 2021
Finally, remember that whatever choice is made, you’re going to need to get behind it! You should be able to make a compelling positive case for any of the options you present. If there’s an option you can’t support, don’t present it.
Feb. 10, 2021
Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies (via) Alex Birsan describes a new category of security vulnerability he discovered in the npm, pip and gem packaging ecosystems: if a company uses a private repository with internal package names, uploading a package with the same name to the public repository can often result in an attacker being able to execute their own code inside the networks of their target. Alex scored over $130,000 in bug bounties from this one, from a number of name-brand companies. Of particular note for Python developers: the --extra-index-url argument to pip will consult both public and private registries and install the package with the highest version number!
Feb. 11, 2021
Why I Built Litestream. Litestream is a really exciting new piece of technology by Ben Johnson, who previously built BoltDB, the key-value store written in Go that is used by etcd. It adds replication to SQLite by running a process that converts the SQLite WAL log into a stream that can be saved to another folder or pushed to S3. The S3 option is particularly exciting—Ben estimates that keeping a full point-in-time recovery log of a high write SQLite database should cost in the order of a few dollars a month. I think this could greatly expand the set of use-cases for which SQLite is sensible choice.
trustme (via) This looks incredibly useful. Run “python -m trustme” and it will create three files for you: server.pem, server.key and a client.pem client certificate, providing a certificate for “localhost” (or another host you spefict) using a fake certificate authority. Looks like it should be the easiest way to test TLS locally.
Litestream runs continuously on a test server with generated load and streams backups to S3. It uses physical replication so it'll actually restore the data from S3 periodically and compare the checksum byte-for-byte with the current database.
Feb. 14, 2021
Weeknotes: Finally, an intro video for Datasette
My big project this week was this Video introduction to Datasette and sqlite-utils. I recorded the video a few weeks ago in advance of FOSDEM, but this week I put together the annotated version. I’m really happy with it, and I’ve added it to the datasette.io homepage as a starting point for helping people understand the project.
[... 690 words]Feb. 19, 2021
One of the hardest things I’ve had to learn is that humans aren’t pure functions: an input that works one day and gets one result, then again another day and get an entirely different result.
Open source projects: consider running office hours
Back in December I decided to try something new for my Datasette open source project: Datasette Office Hours. The idea is simple: anyone can book a 25 minute conversation with me on a Friday to talk about the project. I’m interested in talking to people who are using Datasette, or who are considering using it, or who just want to have a chat.
[... 786 words]Feb. 21, 2021
Cross-database queries in SQLite (and weeknotes)
I released Datasette 0.55 and sqlite-utils 3.6 this week with a common theme across both releases: supporting cross-database joins.
[... 720 words]Feb. 22, 2021
Getting started
Here we go then... I’ve signed up to work on this project full-time, four days a week!
[... 609 words]People, processes, priorities. Twitter thread from Adrienne Porter Felt outlining her model for thinking about engineering management. I like this trifecta of “people, processes, priorities” a lot.
Blazing fast CI with pytest-split and GitHub Actions (via) pytest-split is a neat looking variant on the pattern of splitting up a test suite to run different parts of it in parallel on different machines. It involves maintaining a periodically updated JSON file in the repo recording the average runtime of different tests, to enable them to be more fairly divided among test runners. Includes a recipe for running as a matrix in GitHub Actions.
Business rules engines are li’l Conway’s Law devices: a manifestation of the distrust between stakeholders, client and contractor. We require BREs so that separate business units need not talk to each other to solve problems. They are communication and organizational dysfunction made silicon.
Fuzzy Name Matching in Postgres. Paul Ramsey describes how to implement fuzzy name matching in PostgreSQL using the fuzzystrmatch extension and its levenshtein() and soundex() functions, plus functional indexes to query against indexed soundex first and then apply slower Levenshtein. The same tricks should also work against SQLite using the datasette-jellyfish plugin.
Feb. 23, 2021
I strongly suspect that the single most impactful thing I did during my 5+ years at Linden Lab was shortly before I left: set up a weekly meeting between a couple of leads from Support and Engineering to go over the top 10 support issues.
Spinning up a new Django app to act as a backend for VaccinateCA
My goal by the end of this week is to have a working proof of concept for a Django + PostgreSQL app that can replace Airtable as the principle backend for the https://www.vaccinateca.com/ site. This proof of concept will allow us to make a go or no-go decision and figure out what else needs to be implemented before we can start using it to track calls.
[... 762 words]When building a tool, it’s easy to forget how much you’ve internalized: how much knowledge and context you’ve assumed. Your tool can feel familiar or even obvious to you while being utterly foreign to everyone else. If your goal is for other people to use the darn thing — meaning you’re not just building for yourself, or tinkering for its own sake (which are totally valid reasons) — you gotta help people use it! It doesn’t matter what’s possible or what you intended; all that matters is whether people actually succeed in practice.
Feb. 24, 2021
Importing data from Airtable into Django, plus a search engine for all our code
I made a bunch of progress on the Django backend prototype-that-soon-won’t-be-a-prototype today.
[... 935 words]Feb. 25, 2021
Django admin customization, JSON in our PostgreSQL
My progress slowed a bit today as I started digging into some things I’m less familiar with—but I’ve found some tricks that I think will help us out a lot.
[... 1,089 words]Feb. 26, 2021
How to secure an Ubuntu server using Tailscale and UFW. This is the Tailscale tutorial I’ve always wanted: it explains in detail how you can run an Ubuntu server (from any cloud provider) such that only devices on your personal Tailscale network can access it.
Feb. 27, 2021
cosmopolitan libc (via) “Cosmopolitan makes C a build-once run-anywhere language, similar to Java, except it doesn’t require interpreters or virtual machines be installed beforehand. [...] Instead, it reconfigures stock GCC to output a POSIX-approved polyglot format that runs natively on Linux + Mac + Windows + FreeBSD + OpenBSD + BIOS with the best possible performance and the tiniest footprint imaginable.” This is a spectacular piece of engineering.
unasync (via) Today I started wondering out loud if one could write code that takes an asyncio Python library and transforms it into the synchronous equivalent by using some regular expressions to strip out the “await ...” keywords and suchlike. Turns out that can indeed work, and Ratan Kulshreshtha built it! unasync uses the standard library tokenize module to run some transformations against an async library and spit out the sync version automatically. I’m now considering using this for sqlite-utils.
Feb. 28, 2021
Trying to end the pandemic a little earlier with VaccinateCA
This week I got involved with the VaccinateCA effort. We are trying to end the pandemic a little earlier, by building the most accurate database possible of vaccination locations and availability in California.
[... 1,154 words]