November 2021
Nov. 3, 2021
s3-credentials: a tool for creating credentials for S3 buckets
I’ve built a command-line tool called s3-credentials to solve a problem that’s been frustrating me for ages: how to quickly and easily create AWS credentials (an access key and secret key) that have permission to read or write from just a single S3 bucket.
[... 1,618 words]Nov. 4, 2021
How to build, test and publish an open source Python library
At PyGotham this year I presented a ten minute workshop on how to package up a new open source Python library and publish it to the Python Package Index. Here is the video and accompanying notes, which should make sense even without watching the talk.
[... 2,055 words]Nov. 5, 2021
Weeknotes: datasette-jupyterlite, s3-credentials and a Python packaging talk
My big project this week was s3-credentials, described yesterday—but I also put together a fun expermiental Datasette plugin bundling JupyterLite and wrote up my PyGotham talk on Python packaging.
[... 476 words]An oral history of Bank Python (via) Fascinating description of a very custom Python environment inside a large investment bank—where all of the code lives inside the Python environment itself, everything can be imported into the same process and a directed acyclic graph engine implements Excel-style reactive dependencies. Plenty of extra flavour from people who’ve worked with this (and related) Python systems in the Hacker News comments.
A half-hour to learn Rust. I haven’t tried to write any Rust yet but I occasionally find myself wanting to read it, and I find some of the syntax really difficult to get my head around. This article helped a lot: it provides a quick but thorough introduction to most of Rust’s syntax, with clearly explained snippet examples for each one.
Nov. 6, 2021
AWS IAM definitions in Datasette (via) As part of my ongoing quest to conquer IAM permissions, I built myself a Datasette instance that lets me run queries against all 10,441 permissions across 280 AWS services. It’s deployed by a build script running in GitHub Actions which downloads a 8.9MB JSON file from the Salesforce policy_sentry repository—policy_sentry itself creates that JSON file by running an HTML scraper against the official AWS documentation!
Nov. 7, 2021
Deno Deploy Beta 3 (via) I missed Deno Deploy when it first came out back in June: it’s a really interesting new hosting environment for scripts written in Deno, Node.js creator Ryan Dahl’s re-imagining of Node.js. Deno Deploy runs your code using v8 isolates running in 28 regions worldwide, with a clever BroadcastChannel mechanism (inspired by the browser API of the same name) that allows instances of the server-side code running in different regions to send each other messages. See the “via” link for my annotated version of a demo by Ondřej Žára that got me excited about what it can do.
Nov. 10, 2021
One could never price a thirty year mortgage in bitcoin because its volatility makes it completely unpredictable and no sensible bank could calculate the risk of covering that debt. A world in which Elon Musk can tweet two emojis and your home depreciates 80% in value is a dystopia.
Nov. 13, 2021
Datasette is four years old today. I marked the occasion with a short Twitter thread about the project so far.
Nov. 15, 2021
Weeknotes: git-history, created for a Git scraping workshop
My main project this week was a 90 minute workshop I delivered about Git scraping at Coda.Br 2021, a Brazilian data journalism conference, on Friday. This inspired the creation of a brand new tool, git-history, plus smaller improvements to a range of other projects.
[... 1,239 words]Nov. 18, 2021
Cookiecutter Data Science (via) Some really solid thinking in this documentation for the DrivenData cookiecutter template. They emphasize designing data science projects for repeatability, such that just the src/ and data/ folders can be used to recreate all of the other analysis from scratch. I like the suggestion to give each project a dedicated S3 bucket for keeping immutable copies of the original raw data that might be too large for GitHub.
Many Web3 boosters see themselves as disruptors, but “tokenize all the things” is nothing if not an obedient continuation of “market-ize all the things”, the campaign started in the 1970s, hugely successful, ongoing. I think the World Wide Web was the real rupture — “Where … is the money?”—which Web 2.0 smoothed over and Web3 now attempts to seal totally.
Nov. 22, 2021
Hurl (via) Hurl is “a command line tool that runs HTTP requests defined in a simple plain text format”—written in Rust on top of curl, it lets you run HTTP requests and then execute assertions against the response, defined using JSONPath or XPath for HTML. It can even assert that responses were returned within a specified duration.
Weeknotes: Apache proxies in Docker containers, refactoring Datasette
Updates to six major projects this week, plus finally some concrete progress towards Datasette 1.0.
[... 1,630 words]Introduction to heredocs in Dockerfiles
(via)
This is a fantastic upgrade to Dockerfile syntax, enabled by BuildKit and a new frontend for executing the Dockerfile that can be specified with a #syntax=
directive. I often like to create a standalone Dockerfile that works without needing other files from a directory, so being able to use <<EOF
syntax to populate configure files from inline blocks of code is really handy.
htmlspecialchars was a very early function. Back when PHP had less than 100 functions and the function hashing mechanism was strlen(). In order to get a nice hash distribution of function names across the various function name lengths names were picked specifically to make them fit into a specific length bucket. This was circa late 1994 when PHP was a tool just for my own personal use and I wasn't too worried about not being able to remember the few function names.
Nov. 25, 2021
PHP 8.1 release notes (via) PHP is gaining “Fibers” for lightweight cooperative concurrency—very similar to Python asyncio. Interestingly you don’t need to use separate syntax like “await fn()” to call them—calls to non-blocking functions are visually indistinguishable from calls to blocking functions. Considering how much additional library complexity has emerged in Python world from having these two different colours of functions it’s noteworthy that PHP has chosen to go in a different direction here.