Simon Willison’s Weblog

Subscribe

May 2022

May 2, 2022

sqlite-utils 3.26.1 (via) I released sqlite-utils 3.36.1 with one tiny but exciting feature: I fixed its one dependency that wasn’t published as a pure Python wheel, which means it can now be used with Pyodide—Python compiled to WebAssembly running in your browser!

# 6:43 pm / sqlite-utils, webassembly, python, pyodide

May 3, 2022

Web Scraping via Javascript Runtime Heap Snapshots (via) This is an absolutely brilliant scraping trick. Adrian Cooney figured out a way to use Puppeteer and the Chrome DevTools protocol to take a heap snapshot of all of the JavaScript running on a web page, then recursively crawl through the heap looking for any JavaScript objects that have a specified selection of properties. This allows him to scrape data from arbitrarily complex client-side web applications. He built a JavaScript library and command line tool that implements the pattern.

# 12:51 am / scraping, javascript

Simple declarative schema migration for SQLite (via) This is an interesting, clearly explained approach to the database migration problem. Create a new in-memory database and apply the current schema, then run some code to compare that with the previous schema—which tables are new, and which tables have had columns added. Then apply those changes.

I’d normally be cautious of running something like this because I can think of ways it could go wrong—but SQLite backups are so quick and cheap (just copy the file) that I could see this being a relatively risk-free way to apply migrations.

# 6:07 pm / migrations, sqlite

May 4, 2022

Datasette Lite: a server-side Python web application running in a browser

Visit Datasette Lite: a server-side Python web application running in a browser

Datasette Lite is a new way to run Datasette: entirely in a browser, taking advantage of the incredible Pyodide project which provides Python compiled to WebAssembly plus a whole suite of useful extras.

[... 4,800 words]

SIARD: Software Independent Archiving of Relational Databases (via) I hadn’t heard of this before but it looks really interesting: the Federal Archives of Switzerland developed a standard for archiving any relational database as a zip file full of XML which is “is used in over 50 countries around the globe”.

# 10:40 pm / databases, xml, archives

May 6, 2022

Weeknotes: Datasette Lite, nogil Python, HYTRADBOI

My big project this week was Datasette Lite, a new way to run Datasette directly in a browser, powered by WebAssembly and Pyodide. I also continued my research into running SQL queries in parallel, described last week. Plus I spoke at HYTRADBOI.

[... 1,434 words]

May 13, 2022

sqlite-utils: a nice way to import data into SQLite for analysis (via) Julia Evans on my sqlite-utils Python library and CLI tool.

# 6:17 pm / sqlite-utils, julia-evans, sqlite

May 15, 2022

Why Rust’s postfix await syntax is good (via) C J Silverio explains postfix await in Rust—where you can write a line like this, with the ? causing any errors to be caught and turned into an error return from your function:

let count = fetch_all_animals().await?.filter_for_hedgehogs().len();

# 2:27 pm / async, rust

How Materialize and other databases optimize SQL subqueries. Jamie Brandon offers a survey of the state-of-the-art in optimizing correlated subqueries, across a number of different database engines.

# 8:24 pm / sql

May 16, 2022

Heroku: Core Impact (via) Ex-Heroku engineer Brandur Leach pulls together some of the background information circulating concerning the now more than a month long Heroku security incident and provides some ex-insider commentary on what went right and what went wrong with a platform that left a huge, if somewhat underappreciated impact on the technology industry at large.

# 4:24 am / brandur-leach, heroku

Weeknotes: Camping, a road trip and two new museums

Visit Weeknotes: Camping, a road trip and two new museums

Natalie and I took a week-long road trip and camping holiday. The plan was to camp on Santa Rosa Island in the California Channel Islands, but the boat to the island was cancelled due to bad weather. We treated ourselves to a Central Californian road trip instead.

[... 872 words]

Supercharging GitHub Actions with Job Summaries (via) GitHub Actions workflows can now generate a rendered Markdown summary of, well, anything that you can think to generate as part of the workflow execution. I particularly like the way this is designed: they provide a filename in a $GITHUB_STEP_SUMMARY environment variable which you can then append data to from each of your steps.

# 11:02 pm / github-actions

May 17, 2022

simonw/datasette-screenshots (via) I started a new GitHub repository to automate taking screenshots of Datasette for marketing purposes, using my shot-scraper browser automation tool.

# 5:56 pm / projects, shot-scraper, github-actions, datasette

May 18, 2022

Comby (via) Describes itself as “Structural search and replace for any language”. Lets you execute search and replace patterns that look a little bit like simplified regular expressions, but with some deep OCaml-powered magic that makes them aware of comment, string and nested parenthesis rules for different languages. This means you can use it to construct scripts that automate common refactoring or code upgrade tasks.

# 5:47 am / refactoring, parsing

May 21, 2022

GOV.UK Guidance: Documenting APIs (via) Characteristically excellent guide from GOV.UK on writing great API documentation. “Task-based guidance helps users complete the most common integration tasks, based on the user needs from your research.”

# 11:31 pm / documentation, gov-uk

May 22, 2022

The balance has shifted away from SPAs (via) “There’s a feeling in the air. A zeitgeist. SPAs are no longer the cool kids they once were 10 years ago.” Nolan Lawson offers some opinions on why the pendulum seems to be swinging back in favour of server-side rendering over rendering every page entirely on the client. He argues that paint holding, back-forward caching and service workers have made the benefits of SPAs over MPAs much less apparent. I’m inclined to agree.

# 2:47 am / frontend, javascript

Paint Holding—reducing the flash of white on same-origin navigations. I missed this when it happened back in 2019: Chrome (and apparently Safari too—not sure about Firefox) implemented a feature where rather than showing a blank screen in between page navigations Chrome “waits briefly before starting to paint, especially if the page is fast enough”. As a result, fast loading multi-page applications become almost indistinguishable from SPAs (single-page apps). It’s a really neat feature, and now that I know how it works I realize that it explains why page navigations have felt a lot snappier to me over the past few years.

# 2:50 am / browsers, chrome

May 23, 2022

Bundling binary tools in Python wheels

I spotted a new (to me) pattern which I think is pretty interesting: projects are bundling compiled binary applications as part of their Python packaging wheels. I think it’s really neat.

[... 903 words]

May 26, 2022

Benjamin “Zags” Zagorsky: Handling Timezones in Python. The talks from PyCon US have started appearing on YouTube. I found this one really useful for shoring up my Python timezone knowledge: It reminds that if your code calls datetime.now(), datetime.utcnow() or date.today(), you have timezone bugs—you’ve been working with ambiguous representations of instances in time that could span a 26 hour interval from UTC-12 to UTC+14. date.today() represents a 24 hour period and hence is prone to timezone surprises as well. My code has a lot of timezone bugs!

# 3:40 am / pycon, timezones, python

upptime (via) “Open-source uptime monitor and status page, powered entirely by GitHub Actions, Issues, and Pages.” This is a very creative (ab)use of GitHub Actions: it runs a scheduled action to check the availability of sites that you specify, records the results in a YAML file (with the commit history tracking them over time) and can automatically open a GitHub issue for you if it detects a new incident.

# 3:53 am / github-actions

Weeknotes: Building Datasette Cloud on Fly Machines, Furo for documentation

Visit Weeknotes: Building Datasette Cloud on Fly Machines, Furo for documentation

Hosting provider Fly released Fly Machines this week. I got an early preview and I’ve been working with it for a few days—it’s a fascinating new piece of technology. I’m using it to get my hosting service for Datasette ready for wider release.

[... 1,005 words]

May 27, 2022

Architecture Notes: Datasette (via) I was interviewed for the first edition of Architecture Notes—a new publication (website and newsletter) about software architecture created by Mahdi Yusuf. We covered a bunch of topics in detail: ASGI, SQLIte and asyncio, Baked Data, plugin hook design, Python in WebAssembly, Python in an Electron app and more. Mahdi also turned my scrappy diagrams into beautiful illustrations for the piece.

# 3:20 pm / architecture, datasette

May 30, 2022

Dragonfly: A modern replacement for Redis and Memcached (via) I was initially pretty skeptical of the tagline: does Redis really need a “modern” replacement? But the Background section of the README makes this look like a genuinely interesting project. It re-imagines Redis to have its keyspace partitioned across multiple threads, and uses the VLL lock manager described in a 2014 paper to “compose atomic multi-key operations without using mutexes or spinlocks”. The initial benchmarks show up to a 25x increase in throughput compared to Redis. It’s written in C++.

# 10:02 pm / redis, c-plus-plus

May 31, 2022

Lesser Known Features of ClickHouse (via) I keep hearing positive noises about ClickHouse. I learned about a whole bunch of capabilities from this article—including that ClickHouse can directly query tables that are stored in SQLite or PostgreSQL.

# 7:48 pm / postgresql, clickhouse, sqlite

A Datasette tutorial written by GPT-3

I’ve been playing around with OpenAI’s GPT-3 language model playground for a few months now. It’s a fascinating piece of software. You can sign up here—apparently there’s no longer a waiting list.

[... 1,244 words]

Compiling Black with mypyc (via) Richard Si is a Black contributor who recently obtained a 2x performance boost by compiling Black using the mypyc tool from the mypy project, which uses Python type annotations to generate a compiled C version of the Python logic. He wrote up this fantastic three-part series describing in detail how he achieved this, including plenty of tips on Python profiling and clever optimization tricks.

# 11:24 pm / performance, mypy, python, black

2022 » May

MTWTFSS
      1
2345678
9101112131415
16171819202122
23242526272829
3031