Simon Willison’s Weblog

Subscribe

Saturday, 11th October 2025

An MVCC-like columnar table on S3 with constant-time deletes (via) s3's support for conditional writes (previously) makes it an interesting, scalable and often inexpensive platform for all kinds of database patterns.

Shayon Mukherjee presents an ingenious design for a Parquet-backed database in S3 which accepts concurrent writes, presents a single atomic view for readers and even supports reliable row deletion despite Parquet requiring a complete file rewrite in order to remove data.

The key to the design is a _latest_manifest JSON file at the top of the bucket, containing an integer version number. Clients use compare-and-swap to increment that version - only one client can succeed at this, so the incremented version they get back is guaranteed unique to them.

Having reserved a version number the client can write a unique manifest file for that version - manifest/v00000123.json - with a more complex data structure referencing the current versions of every persisted file, including the one they just uploaded.

Deleted rows are written to tombstone files as either a list of primary keys or a list of of ranges. Clients consult these when executing reads, filtering out deleted rows as part of resolving a query.

The pricing estimates are especially noteworthy:

For a workload ingesting 6 TB/day with 2 TB of deletes and 50K queries/day:

  • PUT requests: ~380K/day (≈4 req/s) = $1.88/day
  • GET requests: highly variable, depends on partitioning effectiveness
    • Best case (good time-based partitioning): ~100K-200K/day = $0.04-$0.08/day
    • Worst case (poor partitioning, scanning many files): ~2M/day = $0.80/day

~$3/day for ingesting 6TB of data is pretty fantastic!

Watch out for storage costs though - each new TB of data at $0.023/GB/month adds $23.55 to the ongoing monthly bill.

# 3:47 am / aws, databases, s3, parquet

I'm beginning to suspect that a key skill in working effectively with coding agents is developing an intuition for when you don't need to closely review every line of code they produce. This feels deeply uncomfortable!

# 12:31 pm / vibe-coding, coding-agents, ai-assisted-programming, generative-ai, ai, llms

Vibing a Non-Trivial Ghostty Feature (via) Mitchell Hashimoto provides a comprehensive answer to the frequent demand for a detailed description of shipping a non-trivial production feature to an existing project using AI-assistance. In this case it's a slick unobtrusive auto-update UI for his Ghostty terminal emulator, written in Swift.

Mitchell shares full transcripts of the 16 coding sessions he carried out using Amp Code across 2 days and around 8 hours of computer time, at a token cost of $15.98.

Amp has the nicest shared transcript feature of any of the coding agent tools, as seen in this example. I'd love to see Claude Code and Codex CLI and Gemini CLI and friends imitate this.

There are plenty of useful tips in here. I like this note about the importance of a cleanup step:

The cleanup step is really important. To cleanup effectively you have to have a pretty good understanding of the code, so this forces me to not blindly accept AI-written code. Subsequently, better organized and documented code helps future agentic sessions perform better.

I sometimes tongue-in-cheek refer to this as the "anti-slop session".

And this on how sometimes you can write manual code in a way that puts the agent the right track:

I spent some time manually restructured the view model. This involved switching to a tagged union rather than the struct with a bunch of optionals. I renamed some types, moved stuff around.

I knew from experience that this small bit of manual work in the middle would set the agents up for success in future sessions for both the frontend and backend. After completing it, I continued with a marathon set of cleanup sessions.

Here's one of those refactoring prompts:

Turn each @macos/Sources/Features/Update/UpdatePopoverView.swift case into a dedicated fileprivate Swift view that takes the typed value as its parameter so that we can remove the guards.

Mitchell advises ending every session with a prompt like this one, asking the agent about any obvious omissions:

Are there any other improvements you can see to be made with the @macos/Sources/Features/Update feature? Don't write any code. Consult the oracle. Consider parts of the code that can also get more unit tests added.

("Consult the oracle" is an Amp-specific pattern for running a task through a more expensive, more capable model.)

Is this all worthwhile? Mitchell thinks so:

Many people on the internet argue whether AI enables you to work faster or not. In this case, I think I shipped this faster than I would have if I had done it all myself, in particular because iterating on minor SwiftUI styling is so tedious and time consuming for me personally and AI does it so well.

I think the faster/slower argument for me personally is missing the thing I like the most: the AI can work for me while I step away to do other things.

Here's the resulting PR, which touches 21 files.

# 4:33 pm / ai, generative-ai, llms, ai-assisted-programming, mitchell-hashimoto, coding-agents

Claude Code includes the ability to run sub-agents, where a separate agent loop with a fresh token context is dispatched to achieve a goal and report back when it's done. I wrote a bit about how these work in June when I traced Claude Code's activity by intercepting its API calls.

I recently learned that you can deliberately trigger Claude Code to use sub-agents just by telling it to do so!

Here's an example. I ran this using Claude Code against a fresh checkout of simonw/datasette (commit ec38ad37689f3c14d307770977a17aed5efc5cb9), with no additional custom instructions:

Write markdown documentation for the context passed to each of the templates in this project. Use sub-agents.

Here's a snippet of the output:

⏺ I'll help you write markdown documentation for the context passed to each template in this project. Let me start by exploring the codebase to understand the template structure and then use sub-agents to document them comprehensively.

⏺ Search(pattern: "**/*.html")
  ⎿  Found 47 files (ctrl+o to expand)

⏺ Search(pattern: "render.*template|render_template")
  ⎿  Found 16 files (ctrl+o to expand)

[...]

⏺ Now let me launch sub-agents to analyze different template contexts in parallel:

⏺ Task(Document index template context)
  ⎿  Done (12 tool uses · 86.4k tokens · 1m 13s)

⏺ Task(Document database template context)
  ⎿  Done (15 tool uses · 55.4k tokens · 1m 41s)

⏺ Task(Document table template context)
  ⎿  Done (14 tool uses · 85.2k tokens · 1m 44s)

⏺ Task(Document query/row template contexts)
  ⎿  Done (17 tool uses · 78.7k tokens · 1m 55s)

⏺ Task(Document special template contexts)
  ⎿  Done (26 tool uses · 116.1k tokens · 1m 50s)

It churned away for a couple of minutes (those sub-agents execute in parallel, another benefit of this pattern) and made me this template_contexts.md markdown file. It's very comprehensive.

# 9:29 pm / sub-agents, coding-agents, ai-assisted-programming, claude-code, generative-ai, ai, llms

2025 » October

MTWTFSS
  12345
6789101112
13141516171819
20212223242526
2728293031