Simon Willison’s Weblog

Subscribe

Friday, 27th February 2026

Release dclient 0.5a2 — A client CLI utility for Datasette instances
Research Unicode Explorer — Binary Search Over HTTP — By leveraging HTTP Range requests and fixed-width binary records, Unicode Explorer demonstrates efficient binary search for Unicode data directly from a static file with zero backend or dependencies. The client fetches only one 256-byte record per step, using signposts from `meta.json` to optimize initial narrowing, then performs real-time network-driven binary search, visualized in an interactive log.

Unicode Explorer using binary search over fetch() HTTP range requests. Here's a little prototype I built this morning from my phone as an experiment in HTTP range requests, and a general example of using LLMs to satisfy curiosity.

I've been collecting HTTP range tricks for a while now, and I decided it would be fun to build something with them myself that used binary search against a large file to do something useful.

So I brainstormed with Claude. The challenge was coming up with a use case for binary search where the data could be naturally sorted in a way that would benefit from binary search.

One of Claude's suggestions was looking up information about unicode codepoints, which means searching through many MBs of metadata.

I had Claude write me a spec to feed to Claude Code - visible here - then kicked off an asynchronous research project with Claude Code for web against my simonw/research repo to turn that into working code.

Here's the resulting report and code. One interesting thing I learned is that Range request tricks aren't compatible with HTTP compression because they mess with the byte offset calculations. I added 'Accept-Encoding': 'identity' to the fetch() calls but this isn't actually necessary because Cloudflare and other CDNs automatically skip compression if a content-range header is present.

I deployed the result to my tools.simonwillison.net site, after first tweaking it to query the data via range requests against a CORS-enabled 76.6MB file in an S3 bucket fronted by Cloudflare.

The demo is fun to play with - type in a single character like ø or a hexadecimal codepoint indicator like 1F99C and it will binary search its way through the large file and show you the steps it takes along the way:

Animated demo of a web tool called Unicode Explore. I enter the ampersand character and hit Search. A box below shows a sequence of HTTP binary search requests made, finding in 17 steps with 3,864 bytes transferred and telling me that ampersand is U+0026 in Punctuation other, Basic Latin

# 5:50 pm / algorithms, http, research, tools, unicode, ai, generative-ai, llms, ai-assisted-programming, vibe-coding, http-range-requests

Free Claude Max for (large project) open source maintainers (via) Anthropic are now offering their $200/month Claude Max 20x plan for free to open source maintainers... for six months... and you have to meet the following criteria:

  • Maintainers: You're a primary maintainer or core team member of a public repo with 5,000+ GitHub stars or 1M+ monthly NPM downloads. You've made commits, releases, or PR reviews within the last 3 months.
  • Don't quite fit the criteria If you maintain something the ecosystem quietly depends on, apply anyway and tell us about it.

Also in the small print: "Applications are reviewed on a rolling basis. We accept up to 10,000 contributors".

# 6:08 pm / open-source, ai, generative-ai, llms, anthropic, claude

Research Rust Word Cloud CLI — Leveraging Rust’s performance and safety, this CLI tool generates PNG word clouds directly from text input using a custom spiral layout algorithm and efficient grid-based collision detection. It supports flexible options for image size, font scaling, color schemes, and background colors, with all core features—such as stopword filtering, spatial indexing, and layout—implemented from scratch without any external word cloud library.

An AI agent coding skeptic tries AI agent coding, in excessive detail. Another in the genre of "OK, coding agents got good in November" posts, this one is by Max Woolf and is very much worth your time. He describes a sequence of coding agent projects, each more ambitious than the last - starting with simple YouTube metadata scrapers and eventually evolving to this:

It would be arrogant to port Python's scikit-learn — the gold standard of data science and machine learning libraries — to Rust with all the features that implies.

But that's unironically a good idea so I decided to try and do it anyways. With the use of agents, I am now developing rustlearn (extreme placeholder name), a Rust crate that implements not only the fast implementations of the standard machine learning algorithms such as logistic regression and k-means clustering, but also includes the fast implementations of the algorithms above: the same three step pipeline I describe above still works even with the more simple algorithms to beat scikit-learn's implementations.

Max also captures the frustration of trying to explain how good the models have got to an existing skeptical audience:

The real annoying thing about Opus 4.6/Codex 5.3 is that it’s impossible to publicly say “Opus 4.5 (and the models that came after it) are an order of magnitude better than coding LLMs released just months before it” without sounding like an AI hype booster clickbaiting, but it’s the counterintuitive truth to my personal frustration. I have been trying to break this damn model by giving it complex tasks that would take me months to do by myself despite my coding pedigree but Opus and Codex keep doing them correctly.

A throwaway remark in this post inspired me to ask Claude Code to build a Rust word cloud CLI tool, which it happily did.

# 8:43 pm / python, ai, rust, max-woolf, generative-ai, llms, ai-assisted-programming, coding-agents, agentic-engineering, november-2025-inflection

Please, please, please stop using passkeys for encrypting user data (via) Because users lose their passkeys all the time, and may not understand that their data has been irreversibly encrypted using them and can no longer be recovered.

Tim Cappalli:

To the wider identity industry: please stop promoting and using passkeys to encrypt user data. I’m begging you. Let them be great, phishing-resistant authentication credentials.

# 10:49 pm / security, usability, passkeys

Thursday, 26th February 2026