Simon Willison’s Weblog

Subscribe

March 2024

149 posts: 8 entries, 74 links, 12 quotes, 55 beats

March 7, 2024

Release datasette-saved-queries 0.2.2 — Datasette plugin that lets users save and execute queries
Release datasette-extract 0.1a0 — Import unstructured data (text and images) into structured tables
Release llm-claude-3 0.2 — LLM plugin for interacting with the Claude 3 family of models
Release datasette-extract 0.1a1 — Import unstructured data (text and images) into structured tables
Release datasette-extract 0.1a2 — Import unstructured data (text and images) into structured tables

March 8, 2024

American Community Survey Data via FTP. I got talking to some people from the US Census at NICAR today and asked them if there was a way to download their data in bulk (in addition to their various APIs)... and there was!

I had heard of the American Community Survey but I hadn’t realized that it’s gathered on a yearly basis, as a 5% sample compared to the full every-ten-years census. It’s only been running for ten years, and there’s around a year long lead time on the survey becoming available.

# 12:25 am / census, data-journalism, surveys, nicar

Inflection-2.5: meet the world’s best personal AI (via) I’ve not been paying much attention to Inflection’s Pi since it released last year, but yesterday they released a new version that they claim is competitive with GPT-4.

“Inflection-2.5 approaches GPT-4’s performance, but used only 40% of the amount of compute for training.”

(I wasn’t aware that the compute used to train GPT-4 was public knowledge.)

If this holds true, that means that the GPT-4 barrier has been well and truly smashed: we now have Claude 3 Opus, Gemini 1.5, Mistral Large and Inflection-2.5 in the same class as GPT-4, up from zero contenders just a month ago.

# 12:51 am / ai, generative-ai, gpt-4, llms, llm-release

Eloquent JavaScript, 4th edition (2024) (via) Marijn Haverbeke is the creator of both the CodeMirror JavaScript code editor library (used by Datasette and many other projects) and the ProseMirror rich-text editor. Eloquent JavaScript is his Creative Commons licensed book on JavaScript, first released in 2007 and now in its 4th edition.

I’ve only dipped into it myself but it has an excellent reputation.

# 4:07 am / javascript

Become a Wikipedian in 30 minutes (via) A characteristically informative and thoughtful guide to getting started with Wikipedia editing by Molly White - video accompanied by a full transcript.

I found the explanation of Reliable Sources particularly helpful, including why Wikipedia prefers secondary to primary sources.

The way we determine reliability is typically based on the reputation for editorial oversight, and for factchecking and corrections. For example, if you have a reference book that is published by a reputable publisher that has an editorial board and that has edited the book for accuracy, if you know of a newspaper that has, again, an editorial team that is reviewing articles and issuing corrections if there are any errors, those are probably reliable sources.

# 9:47 am / wikipedia, molly-white

You can now train a 70b language model at home (via) Jeremy Howard and team: “Today, we’re releasing Answer.AI’s first project: a fully open source system that, for the first time, can efficiently train a 70b large language model on a regular desktop computer with two or more standard gaming GPUs (RTX 3090 or 4090).”

This is about fine-tuning an existing model, not necessarily training one from scratch.

There are two tricks at play here. The first is QLoRA, which can be used to train quantized models despite the reduced precision usually preventing gradient descent from working correctly.

QLoRA can bring the memory requirements for a 70b model down to 35GB, but gaming GPUs aren’t quite that big. The second trick is Meta’s Fully Sharded Data Parallel or FSDP library, which can shard a model across GPUs. Two consumer 24GB GPUs can then handle the 70b training run.

# 10:47 am / ai, generative-ai, llms, jeremy-howard, gpus

Release dclient 0.4 — A client CLI utility for Datasette instances

The GPT-4 barrier has finally been broken

Four weeks ago, GPT-4 remained the undisputed champion: consistently at the top of every key benchmark, but more importantly the clear winner in terms of “vibes”. Almost everyone investing serious time exploring LLMs agreed that it was the most capable default model for the majority of tasks—and had been for more than a year.

[... 717 words]

March 9, 2024

Coroutines and web components (via) I like using generators in Python but I rarely knowingly use them in JavaScript—I’m probably most exposed to them by Observable, which uses then extensively under the hood as a mostly hidden implementation detail.

Laurent Renard here shows some absolutely ingenious tricks with them as a way of building stateful Web Components.

# 3:38 am / javascript, observable, web-components

In every group I speak to, from business executives to scientists, including a group of very accomplished people in Silicon Valley last night, much less than 20% of the crowd has even tried a GPT-4 class model.

Less than 5% has spent the required 10 hours to know how they tick.

Ethan Mollick

# 3:55 am / ai, generative-ai, gpt-4, llms, ethan-mollick

Release datasette-enrichments-quickjs 0.1a1 — Enrich data with a custom JavaScript function

March 10, 2024

datasette/studio. I’m trying a new way to make Datasette available for small personal data manipulation projects, using GitHub Codespaces.

This repository is designed to be opened directly in Codespaces—detailed instructions in the README.

When the container starts it installs the datasette-studio family of plugins—including CSV upload, some enrichments and a few other useful feature—then starts the server running and provides a big green button to click to access the server via GitHub’s port forwarding mechanism.

# 3:03 am / projects, datasette, github-codespaces

S3 is files, but not a filesystem (via) Cal Paterson helps some concepts click into place for me: S3 imitates a file system but has a number of critical missing features, the most important of which is the lack of partial updates. Any time you want to modify even a few bytes in a file you have to upload and overwrite the entire thing. Almost every database system is dependent on partial updates to function, which is why there are so few databases that can use S3 directly as a backend storage mechanism.

# 11:47 am / aws, databases, s3

March 11, 2024

NICAR 2024 Tipsheets & Audio. The NICAR data journalism conference was outstanding this year: ~1100 attendees, and every slot on the schedule had at least 2 sessions that I wanted to attend (and usually a lot more).

If you’re interested in the intersection of data analysis and journalism it really should be a permanent fixture on your calendar, it’s fantastic.

Here’s the official collection of handouts (NICAR calls them tipsheets) and audio recordings from this year’s event.

# 1:14 am / conferences, data-journalism, nicar

March 12, 2024

Speedometer 3.0: The Best Way Yet to Measure Browser Performance. The new browser performance testing suite, released as a collaboration between Blink, Gecko, and WebKit. It’s fun to run this in your browser and watch it rattle through 580 tests written using a wide variety of modern JavaScript frameworks and visualization libraries.

# 4:26 am / benchmarks, javascript, web-performance

gh-116167: Allow disabling the GIL with PYTHON_GIL=0 or -X gil=0. Merged into python:main 14 hours ago. Looks like the first phase of Sam Gross’s phenomenal effort to provide a GIL free Python (here via an explicit opt-in) will ship in Python 3.13.

# 5:40 am / gil, python

Astro DB. A new scale-to-zero hosted SQLite offering, described as “A fully-managed SQL database designed exclusively for Astro”. It’s built on top of LibSQL, the SQLite fork maintained by the Turso database team.

Astro DB encourages defining your tables with TypeScript, and querying them via the Drizzle ORM.

Running Astro locally uses a local SQLite database. Deployed to Astro Cloud switches to their DB product, where the free tier currently includes 1GB of storage, one billion row reads per month and one million row writes per month.

Astro itself is a “web framework for content-driven websites”—so hosted SQLite is a bit of an unexpected product from them, though it does broadly fit the ecosystem they are building.

This approach reminds me of how Deno K/V works—another local SQLite storage solution that offers a proprietary cloud hosted option for deployment.

# 6:02 pm / javascript, sqlite, typescript, deno

March 13, 2024

Release datasette 1.0a13 — An open source multi-tool for exploring and publishing data

The Bing Cache thinks GPT-4.5 is coming. I was able to replicate this myself earlier today: searching Bing (or apparently Duck Duck Go) for “openai announces gpt-4.5 turbo” would return a link to a 404 page at openai.com/blog/gpt-4-5-turbo with a search result page snippet that announced 256,000 tokens and knowledge cut-off of June 2024

I thought the knowledge cut-off must have been a hallucination, but someone got a screenshot of it showing up in the search engine snippet which would suggest that it was real text that got captured in a cache somehow.

I guess this means we might see GPT 4.5 in June then? I have trouble believing that OpenAI would release a model in June with a June knowledge cut-off, given how much time they usually spend red-teaming their models before release.

Or maybe it was one of those glitches like when a newspaper accidentally publishes a pre-written obituary for someone who hasn’t died yet—OpenAI may have had a draft post describing a model that doesn’t exist yet and it accidentally got exposed to search crawlers.

# 2:29 am / bing, ai, openai, llms, hallucinations

TIL Generating URLs to a Gmail compose window — I wanted to send out a small batch of follow-up emails for workshop attendees today, and I realized that since I have their emails in a database table I might be able to semi-automate the process.

pywebview 5 (via) pywebview is a library for building desktop (and now Android) applications using Python, based on the idea of displaying windows that use the system default browser to display an interface to the user—styled such that the fact they run on HTML, CSS and JavaScript is mostly hidden from the end-user.

It’s a bit like a much simpler version of Electron. Unlike Electron it doesn’t bundle a full browser engine (Electron bundles Chromium), which reduces the size of the dependency a lot but does mean that cross-browser differences (quite rare these days) do come back into play.

I tried out their getting started example and it’s very pleasant to use—import webview, create a window and then start the application loop running to display it.

You can register JavaScript functions that call back to Python, and you can execute JavaScript in a window from your Python code.

# 2:15 pm / javascript, python, electron

The talk track I've been using is that LLMs are easy to take to market, but hard to keep in the market long-term. All the hard stuff comes when you move past the demo and get exposure to real users.

And that's where you find that all the nice little things you got neatly working fall apart. And you need to prompt differently, do different retrieval, consider fine-tuning, redesign interaction, etc. People will treat this stuff differently from "normal" products, creating unique challenges.

Phillip Carter

# 3:02 pm / ai, generative-ai, llms

Release datasette-extract 0.1a3 — Import unstructured data (text and images) into structured tables

Berkeley Function-Calling Leaderboard. The team behind Berkeley’s Gorilla OpenFunctions model—an Apache 2 licensed LLM trained to provide OpenAI-style structured JSON functions—also maintain a leaderboard of different function-calling models. Their own Gorilla model is the only non-proprietary model in the top ten.

# 5:26 pm / ai, generative-ai, llms

Release llm-claude-3 0.3 — LLM plugin for interacting with the Claude 3 family of models

llm-claude-3 0.3. Anthropic released Claude 3 Haiku today, their least expensive model: $0.25/million tokens of input, $1.25/million of output (GPT-3.5 Turbo is $0.50/$1.50). Unlike GPT-3.5 Haiku also supports image inputs.

I just released a minor update to my llm-claude-3 LLM plugin adding support for the new model.

# 9:18 pm / projects, ai, generative-ai, llms, llm, anthropic, claude, llm-release