Simon Willison’s Weblog

Subscribe

March 2024

March 1, 2024

Endatabas (via) Endatabas is “an open source immutable database”—also described as “SQL document database with full history”.

It uses a variant of SQL which allows you to insert data into tables that don’t exist yet (they’ll be created automatically) then run standard select queries, joins etc. It maintains a full history of every record and supports the recent SQL standard “FOR SYSTEM_TIME AS OF” clause for retrieving historical records as they existed at a specified time (it defaults to the most recent versions).

It’s written in Common Lisp plus a bit of Rust, and includes Docker images for running the server and client libraries in JavaScript and Python. The on-disk storage format is Apache Arrow, the license is AGPL and it’s been under development for just over a year.

It’s also a document database: you can insert JSON-style nested objects directly into a table, and query them with path expressions like “select users.friends[1] from users where id = 123;”

They have a WebAssembly version and a nice getting started tutorial which you can try out directly in your browser.

Their “Why?” page lists full history, time travel queries, separation of storage from compute, schemaless tables and columnar storage as the five pillars that make up their product. I think it’s a really interesting amalgamation of ideas.

# 4:28 am / lisp, sql, databases

Streaming HTML out of order without JavaScript (via) A really interesting new browser capability. If you serve the following HTML:

<template shadowrootmode="open">
  <slot name="item-1">Loading...</slot>
</template>

Then later in the same page stream an element specifying that slot:

<span slot="item-1">Item number 1</span>

The previous slot will be replaced while the page continues to load.

I tried the demo in the most recent Chrome, Safari and Firefox (and Mobile Safari) and it worked in all of them.

The key feature is shadowrootmode=open, which looks like it was added to Firefox 123 on February 19th 2024 - the other two browsers are listed on caniuse.com as gaining it around March last year.

# 4:59 pm / web-components, browsers, html

March 2, 2024

The Radio Squirrels of Point Reyes (via) Beautiful photo essay by Ann Hermes about the band of volunteer “radio squirrels” keeping maritime morse code radio transmissions alive in the Point Reyes National Seashore.

# 5:23 pm / museums, radio

March 3, 2024

The One Billion Row Challenge in Go: from 1m45s to 4s in nine solutions (via) How fast can you read a billion semicolon delimited (name;float) lines and output a min/max/mean summary for each distinct name—13GB total?

Ben Hoyt describes his 9 incrementally improved versions written in Go in detail. The key optimizations involved custom hashmaps, optimized line parsing and splitting the work across multiple CPU cores.

# 7:08 am / go

Who Am I? Conditional Prompt Injection Attacks with Microsoft Copilot (via) New prompt injection variant from Johann Rehberger, demonstrated against Microsoft Copilot. If the LLM tool you are interacting with has awareness of the identity of the current user you can create targeted prompt injection attacks which only activate when an exploit makes it into the token context of a specific individual.

# 4:34 pm / ai, prompt-injection, security, llms, johann-rehberger

Interesting ideas in Observable Framework

Visit Interesting ideas in Observable Framework

Mike Bostock, Announcing: Observable Framework:

[... 2,123 words]

March 4, 2024

The new Claude 3 model family from Anthropic. Claude 3 is out, and comes in three sizes: Opus (the largest), Sonnet and Haiku.

Claude 3 Opus has self-reported benchmark scores that consistently beat GPT-4. This is a really big deal: in the 12+ months since the GPT-4 release no other model has consistently beat it in this way. It’s exciting to finally see that milestone reached by another research group.

The pricing model here is also really interesting. Prices here are per-million-input-tokens / per-million-output-tokens:

Claude 3 Opus: $15 / $75
Claude 3 Sonnet: $3 / $15
Claude 3 Haiku: $0.25 / $1.25

All three models have a 200,000 length context window and support image input in addition to text.

Compare with today’s OpenAI prices:

GPT-4 Turbo (128K): $10 / $30
GPT-4 8K: $30 / $60
GPT-4 32K: $60 / $120
GPT-3.5 Turbo: $0.50 / $1.50

So Opus pricing is comparable with GPT-4, more than GPT-4 Turbo and significantly cheaper than GPT-4 32K... Sonnet is cheaper than all of the GPT-4 models (including GPT-4 Turbo), and Haiku (which has not yet been released to the Claude API) will be cheaper even than GPT-3.5 Turbo.

It will be interesting to see if OpenAI respond with their own price reductions.

# 6:34 pm / anthropic, claude, generative-ai, openai, gpt-4, ai, llms, vision-llms, llm-pricing

llm-claude-3. I built a new plugin for LLM—my command-line tool and Python library for interacting with Large Language Models—which adds support for the new Claude 3 models from Anthropic.

# 6:46 pm / llm, anthropic, claude, ai, llms, python, generative-ai, projects

March 5, 2024

Prompt injection and jailbreaking are not the same thing

I keep seeing people use the term “prompt injection” when they’re actually talking about “jailbreaking”.

[... 1,157 words]

Wikipedia: Bach Dancing & Dynamite Society (via) I created my first Wikipedia page! The Bach Dancing & Dynamite Society is a really neat live music venue in Half Moon Bay which has been showcasing world-class jazz talent for over 50 years. I attended a concert there for the first time on Sunday and was surprised to see it didn’t have a page yet.

Creating a Wikipedia page is an interesting process. New pages on English Wikipedia created by infrequent editors stay in “draft” mode until they’ve been approved by a member of “WikiProject Articles for creation”—the standards are really high, especially around sources of citations. I spent quite a while tracking down good citation references for the key facts I used in my first draft for the page.

# 4:21 pm / wikipedia, music, half-moon-bay

Buzzwords describe what you already intuitively know. At once they snap the ‘kaleidoscopic flux of impressions’ in your mind into form, crystallizing them instantly allowing you to both organize your knowledge and recognize you share it with other. This rapid, mental crystallization is what I call the buzzword whiplash. It gives buzzwords more importance and velocity, more power, than they objectively should have.

The potential energy stored within your mind is released by the buzzword whiplash. The buzzword is perceived as important partially because of what it describes but also because of the social and emotional weight felt when the buzzword recognizes your previously wordless experiences and demonstrates that those experiences are shared.

Drew Breunig

# 7:56 pm / drew-breunig, linguistics

Observable Framework 1.1 (via) Less than three weeks after 1.0, the 1.1 release adds a whole lot of interesting new stuff. The signature feature is self-hosted npm imports: Framework 1.0 linked out to CDN hosted copies of libraries, but 1.1 fetches copies locally and then bundles that code with the deployed static site.

This works by using the acorn JavaScript parsing library to statically analyze the code and find all of the relevant imports.

# 9:12 pm / observable, mike-bostock, npm, javascript, observable-framework

March 6, 2024

If a hard takeoff occurs, and a safe AI is harder to build than an unsafe one, then by opensourcing everything, we make it easy for someone unscrupulous with access to overwhelming amount of hardware to build an unsafe AI, which will experience a hard takeoff.

As we get closer to building AI, it will make sense to start being less open. The Open in OpenAI means that everyone should benefit from the fruits of AI after its built, but it's totally OK to not share the science (even though sharing everything is definitely the right strategy in the short and possibly medium term for recruitment purposes).

Ilya Sutskever

# 3:02 am / openai, ai, generative-ai

Wikimedia Commons Category:Bach Dancing & Dynamite Society. After creating a new Wikipedia page for the Bach Dancing & Dynamite Society in Half Moon Bay I ran a search across Wikipedia for other mentions of the venue... and found 41 artist pages that mentioned it in a photo caption.

On further exploration it turns out that Brian McMillen, the official photographer for the venue, has been uploading photographs to Wikimedia Commons since 2007 and adding them to different artist pages. Brian has been a jazz photographer based out of Half Moon Bay for 47 years and has an amazing portfolio of images. It’s thrilling to see him share them on Wikipedia in this way.

# 5:24 am / wikipedia

How I use git worktrees (via) TIL about worktrees, a Git feature that lets you have multiple repository branches checked out to separate directories at the same time.

The default UI for them is a little unergonomic (classic Git) but Bill Mill here shares a neat utility script for managing them in a more convenient way.

One particularly neat trick: Bill’s “worktree” Bash script checks for a node_modules folder and, if one exists, duplicates it to the new directory using copy-on-write, saving you from having to run yet another lengthy “npm install”.

# 3:21 pm / git

March 7, 2024

The Claude 3 system prompt, explained. Anthropic research scientist Amanda Askell provides a detailed breakdown of the Claude 3 system prompt in a Twitter thread.

This is some fascinating prompt engineering. It's also great to see an LLM provider proudly documenting their system prompt, rather than treating it as a hidden implementation detail.

The prompt is pretty succinct. The three most interesting paragraphs:

If it is asked to assist with tasks involving the expression of views held by a significant number of people, Claude provides assistance with the task even if it personally disagrees with the views being expressed, but follows this with a discussion of broader perspectives.

Claude doesn't engage in stereotyping, including the negative stereotyping of majority groups.

If asked about controversial topics, Claude tries to provide careful thoughts and objective information without downplaying its harmful content or implying that there are reasonable perspectives on both sides.

# 1:16 am / prompt-engineering, anthropic, claude, generative-ai, ai, llms

Training great LLMs entirely from ground zero in the wilderness as a startup. Yi Tay has a really interesting perspective on training LLMs, having worked at Google Brain before co-founding an independent startup, Reka.

At Google the clusters are provided for you. On the outside, Yi finds himself bargaining for cluster resources from a wide range of vendors—and running into enormous variance in quality.

“We’ve seen clusters that range from passable (just annoying problems that are solvable with some minor SWE hours) to totally unusable clusters that fail every few hours due to a myriad of reasons.”

# 2:34 am / ai, llms

On the zombie edition of the Washington Independent I discovered, the piece I had published more than ten years before was attributed to someone else. Someone unlikely to have ever existed, and whose byline graced an article it had absolutely never written.

[...] Washingtonindependent.com, which I’m using to distinguish it from its namesake, offers recently published, article-like content that does not appear to me to have been produced by human beings. But, if you dig through its news archive, you can find work human beings definitely did produce. I know this because I was one of them.

Spencer Ackerman

# 2:59 am / journalism, ai, ethics

March 8, 2024

American Community Survey Data via FTP. I got talking to some people from the US Census at NICAR today and asked them if there was a way to download their data in bulk (in addition to their various APIs)... and there was!

I had heard of the American Community Survey but I hadn’t realized that it’s gathered on a yearly basis, as a 5% sample compared to the full every-ten-years census. It’s only been running for ten years, and there’s around a year long lead time on the survey becoming available.

# 12:25 am / data-journalism, census, nicar, surveys

Inflection-2.5: meet the world’s best personal AI (via) I’ve not been paying much attention to Inflection’s Pi since it released last year, but yesterday they released a new version that they claim is competitive with GPT-4.

“Inflection-2.5 approaches GPT-4’s performance, but used only 40% of the amount of compute for training.”

(I wasn’t aware that the compute used to train GPT-4 was public knowledge.)

If this holds true, that means that the GPT-4 barrier has been well and truly smashed: we now have Claude 3 Opus, Gemini 1.5, Mistral Large and Inflection-2.5 in the same class as GPT-4, up from zero contenders just a month ago.

# 12:51 am / gpt-4, llms, ai, generative-ai

Eloquent JavaScript, 4th edition (2024) (via) Marijn Haverbeke is the creator of both the CodeMirror JavaScript code editor library (used by Datasette and many other projects) and the ProseMirror rich-text editor. Eloquent JavaScript is his Creative Commons licensed book on JavaScript, first released in 2007 and now in its 4th edition.

I’ve only dipped into it myself but it has an excellent reputation.

# 4:07 am / javascript

Become a Wikipedian in 30 minutes (via) A characteristically informative and thoughtful guide to getting started with Wikipedia editing by Molly White—video accompanied by a full transcript.

I found the explanation of Reliable Sources particularly helpful, including why Wikipedia prefers secondary to primary sources.

“The way we determine reliability is typically based on the reputation for editorial oversight, and for factchecking and corrections. For example, if you have a reference book that is published by a reputable publisher that has an editorial board and that has edited the book for accuracy, if you know of a newspaper that has, again, an editorial team that is reviewing articles and issuing corrections if there are any errors, those are probably reliable sources.”

# 9:47 am / wikipedia, molly-white

You can now train a 70b language model at home (via) Jeremy Howard and team: “Today, we’re releasing Answer.AI’s first project: a fully open source system that, for the first time, can efficiently train a 70b large language model on a regular desktop computer with two or more standard gaming GPUs (RTX 3090 or 4090).”

This is about fine-tuning an existing model, not necessarily training one from scratch.

There are two tricks at play here. The first is QLoRA, which can be used to train quantized models despite the reduced precision usually preventing gradient descent from working correctly.

QLoRA can bring the memory requirements for a 70b model down to 35GB, but gaming GPUs aren’t quite that big. The second trick is Meta’s Fully Sharded Data Parallel or FSDP library, which can shard a model across GPUs. Two consumer 24GB GPUs can then handle the 70b training run.

# 10:47 am / llms, ai, jeremy-howard, generative-ai, gpus

The GPT-4 barrier has finally been broken

Four weeks ago, GPT-4 remained the undisputed champion: consistently at the top of every key benchmark, but more importantly the clear winner in terms of “vibes”. Almost everyone investing serious time exploring LLMs agreed that it was the most capable default model for the majority of tasks—and had been for more than a year.

[... 717 words]

March 9, 2024

Coroutines and web components (via) I like using generators in Python but I rarely knowingly use them in JavaScript—I’m probably most exposed to them by Observable, which uses then extensively under the hood as a mostly hidden implementation detail.

Laurent Renard here shows some absolutely ingenious tricks with them as a way of building stateful Web Components.

# 3:38 am / web-components, observable, javascript

In every group I speak to, from business executives to scientists, including a group of very accomplished people in Silicon Valley last night, much less than 20% of the crowd has even tried a GPT-4 class model.

Less than 5% has spent the required 10 hours to know how they tick.

Ethan Mollick

# 3:55 am / ethan-mollick, generative-ai, gpt-4, ai, llms

March 10, 2024

datasette/studio. I’m trying a new way to make Datasette available for small personal data manipulation projects, using GitHub Codespaces.

This repository is designed to be opened directly in Codespaces—detailed instructions in the README.

When the container starts it installs the datasette-studio family of plugins—including CSV upload, some enrichments and a few other useful feature—then starts the server running and provides a big green button to click to access the server via GitHub’s port forwarding mechanism.

# 3:03 am / github-codespaces, projects, datasette

S3 is files, but not a filesystem (via) Cal Paterson helps some concepts click into place for me: S3 imitates a file system but has a number of critical missing features, the most important of which is the lack of partial updates. Any time you want to modify even a few bytes in a file you have to upload and overwrite the entire thing. Almost every database system is dependent on partial updates to function, which is why there are so few databases that can use S3 directly as a backend storage mechanism.

# 11:47 am / s3, aws, databases

March 11, 2024

NICAR 2024 Tipsheets & Audio. The NICAR data journalism conference was outstanding this year: ~1100 attendees, and every slot on the schedule had at least 2 sessions that I wanted to attend (and usually a lot more).

If you’re interested in the intersection of data analysis and journalism it really should be a permanent fixture on your calendar, it’s fantastic.

Here’s the official collection of handouts (NICAR calls them tipsheets) and audio recordings from this year’s event.

# 1:14 am / conferences, data-journalism, nicar

March 12, 2024

Speedometer 3.0: The Best Way Yet to Measure Browser Performance. The new browser performance testing suite, released as a collaboration between Blink, Gecko, and WebKit. It’s fun to run this in your browser and watch it rattle through 580 tests written using a wide variety of modern JavaScript frameworks and visualization libraries.

# 4:26 am / benchmarks, web-performance, javascript