Simon Willison’s Weblog

Subscribe

Friday, 8th March 2024

The GPT-4 barrier has finally been broken

Four weeks ago, GPT-4 remained the undisputed champion: consistently at the top of every key benchmark, but more importantly the clear winner in terms of “vibes”. Almost everyone investing serious time exploring LLMs agreed that it was the most capable default model for the majority of tasks—and had been for more than a year.

[... 697 words]

You can now train a 70b language model at home (via) Jeremy Howard and team: “Today, we’re releasing Answer.AI’s first project: a fully open source system that, for the first time, can efficiently train a 70b large language model on a regular desktop computer with two or more standard gaming GPUs (RTX 3090 or 4090).”

This is about fine-tuning an existing model, not necessarily training one from scratch.

There are two tricks at play here. The first is QLoRA, which can be used to train quantized models despite the reduced precision usually preventing gradient descent from working correctly.

QLoRA can bring the memory requirements for a 70b model down to 35GB, but gaming GPUs aren’t quite that big. The second trick is Meta’s Fully Sharded Data Parallel or FSDP library, which can shard a model across GPUs. Two consumer 24GB GPUs can then handle the 70b training run. # 10:47 am

Become a Wikipedian in 30 minutes (via) A characteristically informative and thoughtful guide to getting started with Wikipedia editing by Molly White—video accompanied by a full transcript.

I found the explanation of Reliable Sources particularly helpful, including why Wikipedia prefers secondary to primary sources.

“The way we determine reliability is typically based on the reputation for editorial oversight, and for factchecking and corrections. For example, if you have a reference book that is published by a reputable publisher that has an editorial board and that has edited the book for accuracy, if you know of a newspaper that has, again, an editorial team that is reviewing articles and issuing corrections if there are any errors, those are probably reliable sources.” # 9:47 am

Eloquent JavaScript, 4th edition (2024) (via) Marijn Haverbeke is the creator of both the CodeMirror JavaScript code editor library (used by Datasette and many other projects) and the ProseMirror rich-text editor. Eloquent JavaScript is his Creative Commons licensed book on JavaScript, first released in 2007 and now in its 4th edition.

I’ve only dipped into it myself but it has an excellent reputation. # 4:07 am

Inflection-2.5: meet the world’s best personal AI (via) I’ve not been paying much attention to Inflection’s Pi since it released last year, but yesterday they released a new version that they claim is competitive with GPT-4.

“Inflection-2.5 approaches GPT-4’s performance, but used only 40% of the amount of compute for training.”

(I wasn’t aware that the compute used to train GPT-4 was public knowledge.)

If this holds true, that means that the GPT-4 barrier has been well and truly smashed: we now have Claude 3 Opus, Gemini 1.5, Mistral Large and Inflection-2.5 in the same class as GPT-4, up from zero contenders just a month ago. # 12:51 am

American Community Survey Data via FTP. I got talking to some people from the US Census at NICAR today and asked them if there was a way to download their data in bulk (in addition to their various APIs)... and there was!

I had heard of the American Community Survey but I hadn’t realized that it’s gathered on a yearly basis, as a 5% sample compared to the full every-ten-years census. It’s only been running for ten years, and there’s around a year long lead time on the survey becoming available. # 12:25 am