18th April 2024 - Link Blog
Andrej Karpathy's Llama 3 review. The most interesting coverage I’ve seen so far of Meta’s Llama 3 models (8b and 70b so far, 400b promised later).
Andrej notes that Llama 3 trained on 15 trillion tokens—up from 2 trillion for Llama 2—and they used that many even for the smaller 8b model, 75x more than the chinchilla scaling laws would suggest.
The tokenizer has also changed—they now use 128,000 tokens, up from 32,000. This results in a 15% drop in the tokens needed to represent a string of text.
The one disappointment is the context length—just 8,192, 2x that of Llama 2 and 4x LLaMA 1 but still pretty small by today’s standards.
If early indications hold, the 400b model could be the first genuinely GPT-4 class openly licensed model. We’ll have to wait and see.
Recent articles
- Writing about Agentic Engineering Patterns - 23rd February 2026
- Adding TILs, releases, museums, tools and research to my blog - 20th February 2026
- Two new Showboat tools: Chartroom and datasette-showboat - 17th February 2026