18th April 2024 - Link Blog
Andrej Karpathy's Llama 3 review. The most interesting coverage I’ve seen so far of Meta’s Llama 3 models (8b and 70b so far, 400b promised later).
Andrej notes that Llama 3 trained on 15 trillion tokens—up from 2 trillion for Llama 2—and they used that many even for the smaller 8b model, 75x more than the chinchilla scaling laws would suggest.
The tokenizer has also changed—they now use 128,000 tokens, up from 32,000. This results in a 15% drop in the tokens needed to represent a string of text.
The one disappointment is the context length—just 8,192, 2x that of Llama 2 and 4x LLaMA 1 but still pretty small by today’s standards.
If early indications hold, the 400b model could be the first genuinely GPT-4 class openly licensed model. We’ll have to wait and see.
Recent articles
- Claude Opus 4.8: "a modest but tangible improvement" - 28th May 2026
- I think Anthropic and OpenAI have found product-market fit - 27th May 2026
- Notes on Pope Leo XIV's encyclical on AI - 25th May 2026