19th December 2025
In 2025, Reinforcement Learning from Verifiable Rewards (RLVR) emerged as the de facto new major stage to add to this mix. By training LLMs against automatically verifiable rewards across a number of environments (e.g. think math/code puzzles), the LLMs spontaneously develop strategies that look like "reasoning" to humans - they learn to break down problem solving into intermediate calculations and they learn a number of problem solving strategies for going back and forth to figure things out (see DeepSeek R1 paper for examples).
— Andrej Karpathy, 2025 LLM Year in Review
Recent articles
- Two new Showboat tools: Chartroom and datasette-showboat - 17th February 2026
- Deep Blue - 15th February 2026
- The evolution of OpenAI's mission statement - 13th February 2026