Andrej Karpathy's initial impressions of Grok 3. Andrej has the most detailed analysis I've seen so far of xAI's Grok 3 release from last night. He runs through a bunch of interesting test prompts, and concludes:
As far as a quick vibe check over ~2 hours this morning, Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI's strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking. Which is quite incredible considering that the team started from scratch ~1 year ago, this timescale to state of the art territory is unprecedented.
I was delighted to see him include my Generate an SVG of a pelican riding a bicycle benchmark in his tests:

Grok 3 is currently sat at the top of the LLM Chatbot Arena (across all of their categories) so it's doing very well based on vibes for the voters there.
Recent articles
- Introducing Showboat and Rodney, so agents can demo what they’ve built - 10th February 2026
- How StrongDM's AI team build serious software without even looking at the code - 7th February 2026
- Running Pydantic's Monty Rust sandboxed Python subset in WebAssembly - 6th February 2026