What We Learned from a Year of Building with LLMs (Part I). Accumulated wisdom from six experienced LLM hackers. Lots of useful tips in here. On providing examples in a prompt:
If n is too low, the model may over-anchor on those specific examples, hurting its ability to generalize. As a rule of thumb, aim for n ≥ 5. Don’t be afraid to go as high as a few dozen.
There's a recommendation not to overlook keyword search when implementing RAG - tricks with embeddings can miss results for things like names or acronyms, and keyword search is much easier to debug.
Plus this tip on using the LLM-as-judge pattern for implementing automated evals:
Instead of asking the LLM to score a single output on a Likert scale, present it with two options and ask it to select the better one. This tends to lead to more stable results.
Recent articles
- What happens if AI labs train for pelicans riding bicycles? - 13th November 2025
- Reverse engineering Codex CLI to get GPT-5-Codex-Mini to draw me a pelican - 9th November 2025
- Video + notes on upgrading a Datasette plugin for the latest 1.0 alpha, with help from uv and OpenAI Codex CLI - 6th November 2025