Simon Willison’s Weblog

23rd January 2024 - Link Blog

Prompt Lookup Decoding (via) Really neat LLM optimization trick by Apoorv Saxena, who observed that it’s common for sequences of tokens in LLM input to be reflected by the output—snippets included in a summarization, for example.

Apoorv’s code performs a simple search for such prefixes and uses them to populate a set of suggested candidate IDs during LLM token generation.

The result appears to provide around a 2.4x speed-up in generating outputs!

Posted 23rd January 2024 at 2:14 am

Recent articles

The new GPT-5.6 family: Luna, Terra, Sol - 9th July 2026
sqlite-utils 4.0, now with database schema migrations - 7th July 2026
sqlite-utils 4.0rc2, mostly written by Claude Fable (for about $149.25) - 5th July 2026

This is a link post by Simon Willison, posted on 23rd January 2024.

ai 2,113 generative-ai 1,868 llms 1,835

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe