AbsenceBench: Language Models Can't Tell What's Missing (via) Here's another interesting result to file under the "jagged frontier" of LLMs, where their strengths and weaknesses are often unintuitive.
Long context models have been getting increasingly good at passing "Needle in a Haystack" tests recently, but what about a problem in the opposite direction?
This paper explores what happens when you give a model some content and then a copy with a portion removed, then ask what changed.
Here's a truncated table of results from the paper:
Models | Poetry | Sequences | GitHub PRs | Average |
---|---|---|---|---|
Gemini-2.5-flash* |
87.3 | 95.4 | 30.9 | 71.2 |
Claude-3.7-Sonnet* |
72.7 | 96.0 | 40.0 | 69.6 |
Claude-3.7-Sonnet | 73.5 | 91.4 | 35.7 | 66.9 |
Gemini-2.5-flash | 79.3 | 85.2 | 26.2 | 63.6 |
o3-mini* |
65.0 | 78.1 | 38.9 | 60.7 |
GPT-4.1 | 54.3 | 57.5 | 36.2 | 49.3 |
... | ... | ... | ... | ... |
DeepSeek-R1* |
38.7 | 29.5 | 23.1 | 30.4 |
Qwen3-235B* |
26.1 | 18.5 | 24.6 | 23.1 |
Mixtral-8x7B-Instruct | 4.9 | 21.9 | 17.3 | 14.7 |
*
indicates a reasoning model. Sequences are lists of numbers like 117,121,125,129,133,137
, Poetry consists of 100-1000 line portions from the Gutenberg Poetry Corpus and PRs are diffs with 10 to 200 updated lines.
The strongest models do well at numeric sequences, adequately at the poetry challenge and really poorly with those PR diffs. Reasoning models do slightly better at the cost of burning through a lot of reasoning tokens - often more than the length of the original document.
The paper authors - Harvey Yiyun Fu and Aryan Shrivastava and Jared Moore and Peter West and Chenhao Tan and Ari Holtzman - have a hypothesis as to what's going on here:
We propose an initial hypothesis explaining this behavior: identifying presence is simpler than absence with the attention mechanisms underlying Transformers (Vaswani et al., 2017). Information included in a document can be directly attended to, while the absence of information cannot.
Recent articles
- Trying out the new Gemini 2.5 model family - 17th June 2025
- The lethal trifecta for AI agents: private data, untrusted content, and external communication - 16th June 2025
- An Introduction to Google’s Approach to AI Agent Security - 15th June 2025