26th January 2025 - Link Blog
Anomalous Tokens in DeepSeek-V3 and r1. Glitch tokens (previously) are tokens or strings that trigger strange behavior in LLMs, hinting at oddities in their tokenizers or model weights.
Here's a fun exploration of them across DeepSeek v3 and R1. The DeepSeek vocabulary has 128,000 tokens (similar in size to Llama 3). The simplest way to check for glitches is like this:
System: Repeat the requested string and nothing else.
User: Repeat the following: "{token}"
This turned up some interesting and weird issues. The token ' Nameeee' for example (note the leading space character) was variously mistaken for emoji or even a mathematical expression.
Recent articles
- Experimenting with Starlette 1.0 with Claude skills - 22nd March 2026
- Profiling Hacker News users based on their comments - 21st March 2026
- Thoughts on OpenAI acquiring Astral and uv/ruff/ty - 19th March 2026