DeepSeek API introduces Context Caching on Disk (via) I wrote about Claude prompt caching this morning. It turns out Chinese LLM lab DeepSeek released their own implementation of context caching a couple of weeks ago, with the simplest possible pricing model: it's just turned on by default for all users.
When duplicate inputs are detected, the repeated parts are retrieved from the cache, bypassing the need for recomputation. This not only reduces service latency but also significantly cuts down on overall usage costs.
For cache hits, DeepSeek charges $0.014 per million tokens, slashing API costs by up to 90%.
[...]
The disk caching service is now available for all users, requiring no code or interface changes. The cache service runs automatically, and billing is based on actual cache hits.
DeepSeek currently offer two frontier models, DeepSeek-V2 and DeepSeek-Coder-V2, both of which can be run as open weights models or accessed via their API.
Recent articles
- AI assisted search-based research actually works now - 21st April 2025
- Maybe Meta's Llama claims to be open source because of the EU AI act - 19th April 2025
- Image segmentation using Gemini 2.5 - 18th April 2025