30th August 2024
We have recently trained our first 100M token context model: LTM-2-mini. 100M tokens equals ~10 million lines of code or ~750 novels.
For each decoded token, LTM-2-mini's sequence-dimension algorithm is roughly 1000x cheaper than the attention mechanism in Llama 3.1 405B for a 100M token context window.
The contrast in memory requirements is even larger -- running Llama 3.1 405B with a 100M token context requires 638 H100s per user just to store a single 100M token KV cache. In contrast, LTM requires a small fraction of a single H100's HBM per user for the same context.
— Magic AI
Recent articles
- I vibe coded my dream macOS presentation app - 25th February 2026
- Writing about Agentic Engineering Patterns - 23rd February 2026
- Adding TILs, releases, museums, tools and research to my blog - 20th February 2026