The most surprising part of DeepSeek-R1 is that it only takes ~800k samples of 'good' RL reasoning to convert other models into RL-reasoners. Now that DeepSeek-R1 is available people will be able to refine samples out of it to convert any other model into an RL reasoner.
Recent articles
- What's new in the world of LLMs, for NICAR 2025 - 8th March 2025
- I built an automaton called Squadron - 4th March 2025
- Notes from my Accessibility and Gen AI podcast appearance - 2nd March 2025