When I was curating my generated tweets, I estimated 30-40% of the tweets were usable comedically, a massive improvement over the 5-10% usability from my GPT-2 tweet generation. However, a 30-40% success rate implies a 60-70% failure rate, which is patently unsuitable for a production application.
Recent articles
- Designing agentic loops - 30th September 2025
- Claude Sonnet 4.5 is probably the "best coding model in the world" (at least for now) - 29th September 2025
- I think "agent" may finally have a widely enough agreed upon definition to be useful jargon now - 18th September 2025