Lessons after a half-billion GPT tokens (via) Ken Kantzer presents some hard-won experience from shipping real features on top of OpenAI’s models.
They ended up settling on a very basic abstraction over the chat API—mainly to handle automatic retries on a 500 error. No complex wrappers, not even JSON mode or function calling or system prompts.
Rather than counting tokens they estimate tokens as 3 times the length in characters, which works well enough.
One challenge they highlight for structured data extraction (one of my favourite use-cases for LLMs): “GPT really cannot give back more than 10 items. Trying to have it give you back 15 items? Maybe it does it 15% of the time.”
(Several commenters on Hacker News report success in getting more items back by using numbered keys or sequence IDs in the returned JSON to help the model keep count.)
Recent articles
- Weeknotes: Llama 3, AI for Data Journalism, llm-evals and datasette-secrets - 23rd April 2024
- Options for accessing Llama 3 from the terminal using LLM - 22nd April 2024
- AI for Data Journalism: demonstrating what we can do with this stuff right now - 17th April 2024
- Three major LLM releases in 24 hours (plus weeknotes) - 10th April 2024
- Building files-to-prompt entirely using Claude 3 Opus - 8th April 2024
- Running OCR against PDFs and images directly in your browser - 30th March 2024