gpt-4-turbo over the API produces (statistically significant) shorter completions when it "thinks" its December vs. when it thinks its May (as determined by the date in the system prompt).
I took the same exact prompt over the API (a code completion task asking to implement a machine learning task without libraries).
I created two system prompts, one that told the API it was May and another that it was December and then compared the distributions.
For the May system prompt, mean = 4298 For the December system prompt, mean = 4086
N = 477 completions in each sample from May and December
t-test p < 2.28e-07
Recent articles
- My review of Claude's new Code Interpreter, released under a very confusing name - 9th September 2025
- Recreating the Apollo AI adoption rate chart with GPT-5, Python and Pyodide - 9th September 2025
- GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search - 6th September 2025