Simon Willison’s Weblog

Subscribe

Tuesday, 14th May 2024

Why your voice assistant might be sexist (via) Given OpenAI's demo yesterday of a vocal chat assistant with a flirty, giggly female voice - and the new ability to be interrupted! - it's worth revisiting this piece by Chris Baraniuk from June 2022 about gender dynamics in voice assistants. Includes a link to this example of a synthesized non-binary voice.

# 4:16 pm / ai, openai, chatgpt

How developers are using Gemini 1.5 Pro’s 1 million token context window. I got to be a talking head for a few seconds in an intro video for today's Google I/O keynote, talking about how I used Gemini Pro 1.5 to index my bookshelf (and with a cameo from my squirrel nutcracker). I'm at 1m25s.

(Or at 10m6s in the full video of the keynote)

# 8:27 pm / google, video, google-io, ai, generative-ai, llms, gemini

llm-gemini 0.1a4. A new release of my llm-gemini plugin adding support for the Gemini 1.5 Flash model that was revealed this morning at Google I/O.

I'm excited about this new model because of its low price. Flash is $0.35 per 1 million tokens for prompts up to 128K token and $0.70 per 1 million tokens for longer prompts - up to a million tokens now and potentially two million at some point in the future. That's 1/10th of the price of Gemini Pro 1.5, cheaper than GPT 3.5 ($0.50/million) and only a little more expensive than Claude 3 Haiku ($0.25/million).

# 8:32 pm / projects, google-io, ai, generative-ai, llms, llm, gemini

Context caching for Google Gemini (via) Another new Gemini feature announced today. Long context models enable answering questions against large chunks of text, but the price of those long prompts can be prohibitive - $3.50/million for Gemini Pro 1.5 up to 128,000 tokens and $7/million beyond that.

Context caching offers a price optimization, where the long prefix prompt can be reused between requests, halving the cost per prompt but at an additional cost of $4.50 / 1 million tokens per hour to keep that context cache warm.

Given that hourly extra charge this isn't a default optimization for all cases, but certain high traffic applications might be able to save quite a bit on their longer prompt systems.

It will be interesting to see if other vendors such as OpenAI and Anthropic offer a similar optimization in the future.

Update 14th August 2024: Anthropic's Claude now has its own version of prompt caching.

# 8:42 pm / google, ai, prompt-engineering, generative-ai, llms, gemini, llm-pricing