Simon Willison’s Weblog

Subscribe
Atom feed for llm Random

599 posts tagged “llm”

LLM is my command-line tool for running prompts against Large Language Models.

2025

Introducing OpenAI o3 and o4-mini. OpenAI are really emphasizing tool use with these:

For the first time, our reasoning models can agentically use and combine every tool within ChatGPT—this includes searching the web, analyzing uploaded files and other data with Python, reasoning deeply about visual inputs, and even generating images. Critically, these models are trained to reason about when and how to use tools to produce detailed and thoughtful answers in the right output formats, typically in under a minute, to solve more complex problems.

I released llm-openai-plugin 0.3 adding support for the two new models:

llm install -U llm-openai-plugin
llm -m openai/o3 "say hi in five languages"
llm -m openai/o4-mini "say hi in five languages"

Here are the pelicans riding bicycles (prompt: Generate an SVG of a pelican riding a bicycle).

o3:

Described by o3: The illustration shows a playful, stylized bicycle whose frame is drawn to resemble a duck. • The duck’s rounded body forms the bicycle’s seat area, and a small wing is sketched on its side.  • Its long neck stretches forward to become the top tube, ending in a simple head with a black eye and an orange, open beak that points ahead like handlebars.  • Two large black‑outlined wheels with thin, evenly spaced spokes complete the bike, while thin blue lines depict the rest of the frame, pedals, and chain.  • A dashed grey ground line runs beneath the wheels, giving the impression the duck‑bike is rolling along.

o4-mini:

Described by o4-mini: The image is a very simple, minimalistic line drawing of a bird riding a bicycle, rendered against a plain white background.  Key elements:   • Two large, identical circles for the bike’s wheels, outlined in thick black.   • A horizontal bar links the rear wheel to the bird’s body; a slanted bar links the front wheel to the bird.   • The front wheel has a single straight spoke ending in an arrowhead, suggesting the forward direction of travel.   • The bird itself is drawn with basic shapes:     – A small circle for the head, with a single dot for the eye and a short yellow triangle for the beak.     – An oval for the body.     – Thin lines for the neck, legs, and the bike’s pedals and handlebars.   • The bird appears perched on the saddle, its legs extending down to the pedals, and its tiny wings resting near the handlebars.

Here are the full OpenAI model listings: o3 is $10/million input and $40/million for output, with a 75% discount on cached input tokens, 200,000 token context window, 100,000 max output tokens and a May 31st 2024 training cut-off (same as the GPT-4.1 models). It's a bit cheaper than o1 ($15/$60) and a lot cheaper than o1-pro ($150/$600).

o4-mini is priced the same as o3-mini: $1.10/million for input and $4.40/million for output, also with a 75% input caching discount. The size limits and training cut-off are the same as o3.

You can compare these prices with other models using the table on my updated LLM pricing calculator.

A new capability released today is that the OpenAI API can now optionally return reasoning summary text. I've been exploring that in this issue. I believe you have to verify your organization (which may involve a photo ID) in order to use this option - once you have access the easiest way to see the new tokens is using curl like this:

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(llm keys get openai)" \
  -d '{
    "model": "o3",
    "input": "why is the sky blue?",
    "reasoning": {"summary": "auto"},
    "stream": true
  }'

This produces a stream of events that includes this new event type:

event: response.reasoning_summary_text.delta
data: {"type": "response.reasoning_summary_text.delta","item_id": "rs_68004320496081918e1e75ddb550d56e0e9a94ce520f0206","output_index": 0,"summary_index": 0,"delta": "**Expl"}

Omit the "stream": true and the response is easier to read and contains this:

{
  "output": [
    {
      "id": "rs_68004edd2150819183789a867a9de671069bc0c439268c95",
      "type": "reasoning",
      "summary": [
        {
          "type": "summary_text",
          "text": "**Explaining the blue sky**\n\nThe user asks a classic question about why the sky is blue. I'll talk about Rayleigh scattering, where shorter wavelengths of light scatter more than longer ones. This explains how we see blue light spread across the sky! I wonder if the user wants a more scientific or simpler everyday explanation. I'll aim for a straightforward response while keeping it engaging and informative. So, let's break it down!"
        }
      ]
    },
    {
      "id": "msg_68004edf9f5c819188a71a2c40fb9265069bc0c439268c95",
      "type": "message",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "annotations": [],
          "text": "The short answer ..."
        }
      ]
    }
  ]
}

# 16th April 2025, 5:46 pm / llm, openai, llm-tool-use, llm-pricing, ai, llms, llm-release, generative-ai, llm-reasoning

Release llm-openai-plugin 0.3 — OpenAI plugin for LLM
Release llm-openai-plugin 0.2.1 — OpenAI plugin for LLM

GPT-4.1: Three new million token input models from OpenAI, including their cheapest model yet

Visit GPT-4.1: Three new million token input models from OpenAI, including their cheapest model yet

OpenAI introduced three new models this morning: GPT-4.1, GPT-4.1 mini and GPT-4.1 nano. These are API-only models right now, not available through the ChatGPT interface (though you can try them out in OpenAI’s API playground). All three models can handle 1,047,576 tokens of input and 32,768 tokens of output, and all three have a May 31, 2024 cut-off date (their previous models were mostly September 2023).

[... 1,124 words]

Release llm-openai-plugin 0.2 — OpenAI plugin for LLM
Release llm-docs 0.2 — LLM plugin for asking questions of LLM's own documentation, and related packages

llm-fragments-rust (via) Inspired by Filippo Valsorda's llm-fragments-go, Francois Garillot created llm-fragments-rust, an LLM fragments plugin that lets you pull documentation for any Rust crate directly into a prompt to LLM.

I really like this example, which uses two fragments to load documentation for two crates at once:

llm -f rust:rand@0.8.5 -f rust:tokio "How do I generate random numbers asynchronously?"

The code uses some neat tricks: it creates a new Rust project in a temporary directory (similar to how llm-fragments-go works), adds the crates and uses cargo doc --no-deps --document-private-items to generate documentation. Then it runs cargo tree --edges features to add dependency information, and cargo metadata --format-version=1 to include additional metadata about the crate.

# 11th April 2025, 5:36 pm / llm, rust, ai-assisted-programming, plugins, generative-ai, ai, llms

Release llm 0.25a0 — Access large language models from the command-line

llm-docsmith (via) Matheus Pedroni released this neat plugin for LLM for adding docstrings to existing Python code. You can run it like this:

llm install llm-docsmith
llm docsmith ./scripts/main.py -o

The -o option previews the changes that will be made - without -o it edits the files directly.

It also accepts a -m claude-3.7-sonnet parameter for using an alternative model from the default (GPT-4o mini).

The implementation uses the Python libcst "Concrete Syntax Tree" package to manipulate the code, which means there's no chance of it making edits to anything other than the docstrings.

Here's the full system prompt it uses.

One neat trick is at the end of the system prompt it says:

You will receive a JSON template. Fill the slots marked with <SLOT> with the appropriate description. Return as JSON.

That template is actually provided JSON generated using these Pydantic classes:

class Argument(BaseModel):
    name: str
    description: str
    annotation: str | None = None
    default: str | None = None

class Return(BaseModel):
    description: str
    annotation: str | None

class Docstring(BaseModel):
    node_type: Literal["class", "function"]
    name: str
    docstring: str
    args: list[Argument] | None = None
    ret: Return | None = None

class Documentation(BaseModel):
    entries: list[Docstring]

The code adds <SLOT> notes to that in various places, so the template included in the prompt ends up looking like this:

{
  "entries": [
    {
      "node_type": "function",
      "name": "create_docstring_node",
      "docstring": "<SLOT>",
      "args": [
        {
          "name": "docstring_text",
          "description": "<SLOT>",
          "annotation": "str",
          "default": null
        },
        {
          "name": "indent",
          "description": "<SLOT>",
          "annotation": "str",
          "default": null
        }
      ],
      "ret": {
        "description": "<SLOT>",
        "annotation": "cst.BaseStatement"
      }
    }
  ]
}

# 10th April 2025, 6:09 pm / prompt-engineering, llm, python, plugins, generative-ai, ai, pydantic

llm-fragments-go (via) Filippo Valsorda released the first plugin by someone other than me that uses LLM's new register_fragment_loaders() plugin hook I announced the other day.

Install with llm install llm-fragments-go and then:

You can feed the docs of a Go package into LLM using the go: fragment with the package name, optionally followed by a version suffix.

llm -f go:golang.org/x/mod/sumdb/note@v0.23.0 "Write a single file command that generates a key, prints the verifier key, signs an example message, and prints the signed note."

The implementation is just 33 lines of Python and works by running these commands in a temporary directory:

go mod init llm_fragments_go
go get golang.org/x/mod/sumdb/note@v0.23.0
go doc -all golang.org/x/mod/sumdb/note

# 10th April 2025, 3:19 pm / generative-ai, llm, plugins, go, ai, llms, filippo-valsorda

An LLM Query Understanding Service (via) Doug Turnbull recently wrote about how all search is structured now:

Many times, even a small open source LLM will be able to turn a search query into reasonable structure at relatively low cost.

In this follow-up tutorial he demonstrates Qwen 2-7B running in a GPU-enabled Google Kubernetes Engine container to turn user search queries like "red loveseat" into structured filters like {"item_type": "loveseat", "color": "red"}.

Here's the prompt he uses.

Respond with a single line of JSON:

  {"item_type": "sofa", "material": "wood", "color": "red"}

Omit any other information. Do not include any
other text in your response. Omit a value if the
user did not specify it. For example, if the user
said "red sofa", you would respond with:

  {"item_type": "sofa", "color": "red"}

Here is the search query: blue armchair

Out of curiosity, I tried running his prompt against some other models using LLM:

  • gemini-1.5-flash-8b, the cheapest of the Gemini models, handled it well and cost $0.000011 - or 0.0011 cents.
  • llama3.2:3b worked too - that's a very small 2GB model which I ran using Ollama.
  • deepseek-r1:1.5b - a tiny 1.1GB model, again via Ollama, amusingly failed by interpreting "red loveseat" as {"item_type": "sofa", "material": null, "color": "red"} after thinking very hard about the problem!

# 9th April 2025, 8:47 pm / prompt-engineering, llm, generative-ai, search, ai, llms, gemini, ollama, qwen, ai-assisted-search, local-llms, ai-in-china

Release llm 0.24.2 — Access large language models from the command-line

Mistral Small 3.1 on Ollama. Mistral Small 3.1 (previously) is now available through Ollama, providing an easy way to run this multi-modal (vision) model on a Mac (and other platforms, though I haven't tried those myself).

I had to upgrade Ollama to the most recent version to get it to work - prior to that I got a Error: unable to load model message. Upgrades can be accessed through the Ollama macOS system tray icon.

I fetched the 15GB model by running:

ollama pull mistral-small3.1

Then used llm-ollama to run prompts through it, including one to describe this image:

llm install llm-ollama
llm -m mistral-small3.1 'describe this image' -a https://static.simonwillison.net/static/2025/Mpaboundrycdfw-1.png

Here's the output. It's good, though not quite as impressive as the description I got from the slightly larger Qwen2.5-VL-32B.

I also tried it on a scanned (private) PDF of hand-written text with very good results, though it did misread one of the hand-written numbers.

# 8th April 2025, 10:07 pm / vision-llms, mistral, llm, ollama, generative-ai, ai, llms, local-llms

Release llm 0.24.1 — Access large language models from the command-line
Release llm-templates-fabric 0.2 — Load LLM templates from Fabric

llm-hacker-news. I built this new plugin to exercise the new register_fragment_loaders() plugin hook I added to LLM 0.24. It's the plugin equivalent of the Bash script I've been using to summarize Hacker News conversations for the past 18 months.

You can use it like this:

llm install llm-hacker-news
llm -f hn:43615912 'summary with illustrative direct quotes'

You can see the output in this issue.

The plugin registers a hn: prefix - combine that with the ID of a Hacker News conversation to pull that conversation into the context.

It uses the Algolia Hacker News API which returns JSON like this. Rather than feed the JSON directly to the LLM it instead converts it to a hopefully more LLM-friendly format that looks like this example from the plugin's test:

[1] BeakMaster: Fish Spotting Techniques

[1.1] CoastalFlyer: The dive technique works best when hunting in shallow waters.

[1.1.1] PouchBill: Agreed. Have you tried the hover method near the pier?

[1.1.2] WingSpan22: My bill gets too wet with that approach.

[1.1.2.1] CoastalFlyer: Try tilting at a 40° angle like our Australian cousins.

[1.2] BrownFeathers: Anyone spotted those "silver fish" near the rocks?

[1.2.1] GulfGlider: Yes! They're best caught at dawn.
Just remember: swoop > grab > lift

That format was suggested by Claude, which then wrote most of the plugin implementation for me. Here's that Claude transcript.

# 8th April 2025, 12:11 am / llm, plugins, hacker-news, ai, llms, ai-assisted-programming, generative-ai, projects, anthropic, claude

Release llm-hacker-news 0.1 — LLM plugin for pulling content from Hacker News

Long context support in LLM 0.24 using fragments and template plugins

Visit Long context support in LLM 0.24 using fragments and template plugins

LLM 0.24 is now available with new features to help take advantage of the increasingly long input context supported by modern LLMs.

[... 1,896 words]

Release llm-fragments-github 0.1 — Load GitHub repository contents as LLM fragments
Release llm-templates-fabric 0.1 — Load LLM templates from Fabric
Release llm-templates-github 0.1 — Research prototype for new register_template_loaders LLM plugin hook
Release llm-docs 0.1 — LLM plugin for asking questions of LLM's own documentation, and related packages
Release llm 0.24 — Access large language models from the command-line
Release llm-docs 0.1a0 — LLM plugin for asking questions of LLM's own documentation, and related packages
Release llm 0.24a1 — Access large language models from the command-line

Initial impressions of Llama 4

Dropping a model release as significant as Llama 4 on a weekend is plain unfair! So far the best place to learn about the new model family is this post on the Meta AI blog. They’ve released two new models today: Llama 4 Maverick is a 400B model (128 experts, 17B active parameters), text and image input with a 1 million token context length. Llama 4 Scout is 109B total parameters (16 experts, 17B active), also multi-modal and with a claimed 10 million token context length—an industry first.

[... 1,468 words]

Gemini 2.5 Pro Preview pricing (via) Google's Gemini 2.5 Pro is currently the top model on LM Arena and, from my own testing, a superb model for OCR, audio transcription and long-context coding.

You can now pay for it!

The new gemini-2.5-pro-preview-03-25 model ID is priced like this:

  • Prompts less than 200,00 tokens: $1.25/million tokens for input, $10/million for output
  • Prompts more than 200,000 tokens (up to the 1,048,576 max): $2.50/million for input, $15/million for output

This is priced at around the same level as Gemini 1.5 Pro ($1.25/$5 for input/output below 128,000 tokens, $2.50/$10 above 128,000 tokens), is cheaper than GPT-4o for shorter prompts ($2.50/$10) and is cheaper than Claude 3.7 Sonnet ($3/$15).

Gemini 2.5 Pro is a reasoning model, and invisible reasoning tokens are included in the output token count. I just tried prompting "hi" and it charged me 2 tokens for input and 623 for output, of which 613 were "thinking" tokens. That still adds up to just 0.6232 cents (less than a cent) using my LLM pricing calculator which I updated to support the new model just now.

I released llm-gemini 0.17 this morning adding support for the new model:

llm install -U llm-gemini
llm -m gemini-2.5-pro-preview-03-25 hi

Note that the model continues to be available for free under the previous gemini-2.5-pro-exp-03-25 model ID:

llm -m gemini-2.5-pro-exp-03-25 hi

The free tier is "used to improve our products", the paid tier is not.

Rate limits for the paid model vary by tier - from 150/minute and 1,000/day for tier 1 (billing configured), 1,000/minute and 50,000/day for Tier 2 ($250 total spend) and 2,000/minute and unlimited/day for Tier 3 ($1,000 total spend). Meanwhile the free tier continues to limit you to 5 requests per minute and 25 per day.

Google are retiring the Gemini 2.0 Pro preview entirely in favour of 2.5.

# 4th April 2025, 5:22 pm / gemini, llm, generative-ai, llm-pricing, ai, llms, llm-reasoning, google, chatbot-arena

Release llm-gemini 0.17 — LLM plugin to access Google's Gemini family of models

smartfunc. Vincent D. Warmerdam built this ingenious wrapper around my LLM Python library which lets you build LLM wrapper functions using a decorator and a docstring:

from smartfunc import backend

@backend("gpt-4o")
def generate_summary(text: str):
    """Generate a summary of the following text: {{ text }}"""
    pass

summary = generate_summary(long_text)

It works with LLM plugins so the same pattern should work against Gemini, Claude and hundreds of others, including local models.

It integrates with more recent LLM features too, including async support and schemas, by introspecting the function signature:

class Summary(BaseModel):
    summary: str
    pros: list[str]
    cons: list[str]

@async_backend("gpt-4o-mini")
async def generate_poke_desc(text: str) -> Summary:
    "Describe the following pokemon: {{ text }}"
    pass

pokemon = await generate_poke_desc("pikachu")

Vincent also recorded a 12 minute video walking through the implementation and showing how it uses Pydantic, Python's inspect module and typing.get_type_hints() function.

# 3rd April 2025, 2:57 pm / llm, python, generative-ai, ai, llms, vincent-d-warmerdam

Release llm-command-r 0.3.1 — Access the Cohere Command R family of models