Archive for Thursday, 2nd July 2026

Thursday, 2nd July 2026

I saw Geoffrey Litt speak at AIE yesterday, and one framing he used particularly resonated with me:

Understand to participate

Geoffrey was talking about the challenge of collaborating with coding agents as they construct increasingly large and sophisticated changes, and the need to avoid taking on cognitive debt as your understanding drifts from how the code actually works.

His argument is that you need to understand the code to a depth that enables you to participate further with the model:

You can learn what the agent is doing to make sure you can be an active participant in the creative process. [...]

You need a rich set of concepts in your mind to think creatively and fluently about how to move something forward. If you're lacking that fluency, your ability to participate in the project is meaningfully limited.

The AIE talks are all recorded - all 300+ of them! - and should be trickling out over the next three weeks. Geoffrey's is one that I recommend catching on YouTube.

Update 10th July: here's Geoffrey's talk on YouTube.

Geoffrey also published a thread version of his talk on Twitter.

# 5:07 pm / ai, generative-ai, llms, geoffrey-litt, coding-agents, cognitive-debt

Research Using DSPy to evaluate and improve Datasette Agent's SQL system prompts

One of this morning's AIE keynotes covered dspy, which reminded me I've been meaning to see if it could help me improve the system prompt used by Datasette Agent - so I fired off an asynchronous research task in Claude Code for web using Claude Fable 5:

Pip install the latest Datasette alpha and datasette-agent and dspy - then figure out how to use dspy to evaluate and improve the main system prompts used by Datasette Agent for the feature where it can execute read only SQL queries to answer user questions about data.

Fable chose to test using GPT 4.1 mini and nano, and identified several promising looking directions for improvements. I particularly like this one:

The schema listing gives only table names; the "don't call describe_table if you already have the information" advice caused column-name guessing (page_count, o.order_id, first_name) and error-retry loops in baseline traces. Either include column names in the prompt's schema listing or soften that advice.

2nd Jul 2026, 6:25 pm · ai, datasette, generative-ai, llms, evals, dspy, datasette-agent, claude-mythos-fable

Release llm-coding-agent 0.1a0

Another Fable 5 experiment. Now that my LLM library has evolved into more of an agent framework it's time to see what a simple coding agent would look like built on it.

I started a new Python library using my python-lib-template-repository GitHub template repository, then ran these two prompts (here's the Claude Code for web transcript):

Write a spec.md for this project - it will depend on the latest “llm” alpha from PyPI and implement a Claude code style coding agent complete with tools for reading and editing files and executing commands

Then:

Commit the spec, then build it using red/green TDD in a series of sensible commits (each with passing tests and updated docs) - occasionally manually test it using the OpenAI API key in your environment

Here's the spec, the resulting README file, and the sequence of commits.

I've shipped a slop-alpha to PyPI, so you can run the new agent like this:

uvx --prerelease=allow --with llm-coding-agent llm code

It's pretty good for a first attempt! Here's the (Fable-authored) README, which lists recipes like llm code --yolo and llm code --allow "pytest*" --allow "git diff*".

It also presents a Python API based around a CodingAgent(model="gpt-5.5", root="/path", approve=True).run("Fix the failing test in tests/test_parser.py") class which I didn't ask for but I'm delighted to see implemented.

Here's the suite of tools it implemented, listed using uvx ... llm tools:

CodingTools_edit_file(path: str, old_string: str, new_string: str, replace_all: bool = False) -> str

Replace an exact string in a file.

old_string must match the file contents exactly (including whitespace) and must identify a unique location unless replace_all is true. Returns a diff of the change so it can be verified.

CodingTools_execute_command(command: str, timeout: int = 120) -> str

Run a shell command in the session root directory.

Returns combined stdout and stderr followed by an Exit code line. timeout is in seconds (maximum 600); on timeout the whole process tree is killed.

CodingTools_list_files(pattern: str = '**/*', path: str = '.') -> str

List files matching a glob pattern, newest first.

Skips hidden directories, node_modules, __pycache__ and (in a git repository) anything covered by .gitignore. Returns at most 200 paths relative to the searched directory.

CodingTools_read_file(path: str, offset: int = 0, limit: int = 2000) -> str

Read a text file, returning numbered lines like cat -n.

Paths are relative to the session root. Use offset (0-based first line) and limit (max lines) to page through files too large to read in one call.

CodingTools_search_files(pattern: str, path: str = '.', glob: str = None, max_results: int = 100) -> str

Search file contents for a regular expression.

Returns matches as path:line_number:line, capped at max_results. Use glob (e.g. "*.py") to restrict which files are searched.

CodingTools_write_file(path: str, content: str) -> str

Create or overwrite a file with the given content.

Parent directories are created as needed. Prefer edit_file for modifying existing files.

I tried it out by running llm code --yolo and then prompting:

mkdir /tmp/demo and then in that folder create a simple swiftui CLI app for telling the time in ascii art

Here's the transcript, in which GPT-5.5 reasoning notes that "SwiftUI isn't suitable for a true CLI" and then builds an app that outputs this on swift run AsciiTime:

      █    █████         ████     █             █     ███   
     ██    █        █        █   ██      █     ██    █   █  
      █    ████           ███     █             █       █   
      █        █    █        █    █      █      █      █    
     ███   ████          ████    ███           ███   █████

2nd Jul 2026, 7:33 pm · projects, ai, generative-ai, llm, llm-tool-use, coding-agents, claude-code, claude-mythos-fable

← Wednesday, 1st July 2026

Friday, 3rd July 2026 →

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Simon Willison’s Weblog

Thursday, 2nd July 2026