Simon Willison’s Weblog

Subscribe

August 2023

102 posts: 7 entries, 28 links, 10 quotes, 57 beats

Aug. 21, 2023

Release llm-openrouter 0.1 — LLM plugin for models hosted by OpenRouter
TIL Calculating the size of a SQLite database file using SQL — I learned this trick today while [browsing the code](https://github.com/tersesystems/blacklite/blob/main/blacklite-core/src/main/resources/com/tersesystems/blacklite/resources.properties) of [Blacklite](https://tersesystems.com/blog/2020/11/26/queryable-logging-with-blacklite/), a neat Java library for writing diagnostic logs to a SQLite database.

Queryable Logging with Blacklite (via) Will Sargent describes how he built Blacklite, a Java library for diagnostic logging that writes log events (as zstd compressed JSON objects) to a SQLite database and maintains 5,000 entries in a “live” database while entries beyond that range are cycled out to an archive.db file, which is cycled to archive.timestamp.db when it reaches 500,000 items.

Lots of interesting notes here on using SQLite for high performance logging.

“SQLite databases are also better log files in general. Queries are faster than parsing through flat files, with all the power of SQL. A vacuumed SQLite database is only barely larger than flat file logs. They are as easy to store and transport as flat file logs, but work much better when merging out of order or interleaved data between two logs.”

# 6:13 pm / java, logging, sqlite, zstd

Release datasette-publish-fly 1.3.1 — Datasette plugin for publishing data using Fly

If you visit (often NSFW, beware!) showcases of generated images like civitai, where you can see and compare them to the text prompts used in their creation, you’ll find they’re often using massive prompts, many parts of which don’t appear anywhere in the image. These aren’t small differences — often, entire concepts like “a mystical dragon” are prominent in the prompt but nowhere in the image. These users are playing a gacha game, a picture-making slot machine. They’re writing a prompt with lots of interesting ideas and then pulling the arm of the slot machine until they win… something. A compelling image, but not really the image they were asking for.

Sam Bleckley

# 7:38 pm / stable-diffusion, ai, generative-ai

When many business people talk about “AI” today, they treat it as a continuum with past capabilities of the CNN/RNN/GAN world. In reality it is a step function in new capabilities and products enabled, and marks the dawn of a new era of tech.

It is almost like cars existed, and someone invented an airplane and said “an airplane is just another kind of car - but with wings” - instead of mentioning all the new use cases and impact to travel, logistics, defense, and other areas. The era of aviation would have kicked off, not the “era of even faster cars”.

Elad Gil

# 8:32 pm / llms, ai, generative-ai

Release datasette-ripgrep 0.8.1 — Web interface for searching your code using ripgrep, built as a Datasette plugin
Release datasette-auth-tokens 0.4a1 — Datasette plugin for authenticating access using API tokens

Aug. 22, 2023

TIL Configuring Django SQL Dashboard for Fly PostgreSQL — I have a Fly application that uses their PostgreSQL service. I wanted to run [Django SQL Dashboard]() with a read-only user against that database.
Release datasette 1.0a4 — An open source multi-tool for exploring and publishing data

Datasette 1.0 alpha series leaks names of databases and tables to unauthenticated users. I found and fixed a security vulnerability in the Datasette 1.0 alpha series, described in this GitHub security advisory.

The vulnerability allowed unauthenticated users to see the names of the databases and tables in an otherwise private Datasette instance—though not the actual table contents.

The fix is now shipped in Datasette 1.0a4.

The vulnerability affected Datasette Cloud as well, but thankfully I was able to analyze the access logs and confirm that no unauthenticated requests had been made against any of the affected endpoints.

# 5:44 pm / releases, security, datasette

TIL Compile and run a new SQLite version with the existing sqlite3 Python library on macOS — I've been trying to figure this out for years. Previous notes include [Using LD_PRELOAD to run any version of SQLite with Python](https://til.simonwillison.net/sqlite/ld-preload) (Linux only), and [Building a specific version of SQLite with pysqlite on macOS/Linux](https://til.simonwillison.net/sqlite/build-specific-sqlite-pysqlite-macos) and [Using pysqlite3 on macOS](https://til.simonwillison.net/sqlite/pysqlite3-on-macos) (both using the `pysqlite3` package).

Datasette Cloud and the Datasette 1.0 alphas. I sent out the Datasette Newsletter for the first time in quite a while, with updates on Datasette Cloud, the Datasette 1.0 alphas, a note about the security vulnerability in those alphas and a summary of some of my research into combining LLMs with Datasette.

# 7:56 pm / projects, datasette, datasette-cloud, llms

Aug. 23, 2023

PostgreSQL Lock Conflicts (via) I absolutely love how extremely specific and niche this documentation site is. It details every single lock that PostgreSQL implements, and shows exactly which commands acquire that lock. That’s everything. I can imagine this becoming absurdly useful at extremely infrequent intervals for advanced PostgreSQL work.

# 3:08 am / documentation, postgresql

llm-tracker. Leonard Lin’s constantly updated encyclopedia of all things Large Language Model: lists of models, opinions on which ones are the most useful, details for running Speech-to-Text models, code assistants and much more.

# 4:11 am / leonard-lin, ai, generative-ai, llms

Here's the thing: if nearly all of the time the machine does the right thing, the human "supervisor" who oversees it becomes incapable of spotting its error. The job of "review every machine decision and press the green button if it's correct" inevitably becomes "just press the green button," assuming that the machine is usually right.

Cory Doctorow

# 2:26 pm / cory-doctorow, ai, ethics, ai-ethics

Release datasette-configure-fts 1.1.2 — Datasette plugin for enabling full-text search against selected table columns
Release llm-anyscale-endpoints 0.1 — LLM plugin for models hosted by Anyscale Endpoints
Release datasette-debug-permissions 0.1 — A Datasette plugin that outputs debug information about permission checks

Aug. 24, 2023

Release datasette-debug-permissions 0.2 — A Datasette plugin that outputs debug information about permission checks

And the notion that security updates, for every user in the world, would need the approval of the U.K. Home Office just to make sure the patches weren’t closing vulnerabilities that the government itself is exploiting — it boggles the mind. Even if the U.K. were the only country in the world to pass such a law, it would be madness, but what happens when other countries follow?

John Gruber

# 6:16 am / uklaw, cryptography, uk, john-gruber, law

Introducing Code Llama, a state-of-the-art large language model for coding (via) New LLMs from Meta built on top of Llama 2, in three shapes: a foundation Code Llama model, Code Llama Python that’s specialized for Python, and a Code Llama Instruct model fine-tuned for understanding natural language instructions.

# 5:54 pm / ai, generative-ai, llama, llms, fine-tuning, meta

Release datasette-jellyfish 1.0.2 — Datasette plugin adding SQL functions for fuzzy text matching powered by Jellyfish
Release datasette-jellyfish 2.0 — Datasette plugin adding SQL functions for fuzzy text matching powered by Jellyfish

airoboros LMoE. airoboros provides a system for fine-tuning Large Language Models. The latest release adds support for LMoE—LoRA Mixture of Experts. GPT-4 is strongly rumoured to work as a mixture of experts—several (maybe 8?) 220B models each with a different specialty working together to produce the best result. This is the first open source (Apache 2) implementation of that pattern that I’ve seen.

# 10:31 pm / opensearch, ai, generative-ai, gpt-4, llms, fine-tuning

Aug. 25, 2023

Release llm-anyscale-endpoints 0.2 — LLM plugin for models hosted by Anyscale Endpoints

Would I forbid the teaching (if that is the word) of my stories to computers? Not even if I could. I might as well be King Canute, forbidding the tide to come in. Or a Luddite trying to stop industrial progress by hammering a steam loom to pieces.

Stephen King

# 6:31 pm / llms, ai, ethics, generative-ai, ai-ethics

Aug. 26, 2023

Understanding Immortal Objects in Python 3.12. Abhinav Upadhyay provides a clear and detailed explanation of immortal objects coming in Python 3.12, which ensure Python no longer updates reference counts for immutable objects such as True, False, None and low-values integers. The trick (which maintains ABI compatibility) is pretty simple: a reference count value of 4294967295 now means an object is immortal, and the existing Py_INCREF and Py_DECREF macros have been updated to take that into account.

# 12:08 pm / python

TIL Downloading partial YouTube videos with ffmpeg — I spoke [at WordCamp US 2023](), and wanted to grab a copy of the video of my talk. I always try to keep my own copies of these because I've seen some conferences eventually take these offline in the past.

Aug. 27, 2023

Making Large Language Models work for you

Visit Making Large Language Models work for you

I gave an invited keynote at WordCamp 2023 in National Harbor, Maryland on Friday.

[... 14,189 words]