Simon Willison’s Weblog

Subscribe

February 2024

111 posts: 4 entries, 62 links, 13 quotes, 32 beats

Feb. 24, 2024

Upside down table trick with CSS (via) I was complaining how hard it is to build a horizontally scrollable table with a scrollbar at the top rather than the bottom and RGBCube on Lobste.rs suggested rotating the container 180 degrees and then the table contents and headers 180 back again... and it totally works! Demo in this CodePen.

# 9 pm / css

Feb. 25, 2024

Release dclient 0.3 — A client CLI utility for Datasette instances

dclient 0.3. dclient is my CLI utility for working with remote Datasette instances—in particular for authenticating with them and then running both read-only SQL queries and inserting data using the new Datasette write JSON API. I just picked up work on the project again after a six month gap—the insert command can now be used to constantly stream data directly to hosted Datasette instances such as Datasette Cloud.

# 8:06 pm / cli, projects, datasette, datasette-cloud

Feb. 26, 2024

Release llm-mistral 0.3 — LLM plugin providing access to Mistral models using the Mistral API

Mistral Large. Mistral Medium only came out two months ago, and now it's followed by Mistral Large. Like Medium, this new model is currently only available via their API. It scores well on benchmarks (though not quite as well as GPT-4) but the really exciting feature is function support, clearly based on OpenAI's own function design.

Functions are now supported via the Mistral API for both Mistral Large and the new Mistral Small, described as follows:

Mistral Small, optimised for latency and cost. Mistral Small outperforms Mixtral 8x7B and has lower latency, which makes it a refined intermediary solution between our open-weight offering and our flagship model.

# 11:23 pm / ai, generative-ai, llms, mistral, llm-tool-use, llm-release

Feb. 27, 2024

TIL Tracking SQLite table history using a JSON audit log — I continue to collect ways of tracking the history of a table of data stored in SQLite - see [sqlite-history](https://simonwillison.net/2023/Apr/15/sqlite-history/) for previous experiments.

Weeknotes: Getting ready for NICAR

Next week is NICAR 2024 in Baltimore—the annual data journalism conference hosted by Investigative Reporters and Editors. I’m running a workshop on Datasette, and I plan to spend most of my time in the hallway track talking to people about Datasette, Datasette Cloud and how the Datasette ecosystem can best help support their work.

[... 1,390 words]

Release datasette-write 0.3 — Datasette plugin providing a UI for executing SQL writes against the database

All you need is Wide Events, not “Metrics, Logs and Traces” (via) I’ve heard great things about Meta’s internal observability platform Scuba, here’s an explanation from ex-Meta engineer Ivan Burmistrov describing the value it provides and comparing it to the widely used OpenTelemetry stack.

# 10:57 pm / facebook, observability

The Zen of Python, Unix, and LLMs with Simon Willison (via) I’m participating in a live online fireside chat with Hugo Bowne-Anderson tomorrow afternoon (3pm Pacific / 6pm Eastern / 11pm GMT) talking about LLMs, Datasette, my open source process, applying the Unix pipes philosophy to LLMs and a whole lot more. It’s free to register.

# 11:11 pm / speaking

Feb. 28, 2024

Testcontainers (via) Not sure how I missed this: Testcontainers is a family of testing libraries (for Python, Go, JavaScript, Ruby, Rust and a bunch more) that make it trivial to spin up a service such as PostgreSQL or Redis in a container for the duration of your tests and then spin it back down again.

The Python example code is delightful:

redis = DockerContainer("redis:5.0.3-alpine").with_exposed_ports(6379)
redis.start()
wait_for_logs(redis, "Ready to accept connections")

I much prefer integration-style tests over unit tests, and I like to make sure any of my projects that depend on PostgreSQL or similar can run their tests against a real running instance. I've invested heavily in spinning up Varnish or Elasticsearch ephemeral instances in the past - Testcontainers look like they could save me a lot of time.

The open source project started in 2015, span off a company called AtomicJar in 2021 and was acquired by Docker in December 2023.

# 2:41 am / redis, testing, docker

For the last few years, Meta has had a team of attorneys dedicated to policing unauthorized forms of scraping and data collection on Meta platforms. The decision not to further pursue these claims seems as close to waving the white flag as you can get against these kinds of companies. But why? [...]

In short, I think Meta cares more about access to large volumes of data and AI than it does about outsiders scraping their public data now. My hunch is that they know that any success in anti-scraping cases can be thrown back at them in their own attempts to build AI training databases and LLMs. And they care more about the latter than the former.

Kieran McCarthy

# 3:15 pm / facebook, scraping, ai, llms, training-data

Release datasette-explain 0.2 — Explain and validate SQL queries as you type them into Datasette
Release datasette-explain 0.2.1 — Explain and validate SQL queries as you type them into Datasette

Feb. 29, 2024

Release datasette-scale-to-zero 0.3 — Quit Datasette if it has not received traffic for a specified time period

The Zen of Python, Unix, and LLMs. Here’s the YouTube recording of my 1.5 hour conversation with Hugo Bowne-Anderson yesterday.

I fed a Whisper transcript to Google Gemini Pro 1.5 and asked it for the themes from our conversation, and it said we talked about “Python’s success and versatility, the rise and potential of LLMs, data sharing and ethics in the age of LLMs, Unix philosophy and its influence on software development and the future of programming and human-computer interaction”.

# 9:04 pm / python, speaking, my-talks, ai, whisper, llms, gemini

Release datasette-scale-to-zero 0.3.1 — Quit Datasette if it has not received traffic for a specified time period

GGUF, the long way around (via) Vicki Boykis dives deep into the GGUF format used by llama.cpp, after starting with a detailed description of how PyTorch models work and how they are traditionally persisted using Python pickle.

Pickle lead to safetensors, a format that avoided the security problems with downloading and running untrusted pickle files.

Llama.cpp introduced GGML, which popularized 16-bit (as opposed to 32-bit) quantization and bundled metadata and tensor data in a single file.

GGUF fixed some design flaws in GGML and is the default format used by Llama.cpp today.

# 9:39 pm / ai, pytorch, generative-ai, llama, llms, vicki-boykis, llama-cpp

Release datasette 1.0a12 — An open source multi-tool for exploring and publishing data
Release datasette-studio 0.1a1 — Datasette pre-configured with useful plugins. Experimental alpha.

Datasette 1.0a12. Another alpha release, this time with a new query_actions() plugin hook, a new design for the table, database and query actions menus, a “does not contain” table filter and a fix for a minor bug with the JavaScript makeColumnActions() plugin mechanism.

# 11:56 pm / projects, datasette