Simon Willison’s Weblog

Subscribe
Atom feed for datasette Random

1,521 posts tagged “datasette”

Datasette is an open source tool for exploring and publishing data.

2024

Release datasette-python 0.1 — Run a Python interpreter in the Datasette virtual environment

UK Parliament election results, now with Datasette. The House of Commons Library maintains a website of UK parliamentary election results data, currently listing 2010 through 2019 and with 2024 results coming soon.

The site itself is a Rails and PostgreSQL app, but I was delighted to learn today that they're also running a Datasette instance with the election results data, linked to from their homepage!

The data this website uses is available to query. as a Datasette endpoint. The database schema is published for reference. Mobile Safari screenshot on electionresults.parliament.uk

The raw data is also available as CSV files in their GitHub repository. Here's their Datasette configuration, which includes a copy of their SQLite database.

# 5th July 2024, 11:36 pm / elections, sqlite, datasette

Weeknotes: a livestream, a surprise keynote and progress on Datasette Cloud billing

Visit Weeknotes: a livestream, a surprise keynote and progress on Datasette Cloud billing

My first YouTube livestream with Val Town, a keynote at the AI Engineer World’s Fair and some work integrating Stripe with Datasette Cloud. Plus a bunch of upgrades to my blog.

[... 1,124 words]

Datasette 0.64.8. A very small Datasette release, fixing a minor potential security issue where the name of missing databases or tables was reflected on the 404 page in a way that could allow an attacker to present arbitrary text to a user who followed a link. Not an XSS attack (no code could be executed) but still a potential vector for confusing messages.

# 21st June 2024, 11:48 pm / projects, releases, security, datasette

Release datasette 0.64.8 — An open source multi-tool for exploring and publishing data

Building search-based RAG using Claude, Datasette and Val Town

Visit Building search-based RAG using Claude, Datasette and Val Town

Retrieval Augmented Generation (RAG) is a technique for adding extra “knowledge” to systems built on LLMs, allowing them to answer questions against custom information not included in their training data. A common way to implement this is to take a question from a user, translate that into a set of search queries, run those against a search engine and then feed the results back into the LLM to generate an answer.

[... 3,372 words]

Civic Band. Exciting new civic tech project from Philip James: 30 (and counting) Datasette instances serving full-text search enabled collections of OCRd meeting minutes for different civic governments. Includes 20,000 pages for Alameda, 17,000 for Pittsburgh, 3,567 for Baltimore and an enormous 117,000 for Maui County.

Philip includes some notes on how they're doing it. They gather PDF minute notes from anywhere that provides API access to them, then run local Tesseract for OCR (the cost of cloud-based OCR proving prohibitive given the volume of data). The collection is then deployed to a single VPS running multiple instances of Datasette via Caddy, one instance for each of the covered regions.

# 19th June 2024, 9:30 pm / data-journalism, ocr, tesseract, datasette

Weeknotes: Datasette Studio and a whole lot of blogging

Visit Weeknotes: Datasette Studio and a whole lot of blogging

I’m still spinning back up after my trip back to the UK, so actual time spent building things has been less than I’d like. I presented an hour long workshop on command-line LLM usage, wrote five full blog entries (since my last weeknotes) and I’ve also been leaning more into short-form link blogging—a lot more prominent on this site now since my homepage redesign last week.

[... 736 words]

Release datasette-faiss 0.2.1 — Maintain a FAISS index for specified Datasette tables

Language models on the command-line

Visit Language models on the command-line

I gave a talk about accessing Large Language Models from the command-line last week as part of the Mastering LLMs: A Conference For Developers & Data Scientists six week long online conference. The talk focused on my LLM Python command-line utility and ways you can use it (and its plugins) to explore LLMs and use them for useful tasks.

[... 4,992 words]

Release datasette-cluster-map 0.18.2 — Datasette plugin that shows a map for any data with latitude/longitude columns
Release datasette 0.64.7 — An open source multi-tool for exploring and publishing data

Datasette 0.64.7. A very minor dot-fix release for Datasette stable, addressing this bug where Datasette running against the latest version of SQLite - 3.46.0 - threw an error on canned queries that included :named parameters in their SQL.

The root cause was Datasette using a now invalid clever trick I came up with against the undocumented and unstable opcodes returned by a SQLite EXPLAIN query.

I asked on the SQLite forum and learned that the feature I was using was removed in this commit to SQLite. D. Richard Hipp explains:

The P4 parameter to OP_Variable was not being used for anything. By omitting it, we make the prepared statement slightly smaller, reduce the size of the SQLite library by a few bytes, and help sqlite3_prepare() and similar run slightly faster.

# 12th June 2024, 10:55 pm / projects, sqlite, datasette, annotated-release-notes, d-richard-hipp

Release datasette-studio 0.1a4 — Datasette pre-configured with useful plugins. Experimental alpha.
Release datasette-permissions-metadata 0.1 — Configure permissions for Datasette 0.x in metadata.json
Release datasette-enrichments-gpt 0.5 — Datasette enrichment for analyzing row data using OpenAI's GPT models
Release datasette-extract 0.1a7 — Import unstructured data (text and images) into structured tables

Ham radio general exam question pool as JSON. I scraped a pass of my Ham radio general exam this morning. One of the tools I used to help me pass was a Datasette instance with all 429 questions from the official question pool. I've published that raw data as JSON on GitHub, which I converted from the official question pool document using an Observable notebook.

Relevant TIL: How I studied for my Ham radio general exam.

# 11th May 2024, 7:16 pm / json, projects, radio, datasette, observable, ham-radio

datasette-pins — a new Datasette plugin for pinning tables and queries. Alex Garcia built this plugin for Datasette Cloud, and as with almost every Datasette Cloud features we're releasing it as an open source package as well.

datasette-pins allows users with the right permission to "pin" tables, databases and queries to their homepage. It's a lightweight way to customize that homepage, especially useful as your Datasette instance grows to host dozens or even hundreds of tables.

# 9th May 2024, 6:29 pm / plugins, datasette, datasette-cloud, alex-garcia

Weeknotes: more datasette-secrets, plus a mystery video project

Visit Weeknotes: more datasette-secrets, plus a mystery video project

I introduced datasette-secrets two weeks ago. The core idea is to provide a way for end-users to store secrets such as API keys in Datasette, allowing other plugins to access them.

[... 982 words]

Release datasette-upload-dbs 0.3.2 — Upload SQLite database files to Datasette
Release datasette-enrichments 0.4.2 — Tools for running enrichments against data stored in Datasette
Release datasette-enrichments 0.4.1 — Tools for running enrichments against data stored in Datasette
Release datasette-enrichments 0.4 — Tools for running enrichments against data stored in Datasette
Release datasette-secrets 0.2 — Manage secrets such as API keys for use with other Datasette plugins
Release datasette-test 0.3.2 — Utilities to help write tests for Datasette plugins and applications
Release datasette-test 0.3.1 — Utilities to help write tests for Datasette plugins and applications
Release datasette-test 0.3 - release yanked — Utilities to help write tests for Datasette plugins and applications

Food Delivery Leak Unmasks Russian Security Agents. This story is from April 2022 but I realize now I never linked to it.

Yandex Food, a popular food delivery service in Russia, suffered a major data leak.

The data included an order history with names, addresses and phone numbers of people who had placed food orders through that service.

Bellingcat were able to cross-reference this leak with addresses of Russian security service buildings—including those linked to the GRU and FSB.This allowed them to identify the names and phone numbers of people working for those organizations, and then combine that information with further leaked data as part of their other investigations.

If you look closely at the screenshots in this story they may look familiar: Bellingcat were using Datasette internally as a tool for exploring this data!

# 26th April 2024, 1:59 am / data-journalism, datasette, bellingcat