Simon Willison’s Weblog

Subscribe
Atom feed for datasette Random

1,520 posts tagged “datasette”

Datasette is an open source tool for exploring and publishing data.

2023

MMS Language Coverage in Datasette Lite. I converted the HTML table of 4,021 languages supported by Meta’s new Massively Multilingual Speech models to newline-delimited JSON and loaded it into Datasette Lite. Faceting by Language Family is particularly interesting—the top five families represented are Niger-Congo with 1,019, Austronesian with 609, Sino-Tibetan with 288, Indo-European with 278 and Afro-Asiatic with 222.

# 22nd May 2023, 8:01 pm / facebook, ai, datasette, datasette-lite

Big Opportunities in Small Data

Visit Big Opportunities in Small Data

I gave an invited keynote at Citus Con 2023, the PostgreSQL conference. Below is the abstract, video, slides and links from the presentation.

[... 385 words]

Release datasette 0.64.3 — An open source multi-tool for exploring and publishing data
Release datasette-explain 0.1a2 — Explain and validate SQL queries as you type them into Datasette

Data analysis with SQLite and Python for PyCon 2023

Visit Data analysis with SQLite and Python for PyCon 2023

I’m at PyCon 2023 in Salt Lake City this week.

[... 347 words]

What’s in the RedPajama-Data-1T LLM training set

Visit What's in the RedPajama-Data-1T LLM training set

RedPajama is “a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens”. It’s a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute.

[... 1,077 words]

sqlite-history: tracking changes to SQLite tables using triggers (also weeknotes)

Visit sqlite-history: tracking changes to SQLite tables using triggers (also weeknotes)

In between blogging about ChatGPT rhetoric, micro-benchmarking with ChatGPT Code Interpreter and Why prompt injection is an even bigger problem now I managed to ship the beginnings of a new project: sqlite-history.

[... 1,680 words]

GitHub Accelerator: our first cohort. I’m participating in the first cohort of GitHub’s new open source accelerator program, with Datasette (and related projects). It’s a 10 week program with 20 projects working together “with an end goal of building durable streams of funding for their work”.

# 13th April 2023, 5:28 pm / github, open-source, datasette, personal-news

Weeknotes: A new llm CLI tool, plus automating my weeknotes and newsletter

Visit Weeknotes: A new llm CLI tool, plus automating my weeknotes and newsletter

I started publishing weeknotes in 2019 partly as a way to hold myself accountable but mainly as a way to encourage myself to write more.

[... 830 words]

Release datasette-explain 0.1a1 — Explain and validate SQL queries as you type them into Datasette

Semi-automating a Substack newsletter with an Observable notebook

Visit Semi-automating a Substack newsletter with an Observable notebook

I recently started sending out a weekly-ish email newsletter consisting of content from my blog. I’ve mostly automated that, using an Observable Notebook to generate the HTML. Here’s how that system works.

[... 2,520 words]

Release datasette-no-truncate 0.1 — Tiny Datasette plugin to disable text truncation in table displays

I built a ChatGPT plugin to answer questions about data hosted in Datasette

Visit I built a ChatGPT plugin to answer questions about data hosted in Datasette

Yesterday OpenAI announced support for ChatGPT plugins. It’s now possible to teach ChatGPT how to make calls out to external APIs and use the responses to help generate further answers in the current conversation.

[... 1,801 words]

Release datasette-chatgpt-plugin 0.1 — A Datasette plugin that turns a Datasette instance into a ChatGPT plugin
Release datasette-graphql 2.2 — Datasette plugin providing an automatic GraphQL API for your SQLite databases

Weeknotes: AI won’t slow down, a new newsletter and a huge Datasette refactor

I’m a few weeks behind on my weeknotes, but it’s not through lack of attention to my blog. AI just keeps getting weirder and more interesting.

[... 1,255 words]

Datasette: Gather feedback on new ?_extra= design. I just landed the single biggest backwards-incompatible change to Datasette ever, in preparation for the 1.0 release. It’s a change to the default JSON format from the Datasette API—the new format is much slimmer, and can be expanded using a new ?_extra= query string parameter. I’m desperately keen on getting feedback on this change! This issues has more details and a call for feedback.

# 22nd March 2023, 11:14 pm / json, datasette

Release datasette-atom 0.9 — Datasette plugin that adds a .atom output format
Release datasette-simple-html 0.2 — Datasette SQL functions for very simple HTML operations
Release datasette 0.64.2 — An open source multi-tool for exploring and publishing data
Release datasette-simple-html 0.1 — Datasette SQL functions for very simple HTML operations
Release datasette-app 0.2.3 — The Datasette macOS application

Using Datasette in GitHub Codespaces. A new Datasette tutorial showing how it can be run inside GitHub Codespaces—GitHub’s browser-based development environments—in order to explore and analyze data. I’ve been using Codespaces to run tutorials recently and it’s absolutely fantastic, because it puts every tutorial attendee on a level playing field with respect to their development environments.

# 24th February 2023, 12:40 am / github, tutorials, datasette, github-codespaces

Release datasette-codespaces 0.1.1 — Conveniences for running Datasette on GitHub Codespaces
Release datasette-codespaces 0.1 — Conveniences for running Datasette on GitHub Codespaces

Analytics: Hacker News v.s. a tweet from Elon Musk

My post Bing: “I will not harm you unless you harm me first” really took off.

[... 817 words]

Release datasette-app-support 0.11.8 — Part of https://github.com/simonw/datasette-app
Release datasette-app-support 0.11.7 — Part of https://github.com/simonw/datasette-app

Introducing sqlite-vss: A SQLite Extension for Vector Search (via) This latest SQLite extension from Alex Garcia is possibly his best yet: it adds FAISS-powered vector similarity search directly to SQLite, enabling fast KNN similarity lookups against a virtual table that feels a lot like SQLite’s own built-in full text search feature. This write-up includes interactive demos using Datasette called from an Observable notebook, running similarity searches against an index of 200,000 news headlines and summaries in less than 50ms.

# 10th February 2023, 10:53 pm / sqlite, datasette, observable, alex-garcia, vector-search

Weeknotes: A bunch of things I learned this week, plus datasette-explain

Visit Weeknotes: A bunch of things I learned this week, plus datasette-explain

The Datasette table view refactor, JSON redesign and ?_extra= continues this week, mainly in this ongoing pull request and this tracking issue.

[... 1,528 words]