1,520 posts tagged “datasette”
Datasette is an open source tool for exploring and publishing data.
2023
MMS Language Coverage in Datasette Lite. I converted the HTML table of 4,021 languages supported by Meta’s new Massively Multilingual Speech models to newline-delimited JSON and loaded it into Datasette Lite. Faceting by Language Family is particularly interesting—the top five families represented are Niger-Congo with 1,019, Austronesian with 609, Sino-Tibetan with 288, Indo-European with 278 and Afro-Asiatic with 222.
Big Opportunities in Small Data
I gave an invited keynote at Citus Con 2023, the PostgreSQL conference. Below is the abstract, video, slides and links from the presentation.
[... 385 words]Data analysis with SQLite and Python for PyCon 2023
I’m at PyCon 2023 in Salt Lake City this week.
[... 347 words]What’s in the RedPajama-Data-1T LLM training set
RedPajama is “a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens”. It’s a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute.
[... 1,077 words]sqlite-history: tracking changes to SQLite tables using triggers (also weeknotes)
In between blogging about ChatGPT rhetoric, micro-benchmarking with ChatGPT Code Interpreter and Why prompt injection is an even bigger problem now I managed to ship the beginnings of a new project: sqlite-history.
[... 1,680 words]GitHub Accelerator: our first cohort. I’m participating in the first cohort of GitHub’s new open source accelerator program, with Datasette (and related projects). It’s a 10 week program with 20 projects working together “with an end goal of building durable streams of funding for their work”.
Weeknotes: A new llm CLI tool, plus automating my weeknotes and newsletter
I started publishing weeknotes in 2019 partly as a way to hold myself accountable but mainly as a way to encourage myself to write more.
[... 830 words]Semi-automating a Substack newsletter with an Observable notebook
I recently started sending out a weekly-ish email newsletter consisting of content from my blog. I’ve mostly automated that, using an Observable Notebook to generate the HTML. Here’s how that system works.
[... 2,520 words]I built a ChatGPT plugin to answer questions about data hosted in Datasette
Yesterday OpenAI announced support for ChatGPT plugins. It’s now possible to teach ChatGPT how to make calls out to external APIs and use the responses to help generate further answers in the current conversation.
[... 1,801 words]Weeknotes: AI won’t slow down, a new newsletter and a huge Datasette refactor
I’m a few weeks behind on my weeknotes, but it’s not through lack of attention to my blog. AI just keeps getting weirder and more interesting.
[... 1,255 words]Datasette: Gather feedback on new ?_extra= design. I just landed the single biggest backwards-incompatible change to Datasette ever, in preparation for the 1.0 release. It’s a change to the default JSON format from the Datasette API—the new format is much slimmer, and can be expanded using a new ?_extra= query string parameter. I’m desperately keen on getting feedback on this change! This issues has more details and a call for feedback.
Using Datasette in GitHub Codespaces. A new Datasette tutorial showing how it can be run inside GitHub Codespaces—GitHub’s browser-based development environments—in order to explore and analyze data. I’ve been using Codespaces to run tutorials recently and it’s absolutely fantastic, because it puts every tutorial attendee on a level playing field with respect to their development environments.
Analytics: Hacker News v.s. a tweet from Elon Musk
My post Bing: “I will not harm you unless you harm me first” really took off.
[... 817 words]Introducing sqlite-vss: A SQLite Extension for Vector Search (via) This latest SQLite extension from Alex Garcia is possibly his best yet: it adds FAISS-powered vector similarity search directly to SQLite, enabling fast KNN similarity lookups against a virtual table that feels a lot like SQLite’s own built-in full text search feature. This write-up includes interactive demos using Datasette called from an Observable notebook, running similarity searches against an index of 200,000 news headlines and summaries in less than 50ms.
Weeknotes: A bunch of things I learned this week, plus datasette-explain
The Datasette table view refactor, JSON redesign and ?_extra= continues this week, mainly in this ongoing pull request and this tracking issue.







