Simon Willison’s Weblog

Subscribe

Weeknotes: Datasette Studio and a whole lot of blogging

19th June 2024

I’m still spinning back up after my trip back to the UK, so actual time spent building things has been less than I’d like. I presented an hour long workshop on command-line LLM usage, wrote five full blog entries (since my last weeknotes) and I’ve also been leaning more into short-form link blogging—a lot more prominent on this site now since my homepage redesign last week.

Datasette Studio

I ran a workshop for a data journalism class recently which included having students try running structured data extraction using datasette-extract. I didn’t want to talk them through installing Python etc on their own machines, so I instead took advantage of a project I’ve been tinkering with for a little while called Datasette Studio.

Datasette Studio is actually two things. The first is a distribution of Datasette which bundles the core application along with a selection of plugins that greatly increase its capabilities as a tool for cleaning and analyzing data. You can install that like this:

pipx install datasette-studio

Then run datasette-studio to start the server or datasette-studio install xyz to install additional plugins.

Datasette Studio runs the latest Datasette 1.0 alpha, and will upgrade to 1.0 stable as soon as that is released.

Quoting the pyproject.toml file, the current list of plugins is this:

I plan to grow this list over time. A neat thing about datasette-studio is that the entire application is defined by a single pyproject.toml that lists those dependecies and sets up the datasette-studio CLI console script, which is then published to PyPI.

The second part of Datasette Studio is a GitHub repository that’s designed to help run it in GitHub Codespaces, with a very pleasing URL:

https://github.com/datasette/studio

Visit that page, click the green “Code” button and click “Create codespace on main” to launch a virtual machine running in GitHub’s Azure environment, preconfigured to launch a private instance of Datasette as soon as the Codespace has started running.

Screenshot of the GitHub Codespaces UI running Datasette Studio

You can then start using it directly—uploading CSVs or JSON data, or even set your own OpenAI key (using the “Manage secrets” menu item) to enable OpenAI features such as GPT enrichments and structured data extraction.

I’m still fleshing out the idea, but I really like this as a starting point for a completely free Datasette trial environment that’s entirely hosted (and paid for) by Microsoft/GitHub!

More blog improvements

In addition to the redesign of the homepage—moving my linkblog and quotations out of the sidebar and into the main content, at least on desktop—I’ve made a couple of other tweaks.

Blog entries

Releases

TILs