Simon Willison’s Weblog

Subscribe

Introducing Datasette for Newsrooms. We're introducing a new product suite today called Datasette for Newsrooms - a bundled collection of Datasette Cloud features built specifically for investigative journalists and data teams. We're describing it as an all-in-one data store, search engine, and collaboration platform designed to make working with data in a newsroom easier, faster, and more transparent.

If your newsroom could benefit from a managed version of Datasette we would love to hear from you. We're offering it to nonprofit newsrooms for free for the first year (they can pay us in feedback), and we have a two month trial for everyone else.

Get in touch at hello@datasette.cloud if you'd like to try it out.

One crucial detail: we will help you get started - we'll load data into your instance for you (you get some free data engineering!) and walk you through how to use it, and we will eagerly consume any feedback you have for us and prioritize shipping anything that helps you use the tool. Our unofficial goal: we want someone to win a Pulitzer for investigative reporting where our tool played a tiny part in their reporting process.

Here's an animated GIF demo (taken from our new Newsrooms landing page) of my favorite recent feature: the ability to extract structured data into a table starting with an unstructured PDF, using the latest version of the datasette-extract plugin.

Animated demo. Starts with a PDF file of the San Francisco Planning Commission, which includes a table of data of members and their term ending dates. Switches to a Datasette Cloud with an interface for creating a table - the table is called planning_commission and has Seat Number (integer), Appointing Authority, Seat Holder and Term Ending columns - Term Ending has a hint of YYYY-MM-DD. The PDF is dropped onto the interface and the Extract button is clicked - this causes a loading spinner while the rows are extracted one by one as JSON, then the page refreshes as a table view showing the imported structured data.