Weeknotes: Datasette Cloud preview invitations
30th September 2022
This week I finally started sending out invitations for people to try out the preview of the new Datasette Cloud, my SaaS offering for Datasette.
The preview release includes the following features:
- Create a new private instance of Datasette, in the geographic region of your choice
- Add data to your instance by uploading CSV files, importing CSV data by URL or importing data from a Socrata open government data portal
- Build a search engine against selected columns from your data
- Invite members of your team to collaborate on your instance
- Visualize your data on a map, or as bar charts or line charts
You can request preview access here—or come talk to me about it on the Datasette Discord.
I’m certain I haven’t built the right product yet, so feedback is incredibly valuable to me right now!
The two most important upcoming features are API access (with API keys) and the ability to publish data—right now the tool is entirely private, but publishing structured data is a big part of Datasette’s core DNA and something I’m certain people will want to be able to do with the hosted version.
Other projects
I’ve already written a lot about my other projects this week.
I came second place in the Bellingcat Hackathon with Action Transcription, A tool to run caption extraction against online videos using Whisper and GitHub Issues/Actions.
Meta AI released a new paper describing Make-A-Video, a text-to-video model. I dug into the training data using Datasette—see Exploring 10m scraped Shutterstock videos used to train Meta’s Make-A-Video text-to-video model—and found that one of the main academic datasets behind the model was entirely scraped from Shutterstock.
Andy Baio noted that this was another example of a commercial AI research team building on a dataset gathered in academia. He calls that “AI Data Laundering”, and wrote about it in AI Data Laundering: How Academic and Nonprofit Researchers Shield Tech Companies from Accountability.
I’ve continued to think about Prompt Injection, the security attack against software built on large language models that starts with “Ignore previous instructions and ...”. I wrote two more pieces about that:
- I don’t know how to solve prompt injection talks about how this is a security vulnerability which I don’t know of any good mitigations for!
- You can’t solve AI security problems with more AI puts forward my argument that attempting to solve an AI security problem by layering on even more AI feels doomed to fail, because black-box unpredictable AI models do not offer the certainty and guarantees that I want from a security solution.
I also pushed out a new Datasette alpha with some small changes that have accumulated over the past month.
Releases this week
-
datasette-publish-fly: 1.2—(8 releases total)—2022-09-29
Datasette plugin for publishing data using Fly -
datasette-pretty-json: 0.2.2—(2 releases total)—2022-09-28
Datasette plugin that pretty-prints any column values that are valid JSON objects or arrays -
datasette: 0.63a0—(114 releases total)—2022-09-26
An open source multi-tool for exploring and publishing data -
ttml-to-json: 0.2—(2 releases total)—2022-09-25
Convert TTML to JSON -
webvtt-to-json: 0.2—(2 releases total)—2022-09-25
Convert WebVTT to JSON, optionally removing duplicate lines -
image-diff: 0.2.2—(4 releases total)—2022-09-19
CLI tool for comparing images -
datasette-sandstorm-support: 0.2—(2 releases total)—2022-09-16
Authentication and permissions for Datasette on Sandstorm
TIL this week
- Returning related rows in a single SQL query using JSON
- Using DuckDB in Python to access Parquet data
- Deploying Python web apps as AWS Lambda functions
- Whisky sour
- Ensure labels exist in a GitHub repository
- Athena error: The specified key does not exist
- HTML video that loads when the user clicks play
- GraphQL fragments
More recent articles
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024
- Visualizing local election results with Datasette, Observable and MapLibre GL - 9th November 2024
- Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5 - 7th November 2024