Simon Willison’s Weblog

Subscribe
Atom feed for projects

492 posts tagged “projects”

Posts about projects I have worked on.

2020

xml-analyser. In building evernote-to-sqlite I dusted off an ancient (2009) project I built that scans through an XML file and provides a summary of what elements are present in the document and how they relate to each other. I’ve now packaged it up as a CLI app and published it on PyPI.

# 12th October 2020, 12:41 am / projects, xml, cli

evernote-to-sqlite (via) The latest tool in my Dogsheep series of utilities for personal analytics: evernote-to-sqlite takes Evernote note exports en their ENEX XML format and loads them into a SQLite database. Embedded images are loaded into a BLOB column and the output of their cloud-based OCR system is added to a full-text search index. Notes have a latitude and longitude which means you can visualize your notes on a map using Datasette and datasette-cluster-map.

# 12th October 2020, 12:38 am / dogsheep, projects, datasette, sqlite

Datasette Weekly: Datasette 0.50, git scraping, extracting columns (via) The first edition of the new Datasette Weekly newsletter—covering Datasette 0.50, Git scraping, extracting columns with sqlite-utils and featuring datasette-graphql as the first “plugin of the week”

# 10th October 2020, 9 pm / sqlite, datasette, projects, graphql, email, sqlite-utils, git-scraping

Datasette Weekly (via) I’m trying something new: I’ve decided to start an email newsletter called the Datasette Weekly (I’m already worried I’ll regret that weekly promise) which will share news about Datasette and the Datasette ecosystem, plus tips and tricks for getting the most out of Datasette and SQLite.

# 10th October 2020, 7:05 pm / projects, datasette, email

Datasette 0.50: The annotated release notes

Visit Datasette 0.50: The annotated release notes

I released Datasette 0.50 this morning, with a new user-facing column actions menu feature and a way for plugins to make internal HTTP requests to consume the JSON API of their parent Datasette instance.

[... 792 words]

Git scraping: track changes over time by scraping to a Git repository

Visit Git scraping: track changes over time by scraping to a Git repository

Git scraping is the name I’ve given a scraping technique that I’ve been experimenting with for a few years now. It’s really effective, and more people should use it.

[... 963 words]

Weeknotes: Datasette column actions, plus three new plugins

Visit Weeknotes: Datasette column actions, plus three new plugins

A renewed emphasis on building out Datasette Cloud has produced three new plugins this week: datasette-dateutil, datasette-import-table and datasette-edit-schema, plus a major improvement to Datasette’s default interface for browsing tables.

[... 1,093 words]

datasette-dateutil (via) New Datasette plugin exposing date/time parsing custom SQL functions powered by the classic dateutil Python library.

# 28th September 2020, 12:33 am / dateutil, projects, datasette, plugins

Weeknotes: software carpentry, compiling modules for SQLite

Visit Weeknotes: software carpentry, compiling modules for SQLite

This week I completed the Software Carpentry instructor training course, added two foundational features to sqlite-utils and learned how to compile modules for SQLite.

[... 805 words]

Refactoring databases with sqlite-utils extract

Visit Refactoring databases with sqlite-utils extract

Yesterday I described the new sqlite-utils transform mechanism for applying SQLite table transformations that go beyond those supported by ALTER TABLE. The other new feature in sqlite-utils 2.20 builds on that capability to allow you to refactor a database table by extracting columns into separate tables. I’ve called it sqlite-utils extract.

[... 1,345 words]

Executing advanced ALTER TABLE operations in SQLite

Visit Executing advanced ALTER TABLE operations in SQLite

SQLite’s ALTER TABLE has some significant limitations: it can’t drop columns (UPDATE: that was fixed in SQLite 3.35.0 in March 2021), it can’t alter NOT NULL status, it can’t change column types. Since I spend a lot of time with SQLite these days I’ve written some code to fix this—both from Python and as a command-line utility.

[... 689 words]

Weeknotes: datasette-seaborn, fivethirtyeight-polls

Visit Weeknotes: datasette-seaborn, fivethirtyeight-polls

This week I released Datasette 0.49 and tinkered with datasette-seaborn, dogsheep-beta and polling data from FiveThirtyEight.

[... 951 words]

Datasette 0.49: The annotated release notes

Visit Datasette 0.49: The annotated release notes

Datasette 0.49 is out. Some notes on what’s new.

[... 1,234 words]

Weeknotes: airtable-export, generating screenshots in GitHub Actions, Dogsheep!

This week I figured out how to populate Datasette from Airtable, wrote code to generate social media preview card page screenshots using Puppeteer, and made a big breakthrough with my Dogsheep project.

[... 1,461 words]

Render Markdown tool (via) I wrote a quick JavaScript tool for rendering Markdown via the GitHub Markdown API—which includes all of their clever extensions like tables and syntax highlighting—and then stripping out some extraneous HTML to give me back the format I like using for my blog posts.

# 3rd September 2020, 12:08 am / projects, markdown, javascript, github

airtable-export. I wrote a command-line utility for exporting data from Airtable and dumping it to disk as YAML, JSON or newline delimited JSON files. This means you can backup an Airtable database from a GitHub Action and get a commit history of changes made to your data.

# 29th August 2020, 9:48 pm / projects, json, airtable, yaml

Weeknotes: California Protected Areas in Datasette

Visit Weeknotes: California Protected Areas in Datasette

This week I built a geospatial search engine for protected areas in California, shipped datasette-graphql 1.0 and started working towards the next milestone for Datasette Cloud.

[... 1,099 words]

California Protected Areas Database in Datasette (via) I built this yesterday: it’s a Datasette interface on top of the CPAD 2020 GIS database of protected areas in California maintained by GreenInfo Network. This was a useful excuse to build a GitHub Actions flow that builds a SpatiaLite database using my shapefile-to-sqlite tool, and I fixed a few bugs in my datasette-leaflet-geojson plugin as well.

# 21st August 2020, 11:15 pm / shapefiles, github-actions, datasette, projects, california, spatialite, gis

Weeknotes: Rocky Beaches, Datasette 0.48, a commit history of my database

Visit Weeknotes: Rocky Beaches, Datasette 0.48, a commit history of my database

This week I helped Natalie launch Rocky Beaches, shipped Datasette 0.48 and several releases of datasette-graphql, upgraded the CSRF protection for datasette-upload-csvs and figured out how to get a commit log of changes to my blog by backing up its database to a GitHub repository.

[... 1,294 words]

Weeknotes: Installing Datasette with Homebrew, more GraphQL, WAL in SQLite

Visit Weeknotes: Installing Datasette with Homebrew, more GraphQL, WAL in SQLite

This week I’ve been working on making Datasette easier to install, plus wide-ranging improvements to the Datasette GraphQL plugin.

[... 1,009 words]

Datasette 0.46 (via) I just released Datasette 0.46 with a security fix for an issue involving CSRF tokens on canned query pages, plus a new debugging tool, improved file downloads and a bunch of other smaller improvements.

# 9th August 2020, 4:57 pm / projects, security, datasette

GraphQL in Datasette with the new datasette-graphql plugin

Visit GraphQL in Datasette with the new datasette-graphql plugin

This week I’ve mostly been building datasette-graphql, a plugin that adds GraphQL query support to Datasette.

[... 1,249 words]

sqlite-utils 2.14 (via) I finally figured out porter stemming with SQLite full-text search today—it turns out it’s as easy as adding tokenize=’porter’ to the CREATE VIRTUAL TABLE statement. So I just shipped sqlite-utils 2.14 with a tokenize= option (plus the ability to insert binary file data from stdin).

# 1st August 2020, 9:19 pm / projects, search, sqlite, full-text-search, sqlite-utils

Fun with binary data and SQLite

This week I’ve been mainly experimenting with binary data storage in SQLite. sqlite-utils can now insert data from binary files, and datasette-media can serve content over HTTP that originated as binary BLOBs in a database file.

[... 957 words]

datasette-media 0.4. datasette-media is my Datasette plugin for serving media (e.g. images) directly from Datasette. The first version used file paths saved in a column and served the data from disk—this new version adds the ability to serve content from BLOB columns, such as those created by the new “sqlite-utils insert-files” command. It also adds configurable support for resizing images based on querystring parameters like ?w=100.

# 28th July 2020, 2:22 am / projects, datasette, plugins, images

sqlite-utils 2.12 (via) I’ve been experimenting with ways of improving BLOB support in Datasette and sqlite-utils. This new version of sqlite-utils includes a “sqlite-utils insert-files” command, which can recursively crawl directories for files and add their contents to SQLite with configurable columns containing their metadata.

I was inspired by Paul Ford who has been creating multi-GB SQLite databases of images and PDFs. It turns out that when disk space is cheap this is a pretty effective way of working with interesting corpuses of documents and images.

# 27th July 2020, 7:36 am / projects, sqlite, sqlite-utils

pypi-rename. I wanted to rename a PyPI package (renaming datasette-insert-api to datasette-insert as it’s about to grow some non-API features). PyPI recommend uploading a final release under the old name which points to (and depends on) the new name. I’ve built a cookiecutter template to codify that pattern.

# 25th July 2020, 11:07 pm / cookiecutter, projects, pypi

Weeknotes: datasette-copyable, datasette-insert-api

Visit Weeknotes: datasette-copyable, datasette-insert-api

Two new Datasette plugins this week: datasette-copyable, helping users copy-and-paste data from Datasette into other places, and datasette-insert-api, providing a JSON API for inserting and updating data and creating tables.

[... 953 words]

Weeknotes: datasette-auth-passwords, a Datasette logo and a whole lot more

All sorts of project updates this week.

[... 913 words]

datasette-auth-passwords. My latest plugin: datasette-auth-passwords provides a mechanism for signing into Datasette using a username and password (which is verified in order to set a ds_actor authentication cookie). So far it only supports passwords that are hard-coded into Datasette’s configuration via environment variables, but I plan to add database-backed user accounts in the future.

# 13th July 2020, 11:39 pm / passwords, datasette, plugins, projects, authentication