Simon Willison on sqlite

298 posts tagged “sqlite”

SQLite is the world's most widely deployed database engine.

2020

Things I learned about shapefiles building shapefile-to-sqlite

The latest in my series of x-to-sqlite tools is shapefile-to-sqlite. I learned a whole bunch of things about the ESRI shapefile format while building it.

[... 1,073 words]

5:25 am / 19th February 2020 / shapefiles, sqlite, geojson, projects, spatialite, gis, weeknotes, tom-macwright

geojson-to-sqlite (via) I just put out the first release of geojson-to-sqlite—a CLI tool that can convert GeoJSON files (consisting of a Feature or a set of features in a FeatureCollection) into a table in a SQLite database. If you use the --spatialite option it will initalize the table with SpatiaLite and store the geometries in a spacially indexed geometry field—without that option it stores them as GeoJSON.

# 31st January 2020, 6:40 am / geojson, sqlite, spatialite, projects, gis, cli

Generated Columns in SQLite (via) SQLite 3.31.0 released today, and generated columns are the single most notable new feature. PostgreSQL 12 added these in October 2019, and MySQL has had them since 5.7 in October 2015. MySQL and SQLite both offer either “stored” or “virtual” generated columns, with virtual columns being calculated at runtime. PostgreSQL currently only supports stored columns.

# 24th January 2020, 4:20 am / mysql, sql, postgresql, sqlite

Serving 100µs reads with 100% availability (via) Fascinating use-case for SQLite from Segment: they needed a massively replicated configuration database across all of their instances that process streaming data. They chose to make the configuration available as a ~50GB SQLite database file mirrored to every instance, meaning lookups against that data could complete in microseconds. Changes to the central MySQL configuration store are pulled every 2-3 seconds, resulting in a trade-off of consistency for availability which fits their use-case just fine.

# 10th January 2020, 5:15 am / scaling, sqlite, segment

2019

sqlite-utils 2.0: real upserts

I just released version 2.0 of my sqlite-utils library/CLI tool to PyPI.

[... 1,140 words]

6:48 am / 30th December 2019 / sql, sqlite, museums, upsert, projects, weeknotes, sqlite-utils

athena-sqlite (via) Amazon Athena is the AWS tool for querying data stored in S3—as CSV, JSON or Apache Parquet files—using SQL. It’s an interesting way of buliding a very cheap data warehouse on top of S3 without having to run any additional services. Athena recently added a query federation SDK which lets you define additional custom data sources using Lambda functions. Damon Cortesi used this to write a custom connector for SQLite, which lets you run queries against data stored in SQLite files that you have uploaded to S3. You can then run joins between that data and other Athena sources.

# 18th December 2019, 9:05 am / sqlite, sql, aws, athena, s3

Logging to SQLite using ASGI middleware

I had some fun playing around with ASGI middleware and logging during our flight back to England for the holidays.

[... 2,535 words]

10:30 pm / 16th December 2019 / sqlite, asgi, django, datasette, projects, logging, weeknotes, sqlite-utils

sqlite-transform. I released a new CLI tool today: sqlite-transform, which lets you run “transformations” against a SQLite database. I built it out of frustration of constantly running into CSV files that use horrible American date formatting—the “sqlite-transform parsedatetime my.db mytable col1” command runs dateutil’s parser against those columns and replaces them with a nice, sortable ISO formatted timestamp. I’ve also added a “sqlite-transform lambda” command that lets you specify Python code directly on the command-line that should be used to transform every value in a specified column.

# 4th November 2019, 2:41 am / projects, sqlite, cli

goodreads-to-sqlite (via) This is so cool! Tobias Kunze built a Python CLI tool to import your Goodreads data into a SQLite database, inspired by github-to-sqlite and my various other Dogsheep tools. It’s the first Dogsheep style tool I’ve seen that wasn’t built by me—and Tobias’ write-up includes some neat examples of queries you can run against your Goodreads data. I’ve now started using Goodreads and I’m importing my books into my own private Dogsheep Datasette instance.

# 14th October 2019, 4:07 am / dogsheep, books, datasette, sqlite, cli

Tracking PG&E outages by scraping to a git repo

PG&E have cut off power to several million people in northern California, supposedly as a precaution against wildfires.

[... 868 words]

11:32 pm / 10th October 2019 / projects, scraping, data-journalism, sqlite, datasette, git-scraping, digitalocean, sqlite-utils

SQL Murder Mystery in Datasette (via) “A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a murder that occurred sometime on Jan.15, 2018 and that it took place in SQL City. Start by retrieving the corresponding crime scene report from the police department’s database.”—Really fun game to help exercise your skills with SQL by the NU Knight Lab. I loaded their SQLite database into Datasette so you can play in your browser.

# 7th October 2019, 11:37 pm / datasette, projects, sql, sqlite

Weeknotes: Dogsheep

Having figured out my Stanford schedule, this week I started getting back into the habit of writing some code.

[... 1,367 words]

7:33 pm / 7th October 2019 / projects, dogsheep, sqlite, weeknotes

Weeknotes: Design thinking for journalists, genome-to-sqlite, datasette-atom

I haven’t had much time for code this week: we’ve had a full five day workshop at JSK with Tran Ha (a JSK alumni) learning how to apply Design Thinking to our fellowship projects and generally to challenges facing journalism.

[... 870 words]

8:13 pm / 20th September 2019 / design-thinking, sqlite, reading, genetics, datasette, projects, jsk, weeknotes

genome-to-sqlite. I just found out 23andMe let you export your genome as a zipped TSV file, so I wrote a little Python command-line tool to import it into a SQLite database.

# 19th September 2019, 3:58 pm / projects, datasette, genetics, sqlite

Weeknotes: ONA19, twitter-to-sqlite, datasette-rure

I’ve decided to start writing weeknotes for the duration of my JSK fellowship. Here goes!

[... 919 words]

9:35 pm / 13th September 2019 / regex, sqlite, twitter, datasette, projects, weeknotes

sqlite-utils 1.11. Amjith Ramanujam contributed an excellent new feature to sqlite-utils, which I’ve now released as part of version 1.11. Previously you could enable SQLite full-text-search on a table using the .enable_fts() method (or the “sqlite-utils enable-fts” CLI command) but it wouldn’t reflect future changes to the table—you had to use populate_fts() any time you inserted new records. Thanks to Amjith you can now pass create_triggers=True (or --create-triggers) to cause sqlite-utils to automatically add triggers that keeps the FTS index up-to-date any time a row is inserted, updated or deleted from the table.

# 3rd September 2019, 1:05 am / projects, sqlite, full-text-search, sqlite-utils, cli

Working with many-to-many relationships in sqlite-utils (via) I just released sqlite-utils 1.9 with syntactic sugar support for creating many-to-many relationships for records stored in SQLite databases.

# 4th August 2019, 3:57 am / projects, sqlite, sqlite-utils

healthkit-to-sqlite. Ever since I got an Apple Watch I’ve been itching to get my hands on the step tracking and health data that it’s been collecting for me. I know it’s there in a SQLite database on my wrist, but I couldn’t figure out how to get it! A few days ago I stumbled across the “Export Health Data” button in the iOS Health app, and it turns out it creates a zip file containing XML with a full dump of the data collected by Apple Health. healthkit-to-sqlite is the tool I’ve built that can read that export and use it to create a SQLite database ready to be queried and explored with Datasette. It’s a pretty basic implementation but it’s already giving me access to over 3 million rows of data. Lots of potential here for interesting work with personal analytics.

# 22nd July 2019, 3:34 am / dogsheep, health, datasette, sqlite

db-to-sqlite 1.0 release. I’ve released version 1.0 of my db-to-sqlite tool, which lets you create a SQLite database copy of any database supported by SQLAlchemy (I’ve tested it against MySQL and PostgreSQL). The tool has a bunch of new features: you can use --redact to redact specific columns, specify --table multiple times to copy a subset of tables, and the --all option now efficiently adds all foreign keys at the end of the import. The project now has unit tests which run against MySQL and PostgreSQL in Travis CI. Also included in the README: a shell one-liner for creating a local SQLite copy of a remote Heroku Postgres database based on extracting the connection string from a Heroku config environment variable.

# 1st July 2019, 1:35 am / projects, datasette, sqlite, mysql, postgresql, heroku

Convert Locations.kml (pulled from an iPhone backup) to SQLite. I’ve been playing around with data from my iPhone using the iPhone Backup Extractor app and one of the things it exports for you is a Locations.kml file full of location history data. I wrote a tiny script using Python’s ElementTree XMLPullParser to efficiently iterate through the Placemarks and yield them as dictionaries, which I then batch-inserted into sqlite-utils to create a SQLite database.

# 14th June 2019, 12:45 am / kml, projects, sqlite, sqlite-utils, xml

sqlite-utils 1.0. I just released sqlite-utils 1.0, with a couple of handy new features over 0.14: it can now automatically add columns to a database table if you attempt to insert data which doesn’t quite fit (using alter=True in the Python API or the --alter option to the “sqlite-utils insert” command). It also has the ability to output nested JSON column values on the command-line using the new --json-cols option. This is the first project I’ve marked as a 1.0 release in a very long time—I’ll be sticking to semver for this project from now on, bumping the major version only in the case of a backwards incompatible change.

# 25th May 2019, 1:20 am / projects, versioning, sqlite, sqlite-utils, semver

VisiData (via) Intriguing tool by Saul Pwanson: VisiData is a command-line "textpunk utility" for browsing and manipulating tabular data. pip3 install visidata and then vd myfile.csv (or .json or .xls or SQLite or others) and get an interactive terminal UI for quickly searching through the data, conducting frequency analysis of columns, manipulating it and much more besides. Two tips for if you start playing with it: hit gq to exit, and hit Ctrl+H to view the help screen.

# 18th March 2019, 3:45 am / csv, data-journalism, python, sqlite

huey. Charles Leifer’s “little task queue for Python”. Similar to Celery, but it’s designed to work with Redis, SQLite or in the parent process using background greenlets. Worth checking out for the really neat design. The project is new to me, but it’s been under active development since 2011 and has a very healthy looking rate of releases.

# 25th February 2019, 7:49 pm / sqlite, charles-leifer, python, queues, redis

sqlite-utils: a Python library and CLI tool for building SQLite databases

sqlite-utils is a combination Python library and command-line tool I’ve been building over the past six months which aims to make creating new SQLite databases as quick and easy as possible.

[... 1,237 words]

3:29 am / 25th February 2019 / projects, open-source, datasette, sqlite, sqlite-utils, python, cli

socrata2sql (via) Phenomenal new open source tool released by Andrew Chavez at the Dallas Morning News. Socrata is the open data portal software used by huge numbers of local governments worldwide. socrata2sql is a tool that interacts with the standard Socrata API and can use it to suck down a dataset and save it as a SQLite, PostgreSQL, MySQL or other SQLAlchemy-supported database. I just tried this and it took a single command to create a SQLite database of every police arrest in Dallas in the past five years.

# 8th February 2019, 3:27 pm / datasette, data-journalism, sqlite

db-to-sqlite (via) I just released version 0.2 of a tiny CLI utility I’ve been working on. It builds on top of SQLAlchemy and lets you connect to any SQLAlchemy-supported database and convert the data from it to a local SQLite database file. The new --all option will mirror all available tables (including foreign key relationships), or you can use --sql to save the results of custom SQL queries.

# 8th February 2019, 6:08 am / projects, sqlite, cli

The Datasette Ecosystem. I’ve written a page of documentation that introduces the wider Datasette Ecosystem: csvs-to-sqlite, sqlite-utils, db-to-sqlite, dbf-to-sqlite, markdown-to-sqlite and a full collection of Datasette plugins.

# 1st February 2019, 4:41 am / datasette, sqlite, sqlite-utils

SQLite in 2018: A state of the art SQL dialect (via) In 2018 SQLite gained boolean literals, window functions, filter clauses, upserts and the ability to rename a column. If you want to try it out the latest official datasetteproject/datasette Docker image now bundles SQLite 3.26.

# 15th January 2019, 4:21 pm / datasette, sqlite

Usable Data (via) A Paul Ford essay from February 2016 in which he advocates for SQLite as the ideal format for sharing interesting data. I don’t know how I missed this one—it predates Datasette, but it perfectly captures the benefits that I’m trying to expose with the project. “In my dream universe, there would be a massive searchable torrent site filled with open, explorable data sets, in SQLite format, some with full text search indexes already in place.”

# 11th January 2019, 6:33 pm / sqlite, datasette, paul-ford, open-data

Exploring search relevance algorithms with SQLite

SQLite isn’t just a fast, high quality embedded database: it also incorporates a powerful full-text search engine in the form of the FTS4 and FTS5 extensions. You’ve probably used these a bunch of times already: many iOS, Android and desktop applications use SQLite under-the-hood and use it to implement their built-in search.

[... 1,390 words]

3:29 am / 7th January 2019 / sqlite, search, datasette, projects, full-text-search

«« first « previous page 8 / 10 next » last »»