286 items tagged “sqlite”
SQLite is the world's most widely deployed database engine.
2020
Weeknotes: software carpentry, compiling modules for SQLite
This week I completed the Software Carpentry instructor training course, added two foundational features to sqlite-utils
and learned how to compile modules for SQLite.
Refactoring databases with sqlite-utils extract
Yesterday I described the new sqlite-utils transform mechanism for applying SQLite table transformations that go beyond those supported by ALTER TABLE
. The other new feature in sqlite-utils 2.20 builds on that capability to allow you to refactor a database table by extracting columns into separate tables. I’ve called it sqlite-utils extract.
Executing advanced ALTER TABLE operations in SQLite
SQLite’s ALTER TABLE has some significant limitations: it can’t drop columns (UPDATE: that was fixed in SQLite 3.35.0 in March 2021), it can’t alter NOT NULL status, it can’t change column types. Since I spend a lot of time with SQLite these days I’ve written some code to fix this—both from Python and as a command-line utility.
[... 689 words]DuckDB (via) This is a really interesting, relatively new database. It’s kind of a weird hybrid between SQLite and PostgreSQL: it uses the PostgreSQL parser but models itself after SQLite in that databases are a single file and the code is designed for use as an embedded library, distributed in a single amalgamation C++ file (SQLite uses a C amalgamation). It features a “columnar-vectorized query execution engine” inspired by MonetDB (also by the DuckDB authors) and is hence designed to run analytical queries really quickly. You can install it using “pip install duckdb”—the resulting module feels similar to Python’s sqlite3, and follows roughly the same DBAPI pattern.
Weeknotes: datasette-dump, sqlite-backup, talks
I spent some time this week digging into Python’s sqlite3 internals. I also gave two talks and recorded a third, due to air at PyGotham in October.
[... 928 words]Weeknotes: Installing Datasette with Homebrew, more GraphQL, WAL in SQLite
This week I’ve been working on making Datasette easier to install, plus wide-ranging improvements to the Datasette GraphQL plugin.
[... 1,009 words]GraphQL in Datasette with the new datasette-graphql plugin
This week I’ve mostly been building datasette-graphql, a plugin that adds GraphQL query support to Datasette.
[... 1,249 words]sqlite-utils 2.14 (via) I finally figured out porter stemming with SQLite full-text search today—it turns out it’s as easy as adding tokenize=’porter’ to the CREATE VIRTUAL TABLE statement. So I just shipped sqlite-utils 2.14 with a tokenize= option (plus the ability to insert binary file data from stdin).
Fun with binary data and SQLite
This week I’ve been mainly experimenting with binary data storage in SQLite. sqlite-utils can now insert data from binary files, and datasette-media can serve content over HTTP that originated as binary BLOBs in a database file.
[... 957 words]sqlite-utils 2.12 (via) I’ve been experimenting with ways of improving BLOB support in Datasette and sqlite-utils. This new version of sqlite-utils includes a “sqlite-utils insert-files” command, which can recursively crawl directories for files and add their contents to SQLite with configurable columns containing their metadata.
I was inspired by Paul Ford who has been creating multi-GB SQLite databases of images and PDFs. It turns out that when disk space is cheap this is a pretty effective way of working with interesting corpuses of documents and images.
Weeknotes: cookiecutter templates, better plugin documentation, sqlite-generate
I spent this week spreading myself between a bunch of smaller projects, and finally getting familiar with cookiecutter. I wrote about my datasette-plugin cookiecutter template earlier in the week; here’s what else I’ve been working on.
[... 703 words]sqlite-generate (via) I wrote this tool today to generate arbitrarily large SQLite databases, for testing purposes. You tell it how many tables, columns and rows you want and it will use the Faker Python library to generate random data and populate the tables with it.
Using SQL to Look Through All of Your iMessage Text Messages (via) Dan Kelch shows how to access the iMessage SQLite database at ~/Library/Messages/chat.db—it’s protected under macOS Catalina so you have to enable Full Disk Access in the privacy settings first. I usually use the macOS terminal app but I installed iTerm for this because I’d rather enable full disk access to a separate terminal program than let anything I’m running in my regular terminal take advantage of it. It worked! Now I can run “datasette ~/Library/Messages/chat.db” to browse my messages.
Using SQL to find my best photo of a pelican according to Apple Photos
According to the Apple Photos internal SQLite database, this is the most aesthetically pleasing photograph I have ever taken of a pelican:
[... 1,937 words]Weeknotes: Datasette 0.41, photos breakthroughs
Shorter weeknotes this week, because my main project for the week warrants a detailed write-up on its own (coming soon... update 21st May here it is).
[... 867 words]Weeknotes: Datasette 0.39 and many other projects
This week’s theme: Well, I’m not going anywhere. So a ton of progress to report on various projects.
[... 806 words]hacker-news-to-sqlite (via) The latest in my Dogsheep series of tools: hacker-news-to-sqlite uses the Hacker News API to fetch your comments and submissions from Hacker News and save them to a SQLite database.
Weeknotes: Datasette Writes
As discussed previously, the biggest hole in Datasette’s feature set at the moment involves writing to the database.
[... 604 words]Things I learned about shapefiles building shapefile-to-sqlite
The latest in my series of x-to-sqlite tools is shapefile-to-sqlite. I learned a whole bunch of things about the ESRI shapefile format while building it.
[... 1,073 words]geojson-to-sqlite (via) I just put out the first release of geojson-to-sqlite—a CLI tool that can convert GeoJSON files (consisting of a Feature or a set of features in a FeatureCollection) into a table in a SQLite database. If you use the --spatialite option it will initalize the table with SpatiaLite and store the geometries in a spacially indexed geometry field—without that option it stores them as GeoJSON.
Generated Columns in SQLite (via) SQLite 3.31.0 released today, and generated columns are the single most notable new feature. PostgreSQL 12 added these in October 2019, and MySQL has had them since 5.7 in October 2015. MySQL and SQLite both offer either “stored” or “virtual” generated columns, with virtual columns being calculated at runtime. PostgreSQL currently only supports stored columns.
Serving 100µs reads with 100% availability (via) Fascinating use-case for SQLite from Segment: they needed a massively replicated configuration database across all of their instances that process streaming data. They chose to make the configuration available as a ~50GB SQLite database file mirrored to every instance, meaning lookups against that data could complete in microseconds. Changes to the central MySQL configuration store are pulled every 2-3 seconds, resulting in a trade-off of consistency for availability which fits their use-case just fine.
2019
sqlite-utils 2.0: real upserts
I just released version 2.0 of my sqlite-utils library/CLI tool to PyPI.
[... 1,140 words]athena-sqlite (via) Amazon Athena is the AWS tool for querying data stored in S3—as CSV, JSON or Apache Parquet files—using SQL. It’s an interesting way of buliding a very cheap data warehouse on top of S3 without having to run any additional services. Athena recently added a query federation SDK which lets you define additional custom data sources using Lambda functions. Damon Cortesi used this to write a custom connector for SQLite, which lets you run queries against data stored in SQLite files that you have uploaded to S3. You can then run joins between that data and other Athena sources.
Logging to SQLite using ASGI middleware
I had some fun playing around with ASGI middleware and logging during our flight back to England for the holidays.
[... 2,535 words]sqlite-transform. I released a new CLI tool today: sqlite-transform, which lets you run “transformations” against a SQLite database. I built it out of frustration of constantly running into CSV files that use horrible American date formatting—the “sqlite-transform parsedatetime my.db mytable col1” command runs dateutil’s parser against those columns and replaces them with a nice, sortable ISO formatted timestamp. I’ve also added a “sqlite-transform lambda” command that lets you specify Python code directly on the command-line that should be used to transform every value in a specified column.
goodreads-to-sqlite (via) This is so cool! Tobias Kunze built a Python CLI tool to import your Goodreads data into a SQLite database, inspired by github-to-sqlite and my various other Dogsheep tools. It’s the first Dogsheep style tool I’ve seen that wasn’t built by me—and Tobias’ write-up includes some neat examples of queries you can run against your Goodreads data. I’ve now started using Goodreads and I’m importing my books into my own private Dogsheep Datasette instance.
Tracking PG&E outages by scraping to a git repo
PG&E have cut off power to several million people in northern California, supposedly as a precaution against wildfires.
[... 868 words]SQL Murder Mystery in Datasette (via) “A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a murder that occurred sometime on Jan.15, 2018 and that it took place in SQL City. Start by retrieving the corresponding crime scene report from the police department’s database.”—Really fun game to help exercise your skills with SQL by the NU Knight Lab. I loaded their SQLite database into Datasette so you can play in your browser.
Weeknotes: Dogsheep
Having figured out my Stanford schedule, this week I started getting back into the habit of writing some code.
[... 1,367 words]