What’s new in sqlite-utils 3.20 and 3.21: --lines, --text, --convert
11th January 2022
sqlite-utils is my combined CLI tool and Python library for manipulating SQLite databases. Consider this the annotated release notes for sqlite-utils 3.20 and 3.21, both released in the past week.
sqlite-utils insert --convert with --lines and --text
The sqlite-utils insert command inserts rows into a SQLite database from a JSON, CSV or TSV file, creating a table with the necessary columns if one does not exist already.
It gained three new options in v3.20:
sqlite-utils insert ... --linesto insert the lines from a file into a table with a singlelinecolumn, see Inserting unstructured data with --lines and --text.sqlite-utils insert ... --textto insert the contents of the file into a table with a singletextcolumn and a single row.sqlite-utils insert ... --convertallows a Python function to be provided that will be used to convert each row that is being inserted into the database. See Applying conversions while inserting data, including details on special behavior when combined with--linesand--text. (#356)
These features all evolved from an idea I had while re-reading my blog entry from last year, Apply conversion functions to data in SQLite columns with the sqlite-utils CLI tool. That blog entry introduced the sqlite-utils convert comand, which can run a custom Python function against a column in a table to convert that data in some way.
Given a log file log.txt that looks something like this:
2021-08-05T17:58:28.880469+00:00 app[web.1]: measure#nginx.service=4.212 request="GET /search/?type=blogmark&page=2&tag=highavailability HTTP/1.1" status_code=404 request_id=25eb296e-e970-4072-b75a-606e11e1db5b remote_addr="10.1.92.174" forwarded_for="114.119.136.88, 172.70.142.28" forwarded_proto="http" via="1.1 vegur" body_bytes_sent=179 referer="-" user_agent="Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)" request_time="4.212" upstream_response_time="4.212" upstream_connect_time="0.000" upstream_header_time="4.212";
I provided this example code to insert lines from a log file into a table with a single line column:
cat log.txt | \
jq --raw-input '{line: .}' --compact-output | \
sqlite-utils insert logs.db log - --nl
Since sqlite-utils insert requires JSON, this example first used jq to convert the lines into {"line": "..."} JSON objects.
My first idea was to improve this with the new --lines option, which lets you replace the above with this:
sqlite-utils insert logs.db log log.txt --lines
Using --lines will create a table with a single lines column and import every line from the file as a row in that table.
In the article, I then demonstrated how --convert could be used to convert those imported lines into structured rows using a regular expression:
sqlite-utils convert logs.db log line --import re --multi "$(cat <<EOD
r = re.compile(r'([^\s=]+)=(?:"(.*?)"|(\S+))')
pairs = {}
for key, value1, value2 in r.findall(value):
pairs[key] = value1 or value2
return pairs
EOD
)"
The new --convert option to sqlite-utils means you can now achieve the same thing using:
sqlite-utils insert logs.db log log.txt --lines \
--import re --convert "$(cat <<EOD
r = re.compile(r'([^\s=]+)=(?:"(.*?)"|(\S+))')
for key, value1, value2 in r.findall(line):
pairs[key] = value1 or value2
return pairs
EOD
)"
Since the --lines option allows you to consume mostly unstructured files split by newlines, I decided to also add an option to consume an entire unstructured file as a single record. I originally called that --all but found the code got messy because it conflicted with Python’s all() built-in, so I renamed it to --text.
Used on its own, --text creates a table with a single column called text:
% sqlite-utils insert logs.db fulllog log.txt --text
% sqlite-utils schema logs.db
CREATE TABLE [fulllog] (
[text] TEXT
);
But with --convert you can pass a snippet of Python code which can take that text value and convert it into a list of dictionaries, which will then be used to populate the table.
Here’s a fun example. The following one-liner uses the classic feedparser library to parse the Atom feed for my blog and load it into a database table:
curl 'https://simonwillison.net/atom/everything/' | \
sqlite-utils insert feed.db entries --text --convert '
feed = feedparser.parse(text)
return feed.entries' - --import feedparser
The resulting database looks like this:
% sqlite-utils tables feed.db --counts -t
table count
------- -------
feed 30
% sqlite-utils schema feed.db
CREATE TABLE [feed] (
[title] TEXT,
[title_detail] TEXT,
[links] TEXT,
[link] TEXT,
[published] TEXT,
[published_parsed] TEXT,
[updated] TEXT,
[updated_parsed] TEXT,
[id] TEXT,
[guidislink] INTEGER,
[summary] TEXT,
[summary_detail] TEXT,
[tags] TEXT
);
Not bad for a one-liner!
This example uses the --import option to import that feedparser library. This means you’ll need to have that library installed in the same virtual environment as sqlite-utils.
If you run into problems here (maybe due to having installed sqlite-utils via Homebrew) one way to do this is to use the following:
python3 -m pip install feedparser sqlite-utils
Then use python3 -m sqlite_utils in place of sqlite-utils—this will ensure you are running the command from the same virtual environment where you installed the library.
Update 13th December 2022: sqlite-utils 3.30 introduced a new sqlite-utils install command for installing PyPI packages directly into the same virtual environment as sqlite-utils itself.
--convert for regular rows
The above examples combine --convert with the --lines and --text options to parse unstructured text into database tables.
But --convert works with the existing sqlite-utils insert options as well.
To review, those are the following:
-
sqlite-utils insertby default expects a JSON file that’s a list of objects,[{"id": 1, "text": "Like"}, {"id": 2, "text": "This"}]. -
sqlite-utils insert --nlaccepts newline-delimited JSON,{"id": 1, "text": "Like"}\n{"id": 2, "text": "This"}. -
sqlite-utils insert --csvand--tsvaccepts CSV/TSV—with--delimiterand--encodingand--quotecharand--no-headersoptions for customizing that import, and a--sniffoption for automatically detecting those settings.
You can now use --convert to define a Python function that accepts a row dictionary representing each row from the import and modifies that dictionary or returns a fresh one with changes.
Here’s a simple example that produces just the capitalized name, the latitude and the longitude from the WRI’s global power plants CSV file:
curl https://raw.githubusercontent.com/wri/global-power-plant-database/master/output_database/global_power_plant_database.csv | \
sqlite-utils insert plants.db plants - --csv --convert '
return {
"name": row["name"].upper(),
"latitude": float(row["latitude"]),
"longitude": float(row["longitude"]),
}'
The resulting database looks like this:
% sqlite-utils schema plants.db
CREATE TABLE [plants] (
[name] TEXT,
[latitude] FLOAT,
[longitude] FLOAT
);
~ % sqlite-utils rows plants.db plants | head -n 3
[{"name": "KAJAKI HYDROELECTRIC POWER PLANT AFGHANISTAN", "latitude": 32.322, "longitude": 65.119},
{"name": "KANDAHAR DOG", "latitude": 31.67, "longitude": 65.795},
{"name": "KANDAHAR JOL", "latitude": 31.623, "longitude": 65.792},
sqlite-utils bulk
- New sqlite-utils bulk command which can import records in the same way as
sqlite-utils insert(from JSON, CSV or TSV) and use them to bulk execute a parametrized SQL query. (#375)
With the addition of --lines, --text, --convert and --import the sqlite-utils insert command is now a powerful tool for turning anything into a list of Python dictionaries, which can then in turn be inserted into a SQLite database table.
Which gave me an idea... what if you could use the same mechanisms to execute SQL statements in bulk instead?
Python’s SQLite library supports named parameters in SQL queries, which look like this:
insert into plants (id, name) values (:id, :name)Those :id and :name parameters can be populated from a Python dictionary. And the .executemany() method can efficiently apply the same SQL query to a big list (or iterator or generator) of dictionaries in one go:
cursor = db.cursor() cursor.executemany( "insert into plants (id, name) values (:id, :name)", [{"id": 1, "name": "One"}, {"id": 2, "name": "Two"}] )
So I implemented the sqlite-utils bulk command, which takes the same import options as sqlite-utils but instead of creating and populating the specified table requires a SQL argument with a query that will be executed using the imported rows as arguments.
% sqlite-utils bulk demo.db \
'insert into plants (id, name) values (:id, :name)' \
plants.csv --csv
This feels like a powerful new feature, which was very simple to implement because the hard work of importing the data had already been done by the insert command.
Running ANALYZE
- New Python methods for running
ANALYZEagainst a database, table or index:db.analyze()andtable.analyze(), see Optimizing index usage with ANALYZE. (#366)- New sqlite-utils analyze command for running
ANALYZEusing the CLI. (#379)- The
create-index,insertandupsertcommands now have a new--analyzeoption for runningANALYZEafter the command has completed. (#379)
This idea came from Forest Gregg, who initially suggested running ANALYZE automatically as part of the sqlite-utils create-index command.
I have to confess: in all of my years of using SQLite, I’d never actually explored the ANALYZE command.
When run, it builds a new table called sqlite_stats1 containing statistics about each of the indexes on the table—indicating how “selective” each index is—effectively how many rows on average you are likely to filter down to if you use the index.
The SQLite query planner can then use this to decide which index to consult. For example, given the following query:
select * from ny_times_us_counties
where state = 'Missouri' and county = 'Greene'If there are indexes on both columns, should the query planner use the state column or the county column?
In this case the state column will filter down to 75,209 rows, while the county column filters to 9,186—so county is clearly the better query plan.
Impressively, SQLite seems to make this kind of decision perfectly well without the sqlite_stat1 table being populated: explain query plan select * from ny_times_us_counties where “county” = ’Greene’ and “state” = ’Missouri’ returns the following:
SEARCH TABLE ny_times_us_counties USING INDEX idx_ny_times_us_counties_county (county=?)
I’ve not actually found a good example of a query where the sqlite_stat1 table makes a difference yet, but I’m confident such queries exist!
Using SQL, you can run ANALYZE against an entire database by executing ANALYZE;, or against all of the indexes for a specific table with ANALYZE tablename;, or against a specific index by name using ANALYZE indexname;.
There’s one catch with ANALYZE: since running it populates a static sqlite_stat1 table, the data in that table can get out of date. If you insert another million rows into a table for example your analyzye statistics might no longer reflect ground truth to the point that the query planner starts to make bad decisions.
For sqlite-utils I decided to make ANALYZE an explicit operation. In the Python library you can now run the following:
db.analyze() # Analyze every index in the database db.analyze("indexname") # Analyze a specific index db.analyze("tablename") # Analyze every index for that table # Or the same thing using a table object: db["tablename"].analyze()
I also added an optional analyze=True parameter to several methods, which you can use to trigger an ANALZYE once that operation completes:
db["tablename"].create_index(["column"], analyze=True) db["tablename"].insert_rows(rows, analyze=True) db["tablename"].delete_where(analyze=True)
The sqlite-utils CLI command has equivalent functionality:
# Analyze every index in a database:
% sqlite-utils analyze database.db
# Analyze a specific index:
% sqlite-utils analyze database.db indexname
# Analyze all indexes for a table:
% sqlite-utils analyze database.db tablename
And an --analyze option for various commands:
% sqlite-utils create-index ... --analyze
% sqlite-utils insert ... --analyze
% sqlite-utils upsert ... --analyze
Other smaller changes
- New
sqlite-utils create-databasecommand for creating new empty database files. (#348)
Most sqlite-utils commands such as insert or create-table create the database file for you if it doesn’t already exist, but I decided it would be neat to have an explicit create-database command for deliberately creating an empty database.
Update 13th January 2022: I wrote a detailed description of my process building this command in How I build a feature.
- The CLI tool can now also be run using
python -m sqlite_utils. (#368)
I initially added this to help write a unit test that exercised the tool through a subprocess (see TIL Testing a Click app with streaming input) but it’s a neat pattern in general. datasette gained this through a contribution from Abdussamet Koçak a few years ago.
- Using
--fmtnow implies--table, so you don’t need to pass both options. (#374)
A nice tiny usability enhancement. You can now run sqlite-utils rows my.db mytable --fmt rst to get back a reStructuredText table—previously you also needed to add --table.
- The insert-files command supports two new columns:
stemandsuffix. (#372)
I sometimes re-read the documentation for older features to remind me what they do, and occasionally an idea for a feature jumps out from that. Implementing these was a very small change.
That --nl improvement came from tinkering around trying to fix the bug.
The bug itself was interesting: I initially thought that my entire mechanism for comitting on every --batch-size chunk was broken, but it turned out I was unnecessarily buffering data from standard input in order to support the --sniff option for detecting the shape of incoming CSV data.
db.supports_strictproperty showing if the database connection supports SQLite strict tables.table.strictproperty (see .strict) indicating if the table uses strict mode. (#344)
See previous weeknotes: this is the first part of my ongoing support for the new STRICT tables in SQLite.
I’m currently blocked on implementing more due to the need to get a robust mechanism up and running for executing sqlite-utils tests in CI against specific SQLite versions, see issue #346.
Releases this week
-
sqlite-utils: 3.21—(92 releases total)—2022-01-11
Python CLI utility and library for manipulating SQLite databases -
sqlite-utils: 3.20—2022-01-05
-
stream-delay: 0.1—2022-01-08
Stream a file or stdin one line at a time with a delay
TILs this week
More recent articles
- Video: Building a tool to copy-paste share terminal sessions using Claude Code for web - 23rd October 2025
- Dane Stuckey (OpenAI CISO) on prompt injection risks for ChatGPT Atlas - 22nd October 2025
- Living dangerously with Claude - 22nd October 2025