Simon Willison’s Weblog

Subscribe

November 2019

Nov. 2, 2019

Why you should use `python -m pip` (via) Brett Cannon explains why he prefers “python -m pip install...” to “pip install...”—it ensures you always know exactly which Python interpreter environment you are installing packages for. He also makes the case for always installing into a virtual environment, created using “python -m venv”.

# 4:41 pm / packaging, python

Nov. 4, 2019

sqlite-transform. I released a new CLI tool today: sqlite-transform, which lets you run “transformations” against a SQLite database. I built it out of frustration of constantly running into CSV files that use horrible American date formatting—the “sqlite-transform parsedatetime my.db mytable col1” command runs dateutil’s parser against those columns and replaces them with a nice, sortable ISO formatted timestamp. I’ve also added a “sqlite-transform lambda” command that lets you specify Python code directly on the command-line that should be used to transform every value in a specified column.

# 2:41 am / projects, sqlite

Cloud Run Button: Click-to-deploy your git repos to Google Cloud (via) Google Cloud Run now has its own version of the Heroku deploy button: you can add a button to a GitHub repository which, when clicked, will provide an interface for deploying your repo to the user’s own Google Cloud account using Cloud Run.

# 4:57 am / cloudrun, google, github

selenium-demoscraper (via) Really useful minimal example of a Binder project. Click the button to launch a Jupyter notebook in Binder that can take screenshots of URLs using Selenium-controlled headless Firefox. The binder/ folder uses an apt.txt file to install Firefox, requirements.txt to get some Python dependencies and a postBuild Python script to download the Gecko Selenium driver.

# 3:05 pm / jupyter, tony-hirst, selenium, firefox

Weeknotes: More releases, more museums

Lots of small releases this week.

[... 538 words]

Nov. 6, 2019

Automate the Boring Stuff with Python: Working with PDF and Word Documents. I stumbled across this while trying to extract some data from a PDF file (the kind of file with actual text in it as opposed to dodgy scanned images) and it worked perfectly: PyPDF2.PdfFileReader(open("file.pdf", "rb")).getPage(0).extractText()

# 4:17 pm / pdf, python

The first ever commit to Sentry (via) This is fascinating: the first 70 lines of code that started the Sentry error tracking project. It’s a straight-forward Django process_exception() middleware method that collects the traceback and the exception class and saves them to a database. The trick of using the md5 hash of the traceback message to de-dupe errors has been there from the start, and remains one of my favourite things about the design of Sentry.

# 11:08 pm / sentry, django

Nov. 7, 2019

pinboard-to-sqlite (via) Jacob Kaplan-Moss just released the second Dogsheep tool that wasn’t written by me (after goodreads-to-sqlite by Tobias Kunze)—this one imports your Pinterest bookmarks. The repo includes a really clean minimal example of how to use GitHub actions to run tests and release packages to PyPI.

# 8:46 pm / pinboard, dogsheep, pypi, github, jacob-kaplan-moss

Nov. 11, 2019

Weeknotes: Python 3.7 on Glitch, datasette-render-markdown

Streaks is really working well for me. I’m at 12 days of commits to Datasette, 16 posting a daily Niche Museum, 19 of actually reviewing my email inbox and 14 of guitar practice. I rewarded myself for that last one by purchasing an actual classical (as opposed to acoustic) guitar.

[... 1,141 words]

Nov. 12, 2019

My Python Development Environment, 2020 Edition (via) Jacob Kaplan-Moss shares what works for him as a Python environment coming into 2020: pyenv, poetry, and pipx. I’m not a frequent user of any of those tools—it definitely looks like I should be.

# 1:30 am / jacob-kaplan-moss, python

Datasette 0.31. Released today: this version adds compatibility with Python 3.8 and breaks compatibility with Python 3.5. Since Glitch support Python 3.7.3 now I decided I could finally give up on 3.5. This means Datasette can use f-strings now, but more importantly it opens up the opportunity to start taking advantage of Starlette, which makes all kinds of interesting new ASGI-based plugins much easier to build.

# 6:11 am / glitch, asgi, datasette, python, projects

Nov. 14, 2019

I have sometimes wondered how I would fare with a problem where the solution really isn’t in sight. I decided that I should give it a try before I get too old.

I’m going to work on artificial general intelligence (AGI).

I think it is possible, enormously valuable, and that I have a non-negligible chance of making a difference there, so by a Pascal’s Mugging sort of logic, I should be working on it.

John Carmack

# 1:18 am / ai

Nov. 15, 2019

datasette-template-sql (via) New Datasette plugin, celebrating the new ability in Datasette 0.32 to have asynchronous custom template functions in Jinja (which was previously blocked by the need to support Python 3.5). The plugin adds a sql() function which can be used to execute SQL queries that are embedded directly in custom templates.

# 12:59 am / projects, sql, templates, datasette, jinja

Nov. 18, 2019

Weeknotes: datasette-template-sql

Last week I talked about wanting to take ona a larger Datasette project, and listed some candidates. I ended up pushing a big project that I hadn’t listed there: the upgrade of Datasette to Python 3.8, which meant dropping support for Python 3.5 (thanks to incompatible dependencies).

[... 521 words]

Nov. 21, 2019

How Do You Remove Unused CSS From a Site? (via) Chris Coyier takes an exhaustive look at the current set of tools for automatically removing unused CSS, and finds that there’s no magic bullet but you can get OK results if you use them carefully.

# 4:41 am / css

With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.

Hyrum's Law

# 10:45 pm / api-design

Nov. 25, 2019

niche-museums.com, powered by Datasette

I just released a major upgrade to my www.niche-museums.com website (launched last month).

[... 1,154 words]

Nov. 28, 2019

In general, reviewers should favor approving a CL [code review] once it is in a state where it definitely improves the overall code health of the system being worked on, even if the CL isn’t perfect.

Google Standard of Code Review

# 5:40 am / codereview, google

2019 » November

MTWTFSS
    123
45678910
11121314151617
18192021222324
252627282930