1,085 items tagged “python”
The Python programming language.
2018
Compiling SQLite for use with Python Applications (via) Charles Leifer’s recent tutorial on how to compile and build the latest SQLite (with window function support) for use from Python via his pysqlite3 library.
coleifer/pysqlite3. Now that the pysqlite package is bundled as part of the Python standard library the original open source project is no longer actively maintained, and has not been upgraded for Python 3. Charles Leifer has been working on pysqlite3, a stand-alone package of the module. Crucially, this should enable compiling the latest version of SQLite (via the amalgamation package) without needing to upgrade the version that ships with the operating system.
Faust: Python Stream Processing (via) A new open source stream processing system released by Robinhood, created by Vineet Goel and Celery creator Ask Solem. The API looks delightful, making very smart use of Python decorators and async/await. The initial release requires Kafka but they plan to support multiple backends, hopefully including Redis Streams.
Datasette unit tests: monkeytype_call_traces (via) Faceted browse against every function call that occurs during the execution of Datasette’s test suite. I used Instagram’s MonkeyType tool to generate this, which can run Python code and generates a SQLite database of all of the traced calls. It’s intended to be used to automatically add mypy annotations to your code, but since it produces a SQLite database as a by-product I’ve started exploring the intermediary format using Datasette. Generating this was as easy as running “monkeytype run `which pytest`” in the Datasette root directory.
XARs: An efficient system for self-contained executables (via) Really interesting new open source project from Facebook: a XAR is a new way of packaging up a Python executable complete with its dependencies and resources such that it can be distributed and executed elsewhere as a single file. It’s kind of like a Docker container without Docker—it uses the SquashFS compressed read-only filesystem. I can’t wait to try this out with Datasette.
future-fstrings (via) Clever module that backports fstrings to versions of Python earlier than 3.6, by registering itself as a codec and abusing Python’s # -*- coding: future_fstrings -*- feature. Via a conversation on Twitter that pointed out that the JavaScript community have been using transpilation to successfully experiment with new language features for years now.
scrapely. Neat twist on a screen scraping library: this one lets you “train it” by feeding it examples of URLs paired with a dictionary of the data you would like to have extracted from that URL, then uses an instance based learning earning algorithm to run against new URLs. Slightly confusing name since it’s maintained by the scrapy team but is a totally independent project from the scrapy web crawling framework.
mycli. Really neat auto-complete enabled MySQL terminal client, built using the excellent python-prompt-toolkit. Has a sister-project for PostgreSQL called pgcli.
At Harvard we've built out an infrastructure to allow us to deploy JupyterHub to courses with authentication managed by Canvas. It has allowed us to easily deploy complex set-ups to students so they can do really cool stuff without having to spend hours walking them through setup. Instructors are writing their lectures as IPython notebooks, and distributing them to students, who then work through them in their JupyterHub environment. Our most ambitious so far has been setting up each student in the course with a p2.xlarge machine with cuda and TensorFlow so they could do deep learning work for their final projects. We supported 15 courses last year, and got deployment time for an implementation down to only 2-3 hours.
Hynek Schlawack: Testing & Packaging (via) “How to ensure that your tests run code that you think they are running, and how to measure your coverage over multiple tox runs (in parallel!)”—Hynek makes a convincing argument for putting your packaged Python code in a src/ directory for ease of testing and coverage.
Pyre: Fast Type Checking for Python (via) Facebook’s alternative to mypy. “Pyre is designed to be highly parallel, optimizing for near-instant responses so that you get immediate feedback, even in a large codebase”. Like their Hack type checker for PHP, Pyre is implemented in OCaml.
Black Onlline Demo (via) Black is “the uncompromising Python code formatter” by Łukasz Langa—it reformats Python code to a very carefully thought out styleguide, and provides almost no options for how code should be formatted. It’s reminiscent of gofmt. José Padilla built a handy online tool for trying it out in your browser.
dateparser: python parser for human readable dates (via) I’ve used dateutil.parser for this in the past, but dateparser is a major upgrade: it knows how to parse dates in 200 different language locales, can interpret different timezone representations and handles relative dates (“3 months, 1 week and 1 day ago”) as well.
How to rewrite your SQL queries in Pandas, and more (via) I still haven’t fully internalized the idioms needed to manipulate DataFrames in pandas. This tutorial helps a great deal—it shows the Pandas equivalents for a host of common SQL queries.
Intro to Threads and Processes in Python (via) I really like the diagrams in this article which compares the performance of Python threads and processes for different types of task via the excellent concurrent.futures library.
How to Use Static Type Checking in Python 3.6 (via) Useful introduction to optional static typing in Python 3.6, including how to use mypy, PyCharm and the Atom mypy plugin.
The best of Python: a collection of my favorite articles from 2017 and 2018 (so far). Gergely Szerovay has brought together an outstandingly interesting selection of Python articles from the last couple of years of activity of the Python community on Medium. A whole load of gems in here that I hadn’t seen before.
Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter (via) ScriptedForms is a fascinating Jupyter hack that lets you construct dynamic documents defined using markdown that provide form fields and evaluate Python code instantly as you interact with them.
Parsing CSV using ANTLR and Python 3. I’ve been trying to figure out how to use ANTLR grammars from Python—this is the first example I’ve found that really clicked for me.
rubber-docker/linux.c. rubber-docker is a workshop that talks through building a simply Docker clone from scratch in Python. I particularly liked this detail: linux.c is a Python extension written in C that exposes a small collection of Linux syscalls that are needed for the project—clone, mount, pivot_root, setns, umount, umount2 and unshare. Just reading through this module gives a really nice overview of how some of Docker’s underlying magic actually work.
import-pypi. A devious Python 3 hack which abuses importlib.machinery to add a hook such that any time you type “import modulename” it checks to see if the module is installed and runs “pip install modulename” first if it isn’t. Intended as a joke, but if you habitually fire up temporary virtual environments for exploratory programming like I do this could actually be a neat little time-saver.
How to use HDF5 files in Python (via) HDF5: the missing manual. A detailed explanation of the HDF5 file format and how to work with it using the h5py module. HDF5 allows you to efficiently store multiple datasets (plus metatdata about them) in a single file and then load data from them without pulling the entire file into memory—kind of like SQLite but without the SQL support and more optimized for working with arrays.
Trio Tutorial. Trio is a really nice async library for Python—a simpler alternative to asyncio, with some very clean API design. Best of all, the tutorial is fantastic—it provides a very clear explanation of async/await without diving into the intricacies of coroutines.
r1chardj0n3s/parse: Parse strings using a specification based on the Python format() syntax. (via) Really neat API design: parse() behaves almost exactly in the opposite way to Python’s built-in format(), so you can use format strings as an alternative to regular expressions for extracting specific data from a string.
kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner.
s3monkey: A Python library that allows you to interact with Amazon S3 Buckets as if they are your local filesystem. (via) A particularly devious hack by Kenneth Reitz—provides a context manager within which various Python filesystem APIs such as open() and os.listdir() are monkeypatched to operate against an S3 bucket instead. Kenneth built it to make it easier to work with files from apps running on Heroku. Under the hood it uses pyfakefs, a filesystem mocking library originally released by Google.
A Promenade of PyTorch. Useful overview of the PyTorch machine learning library from Facebook AI Research described as “a Python library enabling GPU-accelerated tensor computation”. Similar to TensorFlow, but where TensorFlow requires you to explicitly construct an execution graph PyTorch instead lets you write regular Python code (if statements, for loops etc) which PyTorch then uses to construct the execution graph for you.
Moving a large and old codebase to Python3 (via) Really interesting case study full of good ideas. The codebase in this case was 240,000 lines of Python and Django written over the course of 15 years. The team used Python-Modernize to aid their transition to a six-compatible codebase first.
Python & Async Simplified. Andrew Godwin: “Python’s async framework is actually relatively simple when you treat it at face value, but a lot of tutorials and documentation discuss it in minute implementation detail, so I wanted to make a higher-level overview that deliberately ignores some of the small facts and focuses on the practicalities of writing projects that mix both kinds of code.” This is really useful: clearly explains the two separate worlds of Python (sync and async functions) and describes Andrew’s clever sync_to_async and async_to_sync decorators as well.
Channels 2.0. Andrew just shipped Channels 2.0—a major rewrite and redesign of the Channels project he started back in 2014. Channels brings async to Django, providing a logical, standardized way of supporting things like WebSockets and asynchronous execution on top of a Django application. Previously it required you to run a separate Twisted server and redis/RabbitMQ queue, but thanks to Python 3 async everything can now be deployed as a single process. And the new ASGI spec means its turtles all the way down! Everything from URL routing to view functions to middleware can be composed together using the same ASGI interface.