1,085 items tagged “python”
The Python programming language.
2018
Using setup.py in Your (Django) Project. Includes this neat trick: if you list manage.py in the setup(scripts=) argument you can call it from e.g. cron using the full path to manage.py within your virtual environment and it will execute in the correct context without needing to explicitly activate the environment first.
Datasette Demo (video) from the SF Python Meetup
I gave a short talk about Datasette last month at the SF Python Meetup Holiday Party. They’ve just posted the video, so here it is:
[... 63 words]Generating polygon representing a rough 100km circle around latitude/longitude point using Python. A question I posted to the GIS Stack Exchange—I found my own answer using a Python library called geog, then someone else posted a better solution using pyproj.
Notes on Kafka in Python. Useful review by Matthew Rocklin of the three main open source Python Kafka client libraries as of October 2017.
2017
Let your code type-hint itself: introducing open source MonkeyType. Instagram have open sourced their tool for automatically adding type annotations to your Python 3 code via runtime tracing. By default it logs the types it sees to a SQLite database, which means you can browse them with Datasette!
Python 3 Readiness (via) 345 of the 360 most popular Python packages are now compatible with Python 3. I’d love to see a version of this graph over time.
Object models (via) Extremely comprehensive and readable discussion of the object models of Python, JavaScript, Lua and Perl 5. I learned something new about every one of those languages.
pillow-simd (via) A “friendly fork” of the Python Pillow image library that takes advantage of SIMD operations on certain CPUs to obtain massive speed-ups—they claim 16 to 40 times faster than ImageMagick.
Exploring Line Lengths in Python Packages. Interesting exploration of the impact if the 79 character length limit rule of thumb on various Python packages—and a thoroughly useful guide to histogram plotting in Jupyter, pandas and matplotlib.
dhash (via) Python library to calculate the perceptual difference hash for an image. Delightfully simple algorithm that’s fully explained in the README—it works by scaling the image to 8x8 grayscale and then creating a bitmap representing of each pixel is lighter or darker than the previous one.
Eager Execution: An imperative, define-by-run interface to TensorFlow. Lets you evaluate TensorFlow expressions interactively in Python without needing to constantly run tf.Session().run(variable).
TensorFlow 101. Concise, readable introduction to TensorFlow, with Python examples you can execute (and visualize) in Jupyter.
spaCy. “Industrial-strength Natural Language Processing in Python”. Exciting alternative to nltk—spaCy is mostly written in Cython, makes bold performance claims and ships with a range of pre-built statistical models covering multiple different languages. The API design is clean and intuitive and spaCy even includes an SVG visualizer that works with Jupyter.
Pull request #4120 · python/cpython. I just had my first ever change merged into Python! It was a one sentence documentation improvement (on how to cancel SQLite operations) but it was fascinating seeing how Python’s GitHub flow is set up—clever use of labels, plus a bot that automatically checks that you have signed a copy of their CLA.
walrus. Fascinating collection of Python utilities for working with Redis, by Charles Leifer. There are a ton of interesting ideas in here. It starts with Python object wrappers for Redis so you can interact with lists, sets, sorted sets and Redis hashes using Python-like objects. Then it gets really interesting: walrus ships with implementations of autocomplete, rate limiting, a graph engine (using a sorted set hexastore) and an ORM-style models mechanism which manages secondary indexes and even implements basic full-text search.
Try hosting on PyPy by simonw. I had a go at hosting my blog on PyPy. Thanks to the combination of Travis CI, Sentry and Heroku it was pretty easy to give it a go—I had to swap psycopg2 for psycopg2cffi and switch to the currently undocumented pypy3-5.8.0 Heroku runtime (pypy3-5.5.0 is only compatible with Python 3.3, which Django 2.0 does not support). I ran it in production for a few minutes and didn’t get any Sentry errors but did end up using more Heroku dyno memory than I’m comfortable with—see the graph I posted in a comment. I’m going to stick with CPython 3.6 for the moment. Amusingly I did almost all of the work on this on my phone! Travis CI means it’s easy to create and test a branch through GitHub’s web UI, and deploying a tested branch to Heroku is then just a button click.
Super Fast String Matching in Python (via) Interesting technique for calculating string similarity at scale in Python, with much better performance than Levenshtein distances. The trick here uses TF/IDF against N-Grams, plus a CSR (Compressed Sparse Row) scipy matrix to run the calculations. Includes clear explanations of each of these concepts.
Connecting to Google Sheets with Python. Useful guide to interacting with Google Sheets via the gspread python library, including how to work with Google’s unintuitive “service account keys”.
How Adversarial Attacks Work. Adversarial attacks against machine learning classifiers involve constructing an input that deliberately produces the wrong classification. This article shows how these can be constructed, and includes examples generated using PyTorch which produce a sports car that gets identified as a toaster and a photo of Sylvester Stallone that gets classified as Keanu Reeves.
A Minimalist Guide to SQLite. Pretty comprehensive actually—covers the sqlite3 command line app, importing CSVs, integrating with Python, Pandas and Jupyter notebooks, visualization and more.
Exploring United States Policing Data Using Python. Outstanding introduction to data analysis with Jupyter and Pandas.
Fast GeoSpatial Analysis in Python. Some clever advanced performance tricks with Cython and Dask, but it also introduced me to GeoPandas.
profiling. “An interactive continuous Python profiler”. This is really neat—simply run “profiling myscript.py” to get an interactive, navigable console-based profile inspector at the end of your script... or run “profiling live-profile mywebverver.py” to see a live, updating profile of a long-running process. Has options for statistical profiling as well, which has a much lower overhead in exchange for a less accurate view of what is going on.
Contributors to python/cpython, Aug 5, 1990—Oct 26, 2017. I love how the graphs on this page summarize the history of the last 27 years of Python development, showing exactly when each core contributor was most active.
hupper (via) Handy Python module for adding “live reload” development support to just about anything. I’m using it with Sanic—I run “hupper -m app” and it starts up my code in app.py and automatically reloads it any time any of the corresponding files changes on disk.
Parse shell one-liners with pyparsing. Neat introduction to the pyparsing library, both for parsing tokens into labeled sections and constructing an AST from them.
Getting the Most out of Sqlite3 with Python. A couple of neat tricks I didn’t know: you can skip cursors entirely by calling .execute and .executemany directly on the connection object, and you can use the connection object as a context manager to execute transactions using a “with” block.
Porting my blog to Python 3
This blog is now running on Python 3! Admittedly this is nearly nine years after the first release of Python 3.0, but it’s the first Python 3 project I’ve deployed myself so I’m pretty excited about it.
[... 883 words]Deploying an asynchronous Python microservice with Sanic and Zeit Now
Back in 2008 Natalie Downe and I deployed what today we would call a microservice: json-head, a tiny Google App Engine app that allowed you to make an HTTP head request against a URL and get back the HTTP headers as JSON. One of our initial use-scase for this was Natalie’s addSizes.js, an unobtrusive jQuery script that could annotate links to PDFs and other large files with their corresponding file size pulled from the Content-Length
header. Another potential use-case is detecting broken links, since the API can be used to spot 404 status codes (as in this example).
Sanic. “Sanic is a Flask-like Python 3.5+ web server that’s written to go fast [...] On top of being Flask-like, Sanic supports async request handlers. This means you can use the new shiny async/await syntax from Python 3.5, making your code non-blocking and speedy”.