1,085 items tagged “python”
The Python programming language.
2024
snoop. Neat Python debugging utility by Alex Hall: snoop lets you “import snoop” and then add “@snoop” as a decorator to any function, which causes that function’s source code to be output directly to the console with details of any variable state changes that occur while it’s running.
I didn’t know you could make a Python module callable like that—turns out it’s running “sys.modules[’snoop’] = snoop” in the __init__.py module!
Beej’s Guide to Networking Concepts (via) Beej’s Guide to Network Programming is a legendary tutorial on network programming in C, continually authored and updated by Brian “Beej” Hall since 1995.
This is NOT that. Beej’s Guide to Networking Concepts is brand new—started in March 2023—and illustrates a whole bunch of networking concepts using Python instead of C.
From the forward: “Is it Beej’s Guide to Network Programming in Python? Well, kinda, actually. The C book is more about how C’s (well, Unix’s) network API works. And this book is more about the concepts underlying it, using Python as a vehicle.”
urllib3 2.2.0. Highlighted feature: “urllib3 now works in the browser”—the core urllib3 library now includes code that can integrate with Pyodide, using the browser’s fetch() or XMLHttpRequest APIs to make HTTP requests (to CORS-enabled endpoints).
Getting Started With CUDA for Python Programmers (via) if, like me, you’ve avoided CUDA programming (writing efficient code that runs on NVIGIA GPUs) in the past, Jeremy Howard has a new 1hr17m video tutorial that demystifies the basics. The code is all run using PyTorch in notebooks running on Google Colab, and it starts with a very clear demonstration of how to convert a RGB image to black and white.
Find a level of abstraction that works for what you need to do. When you have trouble there, look beneath that abstraction. You won’t be seeing how things really work, you’ll be seeing a lower-level abstraction that could be helpful. Sometimes what you need will be an abstraction one level up. Is your Python loop too slow? Perhaps you need a C loop. Or perhaps you need numpy array operations.
You (probably) don’t need to learn C.
Python packaging must be getting better—a datapoint (via) Luke Plant reports on a recent project he developed on Linux using a requirements.txt file and some complex binary dependencies—Qt5 and VTK—and when he tried to run it on Windows... it worked! No modifications required.
I think Python’s packaging system has never been more effective... provided you know how to use it. The learning curve is still too high, which I think accounts for the bulk of complaints about it today.
Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions
I use cookiecutter to start almost all of my Python projects. It helps me quickly generate a skeleton of a project with my preferred directory structure and configured tools.
[... 686 words]Marimo (via) This is a really interesting new twist on Python notebooks.
The most powerful feature is that these notebooks are reactive: if you change the value or code in a cell (or change the value in an input widget) every other cell that depends on that value will update automatically. It’s the same pattern implemented by Observable JavaScript notebooks, but now it works for Python.
There are a bunch of other nice touches too. The notebook file format is a regular Python file, and those files can be run as “applications” in addition to being edited in the notebook interface. The interface is very nicely built, especially for such a young project—they even have GitHub Copilot integration for their CodeMirror cell editors.
The Random Transformer (via) “Understand how transformers work by demystifying all the math behind them”—Omar Sanseviero from Hugging Face meticulously implements the transformer architecture behind LLMs from scratch using Python and numpy. There’s a lot to take in here but it’s all very clearly explained.
Python 3.13 gets a JIT. “In late December 2023 (Christmas Day to be precise), CPython core developer Brandt Bucher submitted a little pull-request to the Python 3.13 branch adding a JIT compiler.”
Anthony Shaw does a deep dive into this new experimental JIT, explaining how it differs from other JITs. It’s an implementation of a copy-and-patch JIT, an idea that only emerged in 2021. This makes it architecturally much simpler than a traditional JIT, allowing it to compile faster and take advantage of existing LLVM tools on different architectures.
So far it’s providing a 2-9% performance improvement, but the real impact will be from the many future optimizations it enables.
Fastest Way to Read Excel in Python (via) Haki Benita produced a meticulously researched and written exploration of the options for reading a large Excel spreadsheet into Python. He explored Pandas, Tablib, Openpyxl, shelling out to LibreOffice, DuckDB and python-calamine (a Python wrapper of a Rust library). Calamine was the winner, taking 3.58s to read 500,00 rows—compared to Pandas in last place at 32.98s.
2023
iSH: The Linux shell for iOS (via) Installing this iOS app gives you a full Linux shell environment running on your phone, using a “usermode x86 emulator”. You can even install packages: “apk add python3” gave me a working Python 3.9 interpreter, installed from the apk.ish.app repository.
I didn’t think this kind of thing was allowed by the App Store, but that’s not been the case for a few years now: Section 4.5.2 of the App Store guidelines clarifies that “Educational apps designed to teach, develop, or allow students to test executable code may, in limited circumstances, download code provided that such code is not used for other purposes.”
ast-grep (via) There are a lot of interesting things about this year-old project.
sg (an alias for ast-grep) is a CLI tool for running AST-based searches against code, built in Rust on top of the Tree-sitter parsing library. You can run commands like this:
sg -p ’await await_me_maybe($ARG)’ datasette --lang python
To search the datasette directory for code that matches the search pattern, in a syntax-aware way.
It works across 19 different languages, and can handle search-and-replace too, so it can work as a powerful syntax-aware refactoring tool.
My favourite detail is how it’s packaged. You can install the CLI utility using Homebrew, Cargo, npm or pip/pipx—each of which will give you a CLI tool you can start running. On top of that it provides API bindings for Rust, JavaScript and Python!
It’s Time For A Change: datetime.utcnow() Is Now Deprecated (via) Miguel Grinberg explains the deprecation of datetime.utcnow() and utcfromtimestamp() in Python 3.12, since they return naive datetime objects which cause all sorts of follow-on problems.
The replacement idiom is datetime.datetime.now(datetime.timezone.utc)
Fleet Context. This project took the source code and documentation for 1221 popular Python libraries and ran them through the OpenAI text-embedding-ada-002 embedding model, then made those pre-calculated embedding vectors available as Parquet files for download from S3 or via a custom Python CLI tool.
I haven’t seen many projects release pre-calculated embeddings like this, it’s an interesting initiative.
See the History of a Method with git log -L (via) Neat Git trick from Caleb Hearth that I hadn’t seen before, and it works for Python out of the box:
git log -L :path_with_format:__init__.py
That command displays a log (with diffs) of just the portion of commits that changed the path_with_format function in the __init__.py file.
My User Experience Porting Off setup.py (via) PyOxidizer maintainer Gregory Szorc provides a detailed account of his experience trying to figure out how to switch from setup.py to pyproject.toml for his zstandard Python package.
This kind of detailed usability feedback is incredibly valuable for project maintainers, especially when the user encountered this many different frustrations along the way. It’s like the written version of a detailed usability testing session.
chDB (via) This is a really interesting development: chDB offers “an embedded SQL OLAP Engine” as a Python package, which you can install using “pip install chdb”. What you’re actually getting is a wrapper around ClickHouse—it’s almost like ClickHouse has been repackaged into an embedded database similar to SQLite.
I’m banned for life from advertising on Meta. Because I teach Python. (via) If accurate, this describes a nightmare scenario of automated decision making.
Reuven recently found he had a permanent ban from advertising on Facebook. They won’t tell him exactly why, and have marked this as a final decision that can never be reviewed.
His best theory (impossible for him to confirm) is that it’s because he tried advertising a course on Python and Pandas a few years ago which was blocked because a dumb algorithm thought he was trading exotic animals!
The worst part? An appeal is no longer possible because relevant data is only retained for 180 days and so all of the related evidence has now been deleted.
Various comments on Hacker News from people familiar with these systems confirm that this story likely holds up.
Bottleneck T5 Text Autoencoder (via) Colab notebook by Linus Lee demonstrating his Contra Bottleneck T5 embedding model, which can take up to 512 tokens of text, convert that into a 1024 floating point number embedding vector... and then then reconstruct the original text (or a close imitation) from the embedding again.
This allows for some fascinating tricks, where you can do things like generate embeddings for two completely different sentences and then reconstruct a new sentence that combines the weights from both.
New sqlite3 CLI tool in Python 3.12. The newly released Python 3.12 includes a SQLite shell, which you can open using “python -m sqlite3”—handy for when you’re using a machine that has Python installed but no sqlite3 binary.
I installed Python 3.12 for macOS using the official installer from Python.org and now “/usr/local/bin/python3 -m sqlite3” gives me a SQLite 3.41.1 shell—a pleasantly recent version from March 2023 (the latest SQLite is 3.43.1, released in September).
[On Python 3.12 subinterpreters] there's massive advantages for mixed C(++) and Python: I can now have multiple sub interpreters running concurrently and accessing the same shared state in a thread-safe C++ library.
Previously this required rewriting the whole C++ library to support either pickling (multiplying the total memory consumption by the number of cores), or support allocating everything in shared memory (which means normal C++ types like
std::string
are unusable, need to switch e.g. toboost::interprocess
).Now is sufficient to pickle a pointer to a C++ object as an integer, and it'll still be a valid pointer in the other subinterpreter.
— ynik
Get Your Mac Python From Python.org. Glyph recommends the official Python installer from python.org as the best way to get started with a Python environment on macOS—with require-virtualenv = true in your ~/.pip/pip.conf to help avoid accidentally installing global packages.
Things I’ve learned about building CLI tools in Python
I build a lot of command-line tools in Python. It’s become my favorite way of quickly turning a piece of code into something I can use myself and package up for other people to use too.
[... 1,235 words]A Hackers’ Guide to Language Models. Jeremy Howard’s new 1.5 hour YouTube introduction to language models looks like a really useful place to catch up if you’re an experienced Python programmer looking to start experimenting with LLMs. He covers what they are and how they work, then shows how to build against the OpenAI API, build a Code Interpreter clone using OpenAI functions, run models from Hugging Face on your own machine (with NVIDIA cards or on a Mac) and finishes with a demo of fine-tuning a Llama 2 model to perform text-to-SQL using an open dataset.
How CPython Implements and Uses Bloom Filters for String Processing. Fascinating dive into Python string internals by Abhinav Upadhyay. It turns out CPython uses very simple bloom filters in several parts of the core string methods, to solve problems like splitting on newlines where there are actually eight codepoints that could represent a newline, and a tiny bloom filter can help filter a character in a single operation before performing all eight comparisons only if that first check failed.
bpy—Blender on PyPI (via) TIL you can “pip install” Blender!
bpy “provides Blender as a Python module”—it’s part of the official Blender project, and ships with binary wheels ranging in size from 168MB to 319MB depending on your platform.
It only supports the version of Python used by the current Blender release though—right now that’s Python 3.10.
Understanding Immortal Objects in Python 3.12. Abhinav Upadhyay provides a clear and detailed explanation of immortal objects coming in Python 3.12, which ensure Python no longer updates reference counts for immutable objects such as True, False, None and low-values integers. The trick (which maintains ABI compatibility) is pretty simple: a reference count value of 4294967295 now means an object is immortal, and the existing Py_INCREF and Py_DECREF macros have been updated to take that into account.
Lark parsing library JSON tutorial (via) A very convincing tutorial for a new-to-me parsing library for Python called Lark.
The tutorial covers building a full JSON parser from scratch, which ends up being just 19 lines of grammar definition code and 15 lines for the transformer to turn that tree into the final JSON.
It then gets into the details of optimization—the default Earley algorithm is quite slow, but swapping that out for a LALR parser (a one-line change) provides a 5x speedup for this particular example.
deno_python (via) A wildly impressive hack: deno_python uses Deno’s FFI interface to load your system’s Python framework (.dll/.dylib/.so) and sets up JavaScript proxy objects for imported Python objects—so you can run JavaScript code that instantiates objects from Python libraries and uses them to process data in different ways.
The latest release added pip support, so things like ’const np = await pip.import(“numpy”)’ now work.