Fastest way to turn HTML into text in Python (via) A light benchmark of the new-to-me selectolax Python library shows it performing extremely well for tasks such as extracting just the text from an HTML string, after first manipulating the DOM. selectolax is a Python binding over the Modest and Lexbor HTML parsing engines, which are written in no-outside-dependency C.
Recent articles
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024
- Visualizing local election results with Datasette, Observable and MapLibre GL - 9th November 2024
- Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5 - 7th November 2024