24th October 2025
Research
Blog Tag Prediction with Scikit-Learn
— Automatically assigning meaningful tags to historic, untagged blog posts, this project leverages the Simon Willison blog database and scikit-learn to train and compare multi-label text classification models. Four approaches—TF-IDF + Logistic Regression, Multinomial Naive Bayes, Random Forest, and LinearSVC—were tested on posts’ title and body text using the 158 most frequently used tags.
Recent articles
- Writing about Agentic Engineering Patterns - 23rd February 2026
- Adding TILs, releases, museums, tools and research to my blog - 20th February 2026
- Two new Showboat tools: Chartroom and datasette-showboat - 17th February 2026