Cookiecutter Data Science (via) Some really solid thinking in this documentation for the DrivenData cookiecutter template. They emphasize designing data science projects for repeatability, such that just the src/ and data/ folders can be used to recreate all of the other analysis from scratch. I like the suggestion to give each project a dedicated S3 bucket for keeping immutable copies of the original raw data that might be too large for GitHub.
Recent articles
- The last six months in LLMs, illustrated by pelicans on bicycles - 6th June 2025
- Tips on prompting ChatGPT for UK technology secretary Peter Kyle - 3rd June 2025
- How often do LLMs snitch? Recreating Theo's SnitchBench with LLM - 31st May 2025