Wikidata is a Giant Crosswalk File. Drew Breunig shows how to take the 140GB Wikidata JSON export, use sed 's/,$//'
to convert it to newline-delimited JSON, then use DuckDB to run queries and extract external identifiers, including a query that pulls out 500MB of latitude and longitude points.
Recent articles
- Building software on top of Large Language Models - 15th May 2025
- Trying out llama.cpp's new vision support - 10th May 2025
- Saying "hi" to Microsoft's Phi-4-reasoning - 6th May 2025