December 2019
56 posts: 5 entries, 14 links, 2 quotes, 35 beats
Dec. 18, 2019
athena-sqlite (via) Amazon Athena is the AWS tool for querying data stored in S3—as CSV, JSON or Apache Parquet files—using SQL. It’s an interesting way of buliding a very cheap data warehouse on top of S3 without having to run any additional services. Athena recently added a query federation SDK which lets you define additional custom data sources using Lambda functions. Damon Cortesi used this to write a custom connector for SQLite, which lets you run queries against data stored in SQLite files that you have uploaded to S3. You can then run joins between that data and other Athena sources.
Dec. 19, 2019
Dec. 20, 2019
Building tools to bring data-driven reporting to more newsrooms. I wrote about my fellowship project so far and my goals for the next few months for the JSK Medium publication. My next priority: an invite-only hosted version for newsrooms so that figuring out how to install and manage the software isn’t the biggest barrier to entry.
Dec. 21, 2019
Dec. 22, 2019
Dec. 23, 2019
The Guardian’s nifty old-article trick is a reminder of how news organizations can use metadata to limit misinformation (via) The Guardian displays prominent banners on news stories from more than a year ago warning that it is an older article to help prevent accidental or intentional spread of misinformation using their content as ammunition. Impressively they also display the year prominently on the card images they serve as social media previews fir older articles.
Weeknotes: Datasette 0.33
I released Datasette 0.33 yesterday. The release represents an accumulation of small changes and features since Datasette 0.32 back in November. Duplicating the release notes:
[... 678 words]Dec. 24, 2019
Dec. 25, 2019
Dec. 26, 2019
free-for.dev (via) It’s pretty amazing how much you can build on free tiers these days—perfect for experimenting with side-projects. free-for.dev collects free SaaS tools for developers via pull request, and has had contributions from over 500 people.
For creative work, you can't cheat. My believe is that there are 5 creative hours in everyone's day. All I ask of people at Shopify is that 4 of those are channeled into the company.
Dec. 27, 2019
Dec. 28, 2019
Dec. 29, 2019
Dec. 30, 2019
sqlite-utils 2.0: real upserts
I just released version 2.0 of my sqlite-utils library/CLI tool to PyPI.
[... 1,140 words]Machine Learning on Mobile and at the Edge: 2019 industry year-in-review (via) This is a fantastic detailed overview of advances made in the field of machine learning on the edge (primarily on mobile devices) over 2019. I’m really excited about this trend: I love the improved privacy implications of running models on my phone without uploading data to a server, and it’s great to see techniques like Federated Learning (from Google Labs) which enable devices to privately train models in a distributed way without having to upload their training data.
Guide To Using Reverse Image Search For Investigations (via) Detailed guide from Bellingcat’s Aric Toler on using reverse image search for investigative reporting. Surprisingly Google Image Search isn’t the state of the art: Russian search engine Yandex offers a much more powerful solution, mainly because it’s the largest public-facing image search engine to integrate scary levels of face recognition.
Scaling React Server-Side Rendering (via) Outstanding, detailed essay from 2017 on challenges and solutions for scaling React server-side rendering at Kijiji, Canada’s largest classified site (owned by eBay). There’s a lot of great stuff in here, including a detailed discussion of different approaches to load balancing, load shedding, component caching, client-side rendering fallbacks and more.