Simon Willison’s Weblog

Subscribe

Friday, 13th August 2021

Re-assessing the automatic charset decoding policy in HTTPX (via) Tom Christie ran an analysis of the top 1,000 most accessed websites (according to an older extract from Google’s Ad Planner service) and found that a full 5% of them both omitted a charset parameter and failed to decode as UTF-8. As a result, HTTPX will be depending on the charset-normalizer Python library to handle those cases.

# 10:07 pm / unicode, tom-christie, httpx

2021 » August

MTWTFSS
      1
2345678
9101112131415
16171819202122
23242526272829
3031