Super Fast String Matching in Python

Super Fast String Matching in Python (via) Interesting technique for calculating string similarity at scale in Python, with much better performance than Levenshtein distances. The trick here uses TF/IDF against N-Grams, plus a CSR (Compressed Sparse Row) scipy matrix to run the calculations. Includes clear explanations of each of these concepts.

Posted 5th November 2017 at 3:26 pm

Simon Willison’s Weblog

Recent articles

Monthly briefing