Super Fast String Matching in Python (via) Interesting technique for calculating string similarity at scale in Python, with much better performance than Levenshtein distances. The trick here uses TF/IDF against N-Grams, plus a CSR (Compressed Sparse Row) scipy matrix to run the calculations. Includes clear explanations of each of these concepts.
Recent articles
- I built an automaton called Squadron - 4th March 2025
- Notes from my Accessibility and Gen AI podcast appearance - 2nd March 2025
- Hallucinations in code are the least dangerous form of LLM mistakes - 2nd March 2025