Simon Willison’s Weblog

Subscribe

Tuesday, 10th July 2018

scrapely. Neat twist on a screen scraping library: this one lets you “train it” by feeding it examples of URLs paired with a dictionary of the data you would like to have extracted from that URL, then uses an instance based learning earning algorithm to run against new URLs. Slightly confusing name since it’s maintained by the scrapy team but is a totally independent project from the scrapy web crawling framework.

# 8:25 pm / python, scraping

2018 » July

MTWTFSS
      1
2345678
9101112131415
16171819202122
23242526272829
3031