3 items tagged “crawling”
2018
Googlebot’s Javascript random() function is deterministic. random() as executed by Googlebot returns the same predicable sequence. More interestingly, Googlebot runs a much faster timer for setTimeout and setInterval—as Tom Anthony points out, “Why actually wait 5 seconds when you are a bot?”
2009
Official Google Webmaster Blog: A proposal for making AJAX crawlable.
It's horrible! The Google crawler would map url#!state
to url?_escaped_fragment_=state
, then expect your site to provide rendered HTML that reflects that state (they even go as far as to suggest running a headless browser within your web server to do this). Just stick to progressive enhancement instead, it's far less hideous. It looks like the proposal may have originated with the GWT team.
2007
robots.txt Adventure. Interesting notes from crawling 4.6 million robots.txt, including 69 different ways in which the word “disallow” can be mis-spelled.