This came as a complete shock to me: due to the Google Mayday update, I have been monitoring how the bot accesses our sites in a futile search for clues as to the page rankings changes. Very surprised to discover googlebot fetching URLs that are _strictly_ available through Ajax only.
Here is a sample request from earlier today that fetched a small worker thread that produces additional product images:
66.249.65.25 - - [03/Jun/2010:11:25:43 +0100] "GET /angles.php?id=102257&version=Brown HTTP/1.1" 200 801 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
We have all spoken of Google’s increased capability of following and deciphering javascript logic but it goes beyond the scope of reasonable effort, in my opinion. Is the scraping done based upon URLs detected in the javascript blocks or does it actually evaluate, run and monitor the calls? If the scraping works based upon regex that catches URLs, would obfuscation / base64 encoding help? Does googlebot support XHR via POST or only via GET? Too many questions without an answer at this point, I will update this post as more info becomes available to me.
Such fetches can often be undesirable–certain AJAX calls we do are tied in with sessions and can cause errors / notifications when being requested w/o due cause, session or event. I am currently trying to find if there is a plausible way to apply a “nofollow” to such calls at all that does not involve a lot of editing and blocking of googlebot from any scripts that it has no place reading. If it comes to it, I will revert all ajax handling to a /xhr/ folder and disallow it in robots.txt, wonder what the best practice is…
As for the Mayday Update, the less said about it, the better. I have never seen Google SERPs get things so very wrong – for instance, ranking disabled products with 0 links anywhere on the net that redirect their traffic instead of what used to be the landing pages. Longtail relevance? With 2k organic in-links to the landing page, PR4 and quality content built over 2 years, this is not bizarre enough…
[…] This post was mentioned on Twitter by Webtogs and Gareth, Dimitar Christoff. Dimitar Christoff said: has #googlebot been reading your #mootools #ajax lately? looking for some advice on how to nofollow XHRs http://bit.ly/cOdMtg […]
[…] beware: googlebot understands javascript, mootools and ajax now … […]
[…] Pełny artykuł na: beware: googlebot understands javascript, mootools and ajax now … […]
[…] Follow this link: beware: googlebot understands javascript, mootools and ajax now … […]
[…] Follow this link: beware: googlebot understands javascript, mootools and ajax now … […]