Follow me: @D_mitar

Most read posts recently



Jun 3rd 2010 × beware: googlebot understands javascript, mootools and ajax now

This came as a complete shock to me: due to the Google Mayday update, I have been monitoring how the bot accesses our sites in a futile search for clues as to the page rankings changes. Very surprised to discover googlebot fetching URLs that are _strictly_ available through Ajax only.

Here is a sample request from earlier today that fetched a small worker thread that produces additional product images:

66.249.65.25 - - [03/Jun/2010:11:25:43 +0100] "GET /angles.php?id=102257&version=Brown HTTP/1.1" 200 801 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

We have all spoken of Google’s increased capability of following and deciphering javascript logic but it goes beyond the scope of reasonable effort, in my opinion. Is the scraping done based upon URLs detected in the javascript blocks or does it actually evaluate, run and monitor the calls? If the scraping works based upon regex that catches URLs, would obfuscation / base64 encoding help? Does googlebot support XHR via POST or only via GET? Too many questions without an answer at this point, I will update this post as more info becomes available to me.

Such fetches can often be undesirable–certain AJAX calls we do are tied in with sessions and can cause errors / notifications when being requested w/o due cause, session or event. I am currently trying to find if there is a plausible way to apply a “nofollow” to such calls at all that does not involve a lot of editing and blocking of googlebot from any scripts that it has no place reading. If it comes to it, I will revert all ajax handling to a /xhr/ folder and disallow it in robots.txt, wonder what the best practice is…

As for the Mayday Update, the less said about it, the better. I have never seen Google SERPs get things so very wrong – for instance, ranking disabled products with 0 links anywhere on the net that redirect their traffic instead of what used to be the landing pages. Longtail relevance? With 2k organic in-links to the landing page, PR4 and quality content built over 2 years, this is not bizarre enough…


Aug 4th 2009 × Don’t copy and paste my javascript blindly…

Just a quick request – if you go and use snippets of javascript code from this site, great, it’s what it’s there for.

Just please, please, don’t copy and paste my google analytics javascript into your pages as well, it kind of messes up my stats. Thanks in advance.

no

Mar 19th 2009 × Google Street View goes live in the UK

It’s what it says, really. Now go and find your street and house. No, really, go…

no

Oct 21st 2008 × An interesting concept: CSS frameworks and SEO source ordered content shuffling

It makes sense to standardise your CSS layout and styles development just as you come to rely upon frameworks for javascript and PHP, I guess. I came upon Elements V2, a nice and tight CSS framework that does not try to do too much but does it well.

I you are in a hurry to produce a new site, it can help you get started on the right path by literally laying down the framework for your content. There is always the danger of such projects to try and be too-helpful at times–the included Lightbox/prototype is overkill. But the mass CSS reset seems to standardise and level the playing field for all browsers so it’s worth grabbing just for that feature alone. Naturally, not a first: css reset.

And of course, from a SEO standpoint – you can then take Elements V2′s layout, move main content div to the top of the source code and the header div to end of the file. You then reposition via CSS and some nested layers… The end result? First bit of text that Google sees on the page is the specific body content – H1/2, titles and so forth. Bits that are repetitive like header and side menus simply do not need to be there at all. This technique is also known as Source Ordered Content or content before navigation. Here is an example SEO content shuffle on a search engine optimisation project I am tinkering with.

A snippet of this non-linear approach to source ordered content content shuffling (takes a while to get your head around the idea):

...
</head>
<body>
    <!--Main Container - Centers Everything-->
    <div id="mainContainer">
        <!--Main Content-->
        <div id="mainContent">
            <div class="content" id="main">
                <h1>Welcome to <strong>ASAP Cleaners</strong>...
                ...
                ...
            </div>
            <!--Footer-->
            <div id="footer">
                ...
                ...
            </div>

        </div>
        <!--Header-->
        <div id="header">
            ... links here
            ... more links
        </div>
...

and here is what it looks like in reality:

after SEO shuffle: reversed layer order

I am not going to go into details and bore you with tedious explanations on how to float elements. Just visit the source ordered content example and view the source.

To an extent, I already do this on the http://fragged.org blog itself – what you see as left side menu actually follows behind all the posts in the source code. I cannot really measure how this works in comparison to a normal linear source. I guess some experimentation will be required but that’s for another post :)

Word to the wise: if your pages’ source code gets to be quite big in size, you may consider not using the “content before navigation” method as your page links may be disregarded by some crawlers (with a 100k or so limit for source size on each page). Also, there is an accessibility consideration–some say source ordered content is being harsh on people with disabilities (er, blind) who use reader software (it cannot find the menus, allegedly) so be careful.