Verifying search bots with forward and reverse DNS in PHP

The following function returns true if the current user is one of the bots we consider “good.” (Not that they are better than other legit bots, but ones that we provide a special service to)

This code can be used to allow search engines to index content that is normally behind a registration form. For a good user experience though, it should be used in conjunction with a “first click free” type implementation (which I’m still working on).

In a WordPress local install you can put this in your blogs functions.php page and then call it in the loop as part of the decision to output the_excerpt() or the_content().


# is the remote client (current web browser requesting page that calls
# this function) one of the 
# search bots that we would like to serve alternate content to? (e.g.
# should they get full text version
# of content pages, or should we show them the preview + registration
# form?)
function is_good_bot()
{
    
    # to avoid unecessary lookup, only check if the UA matches one of
    # the bots we like
    $ua = $_SERVER['HTTP_USER_AGENT'];
    if(
        preg_match("/Yahoo! Slurp/i", $ua) ||
        preg_match("/googlebot/i", $ua) ||
        # for testing purposes, put something from your current user
        # agent string in below
        # preg_match("/2009042315/", $ua) ||
        preg_match("/msnbot/i", $ua)
        )
    {
    
        # user agent contains one of the magic phrases, now do a
        # forward and reverse DNS check
        # each of the search providers that we use asserts that their
        # bot domains will always 
        # end in the strings in the below preg_match(es)
        # check forward/reverse to make IP address / hostname spoofing
        # very hard.
        $ip=$_SERVER['REMOTE_ADDR'];
        $hostname=gethostbyaddr($ip);    
        $ip_by_hostname=gethostbyname($hostname);                
        if ($ip_by_hostname == $ip) {
            if(
                preg_match("/\.googlebot\.com$/", $hostname) ||
                preg_match("/search\.msn\.com$/", $hostname) ||
                # testing: enter your hostname here
                # preg_match("/example.com$/", $hostname) ||
                preg_match("/crawl\.yahoo\.net$/", $hostname)         
       
                )
            {
                # good bot. 
                return true;
            } else {
                # bad bot, and possible bad person all around.
                return false;
            }
        } else {
            # bad bot, and possible bad person all around.
            return false;
        }

    } else {
        # If the UA of a prefered bot isn't present, just skip the 2x
        # DNS checks
        return false;
    }     
}
Advertisements

One thought on “Verifying search bots with forward and reverse DNS in PHP

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s