“Stay hungry. Stay foolish.”

Image credit: ING Direct via Facebook

“The spark that ignited their partnership was provided by Mr. Wozniak’s mother. Mr. Wozniak had graduated from high school and enrolled at the University of California, Berkeley, when she sent him an article from the October 1971 issue of Esquire magazine. The article, “Secrets of the Little Blue Box,” by Ron Rosenbaum, detailed an underground hobbyist culture of young men known as phone phreaks who were illicitly exploring the nation’s phone system.” – John Markoff, The New York Times (Last accessed October 6, 2011)

The History of Phone Phreaking Blog has a scanned copy of the October 1971 Esquire magazine article “Secrets of the Little Blue Box.” (PDF 9.8 MB)

There’s an HTML / text version of “Secrets of the Little Blue Box” at lospadres.info.

Verifying search bots with forward and reverse DNS in PHP

The following function returns true if the current user is one of the bots we consider “good.” (Not that they are better than other legit bots, but ones that we provide a special service to)

This code can be used to allow search engines to index content that is normally behind a registration form. For a good user experience though, it should be used in conjunction with a “first click free” type implementation (which I’m still working on).

In a WordPress local install you can put this in your blogs functions.php page and then call it in the loop as part of the decision to output the_excerpt() or the_content().


# is the remote client (current web browser requesting page that calls
# this function) one of the 
# search bots that we would like to serve alternate content to? (e.g.
# should they get full text version
# of content pages, or should we show them the preview + registration
# form?)
function is_good_bot()
{
    
    # to avoid unecessary lookup, only check if the UA matches one of
    # the bots we like
    $ua = $_SERVER['HTTP_USER_AGENT'];
    if(
        preg_match("/Yahoo! Slurp/i", $ua) ||
        preg_match("/googlebot/i", $ua) ||
        # for testing purposes, put something from your current user
        # agent string in below
        # preg_match("/2009042315/", $ua) ||
        preg_match("/msnbot/i", $ua)
        )
    {
    
        # user agent contains one of the magic phrases, now do a
        # forward and reverse DNS check
        # each of the search providers that we use asserts that their
        # bot domains will always 
        # end in the strings in the below preg_match(es)
        # check forward/reverse to make IP address / hostname spoofing
        # very hard.
        $ip=$_SERVER['REMOTE_ADDR'];
        $hostname=gethostbyaddr($ip);    
        $ip_by_hostname=gethostbyname($hostname);                
        if ($ip_by_hostname == $ip) {
            if(
                preg_match("/\.googlebot\.com$/", $hostname) ||
                preg_match("/search\.msn\.com$/", $hostname) ||
                # testing: enter your hostname here
                # preg_match("/example.com$/", $hostname) ||
                preg_match("/crawl\.yahoo\.net$/", $hostname)         
       
                )
            {
                # good bot. 
                return true;
            } else {
                # bad bot, and possible bad person all around.
                return false;
            }
        } else {
            # bad bot, and possible bad person all around.
            return false;
        }

    } else {
        # If the UA of a prefered bot isn't present, just skip the 2x
        # DNS checks
        return false;
    }     
}

Processing WordPress’ Post Content

If you find yourself directly manipulating a WordPress post (say outside of “The Loop” by using query_posts, e.g. query_posts(array('category__and' => array(1,3)));) you may need to do some extra work to get the pretty formatting that WP does for you in things like the_content() and the_excerpt().

Say you have a single post in a local variable $my_post, if you want to output the content of the post and use the filters that WP does for you when calling the_content within The Loop, you can call the filter functions directly:

<?php echo wpautop(wptexturize($framing_post->post_content)); ?>

References: How WordPress Processes Post Content, Function Reference/wptexturize, Function Reference/wpautop, formatting.php source code