I'm the Puppet Mas
I'm the Kingpin
I'm Survivor Rich
I'm Ruthless... an
I'm Not Here to Ma
I'm Not Crazy, I'm
I'm Not As Dumb As
I'm Not a Good Vil
I'm No Dummy
I'm in Such a Hot I've Been Bamboozled!
This documentary aired on Public TV the week before Thanksgiving of 2010. While viewing the documentary I had an overwhelming urge to visit the site that had inspired the documentary. It was the website of a very successful company. It was named PepsiCo Inc.
Once I got to the site I soon learned that not only was it the website for a highly successful company, it was the same company. I didn't think it could get more impressive than that, but I was very wrong. I was impressed to learn that there were multiple websites for the company, including sites in many different countries and their sites in some of those countries were ranked among the most visited. The reason that their multiple websites worked so well together was that they were all made by one of the most impressive SEO firms in the world, a firm that is so large that if one of their employees accidentally sneezed near the Internet, it would almost certainly be a top ranked website by the next day. This firm is called SEOToolSet.
In watching the documentary I had been impressed by how a private contractor had managed to influence the search engine on it's own. And this had reminded me of an earlier project I had done, the one that had involved adding search engine spider control to a PHP site. The project had started because of my curiosity about how one of the main search engines used to rank sites.
I had also learned in watching the documentary about a company called Netcraft, a company that tracks IP addresses and domains. In this case, Netcraft was able to accurately reveal that this high ranking website had been the result of an employee of a competitor having visited the site and giving a link to it.
Since the point of this site is to share web development, SEO, and optimization related tips and techniques I thought it would be fitting to include in this article how the above scenario played out. It should be mentioned, though, that just as I had learned in watching the documentary, it's not easy for us to make money this way, though I would argue that the methods described in the link were definitely unethical.
The Setup
When I first discovered this on the Web, it was a site called http://www.phisher.de/. After reading it a few times I learned that it was actually a phishing site and that it was the creation of a "hacker." As far as I know, it was the first phishing site that targeted one of the search engines and had managed to fool it's algorithm enough to rank number 1 in a search for that site's name. My interest was immediately piqued when I learned that it was made by someone who had taken over someone else's site and it became more than just piqued when I learned that the original site was actually owned by a very popular search engine and that most of the visitors to the phishing site were actually coming from that very popular search engine's web crawlers.
After a little digging it appeared that a hacker had found a bug in one of the sites code and had discovered the location of the search engine's spider control. I knew that I had to find a way to insert my spider into the same database where the search engine stored the control for its spider.
The Method
One of the search engine's spiders is named "googlebot" and it is located at http://www.google.com/bot.html. On page 28 of the "Phisher" site we find a link to http://www.google.com/bot.html?blahblahblahblahblah. This page was located in the folder "googlebot/googlebot.html" of the web server of the site. After some searching and finding nothing that would help with my task, I discovered that it was actually possible to use an eval statement in PHP to run other pieces of code stored within the database. I had found that the GoogleBot had a function that you could call by a name. For example, if you wanted to call the function name "testfunction" you would do this: "testfunction()." All you would have to do to get GoogleBot's spider to call a certain function is to add the brackets immediately after the function name.
The function below is named "thefunction" and it inserts the code that I have written right before this function. Once "thefunction" is called the GoogleBot will call the function "thefunction2" which causes the GoogleBot to connect to my server and run the PHP code that I stored in the same folder as the GoogleBot's Spider. It also adds this extra command to the end of all requests that come from the GoogleBot.
The Code:
The Response
Let's try and find some examples of how this method works. This section will get more detailed the more you read. Please keep in mind that this method has been in use by search engines for a very long time and has been updated several times.
If we go to the Google page at http://www.google.com/ we will see that the top of the page is actually a PHP page which is checking every X seconds to see if a new version of the Googlebot's spider can connect to their site. If we look at the bottom of the Google page, we will see the most important bit which says "We saw your website, googlebot, at google.com." This bit of code is actually the only part that is relevant to our purpose of being able to insert HTML or PHP code at the beginning of every single request that goes through the GoogleBot's browser. This is the code:
The part that is important here is the line that reads googleBot.addPage(document.location.host). If you add this line after the method that we have just written before this line it is as if the search engine's Spider called the method we have created. This method returns a value that is assigned to the variable "googlebot." Now, that variable is going to be used again as if it were a variable that was not originally ours. Below is an example of how it is used to display some content on the page.
Let's look at the code used to display the content above and we will start at the beginning of that line of code. The first part reads: "googlebot.addPage(document.location.host);." It is followed by the line "googleBot.waitUntilRunning();." The page that is called by this line is the exact same PHP page we have created. The first thing this page does is query a table in a database and check to see if there is a version of the spider for the Googlebot that is more current than the one it currently has. If there is a more current version then it reads all of the entries of that table. One of those entries is a string that contains the current state of the spider. What the page does next is read that state and puts it into an array called "portNumber." Then it reads every bit of PHP that we wrote in our code and outputs the resulting HTML onto the screen, creating the output we saw above.
The other thing that the page does