Search Engines and Its Algorithms (2010 – 2012)

Search engines are designed to search for information on the web. The search results consist of web pages, images and other kinds of files. SE’s are considered as the silent public relations firm that quietly works in the background and helps the visitors to find what they are looking for. According to Search Engine Watch, there are about 625 million searches done every day and nearly 40% to 80% of visitors find their required information by using SE’s. More than 100 SE’s are developed and launched. Some are closed and are still providing their services. The three most famous search engines are Google, Yahoo, and Bing. Among these three search engines, Google is considered as the biggest dominate market share [4]. The latest research survey indicated that Google is responsible for generating 91% of traffic yahoo, 3% of Bing and 2% from all other SE’s. This is also the reason that all the empirical evaluation is performed on the search engine Google. In order to rank both experimental websites, it is important to review all the algorithm updates launched by Google so far the latest by 2012. The major algorithm updates of Google are as discussed in the following subsection.

Penguin Updates

Google Panda is a code name of a Google Algorithm Update that was first announced on April 24, 2012, affecting “3.1 % of queries”. The aim of this update was to decrease search engine rankings of websites that violate Google webmaster guidelines by using black hat SEO techniques such as keywords stuffing, cloaking, link exchange, creation of duplicate content and others number of spam factors [36] [37] [38] [39]. This algorithm update is associated with the “over-optimization penalty”. [40]

Google claims that using white hat SEO techniques improves the usability of a site, help in creating great content and make the site faster, which is very positive for both users and search engines. Moreover, Google’s intentions towards algorithm changes help searchers to find sites that provide a user experience and fulfill the information needs [41].

Google pushed the second minor penguin update in the same year on May 25, 2012. In other words, Google refreshes the first penguin update. It affects less than 0.1 % of English searches. It has been observed that all web directories have been removed from search results [42]. After a release of minor Penguin data update, Google released the third major Penguin update in the same year on Oct 05, 2012. This major update of penguin affected “0.3% of English queries and “0.4% of non-English queries” [43].

Panda Update

Google’s first Panda/Farmer update affected up to 12% of search results. It was released on Feb 23, 2011. This update hit sites with high ad to content ratios, content farms and a number of other quality issues [44]. Search Engineers from Google addressed that there are dramatic losses for some companies (Mahalo, Suite101) and gained by some established sites known for high-quality information [45]. Until Nov 2011, Google rolled out several minor updates/versions of the panda algorithm. Importantly Google followed up after launching algorithm updates with time to time update and took user responses via research and techniques like a searcher blocking of any websites through SERPs. [44]. However, Google claimed that they have received many positive responses from the searchers.

Google researchers argued that the ranking algorithm was not able to go deeper into the “long tail” of low-quality websites to return high-quality results before the launching of the panda algorithm [46]. With the aim of launching panda algorithm updates, Google wanted to improve the user experience by displaying the most relevant pages on the web and help the broader web ecosystem [47]. Moreover, researchers argued that Google Panda is more a ranking factor than the algorithm update. The core-ranking algorithm has altered a lot in the year 2011.

The release of panda update on August 12, 2011, let the Google roll Panda internationally, both for English-language queries globally and non-English queries except for Chinese, Japanese, and Korean. Google reported that this affected 6-9% of queries in affected countries [44] [48]. Moreover, in a year of 2012, 14 minor updates have been released. In 2013, two panda update has been released so far. Researchers argued that Panda relates to the concept of Page Rank, which is a value that feeds into the overall Google algorithm. Therefore, Panda helps to consider it as every site is given a Panda Rank score. The sites with low panda-score can go smooth whereas, the ranking algorithm affects those websites, which have high panda, score [49].

Caffeine (Rollout)

According to Google [50], after several months of testing, Google finished the cleaning up of the Caffeine infrastructure. Caffeine not only boosted Google’s raw speed but integrated crawling and indexation more tightly, resulting in a 50% of the fresh index.

Web Page Authority vs. Human Authority

It is how it works when it comes to web pages. Some pages are viewed as more trustworthy than others. If those pages are linked to other pages, then their pointing gains reputation in Google and Bing’s ranking systems. A highly authoritative profile on twitter gives credit to the URL tweet on the user’s profile to ranks. [51]

Google’s real-time search (is the feature of Google search) provided the Google search results. It also sometimes included real-time information from sources such as Twitter, Facebook, blogs, and news websites, whereas Bing incorporated social content into the searches. Moreover, Google is now giving much importance on Author Rank and Social rank rather than Page rank [52]. Socially ranking and a number of shares of the article definitely affected the rankings by the Google news algorithm and Bing. Author quality also matters to high ranking in this era (Google and Bing). Penalty is an American online retailer that was founded in 1999 and is known for its discount merchandise. It had 10 million unique visitors since 2010 December.  They offered discounts of 10% on some merchandise to students and faculty. In exchange, they asked college and university websites to embed links for certain keywords like “bunk beds” or “gift baskets” to Overstock product pages. The links to Overstock pages were among the top three results for such words on Google search results. By Tuesday afternoon, links to overstock for those same searches dropped to No. 40 and No. 70 in the rankings.

Google’s guidelines forbid websites from paying other sites to embed certain links on their pages. Many schemes intended to trick Google’s search algorithm mainly including .edu links. As a summary, it was a penalty for placing site-wide links to non-relevant websites. [53]

Freshness Update

Google completed the caffeine Web Indexing system since last year (2010). It allowed Google to crawl and index for fresh content quickly on an enormous scale. At the launch of this algorithm, Google claimed that “We are making a significant improvement to our ranking algorithm and this update would impact about 35% of the queries”.  After that update, the time-sensitive websites assured the freshness of content in order to rank in the first page [54]. However, the recent information can be of last week, day or even minutes like breaking news or a TV show. The need of the freshness of the content can be judged from an example; search results like warm cookies right out of the oven or refreshing fruit on a summer season are best when they are fresh. Google after the launch of caffeine provided fresh results even if the user don’t specify in the search Query [54] [55].