Remember Me
forgot your password?

The Google Goal Of Indexing 100 Billion Web Pages

Google's Goal of Quality Search

In their paper 'The Anatomy of a Large-Scale Hypertextual Web Search Engine' it is very evident that Google's goal has always been to be one of the best search engines there is in terms of the quality of the results it gives. Sergey Brin and Lawrence Page, however knew that in order to do this, Google needed to be able to store information efficiently and cost effectively and to have excellent crawling, indexing, and sorting methods or techniques. Google not only aimed to give quality results but to produce the results as fast as possible. Google started as a high quality search engine and continues to be the best search engine today. It has managed to stay true to its original intent to be a search engine that not only crawls and indexes the web efficiently but also to produce more satisfying results in comparison to other existing search engines.

To stay true to their goal of providing the best search results Google knew right from the start that it had to be designed so that the search engine could catch up with the web's growth. According to Brin and Page "In designing Google we have considered both the rate of growth of the Web and technological changes. Google is designed to scale well to extremely large data sets. It makes efficient use of storage space to store the index". They knew that they needed much space to store and ever growing index.

Google's index size, which that started out as 24 million web pages was large for its time and has grown to around 25 billion web pages, still keeping Google ahead of its competitors. However, Google is a company that doesn't settle for just beating the competitors. They truly aim to give their users the best service there is and that means as a search engine they want to give users access to all or at least most of the quality information that is available on the web.

Google's New System for Indexing More Pages

As mentioned earlier, Google aims to give access to even more information and has been devoting time and much effort to realize this goal. It seems that the new patent entitled 'Multiple Index Based Information Retrieval System' filed by Google employee Anna Patterson might be the answer to the problem. The patent published just this May of 2006 and filed way back in January of 2005 shows that Google might actually be aiming to expand their index size to as much as a 100 billion web pages or even more.

According to the patent, conventional information retrieval systems, more commonly known as search engines, are able to index only a small part of the documents available on the Internet. According to estimates the existing number of web pages in the Internet as of last year was around 200 billion; however, Patterson claimed that even the best search engine (that is Google) was able to index only up to 6 to 8 billion web pages. The disparity between the number of indexed pages and existing pages clearly signaled a need for a new breed of information retrieval system. Conventional information retrieval systems just weren't capable of doing the job and just wouldn't be able to index enough web pages to give users access to a large enough percentage of the present existing information available on the web.

The Multiple Index Based Information Retrieval System, however, is up to the challenge and is Google's answer to the problem. Two characteristics of the new system makes it stand out compared to the conventional systems. One is that it has the "capability to index an extremely large number of documents, on the order of a hundred billion or more". And the other is its capability to "index multiple versions or instances of documents for archiving...enabling a user to search for documents within a specific range of dates, and allowing date or version related relevance information to be used in evaluating documents in response to a search query and in organizing search results." With the new system developed by Patterson, Google now has the ability to expand its index size to unbelievable proportions as well as improve document analysis and processing, document annotation, and even the process of ranking according to contained and anchor phrases.

History of Google's Index Size

Google started out with an index size of around 24 million web pages in 1996. By August of 200, Google had managed to quadruple their index size to approximately one billion web pages. On September of 2003 Google's front-page boasted and an index of 3.3 billion web pages. Microdoc, however, revealed that the actual number of web pages Google had indexed during that time was more than five billion web pages already. In their article 'Google Understates the Size of Its Database', they emphasized that Google not only specialized in simplicity but also in understating their power and complexity. Google was still managing to stay ahead of its competitors and continued to surprise everyone with what they had under their sleeves.

As Google's index continued to grow the number in their front page grew impressively large as well before it plateaud at eight billion web pages. This was around the time that Patterson filed the new patent. Then in 2005, with controversies in index size growing, Google decided to stop counting in front of the public and simply claimed that their index size was three times larger than the nearest competitor's index size. Google also maintained that it was not just the size of indexed pages that was important but how relevant the results they returned were. Then in September of 2005, as part of Google's 7th anniversary, Anna Patterson, the same software engineer who filed the patent on the Multiple Based Index Information Retrieval System posted an entry on Google's official blog claiming that the index size was now 1,000 times larger than the original index. This pegged their index size to around 24 billion web pages, about a fourth of Google's goal of indexing a100 billion web pages. It seems then that Google must have started using the new system in mid 2005. With the new system in place we can only wait and see how fast Google will reach the goal of a 100 billion web pages in its index. It's most likely though that when Google has reached that goal it would set an even higher goal to provide continuous quality service.

Rate this Article: 0 / 5 stars - 0 vote(s)
Print Email Re-Publish

Add new Comment



Captcha

  • Latest Communication Articles
  • More from Danny Wirken

Getting to Know the Kindle: When a Mobile Air Card Can Replace a Trip to the Library

By: Oswald Melman | 02/01/2010
The Holiday Season always brings with it the emergence of new and different gadgets and technological toys. Though these items exist year-round, during the months of November and December, they sell out in high numbers as gift-givers set out to be the most popular Santa Claus of them all. In past years, this list of technology gifts has included laptop computers, smart phones and mp3 players. And while all of these things remain hot items, this year, we got a look at a new gadget – the Kindle.

Virtual Office Hosted PBX Phone Features and Benefit

By: Armstrong C | 01/01/2010
Virtual office hosted PBX phone system is incorporated with advanced virtual PBX features. Virtual office hosted PBX phone system with its user-friendly features offers innumerable benefits.

IsThere Such Thing As A Free Cell Phone Reverse Number look up?

By: Stephen | 31/12/2009
Many people are curious about whether it is really possible or not to conduct a free cell phone reverse number look up online. I am going to clarify things here

Tech Trends: what lies ahead,A look at the top five trends that will stand out in 2010

By: Rainco | 31/12/2009
This decade has been one of the most crucial for the technology sector. It was feared that the start of the new millennium would be blighted by computers all over the world blanking out. Reason: computers were programmed to understand the binary database, which meant they would not recognise 2000. But it proved to be much ado about nothing and everything went well when the clock struck midnight on January 1, 2000.

Advantages of the Square Angular Tower

By: Olga Novia | 31/12/2009
Especially ideal for telecom towers, the square angular tower is well suited for as it provides the widest application, and ease of modification. The angular structure can be customized to handle many different loads, and can be constructed for many different height levels. The versatility of the square angular telecom tower makes it ideal for hub sites, microwave network junctions, forest fire monitoring, and air traffic control radar. Installation is simplified due to its modular design. This

Examsoon 642-582 Training Materials

By: aminalee | 31/12/2009
We bring Cisco 642-582 exam prepared under the supervision of Certified Professionals. These 642-582 study Notes are simple and accurate in their contents resulting in best 642-582 Exam Preparation.

Examsoon 646-102 practice test questions

By: aminalee | 31/12/2009
Examsoon 646-102 examination exam is written by IT professionals who had years of experience on IT certification exams researching, which guaranteed the quality and accuracy of the practice exams.

Examsoon Cisco 646-588 Training Tools

By: aminalee | 31/12/2009
Cisco 646-588 Certification Exam success begins at Examsoon.com, your exclusive IT Certification Training Partner. Cisco 646-588 Training Tools help you pass your Cisco 646-588 Certification Exam in your first attempt.

Riya: A Big Leap In Visual Search Engines

By: Danny Wirken | 16/11/2006 | Communication
Watch out for new software that will give a new face to search engines. Rather, a program that includes faces in the search function. A new California-based company, Ojos, developed the online photo-based search service named Riya.

Web 2.0, A Guide For Newbies

By: Danny Wirken | 04/11/2006 | Communication
A couple of years back Bill Gates introduce the idea of Convergence to the public. It was a fresh idea that later became a catchphrase for the Internet Industry.

Trackback Spam Explained

By: Danny Wirken | 04/11/2006 | Communication
In most blog applications, there is a feature called Trackback, which allows the user to send a trackback or notification to a different site or another blog that the user referred to in his own blog.

To Blog Or Not To Blog: The Ups And Downs Of Blogging

By: Danny Wirken | 04/11/2006 | Communication
Whenever the subject of the phenomenon called blogging is raised, most people immediately think associated it with an online diary or weblog. The term weblog refers to key words. First is web from the World Wide Web and log, as in keeping a log.

Tips On How To Deal With Anonymous Comment Spam

By: Danny Wirken | 04/11/2006 | Communication
Have you ever experience being flooded with anonymous comments? If yes, then chances are you have been a victim of comment spam. As with everything on the Internet, spam had also evolved. They are no longer limited to email.

The Latest On WordPress Themes

By: Danny Wirken | 03/11/2006 | Communication
As WordPress and blogging become more and more popular, the list of customization options continues to grow. One can attribute that to each user wanting his or her blog to be unique or very much personalized.

The Exciting World Of Video Blogging

By: Danny Wirken | 03/11/2006 | Communication
When the idea of weblogs was first introduce online, it was an instant phenomenon. Suddenly just about everyone feels the need to create their own space online by writing their thoughts. Then podcasting was introduced—blogging in audio form.

What You Newbies Need To Know About Pay Per Click Ads

By: Danny Wirken | 03/11/2006 | Communication
Just about anyone who has been using the Internet in the last few years has no doubt come across the term "pay per click" once or twice. Pay per click is actually one of the less expensive, albeit efficient, forms of advertising online.

Submit Your Articles Free: Signup
Article Categories




Use of this web site constitutes acceptance of the Terms Of Use and Privacy Policy | User published content is licensed under a Creative Commons License.
Copyright © 2005-2008 Free Articles by ArticlesBase.com, All rights reserved. (0.22, 6, w2)