The World Wide Web (or the “Web”) is a huge repository of information stored on thousands of Web Servers over the Internet. This source of information is available in various forms such as websites, images, sound, videos, databases and much more. The hypertext documents (or pages) available on the web are interlinked in such a way that it looks like Spider’s Web. Sometimes, the term internet is mistakenly used as a synonym for WWW. In fact, the web is the driving force behind the Internet.
The Web has grown from a few thousand pages in 1990’s to more than two billion pages at present and searching for information is a primary activity on the web. Google is one of the most popular Search Engine used to locate and retrieve information from the web. To view the retrieved information, a Web Browser is used. Web Browser is a software application that allows the user to view and interact with text, images and other information present on the webpage. Some popular Web Browsers are Internet Explorer, Mozilla Firefox, Google Chrome and Netscape Navigator. The Web Browser which makes the request for a Web resource is referred to as Web Client. The destination server which serves the requested resource (HTML file or image) is referred to as a Web Server.
Present day search using General Purpose Search Engine covers only a small portion of the web called Publicly Indexable Web (PIW) or Surface Web. This refers to a set of web pages reachable purely by following hyperlinks, ignoring Search forms and pages that require authorization or prior registration. Whenever the crawler of a Search Engine Visits a Web Server, it gathers information from all the web pages that have been stored on the server and not able to penetrate deep into the Web Server to access their databases and use the web services offered by the server. The “Invisible Web” or “Hidden Web” is that part of the web that cannot be crawled and indexed by traditional Search Engines. The Hidden Web contains high-quality information hidden behind the Search Interface (such as HTML form) which can be retrieved by using Hidden Web Crawler. This information is more valid to query and give the good response to the query. It has been estimated that the Hidden Web is 500 times the size of Surface Web or Publicly Indexable Web.