TakingITGlobal - Panorama - How do search...

Published on: Feb 19, 2008

Topic:

Type: Opinions

https://www.tigweb.org/express/panorama/article.html?ContentID=18555

Web search engines provide an interface to search for information on the World Wide Web.

Information may consist of web pages, images and other types of files.

Some search engines examine data available in newsgroups, databases, or open directories. Unlike

Web directories, which are maintained by human editors, search engines operate algorithmically or are a mixture of algorithmic and human input.

User specify criteria about an object of interest and have the engine to find the matching query.

"This criteria is a search query".

Search query is expressed as a set of words that identify the desired concept, which one or hundreds documents may contain.Some engines apply improvements to search queries to increase the likelihood through query expansion.

Index-based search engine :

The list of results that meet criteria specified by query is typically sorted, or ranked, in some regard so as to place the most relevant results first. Ranking results by relevance (from highest to lowest) reduces the time required to find the desired information.

Probabilistic search engines :

They rank the results based on measures of similarity and sometimes popularity or authority.

Boolean search engines :

Return items which match exactly without regard to order. They collect metadata about group of items under consideration beforehand through a process referred to as indexing. Index requires a smaller amount of computer storage,provides basis for the engine to calculate result relevance. The search engine may store a copy of each result in a cache so that users can see the state of the result at the time it was indexed or for archive purposes or to make repetitive processes work more efficiently and quickly.

Crawler, or spider type search engines may collect and assess results at the time of the search

query. Meta search engines simply reuse the index or results of one or more other search engines.

Here is The most used Search Engine's short description :
www.Google.com :-

Around 2001,Google search engine rose to prominence. Its success based on the concept of link

popularity and PageRank which are from 0 To 10. The number of other websites and webpages that link to a given page is taken into consideration with PageRank.Google's minimalist user interface is very popular with users, and has since spawned a number of imitators.

The quality of content presented in the pages is very important. It then matched by Google with Meta Tags .If they are relevent to the content then it has a high quality page and would have a good rank.

Google retrieves pages by a Web crawler (known as a spider) — an automated Web browser which follows every link it sees. Exclusions can be made by the use of robots.txt. The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called meta tags). Data about web pages are stored in an index database for use in later queries.

Some search engines, such as Google.com,Yahoo.com store all part of the source page (referred to as a cache) as well as information about the web pages, whereas others, such as www.AltaVista.com stores every word of every page they find.

Some search websites uses the Google's API for searching as http://pakistan.sc uses it and find queries of its users through google. It also provides thumbnails of pages for a quick glance.

Google utilize not only PageRank but more than 150 criteria to determine relevancy.The algorithm "remembers" where it has been and indexes the number of cross-links and relates these into groupings. PageRank is based on citation analysis.
The Google Algorithim is Today's most Hidded Truth !

« return.



by haider
Published on: Feb 19, 2008
Topic:
Type: Opinions
https://www.tigweb.org/express/panorama/article.html?ContentID=18555


Web search engines provide an interface to search for information on the World Wide Web. Information may consist of web pages, images and other types of files. Some search engines examine data available in newsgroups, databases, or open directories. Unlike Web directories, which are maintained by human editors, search engines operate algorithmically or are a mixture of algorithmic and human input. User specify criteria about an object of interest and have the engine to find the matching query. "This criteria is a search query". Search query is expressed as a set of words that identify the desired concept, which one or hundreds documents may contain.Some engines apply improvements to search queries to increase the likelihood through query expansion. Index-based search engine : The list of results that meet criteria specified by query is typically sorted, or ranked, in some regard so as to place the most relevant results first. Ranking results by relevance (from highest to lowest) reduces the time required to find the desired information. Probabilistic search engines : They rank the results based on measures of similarity and sometimes popularity or authority. Boolean search engines : Return items which match exactly without regard to order. They collect metadata about group of items under consideration beforehand through a process referred to as indexing. Index requires a smaller amount of computer storage,provides basis for the engine to calculate result relevance. The search engine may store a copy of each result in a cache so that users can see the state of the result at the time it was indexed or for archive purposes or to make repetitive processes work more efficiently and quickly. Crawler, or spider type search engines may collect and assess results at the time of the search query. Meta search engines simply reuse the index or results of one or more other search engines. Here is The most used Search Engine's short description : www.Google.com :- Around 2001,Google search engine rose to prominence. Its success based on the concept of link popularity and PageRank which are from 0 To 10. The number of other websites and webpages that link to a given page is taken into consideration with PageRank.Google's minimalist user interface is very popular with users, and has since spawned a number of imitators. The quality of content presented in the pages is very important. It then matched by Google with Meta Tags .If they are relevent to the content then it has a high quality page and would have a good rank. Google retrieves pages by a Web crawler (known as a spider) — an automated Web browser which follows every link it sees. Exclusions can be made by the use of robots.txt. The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called meta tags). Data about web pages are stored in an index database for use in later queries. Some search engines, such as Google.com,Yahoo.com store all part of the source page (referred to as a cache) as well as information about the web pages, whereas others, such as www.AltaVista.com stores every word of every page they find. Some search websites uses the Google's API for searching as http://pakistan.sc uses it and find queries of its users through google. It also provides thumbnails of pages for a quick glance. Google utilize not only PageRank but more than 150 criteria to determine relevancy.The algorithm "remembers" where it has been and indexes the number of cross-links and relates these into groupings. PageRank is based on citation analysis. The Google Algorithim is Today's most Hidded Truth ! « return.