The term "Search Engine" is often used generically to describe both crawler-based search engines and human-powered directories. These two types of search engines gather their listings in radically different ways.
Crawler- Based Search Engines
Crawler- based search engines such as Google, create their listings automatically. They "crawl" or "spider" the web, then people search through what they have found.
If you change your web pages, crawler based search engines eventually find these changes, and that can effect how you are listed. Pages titles, body copy and other elements all play a role.
A human-powered directory, such as the Open Directory, depends on humans for its listings. You submit a short description to the directory for your entire site, or editors write one for sites they review. A search looks for matches on in the descriptions submitted.
Changing your web pages has no effect on you listing. Things that are useful for improving a listing with a search engine have nothing to do with improving a listing in a directory. The only exception is that a good site with good content, might be more likely to get reviewed for free than a poor one.
In the web early days, it used to be tat a search engine either presented the crawler-based results or human powered listings. Today, it extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another. For eg: MSN Search is more likely to present human powered listings from LookSmart. However, it does also present crawler-based results (as provided by Inktomi) especially for more obscure queries.
Crawler based search engines have three major elements
1. Spider /Bot/Robot/Crawler - They visits a web page, reads it, and then follows links to other pages within the site.This is what it means when someone refers to a site being spidered or crawled. They return to the site on a regular basis looking for new contents and changes.
2. Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If the web pages changes the this book is updated with new information.
3.Sometime it can take a while for new pages or changes that the spider finds to be added to the index. Thus the webpage may have "spider ed" but not yet indexed. Until it is indexed - added to the index folder- it is not available to those searching with the search engine
Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rant them in order of what it believes is most relevant.
.
No comments:
Post a Comment