Logo

ez2search.com


Overview

The distinct parts of the engine

  1. Crawler, which uses a bot (ez2Bot) to gather web pages
  2. The Storer, saves urls and text found in pages
  3. Indexer, organizes and mines the data based on variable rules
  4. Searcher, returns relevant results for a user's query

The Crawler

The Crawler is divided into several distinct parts.

The Job Manager builds a list with the next site to visit, known robot rules and pages to grab.

The Bot receives the "job list" and tries to grab a "robot.txt". This provides two important peices of information about the site being visited: (1) is the site online and available, (2) there could be crawl restrictions imposed by the "robot.txt". If the site is available, the Bot will then begin to grab the known pages and give them to the Extractor.

The Extractor will perform two major tasks : (1) build a list of links found in the page. (2) extract the text.

...

The Storer

Using the "job list", the status of the current site is updated. New links are stored. Text extracted from the page is stored.

...

The Indexer

...

The Searcher

...

 


Please contact us if you would like to talk about the possibilities of this development.

Home | About | Contact ©2006