Pages

Copyright & Privacy

What is a Webcrawler, spider, Searchbot

A web crawler (or spider or Searchbot) is a computer program that automatically searches the World Wide Web, and analyzes Web pages. Web crawlers are mainly used by search engines. Other applications include the collection of RSS feeds, e-mail addresses or other information.

Web crawlers are a special kind of bot computer programs, i.e. pursuing largely autonomous repetitive tasks. As with Internet surfing Webcrawlers are reached via hyperlinks from one page to another URL. These addresses are stored, all discovered and visited in turn. The newly found links are added to the list of all URLs. In this way, theoretically, all accessible pages of the WWW can be found.

In practice, however, often a selection is made; the process eventually is stopped and started again from scratch. Depending on the task of the crawler, the content of the found web sites are then What is a Webcrawler, spider, Searchbotevaluated by means of indexing and stored to enable a subsequent search in the so-collected data. To combat unwanted Webcrawlers there are also special sites, called tar pits, which provide false information to the web crawlers and slow them enormously.

Problems

A large proportion of the entire Internet is not covered by web crawlers and hence public search engines, as links do not have much simple content, but for example only search screens and restricted portals which are accessible. In addition, the constant changing of the Web as well as the manipulation of the content is (Cloaking) a problem.

Species

Thematically focused web crawlers are called focused crawlers or focused Web crawlers. The focus of the Web search is provided by means of a web page on the classification and the classification of the various hyperlinks. This allows the focused crawler to find the best way through the web and indexed only (for a subject or domain) in the relevant areas of the Web.

Hurdles in the practical implementation of such Web crawlers are mainly non-linked strands, and the training of the classifier. Web crawlers are also used for data mining and to study the Internet (Webometrics) and need not necessarily be limited to the WWW.

A special form of data mining is Harvester. This term is used for software (Web, Usenet, etc.) for e-mail addresses, scans, and this “harvest”. Thus, electronic addresses are collected and can be marketed accordingly on Web site e-mail addresses to contact you via mail.

An equally popular method is to embed the e-mail address in a graphic. The e-mail address is not discoverable by a string in the source code of the website and therefore is available for the bot not as text information.

This has the disadvantage for the user that they can not e-mail an address by clicking “play” but needs to copy the address. Web crawlers are also used to detect maps on the Internet.

  • Share/Bookmark