You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2010/08/10 15:22:06 UTC

[Nutch Wiki] Trivial Update of "NutchTutorial" by AndreRicardo

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "NutchTutorial" page has been changed by AndreRicardo.
http://wiki.apache.org/nutch/NutchTutorial?action=diff&rev1=27&rev2=28

--------------------------------------------------

  
  Typically one starts testing one's configuration by crawling at shallow depths, sharply limiting the number of pages fetched at each level (-topN), and watching the output to check that desired pages are fetched and undesirable pages are not. Once one is confident of the configuration, then an appropriate depth for a full crawl is around 10. The number of pages per level (-topN) for a full crawl can be from tens of thousands to millions, depending on your resources.
  
- Once crawling has completed, one can skip to the [[NutchTutorial#Searching|Searching section]] below.
+ Once crawling has completed, one can skip to the [[NutchTutorial#Command Line Searching|Searching section]] below.
  
  == Step-by-Step or Whole-web Crawling ==