You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2011/06/28 04:14:56 UTC

[Nutch Wiki] Trivial Update of "bin/nutch_crawl" by LewisJohnMcgibbney

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "bin/nutch_crawl" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/bin/nutch_crawl?action=diff&rev1=11&rev2=12

Comment:
Formatting for easier reading

  
  Usage: 
  {{{
- bin/nutch org.apache.nutch.crawl.!Crawl (-local | -ndfs <nameserver:port>) <dir_with_url_files> [-threads n] [-depth i] [-showThreadID]
+ bin/nutch org.apache.nutch.crawl.Crawl (-local | -ndfs <nameserver:port>) <dir_with_url_files> [-threads n] [-depth i] [-showThreadID]
  }}}
  
- <dir_with_url_files>: Contains text files with URL lists. This must be an existing directory. Example would be ${NUTCH_HOME}/urls
+ '''<dir_with_url_files>''': Contains text files with URL lists. This must be an existing directory. Example would be ${NUTCH_HOME}/urls
  
- [-threads n]: This parameter enables you to choose how many threads Nutch should use when crawling.
+ '''[-threads n]''': This parameter enables you to choose how many threads Nutch should use when crawling.
  
- [-depth i]: You can tell Nutch how deep it should crawl. If you don’t tell Nutch a value, it takes 5 as his standard parameter. 
+ '''[-depth i]''': You can tell Nutch how deep it should crawl. If you don’t tell Nutch a value, it takes 5 as his standard parameter. 
  For example if you pass –depth 1 as the parameter, Nutch will only index the first level. If you say –depth 2 (or more) Nutch will follow this number of outlinks.
  
- [-showThreadID]: 
+ '''[-showThreadID]''': 
  
- -local
+ '''-local''':
  
- -ndfs <nameserver:port>
+ '''-ndfs <nameserver:port>''':
  
  
  CommandLineOptions