You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2006/03/05 02:12:25 UTC

[Nutch Wiki] Update of "nutch-0.8-dev/bin/nutch generate" by JeffRitchie

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by JeffRitchie:
http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_generate

New page:
= "generate" is an alias for "org.apache.nutch.crawl.Generator" =

== Generates a new Fetcher Segment from the Crawl Database ==

=== Usage ===
 nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.Generator <crawldb> <segments_dir> [-topN <num>] [-numFetchers <fetchers>] [-adddays <days>]

  '''<crawldb>:''' Path to the crawldb directory.[[BR]]
  '''<segments_dir>:''' Path to the directory where the Fetcher Segments are created.[[BR]]
  '''[-topN <num>]:''' Selects the top ''<num>'' ranking URLs for this segment. Default: ''Long.MAX_VALUE''[[BR]]
  '''[-numFetchers <fetchers>]:''' The number of fetch partitions. Default: ''Configuration key -> mapred.map.tasks -> 1''[[BR]]
  '''[-adddays <days>]:''' Adds <days> to the current time to facilitate crawling urls already fetched sooner then ''db.default.fetch.interval''. Default: ''0''[[BR]]

=== Configuration Files ===
 hadoop-default.xml[[BR]]
 hadoop-site.xml[[BR]]
 nutch-default.xml[[BR]]
 nutch-site.xml[[BR]]

=== Other Files ===
 None.

=== Caveats and Notes ===
 None.

DevelopmentCommandLineOptions