You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2006/03/05 02:12:25 UTC
[Nutch Wiki] Update of "nutch-0.8-dev/bin/nutch generate" by JeffRitchie
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JeffRitchie:
http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_generate
New page:
= "generate" is an alias for "org.apache.nutch.crawl.Generator" =
== Generates a new Fetcher Segment from the Crawl Database ==
=== Usage ===
nutch-0.8-dev/bin/nutch org.apache.nutch.crawl.Generator <crawldb> <segments_dir> [-topN <num>] [-numFetchers <fetchers>] [-adddays <days>]
'''<crawldb>:''' Path to the crawldb directory.[[BR]]
'''<segments_dir>:''' Path to the directory where the Fetcher Segments are created.[[BR]]
'''[-topN <num>]:''' Selects the top ''<num>'' ranking URLs for this segment. Default: ''Long.MAX_VALUE''[[BR]]
'''[-numFetchers <fetchers>]:''' The number of fetch partitions. Default: ''Configuration key -> mapred.map.tasks -> 1''[[BR]]
'''[-adddays <days>]:''' Adds <days> to the current time to facilitate crawling urls already fetched sooner then ''db.default.fetch.interval''. Default: ''0''[[BR]]
=== Configuration Files ===
hadoop-default.xml[[BR]]
hadoop-site.xml[[BR]]
nutch-default.xml[[BR]]
nutch-site.xml[[BR]]
=== Other Files ===
None.
=== Caveats and Notes ===
None.
DevelopmentCommandLineOptions