You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2005/06/30 10:28:16 UTC

[Nutch Wiki] Update of "bin/nutch generate" by JuhoMäkinen

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by JuhoMäkinen:
http://wiki.apache.org/nutch/bin/nutch_generate

New page:
Describe bin/nutch generate here.

The generate command is used to create a new fetchlist from the webdb which contains urls which can be fetched using the fetch tool.

Usage: FetchListTool (-local | -ndfs <namenode:port>) <db>  <segment_dir> [-refetchonly] [-topN N] [-cutoff cutoffscore] [-numFetchers numFetchers] [-adddays numDays]

Command line parameters:

'''-topN N''' where N is a number of pages.

Normally, the "generate" command prepares a fetchlist out of
all unfetched pages, or the ones where fetch interval already expired.
But if you use -topN, then instead of all unfetched urls you only get N
urls with the highest score - potentially the most interesting ones,
which should be prioritized in fetching.


 - Juho Mäkinen