You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2005/06/30 10:28:16 UTC
[Nutch Wiki] Update of "bin/nutch generate" by JuhoMäkinen
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JuhoMäkinen:
http://wiki.apache.org/nutch/bin/nutch_generate
New page:
Describe bin/nutch generate here.
The generate command is used to create a new fetchlist from the webdb which contains urls which can be fetched using the fetch tool.
Usage: FetchListTool (-local | -ndfs <namenode:port>) <db> <segment_dir> [-refetchonly] [-topN N] [-cutoff cutoffscore] [-numFetchers numFetchers] [-adddays numDays]
Command line parameters:
'''-topN N''' where N is a number of pages.
Normally, the "generate" command prepares a fetchlist out of
all unfetched pages, or the ones where fetch interval already expired.
But if you use -topN, then instead of all unfetched urls you only get N
urls with the highest score - potentially the most interesting ones,
which should be prioritized in fetching.
- Juho Mäkinen