You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2014/09/05 11:33:10 UTC

[Nutch Wiki] Update of "bin/nutch generate" by JulienNioche

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "bin/nutch generate" page has been changed by JulienNioche:
https://wiki.apache.org/nutch/bin/nutch%20generate?action=diff&rev1=3&rev2=4

  Generate is an alias for org.apache.nutch.crawl.Generator
  
  This class generates a subset of a crawl db to fetch. This version allows us to generate fetchlists for several segments in one go. Unlike in the initial version (FetchListTool), the IP resolution is done ONLY on the entries which have been selected for fetching. The URLs are partitioned by IP, domain or host within a segment. We can chose separately how to count the URLS i.e. by domain or host to limit the entries.
+ 
+ Both versions return 0 if one or more segment have been generated, -1 on error and 1 if there aren't any URLs to put in a segment.
  
  === Nutch 1.x ===
  {{{