You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2016/05/19 13:28:28 UTC

[Nutch Wiki] Trivial Update of "bin/nutch generate" by SebastianNagel

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "bin/nutch generate" page has been changed by SebastianNagel:
https://wiki.apache.org/nutch/bin/nutch%20generate?action=diff&rev1=6&rev2=7

Comment:
fix format of last edit

   nutch-site.xml<<BR>>
  
  ==== Configuration Values ====
-  The following properties directly affect how the Generator generates fetch segments.<<BR>><<BR>>
+ The following properties directly affect how the Generator generates fetch segments:<<BR>>
   * generate.max.count: The maximum number of urls in a single fetchlist.  -1 if unlimited. The urls are counted according to the value of the parameter generator.count.mode.
   
   * generate.count.mode: Determines how the URLs are counted for generator.max.count. Default value is 'host' but can be 'domain'. Note that we do not count per IP in the new version of the Generator.
  
   * partition.url.mode: Determines how URLs are distributed over fetch partitions: "byHost" (default), "byDomain", or "byIP". Cf. the corresponding property "fetcher.queue.mode" in Fetcher used to guarantee delays between successive fetch requests to the same host/domain/IP.
  
+  <<BR>>
-  Indirectly, the behavior of Generator is influenced by:<<BR>><<BR>>
+ Indirectly, the behavior of Generator is influenced by:<<BR>>
   * mapreduce.job.reduces: In a distributed environment (Hadoop) with multiple reducers the max. total number of URLs (-topN) is applied per reduce task as (topN/numReduceTasks). If URLs are not evenly spread over hosts (domains or IPs, see partition.url.mode) or belong to a single host/domain/IP, some partitions may be smaller than expected or even empty. The total number of generated URLs is then lower than topN.
    
  ==== Examples ====