You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Chris Schneider <Sc...@TransPac.com> on 2006/05/07 22:13:10 UTC

generate.max.per.host is per reduce task

Gang,

I just noticed that the generate.max.per.host property is only 
enforced on a "per reduce task" basis during the first generate job 
(see Generator.Selector.reduce for details). At a minimum, it should 
probably be documented this way in nutch-default.xml.template.

Thoughts?

- Chris
-- 
------------------------
Chris Schneider
TransPac Software, Inc.
Schmed@TransPac.com
------------------------

Re: generate.max.per.host is per reduce task

Posted by Doug Cutting <cu...@apache.org>.
Chris Schneider wrote:
> I just noticed that the generate.max.per.host property is only enforced 
> on a "per reduce task" basis during the first generate job (see 
> Generator.Selector.reduce for details). At a minimum, it should probably 
> be documented this way in nutch-default.xml.template.

Yes, but all URLs with the same host are a single reduce task, since it 
is generating host-disjoint fetch lists.

Doug