You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Chris Schneider <Sc...@TransPac.com> on 2006/05/07 22:13:10 UTC
generate.max.per.host is per reduce task
Gang,
I just noticed that the generate.max.per.host property is only
enforced on a "per reduce task" basis during the first generate job
(see Generator.Selector.reduce for details). At a minimum, it should
probably be documented this way in nutch-default.xml.template.
Thoughts?
- Chris
--
------------------------
Chris Schneider
TransPac Software, Inc.
Schmed@TransPac.com
------------------------
Re: generate.max.per.host is per reduce task
Posted by Doug Cutting <cu...@apache.org>.
Chris Schneider wrote:
> I just noticed that the generate.max.per.host property is only enforced
> on a "per reduce task" basis during the first generate job (see
> Generator.Selector.reduce for details). At a minimum, it should probably
> be documented this way in nutch-default.xml.template.
Yes, but all URLs with the same host are a single reduce task, since it
is generating host-disjoint fetch lists.
Doug