You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Rod Taylor <rb...@sitesell.com> on 2005/12/09 05:57:35 UTC

Should nutch try to reduce first?

When you run multiple commands within nutch it seems to process the
pending tasks in the order that they were added to the queue.  In some
cases this means you may be 50% through many jobs (complete map but not
reduce) while processes maps for yet more jobs.

I think Nutch should prioritize a pending reduce before a pending map as
it keeps things going through (other processes may depend on the
results) and allows temporary diskspace to be freed.
-- 
Rod Taylor <rb...@sitesell.com>


Re: Should nutch try to reduce first?

Posted by Doug Cutting <cu...@nutch.org>.
Rod,

Mike is in the middle of a major revision of the job-tracker that I 
believe addresses this issue, as well as lot of others.  Tasks will be 
prioritized by job.

Thanks,

Doug

Rod Taylor wrote:
> When you run multiple commands within nutch it seems to process the
> pending tasks in the order that they were added to the queue.  In some
> cases this means you may be 50% through many jobs (complete map but not
> reduce) while processes maps for yet more jobs.
> 
> I think Nutch should prioritize a pending reduce before a pending map as
> it keeps things going through (other processes may depend on the
> results) and allows temporary diskspace to be freed.