You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nguyen Manh Tien <ti...@gmail.com> on 2007/08/02 05:14:02 UTC

Slow reduce>copy

I setup nutch with hadoop run on several PC.
when it run, i find that the reduce task run very slow at the speed of
0.01MB/s ("reduce > copy (9 of 10 at 0.01 MB/s)")
Any one help me.

Re: Slow reduce>copy

Posted by Mathijs Homminga <ma...@gmail.com>.
I have the same problem.
In my case it was caused by the 5 seconds delay in Hadoop's 
ReduceTaskRunner:

private static final long MIN_POLL_INTERVAL = 5000;

This is the time the ReduceTaskRunner sleeps between successive polls 
for new map outputs (another question: why does the prepare() method 
keep polling for new map outputs, even when all outputs are known?)
When your map output files are relatively small, this 5 sec delay 
becomes significant.

Proposed solution: make your map output files larger.
Or you can modify Hadoop and decrease this delay. But be careful, if you 
set it too low, the polling overhead might become too large.

Good luck,
Mathijs

Nguyen Manh Tien wrote:
> I setup nutch with hadoop run on several PC.
> when it run, i find that the reduce task run very slow at the speed of
> 0.01MB/s ("reduce > copy (9 of 10 at 0.01 MB/s)")
> Any one help me.
>
>