You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nguyen Manh Tien <ti...@gmail.com> on 2007/08/02 05:14:02 UTC
Slow reduce>copy
I setup nutch with hadoop run on several PC.
when it run, i find that the reduce task run very slow at the speed of
0.01MB/s ("reduce > copy (9 of 10 at 0.01 MB/s)")
Any one help me.
Re: Slow reduce>copy
Posted by Mathijs Homminga <ma...@gmail.com>.
I have the same problem.
In my case it was caused by the 5 seconds delay in Hadoop's
ReduceTaskRunner:
private static final long MIN_POLL_INTERVAL = 5000;
This is the time the ReduceTaskRunner sleeps between successive polls
for new map outputs (another question: why does the prepare() method
keep polling for new map outputs, even when all outputs are known?)
When your map output files are relatively small, this 5 sec delay
becomes significant.
Proposed solution: make your map output files larger.
Or you can modify Hadoop and decrease this delay. But be careful, if you
set it too low, the polling overhead might become too large.
Good luck,
Mathijs
Nguyen Manh Tien wrote:
> I setup nutch with hadoop run on several PC.
> when it run, i find that the reduce task run very slow at the speed of
> 0.01MB/s ("reduce > copy (9 of 10 at 0.01 MB/s)")
> Any one help me.
>
>