You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by William <wt...@gmail.com> on 2010/11/23 21:58:42 UTC

Config

We are currently modifying the configuration of our hadoop grid (250
machines).  The machines are homogeneous and the specs are

dual quad core cpu 18Gb ram 8x1tb drives

currently we have set this up  -

8 reduce slots at 800mb
8 map slots at 800mb

raised our io.sort.mb to 256mb

we see a lot of spilling on both maps and reduces and I am wondering what
other configs I should be looking into

Thanks

Re: Config

Posted by Yu Li <ca...@gmail.com>.

Hi William,

I think the most proper config parameter to try is io.sort.factor, which
affects disk spilling times on both map and reduce side. The default value
of this parameter is 10, try to enlarge it to 100 or more.

If the spilling on reduce side is still frequent you could try tuning
up mapred.job.shuffle.input.buffer.percent along with
mapred.child.java.opts, which may reduce disk spilling times in the shuffle
phase. The default value of mapred.job.shuffle.input.buffer.percent is 0.7,
with mapred.child.java.opts -Xmx200m by default.

Notice that increasing these values will also increase the memory cost, so
we need to make sure memory won't become the system bottleneck.

Hope this could help.

On 24 November 2010 04:58, William <wt...@gmail.com> wrote:

> We are currently modifying the configuration of our hadoop grid (250
> machines).  The machines are homogeneous and the specs are
>
> dual quad core cpu 18Gb ram 8x1tb drives
>
> currently we have set this up  -
>
> 8 reduce slots at 800mb
> 8 map slots at 800mb
>
> raised our io.sort.mb to 256mb
>
> we see a lot of spilling on both maps and reduces and I am wondering what
> other configs I should be looking into
>
> Thanks
>

-- 
Best Regards,
Li Yu