You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by William <wt...@gmail.com> on 2010/11/23 21:58:42 UTC
Config
We are currently modifying the configuration of our hadoop grid (250
machines). The machines are homogeneous and the specs are
dual quad core cpu 18Gb ram 8x1tb drives
currently we have set this up -
8 reduce slots at 800mb
8 map slots at 800mb
raised our io.sort.mb to 256mb
we see a lot of spilling on both maps and reduces and I am wondering what
other configs I should be looking into
Thanks
Re: Config
Posted by Yu Li <ca...@gmail.com>.
Hi William,
I think the most proper config parameter to try is io.sort.factor, which
affects disk spilling times on both map and reduce side. The default value
of this parameter is 10, try to enlarge it to 100 or more.
If the spilling on reduce side is still frequent you could try tuning
up mapred.job.shuffle.input.buffer.percent along with
mapred.child.java.opts, which may reduce disk spilling times in the shuffle
phase. The default value of mapred.job.shuffle.input.buffer.percent is 0.7,
with mapred.child.java.opts -Xmx200m by default.
Notice that increasing these values will also increase the memory cost, so
we need to make sure memory won't become the system bottleneck.
Hope this could help.
On 24 November 2010 04:58, William <wt...@gmail.com> wrote:
> We are currently modifying the configuration of our hadoop grid (250
> machines). The machines are homogeneous and the specs are
>
> dual quad core cpu 18Gb ram 8x1tb drives
>
> currently we have set this up -
>
> 8 reduce slots at 800mb
> 8 map slots at 800mb
>
> raised our io.sort.mb to 256mb
>
> we see a lot of spilling on both maps and reduces and I am wondering what
> other configs I should be looking into
>
> Thanks
>
--
Best Regards,
Li Yu