You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mark <st...@gmail.com> on 2011/03/09 18:47:51 UTC

Hadoop heap and other memory settings

How should I be configuring the heap and memory settings for my cluster?

Currently the only settings we use are:

HADOOP_HEAPSIZE=8192 (in hadoop-env.sh)
mapred.child.java.opts=8192 (in mapred-site.xml)

I have a feeling they settings are completely off. The only reason we 
increased it this high is because the mappers were throwing heap 
exceptions when running our FPGrowth job.

Should all the configuration go in hadoop-env.sh?

Thanks for any pointers

Re: Hadoop heap and other memory settings

Posted by Harsh J <qw...@gmail.com>.

Hello,

On Wed, Mar 9, 2011 at 11:17 PM, Mark <st...@gmail.com> wrote:
> HADOOP_HEAPSIZE=8192 (in hadoop-env.sh)

HADOOP_HEAPSIZE affects _all_ the Hadoop daemons, and all of them
would be started with the specified value as their heap size. Your
JT/NN/TT/DN/SNN would be started with 8192M as their heapsize (which
might not be what you're really looking for).

It does not affect the launched Task JVMs, which is what you're
looking to change.

> mapred.child.java.opts=8192 (in mapred-site.xml)

This affects all launched Task JVMs (Mappers and Reducers together),
and is a per-job specified setting (i.e. set by the JobConf/Job or -D
parameters in streaming, etc.). If you want your mapred-site.xml to be
the final authority on that property, set it as <final>true</final> so
nothing else can override it. By default, it is set to 200M.

Also, know that it is applied _per_ launched JVM, so set it sensibly
(based on the total configured slots per tasktracker/total memory
available for use in the machine/etc.).

-- 
Harsh J
www.harshj.com