You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Brendan Weickert <bw...@palantir.com> on 2011/03/01 16:26:23 UTC

tuning memory for map-reduce jobs--help appreciated

I haven't been able to answer this question in the documentation...
I want to up the memory allocation for the launched reduce tasks.  I can see no way to do this in the code (via a jobConf setting, for instance).  So instead I've gone to each node in my 10-node cluster and edited the mapred-site.xml file to add the property
mapreduce.reduce.java.opts, with value -Xmx 2048M, then stopped and restarted the cluster.
But I have a few questions about this:

 1.  Now when I run a job, the Job Configuration lists two simultaneous properties:  mapreduce.reduce.java.opts, and mapred.child.java.opts, with values 2048M and 200m, respectively.  Which of these is taking priority?  Is the first overriding the second for reduce jobs (I hope)?
 2.  Is the child JVM allocation being carved out of HADOOP_HEAPSIZE?  Or is it separate?
 3.  All the documentation says that these child memory settings can be set "at the job level" rather than at the cluster level.  But I haven't been able to see how.  As I mentioned above, only way I've seen to do it is to edit ten different mapred-site.xml files, stop, and then restart my cluster, which certainly doesn't seem like job-level configuration. Am I missing a way to do this via jobConf?
Thanks very much for any help.

Re: tuning memory for map-reduce jobs--help appreciated

Posted by Harsh J <qw...@gmail.com>.
Hello,

On Tue, Mar 1, 2011 at 8:56 PM, Brendan Weickert <bw...@palantir.com> wrote:
> I haven't been able to answer this question in the documentation...
> I want to up the memory allocation for the launched reduce tasks.  I can see no way to do this in the code (via a jobConf setting, for instance).  So instead I've gone to each node in my 10-node cluster and edited the mapred-site.xml file to add the property
> mapreduce.reduce.java.opts, with value -Xmx 2048M, then stopped and restarted the cluster.
> But I have a few questions about this:
>
>  1.  Now when I run a job, the Job Configuration lists two simultaneous properties:  mapreduce.reduce.java.opts, and mapred.child.java.opts, with values 2048M and 200m, respectively.  Which of these is taking priority?  Is the first overriding the second for reduce jobs (I hope)?

This FAQ ought to solve some concerns regarding (1):
http://wiki.apache.org/hadoop/FAQ#How_do_I_get_my_MapReduce_Java_Program_to_read_the_Cluster.27s_set_configuration_and_not_just_defaults.3F

>  2.  Is the child JVM allocation being carved out of HADOOP_HEAPSIZE?  Or is it separate?

It is separate. Child JVM properties are set by
'mapred.child.java.opts', a submission-given property.

>  3.  All the documentation says that these child memory settings can be set "at the job level" rather than at the cluster level.  But I haven't been able to see how.  As I mentioned above, only way I've seen to do it is to edit ten different mapred-site.xml files, stop, and then restart my cluster, which certainly doesn't seem like job-level configuration. Am I missing a way to do this via jobConf?

Yes, I think the JobConf does lack an 'API' method of setting the
child JVM opts. You can, meanwhile, set it as
jobConf.set("mapred.child.java.opts", "-Xmx2048M");

mapred.child.java.opts is indeed a job-specific property, and adding
it to mapred-site.xml is only useful in scenarios where you want to
'finalize' that property to a fixed value no matter what a submitted
Job provides.

Hope this clears up some confusion you're having!

-- 
Harsh J
www.harshj.com