You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by jeremy p <at...@gmail.com> on 2013/03/11 23:15:59 UTC

How to change the number of mappers per node for a given job

Hello all,

So, I have two jobs, Job A and Job B.  For Job A, I would like to have a
maximum of 6 mappers per node.  However, Job B is a little different.  For
Job B, I can only run one mapper per node.  The reason for this isn't
important -- let's just say this requirement is non-negotiable.  I would
like to tell Hadoop, "For Job A, schedule a maximum of 6 mappers per node.
 But for Job B, schedule a maximum of 1 mapper per node."  Is this possible
at all?

The only solution I can think of is :

1) Have two folders off the main hadoop folder, conf.JobA and conf.JobB.
 Each folder has its own copy of mapred-site.xml.
 conf.JobA/mapred-site.xml has a value of 6
for mapred.tasktracker.map.tasks.maximum.  conf.JobB/mapred-site.xml has a
value of 1 for mapred.tasktracker.map.tasks.maximum.
2) Before I run Job A :
2a) Shut down my tasktrackers
2b) Copy conf.JobA/mapred-site.xml into Hadoop's conf folder, replacing the
mapred-site.xml that was already in there
2c) Restart my tasktrackers
2d) Wait for the tasktrackers to finish starting
3) Run Job A

and then do a similar thing when I need to run Job B.

I really don't like this solution; it seems kludgey and failure-prone.  Is
there a better way to do what I need to do?

--Jeremy