You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jason Venner <ja...@attributor.com> on 2007/12/03 04:33:56 UTC

Controlling the number of simultanious jobs per machine - 0.15.0

We have jobs that require different resources and as such saturate our 
machines at different levels or parallelization.
What we want to do in the driver is set the number of simultaneous jobs 
per node.

        JobClient client = new JobClient();
        Configuration configuration = new Configuration();
        configuration.setInt( "mapred.tasktracker.tasks.maximum", 7);
        JobConf conf = new JobConf(configuration,MergeNewSeenDriver.class);

        System.err.println( "configured maximum tasks is " + conf.get( 
"mapred.tasktracker.tasks.maximum" ));

But this doesn't seem to work. The only success we have had is using 
multithreaded map runner, but then we don't get to run multiple reduces 
at a time on the machines.

Any suggestions?

Re: Controlling the number of simultanious jobs per machine - 0.15.0

Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Dec 2, 2007, at 11:53 PM, Espen Amble Kolstad wrote:

> AFAIK 0.15.x does not support different
> mapred.tasktracker.tasks.maximum per node. It's a per cluster setting.
> So whatever's in you hadoop-site.xml is what will be used.
>
> I think this is something coming in 0.16.x though.

This change was committed as HADOOP-1245. Furthermore, HADOOP-1274  
allows you to change the number of map slots independently from the  
number of reduce slots, so that you can run more maps without  
clobbering your cluster with reduces.

-- Owen

Re: Controlling the number of simultanious jobs per machine - 0.15.0

Posted by Espen Amble Kolstad <es...@trank.no>.
AFAIK 0.15.x does not support different
mapred.tasktracker.tasks.maximum per node. It's a per cluster setting.
So whatever's in you hadoop-site.xml is what will be used.

I think this is something coming in 0.16.x though.

Espen


On 12/3/07, Jason Venner <ja...@attributor.com> wrote:
> We have jobs that require different resources and as such saturate our
> machines at different levels or parallelization.
> What we want to do in the driver is set the number of simultaneous jobs
> per node.
>
>         JobClient client = new JobClient();
>         Configuration configuration = new Configuration();
>         configuration.setInt( "mapred.tasktracker.tasks.maximum", 7);
>         JobConf conf = new JobConf(configuration,MergeNewSeenDriver.class);
>
>         System.err.println( "configured maximum tasks is " + conf.get(
> "mapred.tasktracker.tasks.maximum" ));
>
> But this doesn't seem to work. The only success we have had is using
> multithreaded map runner, but then we don't get to run multiple reduces
> at a time on the machines.
>
> Any suggestions?
>

Re: Controlling the number of simultanious jobs per machine - 0.15.0

Posted by Michael Bieniosek <mi...@powerset.com>.
You also might want to look at HADOOP-2300


On 12/2/07 7:33 PM, "Jason Venner" <ja...@attributor.com> wrote:

We have jobs that require different resources and as such saturate our
machines at different levels or parallelization.
What we want to do in the driver is set the number of simultaneous jobs
per node.

        JobClient client = new JobClient();
        Configuration configuration = new Configuration();
        configuration.setInt( "mapred.tasktracker.tasks.maximum", 7);
        JobConf conf = new JobConf(configuration,MergeNewSeenDriver.class);

        System.err.println( "configured maximum tasks is " + conf.get(
"mapred.tasktracker.tasks.maximum" ));

But this doesn't seem to work. The only success we have had is using
multithreaded map runner, but then we don't get to run multiple reduces
at a time on the machines.

Any suggestions?