You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Billy <sa...@pearsonwholesale.com> on 2008/01/02 20:38:06 UTC

mapred.tasktracker.map.tasks.maximum

If I add this to a command line as a -jobconf should it be enforced?

Say I have a job that I want to run only 1 map at a time per server

I have tried this and look in the job.xml file and its set correctly but not 
enforced.

Billy




Re: mapred.tasktracker.map.tasks.maximum

Posted by Jason Venner <ja...@attributor.com>.
I believe you get this ability about 0.16.0.
as of 0.15.1 this is a per cluster set at start time value.

Billy wrote:
> If I add this to a command line as a -jobconf should it be enforced?
>
> Say I have a job that I want to run only 1 map at a time per server
>
> I have tried this and look in the job.xml file and its set correctly but not 
> enforced.
>
> Billy
>
>
>
>   

Re: mapred.tasktracker.map.tasks.maximum

Posted by Billy <sa...@pearsonwholesale.com>.
Some of the task I have will over run the servers if I ran say 2 of them per 
node but I have other task I can run 4 on a server so I was looking to get 
it config on the command line to better spread the work the way we want to.

Billy




"Arun C Murthy" <ar...@yahoo-inc.com> wrote in 
message news:20080102195305.GA47862@yahoo-inc.com...
> Billy,
>
>
> On Wed, Jan 02, 2008 at 01:38:06PM -0600, Billy wrote:
>>If I add this to a command line as a -jobconf should it be enforced?
>>
>
> This is a property of the TaskTracker and hence cannot be set on a per-job 
> basis...
>
>>Say I have a job that I want to run only 1 map at a time per server
>>
>
> Could you describe your reasons?
>
> Arun
>
>>I have tried this and look in the job.xml file and its set correctly but 
>>not
>>enforced.
>>
>
>
>>Billy
>>
>>
>>
> 




Re: mapred.tasktracker.map.tasks.maximum

Posted by Billy <sa...@pearsonwholesale.com>.
I thank the best option would be able to set the max per node in its config 
file
I thank someone is or has worked on this I seen something in jira.

for the new option I would thank a job over ride would work something like 
this

1) Check node config if job over ride is lower then node then use job 
override but if local node max is lower then use it
this way we could slow down the number of task for a job if needed on nodes 
that has excess compacty.

or

2) Jobconf would over ride all nodes local settings on this.

Billy



"Arun C Murthy" <ar...@yahoo-inc.com> wrote in 
message news:20080103045251.GB72608@yahoo-inc.com...
> On Thu, Jan 03, 2008 at 10:12:04AM +0530, Arun C Murthy wrote:
>>On Wed, Jan 02, 2008 at 12:08:53PM -0800, Jason Venner wrote:
>>>In our case, we have specific jobs that due to resource constraints can
>>>only be run serially (ie: 1 instance per machine).
>>
>>I see, at this point there isn't anything in Hadoop which can help you out 
>>here...
>>
>
> Given that, please file a jira for this enhancement anyway... Thanks!
>
> I'd imagine we should consider features such as:
> a) Max simultaneous tasks per node per job (current ask).
> b) Max concurrent tasks per job cluster-wide (i.e. don't run more than 25, 
> or an absolute number, of maps of a given job simultaneously on the 
> cluster) - this should help jobs which need to respect SLAs of external 
> services regardless of cluster sizes - don't open more than 150 
> simultaneous db-connections.
>
> Arun
>
>>Having said that, could you consider running another Map-Reduce cluster 
>>with mapred.tasktracker.map.tasks.maximum set to 1 for these special jobs?
>>Run this cluster on the same machines simultaneously with the your 
>>_regular_ cluster; just pick different ports etc.
>>
>>hth,
>>Arun
>>
>>>Most of our jobs are more normal and can be run in parallel on the 
>>>machines.
>>>
>>>Arun C Murthy wrote:
>>>>Billy,
>>>>
>>>>
>>>>On Wed, Jan 02, 2008 at 01:38:06PM -0600, Billy wrote:
>>>>
>>>>>If I add this to a command line as a -jobconf should it be enforced?
>>>>>
>>>>>
>>>>
>>>>This is a property of the TaskTracker and hence cannot be set on a 
>>>>per-job
>>>>basis...
>>>>
>>>>
>>>>>Say I have a job that I want to run only 1 map at a time per server
>>>>>
>>>>>
>>>>
>>>>Could you describe your reasons?
>>>>
>>>>Arun
>>>>
>>>>
>>>>>I have tried this and look in the job.xml file and its set correctly 
>>>>>but
>>>>>not enforced.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>>Billy
>>>>>
>>>>>
>>>>>
>>>>>
> 




Re: mapred.tasktracker.map.tasks.maximum

Posted by Arun C Murthy <ar...@yahoo-inc.com>.
On Thu, Jan 03, 2008 at 10:12:04AM +0530, Arun C Murthy wrote:
>On Wed, Jan 02, 2008 at 12:08:53PM -0800, Jason Venner wrote:
>>In our case, we have specific jobs that due to resource constraints can 
>>only be run serially (ie: 1 instance per machine).
>
>I see, at this point there isn't anything in Hadoop which can help you out here...
>

Given that, please file a jira for this enhancement anyway... Thanks! 

I'd imagine we should consider features such as:
a) Max simultaneous tasks per node per job (current ask).
b) Max concurrent tasks per job cluster-wide (i.e. don't run more than 25, or an absolute number, of maps of a given job simultaneously on the cluster) - this should help jobs which need to respect SLAs of external services regardless of cluster sizes - don't open more than 150 simultaneous db-connections.

Arun

>Having said that, could you consider running another Map-Reduce cluster with mapred.tasktracker.map.tasks.maximum set to 1 for these special jobs?
>Run this cluster on the same machines simultaneously with the your _regular_ cluster; just pick different ports etc.
>
>hth,
>Arun
>
>>Most of our jobs are more normal and can be run in parallel on the machines.
>>
>>Arun C Murthy wrote:
>>>Billy,
>>>
>>>
>>>On Wed, Jan 02, 2008 at 01:38:06PM -0600, Billy wrote:
>>>  
>>>>If I add this to a command line as a -jobconf should it be enforced?
>>>>
>>>>    
>>>
>>>This is a property of the TaskTracker and hence cannot be set on a per-job 
>>>basis...
>>>
>>>  
>>>>Say I have a job that I want to run only 1 map at a time per server
>>>>
>>>>    
>>>
>>>Could you describe your reasons?
>>>
>>>Arun
>>>
>>>  
>>>>I have tried this and look in the job.xml file and its set correctly but 
>>>>not enforced.
>>>>
>>>>    
>>>
>>>
>>>  
>>>>Billy
>>>>
>>>>
>>>>
>>>>    

Re: mapred.tasktracker.map.tasks.maximum

Posted by Arun C Murthy <ar...@yahoo-inc.com>.
On Wed, Jan 02, 2008 at 12:08:53PM -0800, Jason Venner wrote:
>In our case, we have specific jobs that due to resource constraints can 
>only be run serially (ie: 1 instance per machine).

I see, at this point there isn't anything in Hadoop which can help you out here...

Having said that, could you consider running another Map-Reduce cluster with mapred.tasktracker.map.tasks.maximum set to 1 for these special jobs?
Run this cluster on the same machines simultaneously with the your _regular_ cluster; just pick different ports etc.

hth,
Arun

>Most of our jobs are more normal and can be run in parallel on the machines.
>
>Arun C Murthy wrote:
>>Billy,
>>
>>
>>On Wed, Jan 02, 2008 at 01:38:06PM -0600, Billy wrote:
>>  
>>>If I add this to a command line as a -jobconf should it be enforced?
>>>
>>>    
>>
>>This is a property of the TaskTracker and hence cannot be set on a per-job 
>>basis...
>>
>>  
>>>Say I have a job that I want to run only 1 map at a time per server
>>>
>>>    
>>
>>Could you describe your reasons?
>>
>>Arun
>>
>>  
>>>I have tried this and look in the job.xml file and its set correctly but 
>>>not enforced.
>>>
>>>    
>>
>>
>>  
>>>Billy
>>>
>>>
>>>
>>>    

Re: mapred.tasktracker.map.tasks.maximum

Posted by Jason Venner <ja...@attributor.com>.
In our case, we have specific jobs that due to resource constraints can 
only be run serially (ie: 1 instance per machine).
Most of our jobs are more normal and can be run in parallel on the machines.

Arun C Murthy wrote:
> Billy,
>
>
> On Wed, Jan 02, 2008 at 01:38:06PM -0600, Billy wrote:
>   
>> If I add this to a command line as a -jobconf should it be enforced?
>>
>>     
>
> This is a property of the TaskTracker and hence cannot be set on a per-job basis...
>
>   
>> Say I have a job that I want to run only 1 map at a time per server
>>
>>     
>
> Could you describe your reasons?
>
> Arun
>
>   
>> I have tried this and look in the job.xml file and its set correctly but not 
>> enforced.
>>
>>     
>
>
>   
>> Billy
>>
>>
>>
>>     

Re: mapred.tasktracker.map.tasks.maximum

Posted by Arun C Murthy <ar...@yahoo-inc.com>.
Billy,


On Wed, Jan 02, 2008 at 01:38:06PM -0600, Billy wrote:
>If I add this to a command line as a -jobconf should it be enforced?
>

This is a property of the TaskTracker and hence cannot be set on a per-job basis...

>Say I have a job that I want to run only 1 map at a time per server
>

Could you describe your reasons?

Arun

>I have tried this and look in the job.xml file and its set correctly but not 
>enforced.
>


>Billy
>
>
>