You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Han JU <ju...@gmail.com> on 2013/04/30 12:00:22 UTC

Set reducer capacity for a specific M/R job

Hi,

I want to change the cluster's capacity of reduce slots on a per job basis.
Originally I have 8 reduce slots for a tasktracker.
I did:

conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
...
Job job = new Job(conf, ...)


And in the web UI I can see that for this job, the max reduce tasks is
exactly at 4, like I set. However hadoop still launches 8 reducer per
datanode ... why is this?

How could I achieve this?
-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

I don't think you can control how many reducers can run parallely via
framework.

Other way to do this is increase the memory given to individual reducer so
that the tasktracker will be limited by memory to launch more reducers at
the same time and they will queue up

you can try setting up this mapred.job.reduce.memory.mb to a higher value
and see if that works


On Tue, Apr 30, 2013 at 4:08 PM, Han JU <ju...@gmail.com> wrote:

> Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum
> is 8.
> And for this job, I want it to be 4.
> I set it through conf and build the job with this conf, then submit it.
> But hadoop lauches 8 reduce per datanode...
>
>
> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>
>> so basically if I understand correctly
>>
>> you want to limit the # parallel execution of reducers only for this job?
>>
>>
>>
>> On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju...@gmail.com> wrote:
>>
>>> Thanks.
>>>
>>> In fact I don't want to set reducer or mapper numbers, they are fine.
>>> I want to set the reduce slot capacity of my cluster when it executes my
>>> specific job. Say I have 100 reduce tasks for this job, I want my cluster
>>> to execute 4 of them in the same time, not 8 of them in the same time, only
>>> for this specific job.
>>> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the
>>> job. This conf is well received by the job, but ignored by hadoop ..
>>>
>>> Any idea why is this?
>>>
>>>
>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>
>>>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets
>>>> the maximum number of reduce tasks that may be run by an individual
>>>> TaskTracker server at one time. This is not per job configuration.
>>>>
>>>> he number of map tasks for a given job is driven by the number of input
>>>> splits and not by the mapred.map.tasks parameter. For each input split a
>>>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>>>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>>>> a hint to the InputFormat for the number of maps
>>>>
>>>> If you want to set max number of maps or reducers per job then you can
>>>> set the hints by using the job object you created
>>>> job.setNumMapTasks()
>>>>
>>>> Note this is just a hint and again the number will be decided by the
>>>> input split size.
>>>>
>>>>
>>>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>>>>
>>>>> Thanks Nitin.
>>>>>
>>>>> What I need is to set slot only for a specific job, not for the whole
>>>>> cluster conf.
>>>>> But what I did does NOT work ... Have I done something wrong?
>>>>>
>>>>>
>>>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>>>
>>>>>> The config you are setting is for job only
>>>>>>
>>>>>> But if you want to reduce the slota on tasktrackers then you will
>>>>>> need to edit tasktracker conf and restart tasktracker
>>>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>>>> I did:
>>>>>>>
>>>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>>>> ...
>>>>>>> Job job = new Job(conf, ...)
>>>>>>>
>>>>>>>
>>>>>>> And in the web UI I can see that for this job, the max reduce tasks
>>>>>>> is exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>>>> datanode ... why is this?
>>>>>>>
>>>>>>> How could I achieve this?
>>>>>>> --
>>>>>>> *JU Han*
>>>>>>>
>>>>>>> Software Engineer Intern @ KXEN Inc.
>>>>>>> UTC   -  Université de Technologie de Compiègne
>>>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>>>
>>>>>>> +33 0619608888
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *JU Han*
>>>>>
>>>>> Software Engineer Intern @ KXEN Inc.
>>>>> UTC   -  Université de Technologie de Compiègne
>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>
>>>>> +33 0619608888
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>>
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

I don't think you can control how many reducers can run parallely via
framework.

Other way to do this is increase the memory given to individual reducer so
that the tasktracker will be limited by memory to launch more reducers at
the same time and they will queue up

you can try setting up this mapred.job.reduce.memory.mb to a higher value
and see if that works


On Tue, Apr 30, 2013 at 4:08 PM, Han JU <ju...@gmail.com> wrote:

> Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum
> is 8.
> And for this job, I want it to be 4.
> I set it through conf and build the job with this conf, then submit it.
> But hadoop lauches 8 reduce per datanode...
>
>
> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>
>> so basically if I understand correctly
>>
>> you want to limit the # parallel execution of reducers only for this job?
>>
>>
>>
>> On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju...@gmail.com> wrote:
>>
>>> Thanks.
>>>
>>> In fact I don't want to set reducer or mapper numbers, they are fine.
>>> I want to set the reduce slot capacity of my cluster when it executes my
>>> specific job. Say I have 100 reduce tasks for this job, I want my cluster
>>> to execute 4 of them in the same time, not 8 of them in the same time, only
>>> for this specific job.
>>> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the
>>> job. This conf is well received by the job, but ignored by hadoop ..
>>>
>>> Any idea why is this?
>>>
>>>
>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>
>>>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets
>>>> the maximum number of reduce tasks that may be run by an individual
>>>> TaskTracker server at one time. This is not per job configuration.
>>>>
>>>> he number of map tasks for a given job is driven by the number of input
>>>> splits and not by the mapred.map.tasks parameter. For each input split a
>>>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>>>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>>>> a hint to the InputFormat for the number of maps
>>>>
>>>> If you want to set max number of maps or reducers per job then you can
>>>> set the hints by using the job object you created
>>>> job.setNumMapTasks()
>>>>
>>>> Note this is just a hint and again the number will be decided by the
>>>> input split size.
>>>>
>>>>
>>>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>>>>
>>>>> Thanks Nitin.
>>>>>
>>>>> What I need is to set slot only for a specific job, not for the whole
>>>>> cluster conf.
>>>>> But what I did does NOT work ... Have I done something wrong?
>>>>>
>>>>>
>>>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>>>
>>>>>> The config you are setting is for job only
>>>>>>
>>>>>> But if you want to reduce the slota on tasktrackers then you will
>>>>>> need to edit tasktracker conf and restart tasktracker
>>>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>>>> I did:
>>>>>>>
>>>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>>>> ...
>>>>>>> Job job = new Job(conf, ...)
>>>>>>>
>>>>>>>
>>>>>>> And in the web UI I can see that for this job, the max reduce tasks
>>>>>>> is exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>>>> datanode ... why is this?
>>>>>>>
>>>>>>> How could I achieve this?
>>>>>>> --
>>>>>>> *JU Han*
>>>>>>>
>>>>>>> Software Engineer Intern @ KXEN Inc.
>>>>>>> UTC   -  Université de Technologie de Compiègne
>>>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>>>
>>>>>>> +33 0619608888
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *JU Han*
>>>>>
>>>>> Software Engineer Intern @ KXEN Inc.
>>>>> UTC   -  Université de Technologie de Compiègne
>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>
>>>>> +33 0619608888
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>>
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

I don't think you can control how many reducers can run parallely via
framework.

Other way to do this is increase the memory given to individual reducer so
that the tasktracker will be limited by memory to launch more reducers at
the same time and they will queue up

you can try setting up this mapred.job.reduce.memory.mb to a higher value
and see if that works


On Tue, Apr 30, 2013 at 4:08 PM, Han JU <ju...@gmail.com> wrote:

> Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum
> is 8.
> And for this job, I want it to be 4.
> I set it through conf and build the job with this conf, then submit it.
> But hadoop lauches 8 reduce per datanode...
>
>
> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>
>> so basically if I understand correctly
>>
>> you want to limit the # parallel execution of reducers only for this job?
>>
>>
>>
>> On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju...@gmail.com> wrote:
>>
>>> Thanks.
>>>
>>> In fact I don't want to set reducer or mapper numbers, they are fine.
>>> I want to set the reduce slot capacity of my cluster when it executes my
>>> specific job. Say I have 100 reduce tasks for this job, I want my cluster
>>> to execute 4 of them in the same time, not 8 of them in the same time, only
>>> for this specific job.
>>> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the
>>> job. This conf is well received by the job, but ignored by hadoop ..
>>>
>>> Any idea why is this?
>>>
>>>
>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>
>>>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets
>>>> the maximum number of reduce tasks that may be run by an individual
>>>> TaskTracker server at one time. This is not per job configuration.
>>>>
>>>> he number of map tasks for a given job is driven by the number of input
>>>> splits and not by the mapred.map.tasks parameter. For each input split a
>>>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>>>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>>>> a hint to the InputFormat for the number of maps
>>>>
>>>> If you want to set max number of maps or reducers per job then you can
>>>> set the hints by using the job object you created
>>>> job.setNumMapTasks()
>>>>
>>>> Note this is just a hint and again the number will be decided by the
>>>> input split size.
>>>>
>>>>
>>>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>>>>
>>>>> Thanks Nitin.
>>>>>
>>>>> What I need is to set slot only for a specific job, not for the whole
>>>>> cluster conf.
>>>>> But what I did does NOT work ... Have I done something wrong?
>>>>>
>>>>>
>>>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>>>
>>>>>> The config you are setting is for job only
>>>>>>
>>>>>> But if you want to reduce the slota on tasktrackers then you will
>>>>>> need to edit tasktracker conf and restart tasktracker
>>>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>>>> I did:
>>>>>>>
>>>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>>>> ...
>>>>>>> Job job = new Job(conf, ...)
>>>>>>>
>>>>>>>
>>>>>>> And in the web UI I can see that for this job, the max reduce tasks
>>>>>>> is exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>>>> datanode ... why is this?
>>>>>>>
>>>>>>> How could I achieve this?
>>>>>>> --
>>>>>>> *JU Han*
>>>>>>>
>>>>>>> Software Engineer Intern @ KXEN Inc.
>>>>>>> UTC   -  Université de Technologie de Compiègne
>>>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>>>
>>>>>>> +33 0619608888
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *JU Han*
>>>>>
>>>>> Software Engineer Intern @ KXEN Inc.
>>>>> UTC   -  Université de Technologie de Compiègne
>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>
>>>>> +33 0619608888
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>>
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

I don't think you can control how many reducers can run parallely via
framework.

Other way to do this is increase the memory given to individual reducer so
that the tasktracker will be limited by memory to launch more reducers at
the same time and they will queue up

you can try setting up this mapred.job.reduce.memory.mb to a higher value
and see if that works


On Tue, Apr 30, 2013 at 4:08 PM, Han JU <ju...@gmail.com> wrote:

> Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum
> is 8.
> And for this job, I want it to be 4.
> I set it through conf and build the job with this conf, then submit it.
> But hadoop lauches 8 reduce per datanode...
>
>
> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>
>> so basically if I understand correctly
>>
>> you want to limit the # parallel execution of reducers only for this job?
>>
>>
>>
>> On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju...@gmail.com> wrote:
>>
>>> Thanks.
>>>
>>> In fact I don't want to set reducer or mapper numbers, they are fine.
>>> I want to set the reduce slot capacity of my cluster when it executes my
>>> specific job. Say I have 100 reduce tasks for this job, I want my cluster
>>> to execute 4 of them in the same time, not 8 of them in the same time, only
>>> for this specific job.
>>> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the
>>> job. This conf is well received by the job, but ignored by hadoop ..
>>>
>>> Any idea why is this?
>>>
>>>
>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>
>>>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets
>>>> the maximum number of reduce tasks that may be run by an individual
>>>> TaskTracker server at one time. This is not per job configuration.
>>>>
>>>> he number of map tasks for a given job is driven by the number of input
>>>> splits and not by the mapred.map.tasks parameter. For each input split a
>>>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>>>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>>>> a hint to the InputFormat for the number of maps
>>>>
>>>> If you want to set max number of maps or reducers per job then you can
>>>> set the hints by using the job object you created
>>>> job.setNumMapTasks()
>>>>
>>>> Note this is just a hint and again the number will be decided by the
>>>> input split size.
>>>>
>>>>
>>>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>>>>
>>>>> Thanks Nitin.
>>>>>
>>>>> What I need is to set slot only for a specific job, not for the whole
>>>>> cluster conf.
>>>>> But what I did does NOT work ... Have I done something wrong?
>>>>>
>>>>>
>>>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>>>
>>>>>> The config you are setting is for job only
>>>>>>
>>>>>> But if you want to reduce the slota on tasktrackers then you will
>>>>>> need to edit tasktracker conf and restart tasktracker
>>>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>>>> I did:
>>>>>>>
>>>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>>>> ...
>>>>>>> Job job = new Job(conf, ...)
>>>>>>>
>>>>>>>
>>>>>>> And in the web UI I can see that for this job, the max reduce tasks
>>>>>>> is exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>>>> datanode ... why is this?
>>>>>>>
>>>>>>> How could I achieve this?
>>>>>>> --
>>>>>>> *JU Han*
>>>>>>>
>>>>>>> Software Engineer Intern @ KXEN Inc.
>>>>>>> UTC   -  Université de Technologie de Compiègne
>>>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>>>
>>>>>>> +33 0619608888
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *JU Han*
>>>>>
>>>>> Software Engineer Intern @ KXEN Inc.
>>>>> UTC   -  Université de Technologie de Compiègne
>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>
>>>>> +33 0619608888
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>>
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Han JU <ju...@gmail.com>.

Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum
is 8.
And for this job, I want it to be 4.
I set it through conf and build the job with this conf, then submit it. But
hadoop lauches 8 reduce per datanode...


2013/4/30 Nitin Pawar <ni...@gmail.com>

> so basically if I understand correctly
>
> you want to limit the # parallel execution of reducers only for this job?
>
>
>
> On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju...@gmail.com> wrote:
>
>> Thanks.
>>
>> In fact I don't want to set reducer or mapper numbers, they are fine.
>> I want to set the reduce slot capacity of my cluster when it executes my
>> specific job. Say I have 100 reduce tasks for this job, I want my cluster
>> to execute 4 of them in the same time, not 8 of them in the same time, only
>> for this specific job.
>> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
>> This conf is well received by the job, but ignored by hadoop ..
>>
>> Any idea why is this?
>>
>>
>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>
>>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets
>>> the maximum number of reduce tasks that may be run by an individual
>>> TaskTracker server at one time. This is not per job configuration.
>>>
>>> he number of map tasks for a given job is driven by the number of input
>>> splits and not by the mapred.map.tasks parameter. For each input split a
>>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>>> a hint to the InputFormat for the number of maps
>>>
>>> If you want to set max number of maps or reducers per job then you can
>>> set the hints by using the job object you created
>>> job.setNumMapTasks()
>>>
>>> Note this is just a hint and again the number will be decided by the
>>> input split size.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>>>
>>>> Thanks Nitin.
>>>>
>>>> What I need is to set slot only for a specific job, not for the whole
>>>> cluster conf.
>>>> But what I did does NOT work ... Have I done something wrong?
>>>>
>>>>
>>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>>
>>>>> The config you are setting is for job only
>>>>>
>>>>> But if you want to reduce the slota on tasktrackers then you will need
>>>>> to edit tasktracker conf and restart tasktracker
>>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>>> I did:
>>>>>>
>>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>>> ...
>>>>>> Job job = new Job(conf, ...)
>>>>>>
>>>>>>
>>>>>> And in the web UI I can see that for this job, the max reduce tasks
>>>>>> is exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>>> datanode ... why is this?
>>>>>>
>>>>>> How could I achieve this?
>>>>>> --
>>>>>> *JU Han*
>>>>>>
>>>>>> Software Engineer Intern @ KXEN Inc.
>>>>>> UTC   -  Université de Technologie de Compiègne
>>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>>
>>>>>> +33 0619608888
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Han JU <ju...@gmail.com>.

Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum
is 8.
And for this job, I want it to be 4.
I set it through conf and build the job with this conf, then submit it. But
hadoop lauches 8 reduce per datanode...


2013/4/30 Nitin Pawar <ni...@gmail.com>

> so basically if I understand correctly
>
> you want to limit the # parallel execution of reducers only for this job?
>
>
>
> On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju...@gmail.com> wrote:
>
>> Thanks.
>>
>> In fact I don't want to set reducer or mapper numbers, they are fine.
>> I want to set the reduce slot capacity of my cluster when it executes my
>> specific job. Say I have 100 reduce tasks for this job, I want my cluster
>> to execute 4 of them in the same time, not 8 of them in the same time, only
>> for this specific job.
>> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
>> This conf is well received by the job, but ignored by hadoop ..
>>
>> Any idea why is this?
>>
>>
>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>
>>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets
>>> the maximum number of reduce tasks that may be run by an individual
>>> TaskTracker server at one time. This is not per job configuration.
>>>
>>> he number of map tasks for a given job is driven by the number of input
>>> splits and not by the mapred.map.tasks parameter. For each input split a
>>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>>> a hint to the InputFormat for the number of maps
>>>
>>> If you want to set max number of maps or reducers per job then you can
>>> set the hints by using the job object you created
>>> job.setNumMapTasks()
>>>
>>> Note this is just a hint and again the number will be decided by the
>>> input split size.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>>>
>>>> Thanks Nitin.
>>>>
>>>> What I need is to set slot only for a specific job, not for the whole
>>>> cluster conf.
>>>> But what I did does NOT work ... Have I done something wrong?
>>>>
>>>>
>>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>>
>>>>> The config you are setting is for job only
>>>>>
>>>>> But if you want to reduce the slota on tasktrackers then you will need
>>>>> to edit tasktracker conf and restart tasktracker
>>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>>> I did:
>>>>>>
>>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>>> ...
>>>>>> Job job = new Job(conf, ...)
>>>>>>
>>>>>>
>>>>>> And in the web UI I can see that for this job, the max reduce tasks
>>>>>> is exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>>> datanode ... why is this?
>>>>>>
>>>>>> How could I achieve this?
>>>>>> --
>>>>>> *JU Han*
>>>>>>
>>>>>> Software Engineer Intern @ KXEN Inc.
>>>>>> UTC   -  Université de Technologie de Compiègne
>>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>>
>>>>>> +33 0619608888
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Han JU <ju...@gmail.com>.

Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum
is 8.
And for this job, I want it to be 4.
I set it through conf and build the job with this conf, then submit it. But
hadoop lauches 8 reduce per datanode...


2013/4/30 Nitin Pawar <ni...@gmail.com>

> so basically if I understand correctly
>
> you want to limit the # parallel execution of reducers only for this job?
>
>
>
> On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju...@gmail.com> wrote:
>
>> Thanks.
>>
>> In fact I don't want to set reducer or mapper numbers, they are fine.
>> I want to set the reduce slot capacity of my cluster when it executes my
>> specific job. Say I have 100 reduce tasks for this job, I want my cluster
>> to execute 4 of them in the same time, not 8 of them in the same time, only
>> for this specific job.
>> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
>> This conf is well received by the job, but ignored by hadoop ..
>>
>> Any idea why is this?
>>
>>
>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>
>>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets
>>> the maximum number of reduce tasks that may be run by an individual
>>> TaskTracker server at one time. This is not per job configuration.
>>>
>>> he number of map tasks for a given job is driven by the number of input
>>> splits and not by the mapred.map.tasks parameter. For each input split a
>>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>>> a hint to the InputFormat for the number of maps
>>>
>>> If you want to set max number of maps or reducers per job then you can
>>> set the hints by using the job object you created
>>> job.setNumMapTasks()
>>>
>>> Note this is just a hint and again the number will be decided by the
>>> input split size.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>>>
>>>> Thanks Nitin.
>>>>
>>>> What I need is to set slot only for a specific job, not for the whole
>>>> cluster conf.
>>>> But what I did does NOT work ... Have I done something wrong?
>>>>
>>>>
>>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>>
>>>>> The config you are setting is for job only
>>>>>
>>>>> But if you want to reduce the slota on tasktrackers then you will need
>>>>> to edit tasktracker conf and restart tasktracker
>>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>>> I did:
>>>>>>
>>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>>> ...
>>>>>> Job job = new Job(conf, ...)
>>>>>>
>>>>>>
>>>>>> And in the web UI I can see that for this job, the max reduce tasks
>>>>>> is exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>>> datanode ... why is this?
>>>>>>
>>>>>> How could I achieve this?
>>>>>> --
>>>>>> *JU Han*
>>>>>>
>>>>>> Software Engineer Intern @ KXEN Inc.
>>>>>> UTC   -  Université de Technologie de Compiègne
>>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>>
>>>>>> +33 0619608888
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Han JU <ju...@gmail.com>.

Yes.. In the conf file of my cluster, mapred.tasktracker.reduce.tasks.maximum
is 8.
And for this job, I want it to be 4.
I set it through conf and build the job with this conf, then submit it. But
hadoop lauches 8 reduce per datanode...


2013/4/30 Nitin Pawar <ni...@gmail.com>

> so basically if I understand correctly
>
> you want to limit the # parallel execution of reducers only for this job?
>
>
>
> On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju...@gmail.com> wrote:
>
>> Thanks.
>>
>> In fact I don't want to set reducer or mapper numbers, they are fine.
>> I want to set the reduce slot capacity of my cluster when it executes my
>> specific job. Say I have 100 reduce tasks for this job, I want my cluster
>> to execute 4 of them in the same time, not 8 of them in the same time, only
>> for this specific job.
>> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
>> This conf is well received by the job, but ignored by hadoop ..
>>
>> Any idea why is this?
>>
>>
>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>
>>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets
>>> the maximum number of reduce tasks that may be run by an individual
>>> TaskTracker server at one time. This is not per job configuration.
>>>
>>> he number of map tasks for a given job is driven by the number of input
>>> splits and not by the mapred.map.tasks parameter. For each input split a
>>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>>> a hint to the InputFormat for the number of maps
>>>
>>> If you want to set max number of maps or reducers per job then you can
>>> set the hints by using the job object you created
>>> job.setNumMapTasks()
>>>
>>> Note this is just a hint and again the number will be decided by the
>>> input split size.
>>>
>>>
>>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>>>
>>>> Thanks Nitin.
>>>>
>>>> What I need is to set slot only for a specific job, not for the whole
>>>> cluster conf.
>>>> But what I did does NOT work ... Have I done something wrong?
>>>>
>>>>
>>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>>
>>>>> The config you are setting is for job only
>>>>>
>>>>> But if you want to reduce the slota on tasktrackers then you will need
>>>>> to edit tasktracker conf and restart tasktracker
>>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>>> I did:
>>>>>>
>>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>>> ...
>>>>>> Job job = new Job(conf, ...)
>>>>>>
>>>>>>
>>>>>> And in the web UI I can see that for this job, the max reduce tasks
>>>>>> is exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>>> datanode ... why is this?
>>>>>>
>>>>>> How could I achieve this?
>>>>>> --
>>>>>> *JU Han*
>>>>>>
>>>>>> Software Engineer Intern @ KXEN Inc.
>>>>>> UTC   -  Université de Technologie de Compiègne
>>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>>
>>>>>> +33 0619608888
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>>
>>>
>>> --
>>> Nitin Pawar
>>>
>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

so basically if I understand correctly

you want to limit the # parallel execution of reducers only for this job?



On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju...@gmail.com> wrote:

> Thanks.
>
> In fact I don't want to set reducer or mapper numbers, they are fine.
> I want to set the reduce slot capacity of my cluster when it executes my
> specific job. Say I have 100 reduce tasks for this job, I want my cluster
> to execute 4 of them in the same time, not 8 of them in the same time, only
> for this specific job.
> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
> This conf is well received by the job, but ignored by hadoop ..
>
> Any idea why is this?
>
>
> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>
>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
>> maximum number of reduce tasks that may be run by an individual TaskTracker
>> server at one time. This is not per job configuration.
>>
>> he number of map tasks for a given job is driven by the number of input
>> splits and not by the mapred.map.tasks parameter. For each input split a
>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>> a hint to the InputFormat for the number of maps
>>
>> If you want to set max number of maps or reducers per job then you can
>> set the hints by using the job object you created
>> job.setNumMapTasks()
>>
>> Note this is just a hint and again the number will be decided by the
>> input split size.
>>
>>
>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>>
>>> Thanks Nitin.
>>>
>>> What I need is to set slot only for a specific job, not for the whole
>>> cluster conf.
>>> But what I did does NOT work ... Have I done something wrong?
>>>
>>>
>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>
>>>> The config you are setting is for job only
>>>>
>>>> But if you want to reduce the slota on tasktrackers then you will need
>>>> to edit tasktracker conf and restart tasktracker
>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>> I did:
>>>>>
>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>> ...
>>>>> Job job = new Job(conf, ...)
>>>>>
>>>>>
>>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>> datanode ... why is this?
>>>>>
>>>>> How could I achieve this?
>>>>> --
>>>>> *JU Han*
>>>>>
>>>>> Software Engineer Intern @ KXEN Inc.
>>>>> UTC   -  Université de Technologie de Compiègne
>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>
>>>>> +33 0619608888
>>>>>
>>>>
>>>
>>>
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

so basically if I understand correctly

you want to limit the # parallel execution of reducers only for this job?



On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju...@gmail.com> wrote:

> Thanks.
>
> In fact I don't want to set reducer or mapper numbers, they are fine.
> I want to set the reduce slot capacity of my cluster when it executes my
> specific job. Say I have 100 reduce tasks for this job, I want my cluster
> to execute 4 of them in the same time, not 8 of them in the same time, only
> for this specific job.
> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
> This conf is well received by the job, but ignored by hadoop ..
>
> Any idea why is this?
>
>
> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>
>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
>> maximum number of reduce tasks that may be run by an individual TaskTracker
>> server at one time. This is not per job configuration.
>>
>> he number of map tasks for a given job is driven by the number of input
>> splits and not by the mapred.map.tasks parameter. For each input split a
>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>> a hint to the InputFormat for the number of maps
>>
>> If you want to set max number of maps or reducers per job then you can
>> set the hints by using the job object you created
>> job.setNumMapTasks()
>>
>> Note this is just a hint and again the number will be decided by the
>> input split size.
>>
>>
>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>>
>>> Thanks Nitin.
>>>
>>> What I need is to set slot only for a specific job, not for the whole
>>> cluster conf.
>>> But what I did does NOT work ... Have I done something wrong?
>>>
>>>
>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>
>>>> The config you are setting is for job only
>>>>
>>>> But if you want to reduce the slota on tasktrackers then you will need
>>>> to edit tasktracker conf and restart tasktracker
>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>> I did:
>>>>>
>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>> ...
>>>>> Job job = new Job(conf, ...)
>>>>>
>>>>>
>>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>> datanode ... why is this?
>>>>>
>>>>> How could I achieve this?
>>>>> --
>>>>> *JU Han*
>>>>>
>>>>> Software Engineer Intern @ KXEN Inc.
>>>>> UTC   -  Université de Technologie de Compiègne
>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>
>>>>> +33 0619608888
>>>>>
>>>>
>>>
>>>
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

so basically if I understand correctly

you want to limit the # parallel execution of reducers only for this job?



On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju...@gmail.com> wrote:

> Thanks.
>
> In fact I don't want to set reducer or mapper numbers, they are fine.
> I want to set the reduce slot capacity of my cluster when it executes my
> specific job. Say I have 100 reduce tasks for this job, I want my cluster
> to execute 4 of them in the same time, not 8 of them in the same time, only
> for this specific job.
> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
> This conf is well received by the job, but ignored by hadoop ..
>
> Any idea why is this?
>
>
> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>
>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
>> maximum number of reduce tasks that may be run by an individual TaskTracker
>> server at one time. This is not per job configuration.
>>
>> he number of map tasks for a given job is driven by the number of input
>> splits and not by the mapred.map.tasks parameter. For each input split a
>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>> a hint to the InputFormat for the number of maps
>>
>> If you want to set max number of maps or reducers per job then you can
>> set the hints by using the job object you created
>> job.setNumMapTasks()
>>
>> Note this is just a hint and again the number will be decided by the
>> input split size.
>>
>>
>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>>
>>> Thanks Nitin.
>>>
>>> What I need is to set slot only for a specific job, not for the whole
>>> cluster conf.
>>> But what I did does NOT work ... Have I done something wrong?
>>>
>>>
>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>
>>>> The config you are setting is for job only
>>>>
>>>> But if you want to reduce the slota on tasktrackers then you will need
>>>> to edit tasktracker conf and restart tasktracker
>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>> I did:
>>>>>
>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>> ...
>>>>> Job job = new Job(conf, ...)
>>>>>
>>>>>
>>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>> datanode ... why is this?
>>>>>
>>>>> How could I achieve this?
>>>>> --
>>>>> *JU Han*
>>>>>
>>>>> Software Engineer Intern @ KXEN Inc.
>>>>> UTC   -  Université de Technologie de Compiègne
>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>
>>>>> +33 0619608888
>>>>>
>>>>
>>>
>>>
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

so basically if I understand correctly

you want to limit the # parallel execution of reducers only for this job?



On Tue, Apr 30, 2013 at 4:02 PM, Han JU <ju...@gmail.com> wrote:

> Thanks.
>
> In fact I don't want to set reducer or mapper numbers, they are fine.
> I want to set the reduce slot capacity of my cluster when it executes my
> specific job. Say I have 100 reduce tasks for this job, I want my cluster
> to execute 4 of them in the same time, not 8 of them in the same time, only
> for this specific job.
> So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
> This conf is well received by the job, but ignored by hadoop ..
>
> Any idea why is this?
>
>
> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>
>> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
>> maximum number of reduce tasks that may be run by an individual TaskTracker
>> server at one time. This is not per job configuration.
>>
>> he number of map tasks for a given job is driven by the number of input
>> splits and not by the mapred.map.tasks parameter. For each input split a
>> map task is spawned. So, over the lifetime of a mapreduce job the number of
>> map tasks is equal to the number of input splits. mapred.map.tasks is just
>> a hint to the InputFormat for the number of maps
>>
>> If you want to set max number of maps or reducers per job then you can
>> set the hints by using the job object you created
>> job.setNumMapTasks()
>>
>> Note this is just a hint and again the number will be decided by the
>> input split size.
>>
>>
>> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>>
>>> Thanks Nitin.
>>>
>>> What I need is to set slot only for a specific job, not for the whole
>>> cluster conf.
>>> But what I did does NOT work ... Have I done something wrong?
>>>
>>>
>>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>>
>>>> The config you are setting is for job only
>>>>
>>>> But if you want to reduce the slota on tasktrackers then you will need
>>>> to edit tasktracker conf and restart tasktracker
>>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>>> I did:
>>>>>
>>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>>> ...
>>>>> Job job = new Job(conf, ...)
>>>>>
>>>>>
>>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>>> datanode ... why is this?
>>>>>
>>>>> How could I achieve this?
>>>>> --
>>>>> *JU Han*
>>>>>
>>>>> Software Engineer Intern @ KXEN Inc.
>>>>> UTC   -  Université de Technologie de Compiègne
>>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>>
>>>>> +33 0619608888
>>>>>
>>>>
>>>
>>>
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Han JU <ju...@gmail.com>.

Thanks.

In fact I don't want to set reducer or mapper numbers, they are fine.
I want to set the reduce slot capacity of my cluster when it executes my
specific job. Say I have 100 reduce tasks for this job, I want my cluster
to execute 4 of them in the same time, not 8 of them in the same time, only
for this specific job.
So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
This conf is well received by the job, but ignored by hadoop ..

Any idea why is this?


2013/4/30 Nitin Pawar <ni...@gmail.com>

> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
> maximum number of reduce tasks that may be run by an individual TaskTracker
> server at one time. This is not per job configuration.
>
> he number of map tasks for a given job is driven by the number of input
> splits and not by the mapred.map.tasks parameter. For each input split a
> map task is spawned. So, over the lifetime of a mapreduce job the number of
> map tasks is equal to the number of input splits. mapred.map.tasks is just
> a hint to the InputFormat for the number of maps
>
> If you want to set max number of maps or reducers per job then you can set
> the hints by using the job object you created
> job.setNumMapTasks()
>
> Note this is just a hint and again the number will be decided by the input
> split size.
>
>
> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>
>> Thanks Nitin.
>>
>> What I need is to set slot only for a specific job, not for the whole
>> cluster conf.
>> But what I did does NOT work ... Have I done something wrong?
>>
>>
>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>
>>> The config you are setting is for job only
>>>
>>> But if you want to reduce the slota on tasktrackers then you will need
>>> to edit tasktracker conf and restart tasktracker
>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>> I did:
>>>>
>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>> ...
>>>> Job job = new Job(conf, ...)
>>>>
>>>>
>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>> datanode ... why is this?
>>>>
>>>> How could I achieve this?
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

forgot to add there is similar method for reducer as well

job.setNumReduceTasks(0);




On Tue, Apr 30, 2013 at 3:56 PM, Nitin Pawar <ni...@gmail.com>wrote:

> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
> maximum number of reduce tasks that may be run by an individual TaskTracker
> server at one time. This is not per job configuration.
>
> he number of map tasks for a given job is driven by the number of input
> splits and not by the mapred.map.tasks parameter. For each input split a
> map task is spawned. So, over the lifetime of a mapreduce job the number of
> map tasks is equal to the number of input splits. mapred.map.tasks is just
> a hint to the InputFormat for the number of maps
>
> If you want to set max number of maps or reducers per job then you can set
> the hints by using the job object you created
> job.setNumMapTasks()
>
> Note this is just a hint and again the number will be decided by the input
> split size.
>
>
> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>
>> Thanks Nitin.
>>
>> What I need is to set slot only for a specific job, not for the whole
>> cluster conf.
>> But what I did does NOT work ... Have I done something wrong?
>>
>>
>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>
>>> The config you are setting is for job only
>>>
>>> But if you want to reduce the slota on tasktrackers then you will need
>>> to edit tasktracker conf and restart tasktracker
>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>> I did:
>>>>
>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>> ...
>>>> Job job = new Job(conf, ...)
>>>>
>>>>
>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>> datanode ... why is this?
>>>>
>>>> How could I achieve this?
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

forgot to add there is similar method for reducer as well

job.setNumReduceTasks(0);




On Tue, Apr 30, 2013 at 3:56 PM, Nitin Pawar <ni...@gmail.com>wrote:

> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
> maximum number of reduce tasks that may be run by an individual TaskTracker
> server at one time. This is not per job configuration.
>
> he number of map tasks for a given job is driven by the number of input
> splits and not by the mapred.map.tasks parameter. For each input split a
> map task is spawned. So, over the lifetime of a mapreduce job the number of
> map tasks is equal to the number of input splits. mapred.map.tasks is just
> a hint to the InputFormat for the number of maps
>
> If you want to set max number of maps or reducers per job then you can set
> the hints by using the job object you created
> job.setNumMapTasks()
>
> Note this is just a hint and again the number will be decided by the input
> split size.
>
>
> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>
>> Thanks Nitin.
>>
>> What I need is to set slot only for a specific job, not for the whole
>> cluster conf.
>> But what I did does NOT work ... Have I done something wrong?
>>
>>
>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>
>>> The config you are setting is for job only
>>>
>>> But if you want to reduce the slota on tasktrackers then you will need
>>> to edit tasktracker conf and restart tasktracker
>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>> I did:
>>>>
>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>> ...
>>>> Job job = new Job(conf, ...)
>>>>
>>>>
>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>> datanode ... why is this?
>>>>
>>>> How could I achieve this?
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

forgot to add there is similar method for reducer as well

job.setNumReduceTasks(0);




On Tue, Apr 30, 2013 at 3:56 PM, Nitin Pawar <ni...@gmail.com>wrote:

> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
> maximum number of reduce tasks that may be run by an individual TaskTracker
> server at one time. This is not per job configuration.
>
> he number of map tasks for a given job is driven by the number of input
> splits and not by the mapred.map.tasks parameter. For each input split a
> map task is spawned. So, over the lifetime of a mapreduce job the number of
> map tasks is equal to the number of input splits. mapred.map.tasks is just
> a hint to the InputFormat for the number of maps
>
> If you want to set max number of maps or reducers per job then you can set
> the hints by using the job object you created
> job.setNumMapTasks()
>
> Note this is just a hint and again the number will be decided by the input
> split size.
>
>
> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>
>> Thanks Nitin.
>>
>> What I need is to set slot only for a specific job, not for the whole
>> cluster conf.
>> But what I did does NOT work ... Have I done something wrong?
>>
>>
>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>
>>> The config you are setting is for job only
>>>
>>> But if you want to reduce the slota on tasktrackers then you will need
>>> to edit tasktracker conf and restart tasktracker
>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>> I did:
>>>>
>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>> ...
>>>> Job job = new Job(conf, ...)
>>>>
>>>>
>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>> datanode ... why is this?
>>>>
>>>> How could I achieve this?
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Han JU <ju...@gmail.com>.

Thanks.

In fact I don't want to set reducer or mapper numbers, they are fine.
I want to set the reduce slot capacity of my cluster when it executes my
specific job. Say I have 100 reduce tasks for this job, I want my cluster
to execute 4 of them in the same time, not 8 of them in the same time, only
for this specific job.
So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
This conf is well received by the job, but ignored by hadoop ..

Any idea why is this?


2013/4/30 Nitin Pawar <ni...@gmail.com>

> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
> maximum number of reduce tasks that may be run by an individual TaskTracker
> server at one time. This is not per job configuration.
>
> he number of map tasks for a given job is driven by the number of input
> splits and not by the mapred.map.tasks parameter. For each input split a
> map task is spawned. So, over the lifetime of a mapreduce job the number of
> map tasks is equal to the number of input splits. mapred.map.tasks is just
> a hint to the InputFormat for the number of maps
>
> If you want to set max number of maps or reducers per job then you can set
> the hints by using the job object you created
> job.setNumMapTasks()
>
> Note this is just a hint and again the number will be decided by the input
> split size.
>
>
> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>
>> Thanks Nitin.
>>
>> What I need is to set slot only for a specific job, not for the whole
>> cluster conf.
>> But what I did does NOT work ... Have I done something wrong?
>>
>>
>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>
>>> The config you are setting is for job only
>>>
>>> But if you want to reduce the slota on tasktrackers then you will need
>>> to edit tasktracker conf and restart tasktracker
>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>> I did:
>>>>
>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>> ...
>>>> Job job = new Job(conf, ...)
>>>>
>>>>
>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>> datanode ... why is this?
>>>>
>>>> How could I achieve this?
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Han JU <ju...@gmail.com>.

Thanks.

In fact I don't want to set reducer or mapper numbers, they are fine.
I want to set the reduce slot capacity of my cluster when it executes my
specific job. Say I have 100 reduce tasks for this job, I want my cluster
to execute 4 of them in the same time, not 8 of them in the same time, only
for this specific job.
So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
This conf is well received by the job, but ignored by hadoop ..

Any idea why is this?


2013/4/30 Nitin Pawar <ni...@gmail.com>

> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
> maximum number of reduce tasks that may be run by an individual TaskTracker
> server at one time. This is not per job configuration.
>
> he number of map tasks for a given job is driven by the number of input
> splits and not by the mapred.map.tasks parameter. For each input split a
> map task is spawned. So, over the lifetime of a mapreduce job the number of
> map tasks is equal to the number of input splits. mapred.map.tasks is just
> a hint to the InputFormat for the number of maps
>
> If you want to set max number of maps or reducers per job then you can set
> the hints by using the job object you created
> job.setNumMapTasks()
>
> Note this is just a hint and again the number will be decided by the input
> split size.
>
>
> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>
>> Thanks Nitin.
>>
>> What I need is to set slot only for a specific job, not for the whole
>> cluster conf.
>> But what I did does NOT work ... Have I done something wrong?
>>
>>
>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>
>>> The config you are setting is for job only
>>>
>>> But if you want to reduce the slota on tasktrackers then you will need
>>> to edit tasktracker conf and restart tasktracker
>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>> I did:
>>>>
>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>> ...
>>>> Job job = new Job(conf, ...)
>>>>
>>>>
>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>> datanode ... why is this?
>>>>
>>>> How could I achieve this?
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Han JU <ju...@gmail.com>.

Thanks.

In fact I don't want to set reducer or mapper numbers, they are fine.
I want to set the reduce slot capacity of my cluster when it executes my
specific job. Say I have 100 reduce tasks for this job, I want my cluster
to execute 4 of them in the same time, not 8 of them in the same time, only
for this specific job.
So I set mapred.tasktracker.reduce.tasks.maximum to 4 and submit the job.
This conf is well received by the job, but ignored by hadoop ..

Any idea why is this?


2013/4/30 Nitin Pawar <ni...@gmail.com>

> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
> maximum number of reduce tasks that may be run by an individual TaskTracker
> server at one time. This is not per job configuration.
>
> he number of map tasks for a given job is driven by the number of input
> splits and not by the mapred.map.tasks parameter. For each input split a
> map task is spawned. So, over the lifetime of a mapreduce job the number of
> map tasks is equal to the number of input splits. mapred.map.tasks is just
> a hint to the InputFormat for the number of maps
>
> If you want to set max number of maps or reducers per job then you can set
> the hints by using the job object you created
> job.setNumMapTasks()
>
> Note this is just a hint and again the number will be decided by the input
> split size.
>
>
> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>
>> Thanks Nitin.
>>
>> What I need is to set slot only for a specific job, not for the whole
>> cluster conf.
>> But what I did does NOT work ... Have I done something wrong?
>>
>>
>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>
>>> The config you are setting is for job only
>>>
>>> But if you want to reduce the slota on tasktrackers then you will need
>>> to edit tasktracker conf and restart tasktracker
>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>> I did:
>>>>
>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>> ...
>>>> Job job = new Job(conf, ...)
>>>>
>>>>
>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>> datanode ... why is this?
>>>>
>>>> How could I achieve this?
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

forgot to add there is similar method for reducer as well

job.setNumReduceTasks(0);




On Tue, Apr 30, 2013 at 3:56 PM, Nitin Pawar <ni...@gmail.com>wrote:

> The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
> maximum number of reduce tasks that may be run by an individual TaskTracker
> server at one time. This is not per job configuration.
>
> he number of map tasks for a given job is driven by the number of input
> splits and not by the mapred.map.tasks parameter. For each input split a
> map task is spawned. So, over the lifetime of a mapreduce job the number of
> map tasks is equal to the number of input splits. mapred.map.tasks is just
> a hint to the InputFormat for the number of maps
>
> If you want to set max number of maps or reducers per job then you can set
> the hints by using the job object you created
> job.setNumMapTasks()
>
> Note this is just a hint and again the number will be decided by the input
> split size.
>
>
> On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:
>
>> Thanks Nitin.
>>
>> What I need is to set slot only for a specific job, not for the whole
>> cluster conf.
>> But what I did does NOT work ... Have I done something wrong?
>>
>>
>> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>>
>>> The config you are setting is for job only
>>>
>>> But if you want to reduce the slota on tasktrackers then you will need
>>> to edit tasktracker conf and restart tasktracker
>>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to change the cluster's capacity of reduce slots on a per job
>>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>>> I did:
>>>>
>>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>>> ...
>>>> Job job = new Job(conf, ...)
>>>>
>>>>
>>>> And in the web UI I can see that for this job, the max reduce tasks is
>>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>>> datanode ... why is this?
>>>>
>>>> How could I achieve this?
>>>> --
>>>> *JU Han*
>>>>
>>>> Software Engineer Intern @ KXEN Inc.
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
maximum number of reduce tasks that may be run by an individual TaskTracker
server at one time. This is not per job configuration.

he number of map tasks for a given job is driven by the number of input
splits and not by the mapred.map.tasks parameter. For each input split a
map task is spawned. So, over the lifetime of a mapreduce job the number of
map tasks is equal to the number of input splits. mapred.map.tasks is just
a hint to the InputFormat for the number of maps

If you want to set max number of maps or reducers per job then you can set
the hints by using the job object you created
job.setNumMapTasks()

Note this is just a hint and again the number will be decided by the input
split size.

On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:

> Thanks Nitin.
>
> What I need is to set slot only for a specific job, not for the whole
> cluster conf.
> But what I did does NOT work ... Have I done something wrong?
>
>
> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>
>> The config you are setting is for job only
>>
>> But if you want to reduce the slota on tasktrackers then you will need to
>> edit tasktracker conf and restart tasktracker
>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I want to change the cluster's capacity of reduce slots on a per job
>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>> I did:
>>>
>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>> ...
>>> Job job = new Job(conf, ...)
>>>
>>>
>>> And in the web UI I can see that for this job, the max reduce tasks is
>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>> datanode ... why is this?
>>>
>>> How could I achieve this?
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>

-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
maximum number of reduce tasks that may be run by an individual TaskTracker
server at one time. This is not per job configuration.

he number of map tasks for a given job is driven by the number of input
splits and not by the mapred.map.tasks parameter. For each input split a
map task is spawned. So, over the lifetime of a mapreduce job the number of
map tasks is equal to the number of input splits. mapred.map.tasks is just
a hint to the InputFormat for the number of maps

If you want to set max number of maps or reducers per job then you can set
the hints by using the job object you created
job.setNumMapTasks()

Note this is just a hint and again the number will be decided by the input
split size.

On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:

> Thanks Nitin.
>
> What I need is to set slot only for a specific job, not for the whole
> cluster conf.
> But what I did does NOT work ... Have I done something wrong?
>
>
> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>
>> The config you are setting is for job only
>>
>> But if you want to reduce the slota on tasktrackers then you will need to
>> edit tasktracker conf and restart tasktracker
>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I want to change the cluster's capacity of reduce slots on a per job
>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>> I did:
>>>
>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>> ...
>>> Job job = new Job(conf, ...)
>>>
>>>
>>> And in the web UI I can see that for this job, the max reduce tasks is
>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>> datanode ... why is this?
>>>
>>> How could I achieve this?
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>

-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
maximum number of reduce tasks that may be run by an individual TaskTracker
server at one time. This is not per job configuration.

he number of map tasks for a given job is driven by the number of input
splits and not by the mapred.map.tasks parameter. For each input split a
map task is spawned. So, over the lifetime of a mapreduce job the number of
map tasks is equal to the number of input splits. mapred.map.tasks is just
a hint to the InputFormat for the number of maps

If you want to set max number of maps or reducers per job then you can set
the hints by using the job object you created
job.setNumMapTasks()

Note this is just a hint and again the number will be decided by the input
split size.

On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:

> Thanks Nitin.
>
> What I need is to set slot only for a specific job, not for the whole
> cluster conf.
> But what I did does NOT work ... Have I done something wrong?
>
>
> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>
>> The config you are setting is for job only
>>
>> But if you want to reduce the slota on tasktrackers then you will need to
>> edit tasktracker conf and restart tasktracker
>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I want to change the cluster's capacity of reduce slots on a per job
>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>> I did:
>>>
>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>> ...
>>> Job job = new Job(conf, ...)
>>>
>>>
>>> And in the web UI I can see that for this job, the max reduce tasks is
>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>> datanode ... why is this?
>>>
>>> How could I achieve this?
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>

-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

The *mapred*.*tasktracker*.*reduce*.*tasks*.*maximum* parameter sets the
maximum number of reduce tasks that may be run by an individual TaskTracker
server at one time. This is not per job configuration.

he number of map tasks for a given job is driven by the number of input
splits and not by the mapred.map.tasks parameter. For each input split a
map task is spawned. So, over the lifetime of a mapreduce job the number of
map tasks is equal to the number of input splits. mapred.map.tasks is just
a hint to the InputFormat for the number of maps

If you want to set max number of maps or reducers per job then you can set
the hints by using the job object you created
job.setNumMapTasks()

Note this is just a hint and again the number will be decided by the input
split size.

On Tue, Apr 30, 2013 at 3:39 PM, Han JU <ju...@gmail.com> wrote:

> Thanks Nitin.
>
> What I need is to set slot only for a specific job, not for the whole
> cluster conf.
> But what I did does NOT work ... Have I done something wrong?
>
>
> 2013/4/30 Nitin Pawar <ni...@gmail.com>
>
>> The config you are setting is for job only
>>
>> But if you want to reduce the slota on tasktrackers then you will need to
>> edit tasktracker conf and restart tasktracker
>> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I want to change the cluster's capacity of reduce slots on a per job
>>> basis. Originally I have 8 reduce slots for a tasktracker.
>>> I did:
>>>
>>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>>> ...
>>> Job job = new Job(conf, ...)
>>>
>>>
>>> And in the web UI I can see that for this job, the max reduce tasks is
>>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>>> datanode ... why is this?
>>>
>>> How could I achieve this?
>>> --
>>> *JU Han*
>>>
>>> Software Engineer Intern @ KXEN Inc.
>>> UTC   -  Université de Technologie de Compiègne
>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>
>>> +33 0619608888
>>>
>>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>

-- 
Nitin Pawar

Re: Set reducer capacity for a specific M/R job

Posted by Han JU <ju...@gmail.com>.

Thanks Nitin.

What I need is to set slot only for a specific job, not for the whole
cluster conf.
But what I did does NOT work ... Have I done something wrong?


2013/4/30 Nitin Pawar <ni...@gmail.com>

> The config you are setting is for job only
>
> But if you want to reduce the slota on tasktrackers then you will need to
> edit tasktracker conf and restart tasktracker
> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>
>> Hi,
>>
>> I want to change the cluster's capacity of reduce slots on a per job
>> basis. Originally I have 8 reduce slots for a tasktracker.
>> I did:
>>
>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>> ...
>> Job job = new Job(conf, ...)
>>
>>
>> And in the web UI I can see that for this job, the max reduce tasks is
>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>> datanode ... why is this?
>>
>> How could I achieve this?
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>


-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Han JU <ju...@gmail.com>.

Thanks Nitin.

What I need is to set slot only for a specific job, not for the whole
cluster conf.
But what I did does NOT work ... Have I done something wrong?


2013/4/30 Nitin Pawar <ni...@gmail.com>

> The config you are setting is for job only
>
> But if you want to reduce the slota on tasktrackers then you will need to
> edit tasktracker conf and restart tasktracker
> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>
>> Hi,
>>
>> I want to change the cluster's capacity of reduce slots on a per job
>> basis. Originally I have 8 reduce slots for a tasktracker.
>> I did:
>>
>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>> ...
>> Job job = new Job(conf, ...)
>>
>>
>> And in the web UI I can see that for this job, the max reduce tasks is
>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>> datanode ... why is this?
>>
>> How could I achieve this?
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>


-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Han JU <ju...@gmail.com>.

Thanks Nitin.

What I need is to set slot only for a specific job, not for the whole
cluster conf.
But what I did does NOT work ... Have I done something wrong?


2013/4/30 Nitin Pawar <ni...@gmail.com>

> The config you are setting is for job only
>
> But if you want to reduce the slota on tasktrackers then you will need to
> edit tasktracker conf and restart tasktracker
> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>
>> Hi,
>>
>> I want to change the cluster's capacity of reduce slots on a per job
>> basis. Originally I have 8 reduce slots for a tasktracker.
>> I did:
>>
>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>> ...
>> Job job = new Job(conf, ...)
>>
>>
>> And in the web UI I can see that for this job, the max reduce tasks is
>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>> datanode ... why is this?
>>
>> How could I achieve this?
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>


-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Han JU <ju...@gmail.com>.

Thanks Nitin.

What I need is to set slot only for a specific job, not for the whole
cluster conf.
But what I did does NOT work ... Have I done something wrong?


2013/4/30 Nitin Pawar <ni...@gmail.com>

> The config you are setting is for job only
>
> But if you want to reduce the slota on tasktrackers then you will need to
> edit tasktracker conf and restart tasktracker
> On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:
>
>> Hi,
>>
>> I want to change the cluster's capacity of reduce slots on a per job
>> basis. Originally I have 8 reduce slots for a tasktracker.
>> I did:
>>
>> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
>> ...
>> Job job = new Job(conf, ...)
>>
>>
>> And in the web UI I can see that for this job, the max reduce tasks is
>> exactly at 4, like I set. However hadoop still launches 8 reducer per
>> datanode ... why is this?
>>
>> How could I achieve this?
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>


-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

The config you are setting is for job only

But if you want to reduce the slota on tasktrackers then you will need to
edit tasktracker conf and restart tasktracker
On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:

> Hi,
>
> I want to change the cluster's capacity of reduce slots on a per job
> basis. Originally I have 8 reduce slots for a tasktracker.
> I did:
>
> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
> ...
> Job job = new Job(conf, ...)
>
>
> And in the web UI I can see that for this job, the max reduce tasks is
> exactly at 4, like I set. However hadoop still launches 8 reducer per
> datanode ... why is this?
>
> How could I achieve this?
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

The config you are setting is for job only

But if you want to reduce the slota on tasktrackers then you will need to
edit tasktracker conf and restart tasktracker
On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:

> Hi,
>
> I want to change the cluster's capacity of reduce slots on a per job
> basis. Originally I have 8 reduce slots for a tasktracker.
> I did:
>
> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
> ...
> Job job = new Job(conf, ...)
>
>
> And in the web UI I can see that for this job, the max reduce tasks is
> exactly at 4, like I set. However hadoop still launches 8 reducer per
> datanode ... why is this?
>
> How could I achieve this?
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

The config you are setting is for job only

But if you want to reduce the slota on tasktrackers then you will need to
edit tasktracker conf and restart tasktracker
On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:

> Hi,
>
> I want to change the cluster's capacity of reduce slots on a per job
> basis. Originally I have 8 reduce slots for a tasktracker.
> I did:
>
> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
> ...
> Job job = new Job(conf, ...)
>
>
> And in the web UI I can see that for this job, the max reduce tasks is
> exactly at 4, like I set. However hadoop still launches 8 reducer per
> datanode ... why is this?
>
> How could I achieve this?
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>

Re: Set reducer capacity for a specific M/R job

Posted by Nitin Pawar <ni...@gmail.com>.

The config you are setting is for job only

But if you want to reduce the slota on tasktrackers then you will need to
edit tasktracker conf and restart tasktracker
On Apr 30, 2013 3:30 PM, "Han JU" <ju...@gmail.com> wrote:

> Hi,
>
> I want to change the cluster's capacity of reduce slots on a per job
> basis. Originally I have 8 reduce slots for a tasktracker.
> I did:
>
> conf.set("mapred.tasktracker.reduce.tasks.maximum", "4");
> ...
> Job job = new Job(conf, ...)
>
>
> And in the web UI I can see that for this job, the max reduce tasks is
> exactly at 4, like I set. However hadoop still launches 8 reducer per
> datanode ... why is this?
>
> How could I achieve this?
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>