You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Himanshu Vijay <hi...@gmail.com> on 2013/09/30 21:39:41 UTC

Cluster config: Mapper:Reducer Task Capapcity

Hi,

Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map Task
Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a ratio of
2.7. We have a lot of variety of jobs running and we want to increase the
throughput.

My manual observation was that we hit the Mapper capacity and hence many
jobs have to wait even though lot of room left in Reduce capacity. I mined
the jobtracker logs for the jobs that completed and saw that on a hourly
basis as well as daily basis the mapper:reducer ratio was 4-5.

To increase the throughput I was thinking that I experiment changing the
Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
~4.

Does this sound like a correct approach ? Is this something that I can
control or it's determined automatically by Hadoop ?

Have any of you done this kind of exercise ? If yes can you please direct
how to go about changing this ratio. I am not finding much literature on
it.

Note: Mapper and ReducerTask Capacity is the max total no. of
mappers/reducers you can run on the cluster at any point.

Regards,
-Himanshu Vijay

Re: Cluster config: Mapper:Reducer Task Capapcity

Posted by Himanshu Vijay <hi...@gmail.com>.
What is the down side of increasing both
mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum to same value ?

I read on this link<http://developer.yahoo.com/hadoop/tutorial/module7.html>that:
 mapred.tasktracker.map.tasks.maximum 1/2 * (cores/node) to 2 *
(cores/node)Number
of map tasks to deploy on each machine.
mapred.tasktracker.reduce.tasks.maximum1/2 * (cores/node) to 2 *
(cores/node) Number of reduce tasks to deploy on each machine.
Each node has 8 cores. So according to above guidance I should both the
configs from 4 to 16. The ratio of mapper to reducer doesn't really matter
as far as these two properties are concerned.


On Mon, Sep 30, 2013 at 12:52 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Himanshu,
>
> Changing the ratio is definitely a reasonable thing to do.  The capacities
> come from the mapred.tasktracker.map.tasks.maximum
> and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
>  You can tweak these on your nodes to get your desired ratio.
>
> -Sandy
>
>
> On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <hi...@gmail.com>wrote:
>
>> Hi,
>>
>> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map
>> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a
>> ratio of 2.7. We have a lot of variety of jobs running and we want to
>> increase the throughput.
>>
>> My manual observation was that we hit the Mapper capacity and hence many
>> jobs have to wait even though lot of room left in Reduce capacity. I mined
>> the jobtracker logs for the jobs that completed and saw that on a hourly
>> basis as well as daily basis the mapper:reducer ratio was 4-5.
>>
>> To increase the throughput I was thinking that I experiment changing the
>> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
>> ~4.
>>
>> Does this sound like a correct approach ? Is this something that I can
>> control or it's determined automatically by Hadoop ?
>>
>> Have any of you done this kind of exercise ? If yes can you please direct
>> how to go about changing this ratio. I am not finding much literature on
>> it.
>>
>> Note: Mapper and ReducerTask Capacity is the max total no. of
>> mappers/reducers you can run on the cluster at any point.
>>
>> Regards,
>> -Himanshu Vijay
>>
>
>


-- 
-Himanshu Vijay

Re: Cluster config: Mapper:Reducer Task Capapcity

Posted by Himanshu Vijay <hi...@gmail.com>.
What is the down side of increasing both
mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum to same value ?

I read on this link<http://developer.yahoo.com/hadoop/tutorial/module7.html>that:
 mapred.tasktracker.map.tasks.maximum 1/2 * (cores/node) to 2 *
(cores/node)Number
of map tasks to deploy on each machine.
mapred.tasktracker.reduce.tasks.maximum1/2 * (cores/node) to 2 *
(cores/node) Number of reduce tasks to deploy on each machine.
Each node has 8 cores. So according to above guidance I should both the
configs from 4 to 16. The ratio of mapper to reducer doesn't really matter
as far as these two properties are concerned.


On Mon, Sep 30, 2013 at 12:52 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Himanshu,
>
> Changing the ratio is definitely a reasonable thing to do.  The capacities
> come from the mapred.tasktracker.map.tasks.maximum
> and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
>  You can tweak these on your nodes to get your desired ratio.
>
> -Sandy
>
>
> On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <hi...@gmail.com>wrote:
>
>> Hi,
>>
>> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map
>> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a
>> ratio of 2.7. We have a lot of variety of jobs running and we want to
>> increase the throughput.
>>
>> My manual observation was that we hit the Mapper capacity and hence many
>> jobs have to wait even though lot of room left in Reduce capacity. I mined
>> the jobtracker logs for the jobs that completed and saw that on a hourly
>> basis as well as daily basis the mapper:reducer ratio was 4-5.
>>
>> To increase the throughput I was thinking that I experiment changing the
>> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
>> ~4.
>>
>> Does this sound like a correct approach ? Is this something that I can
>> control or it's determined automatically by Hadoop ?
>>
>> Have any of you done this kind of exercise ? If yes can you please direct
>> how to go about changing this ratio. I am not finding much literature on
>> it.
>>
>> Note: Mapper and ReducerTask Capacity is the max total no. of
>> mappers/reducers you can run on the cluster at any point.
>>
>> Regards,
>> -Himanshu Vijay
>>
>
>


-- 
-Himanshu Vijay

Re: Cluster config: Mapper:Reducer Task Capapcity

Posted by Himanshu Vijay <hi...@gmail.com>.
What is the down side of increasing both
mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum to same value ?

I read on this link<http://developer.yahoo.com/hadoop/tutorial/module7.html>that:
 mapred.tasktracker.map.tasks.maximum 1/2 * (cores/node) to 2 *
(cores/node)Number
of map tasks to deploy on each machine.
mapred.tasktracker.reduce.tasks.maximum1/2 * (cores/node) to 2 *
(cores/node) Number of reduce tasks to deploy on each machine.
Each node has 8 cores. So according to above guidance I should both the
configs from 4 to 16. The ratio of mapper to reducer doesn't really matter
as far as these two properties are concerned.


On Mon, Sep 30, 2013 at 12:52 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Himanshu,
>
> Changing the ratio is definitely a reasonable thing to do.  The capacities
> come from the mapred.tasktracker.map.tasks.maximum
> and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
>  You can tweak these on your nodes to get your desired ratio.
>
> -Sandy
>
>
> On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <hi...@gmail.com>wrote:
>
>> Hi,
>>
>> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map
>> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a
>> ratio of 2.7. We have a lot of variety of jobs running and we want to
>> increase the throughput.
>>
>> My manual observation was that we hit the Mapper capacity and hence many
>> jobs have to wait even though lot of room left in Reduce capacity. I mined
>> the jobtracker logs for the jobs that completed and saw that on a hourly
>> basis as well as daily basis the mapper:reducer ratio was 4-5.
>>
>> To increase the throughput I was thinking that I experiment changing the
>> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
>> ~4.
>>
>> Does this sound like a correct approach ? Is this something that I can
>> control or it's determined automatically by Hadoop ?
>>
>> Have any of you done this kind of exercise ? If yes can you please direct
>> how to go about changing this ratio. I am not finding much literature on
>> it.
>>
>> Note: Mapper and ReducerTask Capacity is the max total no. of
>> mappers/reducers you can run on the cluster at any point.
>>
>> Regards,
>> -Himanshu Vijay
>>
>
>


-- 
-Himanshu Vijay

Re: Cluster config: Mapper:Reducer Task Capapcity

Posted by Himanshu Vijay <hi...@gmail.com>.
What is the down side of increasing both
mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum to same value ?

I read on this link<http://developer.yahoo.com/hadoop/tutorial/module7.html>that:
 mapred.tasktracker.map.tasks.maximum 1/2 * (cores/node) to 2 *
(cores/node)Number
of map tasks to deploy on each machine.
mapred.tasktracker.reduce.tasks.maximum1/2 * (cores/node) to 2 *
(cores/node) Number of reduce tasks to deploy on each machine.
Each node has 8 cores. So according to above guidance I should both the
configs from 4 to 16. The ratio of mapper to reducer doesn't really matter
as far as these two properties are concerned.


On Mon, Sep 30, 2013 at 12:52 PM, Sandy Ryza <sa...@cloudera.com>wrote:

> Hi Himanshu,
>
> Changing the ratio is definitely a reasonable thing to do.  The capacities
> come from the mapred.tasktracker.map.tasks.maximum
> and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
>  You can tweak these on your nodes to get your desired ratio.
>
> -Sandy
>
>
> On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <hi...@gmail.com>wrote:
>
>> Hi,
>>
>> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map
>> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a
>> ratio of 2.7. We have a lot of variety of jobs running and we want to
>> increase the throughput.
>>
>> My manual observation was that we hit the Mapper capacity and hence many
>> jobs have to wait even though lot of room left in Reduce capacity. I mined
>> the jobtracker logs for the jobs that completed and saw that on a hourly
>> basis as well as daily basis the mapper:reducer ratio was 4-5.
>>
>> To increase the throughput I was thinking that I experiment changing the
>> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
>> ~4.
>>
>> Does this sound like a correct approach ? Is this something that I can
>> control or it's determined automatically by Hadoop ?
>>
>> Have any of you done this kind of exercise ? If yes can you please direct
>> how to go about changing this ratio. I am not finding much literature on
>> it.
>>
>> Note: Mapper and ReducerTask Capacity is the max total no. of
>> mappers/reducers you can run on the cluster at any point.
>>
>> Regards,
>> -Himanshu Vijay
>>
>
>


-- 
-Himanshu Vijay

Re: Cluster config: Mapper:Reducer Task Capapcity

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Himanshu,

Changing the ratio is definitely a reasonable thing to do.  The capacities
come from the mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
 You can tweak these on your nodes to get your desired ratio.

-Sandy


On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <hi...@gmail.com>wrote:

> Hi,
>
> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map
> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a
> ratio of 2.7. We have a lot of variety of jobs running and we want to
> increase the throughput.
>
> My manual observation was that we hit the Mapper capacity and hence many
> jobs have to wait even though lot of room left in Reduce capacity. I mined
> the jobtracker logs for the jobs that completed and saw that on a hourly
> basis as well as daily basis the mapper:reducer ratio was 4-5.
>
> To increase the throughput I was thinking that I experiment changing the
> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
> ~4.
>
> Does this sound like a correct approach ? Is this something that I can
> control or it's determined automatically by Hadoop ?
>
> Have any of you done this kind of exercise ? If yes can you please direct
> how to go about changing this ratio. I am not finding much literature on
> it.
>
> Note: Mapper and ReducerTask Capacity is the max total no. of
> mappers/reducers you can run on the cluster at any point.
>
> Regards,
> -Himanshu Vijay
>

Re: Cluster config: Mapper:Reducer Task Capapcity

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Himanshu,

Changing the ratio is definitely a reasonable thing to do.  The capacities
come from the mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
 You can tweak these on your nodes to get your desired ratio.

-Sandy


On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <hi...@gmail.com>wrote:

> Hi,
>
> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map
> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a
> ratio of 2.7. We have a lot of variety of jobs running and we want to
> increase the throughput.
>
> My manual observation was that we hit the Mapper capacity and hence many
> jobs have to wait even though lot of room left in Reduce capacity. I mined
> the jobtracker logs for the jobs that completed and saw that on a hourly
> basis as well as daily basis the mapper:reducer ratio was 4-5.
>
> To increase the throughput I was thinking that I experiment changing the
> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
> ~4.
>
> Does this sound like a correct approach ? Is this something that I can
> control or it's determined automatically by Hadoop ?
>
> Have any of you done this kind of exercise ? If yes can you please direct
> how to go about changing this ratio. I am not finding much literature on
> it.
>
> Note: Mapper and ReducerTask Capacity is the max total no. of
> mappers/reducers you can run on the cluster at any point.
>
> Regards,
> -Himanshu Vijay
>

Re: Cluster config: Mapper:Reducer Task Capapcity

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Himanshu,

Changing the ratio is definitely a reasonable thing to do.  The capacities
come from the mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
 You can tweak these on your nodes to get your desired ratio.

-Sandy


On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <hi...@gmail.com>wrote:

> Hi,
>
> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map
> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a
> ratio of 2.7. We have a lot of variety of jobs running and we want to
> increase the throughput.
>
> My manual observation was that we hit the Mapper capacity and hence many
> jobs have to wait even though lot of room left in Reduce capacity. I mined
> the jobtracker logs for the jobs that completed and saw that on a hourly
> basis as well as daily basis the mapper:reducer ratio was 4-5.
>
> To increase the throughput I was thinking that I experiment changing the
> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
> ~4.
>
> Does this sound like a correct approach ? Is this something that I can
> control or it's determined automatically by Hadoop ?
>
> Have any of you done this kind of exercise ? If yes can you please direct
> how to go about changing this ratio. I am not finding much literature on
> it.
>
> Note: Mapper and ReducerTask Capacity is the max total no. of
> mappers/reducers you can run on the cluster at any point.
>
> Regards,
> -Himanshu Vijay
>

Re: Cluster config: Mapper:Reducer Task Capapcity

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Himanshu,

Changing the ratio is definitely a reasonable thing to do.  The capacities
come from the mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
 You can tweak these on your nodes to get your desired ratio.

-Sandy


On Mon, Sep 30, 2013 at 12:39 PM, Himanshu Vijay <hi...@gmail.com>wrote:

> Hi,
>
> Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map
> Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a
> ratio of 2.7. We have a lot of variety of jobs running and we want to
> increase the throughput.
>
> My manual observation was that we hit the Mapper capacity and hence many
> jobs have to wait even though lot of room left in Reduce capacity. I mined
> the jobtracker logs for the jobs that completed and saw that on a hourly
> basis as well as daily basis the mapper:reducer ratio was 4-5.
>
> To increase the throughput I was thinking that I experiment changing the
> Map and Reducer Task Capacity such that the ratio is increased from 2.7 to
> ~4.
>
> Does this sound like a correct approach ? Is this something that I can
> control or it's determined automatically by Hadoop ?
>
> Have any of you done this kind of exercise ? If yes can you please direct
> how to go about changing this ratio. I am not finding much literature on
> it.
>
> Note: Mapper and ReducerTask Capacity is the max total no. of
> mappers/reducers you can run on the cluster at any point.
>
> Regards,
> -Himanshu Vijay
>