You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by john smith <js...@gmail.com> on 2012/11/01 10:01:31 UTC

Low shuffle transfer speeds

Hi list,

I have jobs that generate huge amount of intermediate data. For eg: One of
my job generates almost 12 GB map output. I have 8 datanodes/TTs and 1
master.

My reduce progress shows that the copy speed in range 0.55 - 1 MBps , but
normal file transfers between my datanodes generally go up to 40-50 MBps.
Why is my shuffle speed so slow?

Also how is that number calculated ? What exactly does that signify? (Is it
the avg speed of all mappers to that particular reducer? or anything else?)
 Any suggestions?

Thanks

Re: Low shuffle transfer speeds

Posted by Harsh J <ha...@cloudera.com>.

Hi,

The reducer copies map outputs progressively (as and when they
complete) unless configured otherwise. It is normal hence, for the
overall average (thats what it is currently, unfortunately), to show
up lower than the actual value since there are periods where the
reducer is idle in waiting for further map task waves to complete.

You can control the mapred.reduce.slowstart.completed.maps (5% (0.05)
by default) to control the threshold of overall maps completion
percentage the reducer should begin copying outputs at. An increased
value, such as 80% (0.8) will let your Reducers copy more data
continuously (since it does not have to wait much).

On Thu, Nov 1, 2012 at 2:31 PM, john smith <js...@gmail.com> wrote:
> Hi list,
>
> I have jobs that generate huge amount of intermediate data. For eg: One of
> my job generates almost 12 GB map output. I have 8 datanodes/TTs and 1
> master.
>
> My reduce progress shows that the copy speed in range 0.55 - 1 MBps , but
> normal file transfers between my datanodes generally go up to 40-50 MBps.
> Why is my shuffle speed so slow?
>
> Also how is that number calculated ? What exactly does that signify? (Is it
> the avg speed of all mappers to that particular reducer? or anything else?)
> Any suggestions?
>
> Thanks

-- 
Harsh J

Re: Low shuffle transfer speeds

Posted by Harsh J <ha...@cloudera.com>.

Hi,

The reducer copies map outputs progressively (as and when they
complete) unless configured otherwise. It is normal hence, for the
overall average (thats what it is currently, unfortunately), to show
up lower than the actual value since there are periods where the
reducer is idle in waiting for further map task waves to complete.

You can control the mapred.reduce.slowstart.completed.maps (5% (0.05)
by default) to control the threshold of overall maps completion
percentage the reducer should begin copying outputs at. An increased
value, such as 80% (0.8) will let your Reducers copy more data
continuously (since it does not have to wait much).

On Thu, Nov 1, 2012 at 2:31 PM, john smith <js...@gmail.com> wrote:
> Hi list,
>
> I have jobs that generate huge amount of intermediate data. For eg: One of
> my job generates almost 12 GB map output. I have 8 datanodes/TTs and 1
> master.
>
> My reduce progress shows that the copy speed in range 0.55 - 1 MBps , but
> normal file transfers between my datanodes generally go up to 40-50 MBps.
> Why is my shuffle speed so slow?
>
> Also how is that number calculated ? What exactly does that signify? (Is it
> the avg speed of all mappers to that particular reducer? or anything else?)
> Any suggestions?
>
> Thanks

-- 
Harsh J

Re: Low shuffle transfer speeds

Posted by Harsh J <ha...@cloudera.com>.

Hi,

The reducer copies map outputs progressively (as and when they
complete) unless configured otherwise. It is normal hence, for the
overall average (thats what it is currently, unfortunately), to show
up lower than the actual value since there are periods where the
reducer is idle in waiting for further map task waves to complete.

You can control the mapred.reduce.slowstart.completed.maps (5% (0.05)
by default) to control the threshold of overall maps completion
percentage the reducer should begin copying outputs at. An increased
value, such as 80% (0.8) will let your Reducers copy more data
continuously (since it does not have to wait much).

On Thu, Nov 1, 2012 at 2:31 PM, john smith <js...@gmail.com> wrote:
> Hi list,
>
> I have jobs that generate huge amount of intermediate data. For eg: One of
> my job generates almost 12 GB map output. I have 8 datanodes/TTs and 1
> master.
>
> My reduce progress shows that the copy speed in range 0.55 - 1 MBps , but
> normal file transfers between my datanodes generally go up to 40-50 MBps.
> Why is my shuffle speed so slow?
>
> Also how is that number calculated ? What exactly does that signify? (Is it
> the avg speed of all mappers to that particular reducer? or anything else?)
> Any suggestions?
>
> Thanks

-- 
Harsh J

Re: Low shuffle transfer speeds

Posted by Harsh J <ha...@cloudera.com>.

Hi,

The reducer copies map outputs progressively (as and when they
complete) unless configured otherwise. It is normal hence, for the
overall average (thats what it is currently, unfortunately), to show
up lower than the actual value since there are periods where the
reducer is idle in waiting for further map task waves to complete.

You can control the mapred.reduce.slowstart.completed.maps (5% (0.05)
by default) to control the threshold of overall maps completion
percentage the reducer should begin copying outputs at. An increased
value, such as 80% (0.8) will let your Reducers copy more data
continuously (since it does not have to wait much).

On Thu, Nov 1, 2012 at 2:31 PM, john smith <js...@gmail.com> wrote:
> Hi list,
>
> I have jobs that generate huge amount of intermediate data. For eg: One of
> my job generates almost 12 GB map output. I have 8 datanodes/TTs and 1
> master.
>
> My reduce progress shows that the copy speed in range 0.55 - 1 MBps , but
> normal file transfers between my datanodes generally go up to 40-50 MBps.
> Why is my shuffle speed so slow?
>
> Also how is that number calculated ? What exactly does that signify? (Is it
> the avg speed of all mappers to that particular reducer? or anything else?)
> Any suggestions?
>
> Thanks

-- 
Harsh J