You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Huanchen Zhang <ia...@gmail.com> on 2012/10/03 03:04:16 UTC

A small portion of map tasks slows down the job

Hello,

I have a small portion of map tasks whose output is much larger than others (more spills). So the reducer is mainly waiting for these a few map tasks. Is there a good solution for this problem ?

Thank you.

Best,
Huanchen

Re: A small portion of map tasks slows down the job

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

Would reducing the output from the map tasks solve the problem ? i.e. are
reducers slowing down because a lot of data is being shuffled ?

If that's the case, you could see if the map output size will reduce by
using the framework's combiner or an in-mapper combining technique.

Thanks
Hemanth

On Wed, Oct 3, 2012 at 6:34 AM, Huanchen Zhang <ia...@gmail.com> wrote:

> Hello,
>
> I have a small portion of map tasks whose output is much larger than
> others (more spills). So the reducer is mainly waiting for these a few map
> tasks. Is there a good solution for this problem ?
>
> Thank you.
>
> Best,
> Huanchen

Re: A small portion of map tasks slows down the job

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

Would reducing the output from the map tasks solve the problem ? i.e. are
reducers slowing down because a lot of data is being shuffled ?

If that's the case, you could see if the map output size will reduce by
using the framework's combiner or an in-mapper combining technique.

Thanks
Hemanth

On Wed, Oct 3, 2012 at 6:34 AM, Huanchen Zhang <ia...@gmail.com> wrote:

> Hello,
>
> I have a small portion of map tasks whose output is much larger than
> others (more spills). So the reducer is mainly waiting for these a few map
> tasks. Is there a good solution for this problem ?
>
> Thank you.
>
> Best,
> Huanchen

Re: A small portion of map tasks slows down the job

Posted by JAX <ja...@gmail.com>.
This is reasonable if you have any kind of trends  in the ordering of your data or  any computation in the mappers.

You can use a smaller input split to
Reduce the load on each individual mapper so that large blocks of records that take a long time To Process are less likely to clog one mapper.

Jay Vyas 
MMSB
UCHC

On Oct 2, 2012, at 9:04 PM, Huanchen Zhang <ia...@gmail.com> wrote:

> Hello,
> 
> I have a small portion of map tasks whose output is much larger than others (more spills). So the reducer is mainly waiting for these a few map tasks. Is there a good solution for this problem ?
> 
> Thank you.
> 
> Best,
> Huanchen

Re: A small portion of map tasks slows down the job

Posted by JAX <ja...@gmail.com>.
This is reasonable if you have any kind of trends  in the ordering of your data or  any computation in the mappers.

You can use a smaller input split to
Reduce the load on each individual mapper so that large blocks of records that take a long time To Process are less likely to clog one mapper.

Jay Vyas 
MMSB
UCHC

On Oct 2, 2012, at 9:04 PM, Huanchen Zhang <ia...@gmail.com> wrote:

> Hello,
> 
> I have a small portion of map tasks whose output is much larger than others (more spills). So the reducer is mainly waiting for these a few map tasks. Is there a good solution for this problem ?
> 
> Thank you.
> 
> Best,
> Huanchen

Re: A small portion of map tasks slows down the job

Posted by JAX <ja...@gmail.com>.
This is reasonable if you have any kind of trends  in the ordering of your data or  any computation in the mappers.

You can use a smaller input split to
Reduce the load on each individual mapper so that large blocks of records that take a long time To Process are less likely to clog one mapper.

Jay Vyas 
MMSB
UCHC

On Oct 2, 2012, at 9:04 PM, Huanchen Zhang <ia...@gmail.com> wrote:

> Hello,
> 
> I have a small portion of map tasks whose output is much larger than others (more spills). So the reducer is mainly waiting for these a few map tasks. Is there a good solution for this problem ?
> 
> Thank you.
> 
> Best,
> Huanchen

Re: A small portion of map tasks slows down the job

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

Would reducing the output from the map tasks solve the problem ? i.e. are
reducers slowing down because a lot of data is being shuffled ?

If that's the case, you could see if the map output size will reduce by
using the framework's combiner or an in-mapper combining technique.

Thanks
Hemanth

On Wed, Oct 3, 2012 at 6:34 AM, Huanchen Zhang <ia...@gmail.com> wrote:

> Hello,
>
> I have a small portion of map tasks whose output is much larger than
> others (more spills). So the reducer is mainly waiting for these a few map
> tasks. Is there a good solution for this problem ?
>
> Thank you.
>
> Best,
> Huanchen

Re: A small portion of map tasks slows down the job

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Hi,

Would reducing the output from the map tasks solve the problem ? i.e. are
reducers slowing down because a lot of data is being shuffled ?

If that's the case, you could see if the map output size will reduce by
using the framework's combiner or an in-mapper combining technique.

Thanks
Hemanth

On Wed, Oct 3, 2012 at 6:34 AM, Huanchen Zhang <ia...@gmail.com> wrote:

> Hello,
>
> I have a small portion of map tasks whose output is much larger than
> others (more spills). So the reducer is mainly waiting for these a few map
> tasks. Is there a good solution for this problem ?
>
> Thank you.
>
> Best,
> Huanchen

Re: A small portion of map tasks slows down the job

Posted by JAX <ja...@gmail.com>.
This is reasonable if you have any kind of trends  in the ordering of your data or  any computation in the mappers.

You can use a smaller input split to
Reduce the load on each individual mapper so that large blocks of records that take a long time To Process are less likely to clog one mapper.

Jay Vyas 
MMSB
UCHC

On Oct 2, 2012, at 9:04 PM, Huanchen Zhang <ia...@gmail.com> wrote:

> Hello,
> 
> I have a small portion of map tasks whose output is much larger than others (more spills). So the reducer is mainly waiting for these a few map tasks. Is there a good solution for this problem ?
> 
> Thank you.
> 
> Best,
> Huanchen