You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Zheng Shao <zs...@facebook.com> on 2009/08/16 04:00:41 UTC

merging multiple mapper's outputs

Does hadoop have the capability of merging multiple mappers(on the same node) output into a single one, to speed up the shuffling phase? Is there a JIRA that I can find more information about it?

Zheng


Re: merging multiple mapper's outputs

Posted by Hong Tang <ht...@yahoo-inc.com>.
Combiner may reduce the total amount of data transfered across the  
network. Even if it does not, the IO size would be bigger and makes  
disks perform faster. Yes, we do need to be concerned about the  
possible increase of latency, but we can either allow user to control  
it or use indicators like cluster load, the progress of other maps  
(local or remote), and the duration of map tasks to decide whether  
such a delay would be overall beneficial or not.

On Aug 17, 2009, at 5:35 AM, Amogh Vasekar wrote:

> Same amount of data will have to be read and transferred over  
> network, same file or multiple files. If you do merge to a single  
> file, the S&S phase actually cant start till all mappers have  
> finished, as opposed to fetching outputs from individual mapper  
> tasks which can be as soon as it has finished.
> Just my two cents.
>
> Amogh
>
> -----Original Message-----
> From: Zheng Shao [mailto:zshao@facebook.com]
> Sent: Monday, August 17, 2009 3:36 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: RE: merging multiple mapper's outputs
>
> Multiple mapper tasks.
>
> Combiner is independent from this functionality I think.  Combiner  
> merges rows with the same key. It can work on both single mapper  
> output and multiple mapper outputs together.
>
> Zheng
> -----Original Message-----
> From: Zhong Wang [mailto:wangzhong.neu@gmail.com]
> Sent: Sunday, August 16, 2009 8:42 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: merging multiple mapper's outputs
>
> On Sun, Aug 16, 2009 at 10:00 AM, Zheng Shao<zs...@facebook.com>  
> wrote:
>> Does hadoop have the capability of merging multiple mappers(on the  
>> same
>> node) output into a single one, to speed up the shuffling phase? Is  
>> there a
>> JIRA that I can find more information about it?
>
> Do you mean outputs from multiple mapper tasks or multiple mapper
> functions? Could Combiner help?
>
>
>
> -- 
> Zhong Wang


RE: merging multiple mapper's outputs

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Same amount of data will have to be read and transferred over network, same file or multiple files. If you do merge to a single file, the S&S phase actually cant start till all mappers have finished, as opposed to fetching outputs from individual mapper tasks which can be as soon as it has finished.
Just my two cents.

Amogh

-----Original Message-----
From: Zheng Shao [mailto:zshao@facebook.com] 
Sent: Monday, August 17, 2009 3:36 AM
To: mapreduce-user@hadoop.apache.org
Subject: RE: merging multiple mapper's outputs

Multiple mapper tasks.

Combiner is independent from this functionality I think.  Combiner merges rows with the same key. It can work on both single mapper output and multiple mapper outputs together.

Zheng
-----Original Message-----
From: Zhong Wang [mailto:wangzhong.neu@gmail.com] 
Sent: Sunday, August 16, 2009 8:42 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: merging multiple mapper's outputs

On Sun, Aug 16, 2009 at 10:00 AM, Zheng Shao<zs...@facebook.com> wrote:
> Does hadoop have the capability of merging multiple mappers(on the same
> node) output into a single one, to speed up the shuffling phase? Is there a
> JIRA that I can find more information about it?

Do you mean outputs from multiple mapper tasks or multiple mapper
functions? Could Combiner help?



-- 
Zhong Wang

RE: merging multiple mapper's outputs

Posted by Zheng Shao <zs...@facebook.com>.
Multiple mapper tasks.

Combiner is independent from this functionality I think.  Combiner merges rows with the same key. It can work on both single mapper output and multiple mapper outputs together.

Zheng
-----Original Message-----
From: Zhong Wang [mailto:wangzhong.neu@gmail.com] 
Sent: Sunday, August 16, 2009 8:42 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: merging multiple mapper's outputs

On Sun, Aug 16, 2009 at 10:00 AM, Zheng Shao<zs...@facebook.com> wrote:
> Does hadoop have the capability of merging multiple mappers(on the same
> node) output into a single one, to speed up the shuffling phase? Is there a
> JIRA that I can find more information about it?

Do you mean outputs from multiple mapper tasks or multiple mapper
functions? Could Combiner help?



-- 
Zhong Wang

Re: merging multiple mapper's outputs

Posted by Zhong Wang <wa...@gmail.com>.
On Sun, Aug 16, 2009 at 10:00 AM, Zheng Shao<zs...@facebook.com> wrote:
> Does hadoop have the capability of merging multiple mappers(on the same
> node) output into a single one, to speed up the shuffling phase? Is there a
> JIRA that I can find more information about it?

Do you mean outputs from multiple mapper tasks or multiple mapper
functions? Could Combiner help?



-- 
Zhong Wang