You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Zheng Shao <zs...@facebook.com> on 2009/08/16 04:00:41 UTC
merging multiple mapper's outputs
Does hadoop have the capability of merging multiple mappers(on the same node) output into a single one, to speed up the shuffling phase? Is there a JIRA that I can find more information about it?
Zheng
Re: merging multiple mapper's outputs
Posted by Hong Tang <ht...@yahoo-inc.com>.
Combiner may reduce the total amount of data transfered across the
network. Even if it does not, the IO size would be bigger and makes
disks perform faster. Yes, we do need to be concerned about the
possible increase of latency, but we can either allow user to control
it or use indicators like cluster load, the progress of other maps
(local or remote), and the duration of map tasks to decide whether
such a delay would be overall beneficial or not.
On Aug 17, 2009, at 5:35 AM, Amogh Vasekar wrote:
> Same amount of data will have to be read and transferred over
> network, same file or multiple files. If you do merge to a single
> file, the S&S phase actually cant start till all mappers have
> finished, as opposed to fetching outputs from individual mapper
> tasks which can be as soon as it has finished.
> Just my two cents.
>
> Amogh
>
> -----Original Message-----
> From: Zheng Shao [mailto:zshao@facebook.com]
> Sent: Monday, August 17, 2009 3:36 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: RE: merging multiple mapper's outputs
>
> Multiple mapper tasks.
>
> Combiner is independent from this functionality I think. Combiner
> merges rows with the same key. It can work on both single mapper
> output and multiple mapper outputs together.
>
> Zheng
> -----Original Message-----
> From: Zhong Wang [mailto:wangzhong.neu@gmail.com]
> Sent: Sunday, August 16, 2009 8:42 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: merging multiple mapper's outputs
>
> On Sun, Aug 16, 2009 at 10:00 AM, Zheng Shao<zs...@facebook.com>
> wrote:
>> Does hadoop have the capability of merging multiple mappers(on the
>> same
>> node) output into a single one, to speed up the shuffling phase? Is
>> there a
>> JIRA that I can find more information about it?
>
> Do you mean outputs from multiple mapper tasks or multiple mapper
> functions? Could Combiner help?
>
>
>
> --
> Zhong Wang
RE: merging multiple mapper's outputs
Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Same amount of data will have to be read and transferred over network, same file or multiple files. If you do merge to a single file, the S&S phase actually cant start till all mappers have finished, as opposed to fetching outputs from individual mapper tasks which can be as soon as it has finished.
Just my two cents.
Amogh
-----Original Message-----
From: Zheng Shao [mailto:zshao@facebook.com]
Sent: Monday, August 17, 2009 3:36 AM
To: mapreduce-user@hadoop.apache.org
Subject: RE: merging multiple mapper's outputs
Multiple mapper tasks.
Combiner is independent from this functionality I think. Combiner merges rows with the same key. It can work on both single mapper output and multiple mapper outputs together.
Zheng
-----Original Message-----
From: Zhong Wang [mailto:wangzhong.neu@gmail.com]
Sent: Sunday, August 16, 2009 8:42 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: merging multiple mapper's outputs
On Sun, Aug 16, 2009 at 10:00 AM, Zheng Shao<zs...@facebook.com> wrote:
> Does hadoop have the capability of merging multiple mappers(on the same
> node) output into a single one, to speed up the shuffling phase? Is there a
> JIRA that I can find more information about it?
Do you mean outputs from multiple mapper tasks or multiple mapper
functions? Could Combiner help?
--
Zhong Wang
RE: merging multiple mapper's outputs
Posted by Zheng Shao <zs...@facebook.com>.
Multiple mapper tasks.
Combiner is independent from this functionality I think. Combiner merges rows with the same key. It can work on both single mapper output and multiple mapper outputs together.
Zheng
-----Original Message-----
From: Zhong Wang [mailto:wangzhong.neu@gmail.com]
Sent: Sunday, August 16, 2009 8:42 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: merging multiple mapper's outputs
On Sun, Aug 16, 2009 at 10:00 AM, Zheng Shao<zs...@facebook.com> wrote:
> Does hadoop have the capability of merging multiple mappers(on the same
> node) output into a single one, to speed up the shuffling phase? Is there a
> JIRA that I can find more information about it?
Do you mean outputs from multiple mapper tasks or multiple mapper
functions? Could Combiner help?
--
Zhong Wang
Re: merging multiple mapper's outputs
Posted by Zhong Wang <wa...@gmail.com>.
On Sun, Aug 16, 2009 at 10:00 AM, Zheng Shao<zs...@facebook.com> wrote:
> Does hadoop have the capability of merging multiple mappers(on the same
> node) output into a single one, to speed up the shuffling phase? Is there a
> JIRA that I can find more information about it?
Do you mean outputs from multiple mapper tasks or multiple mapper
functions? Could Combiner help?
--
Zhong Wang