You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jeff Zhang <zj...@gmail.com> on 2009/10/27 02:05:10 UTC
Does the map task push map output to reduce task or reduce task pull
it from map task
Hi all,
I'd like to know does the map task push map output to reduce task or reduce
task pull it from map task ? Which way is real in hadoop ?
Thank you very much.
Jeff zhang
Re: Does the map task push map output to reduce task or reduce task
pull it from map task
Posted by Jothi Padmanabhan <jo...@yahoo-inc.com>.
> Don't
> know what the equivalent would be in the mapreduce package
> in 0.20.x.
>
> dave bayer
>
The framework code to do with fetching of map outputs is the same for
both the mapred and mapreduce based reducers.
Re: Does the map task push map output to reduce task or reduce task pull it from map task
Posted by dave bayer <da...@cloudfactory.org>.
On Oct 26, 2009, at 6:05 PM, Jeff Zhang wrote:
> I'd like to know does the map task push map output to reduce task or
> reduce
> task pull it from map task ? Which way is real in hadoop ?
In 0.19, it appears to be a pull. Look at the run() method in mapred/
org/apache/hadoop/mapred/ReduceTask.java. Don't
know what the equivalent would be in the mapreduce package
in 0.20.x.
dave bayer
Re: Does the map task push map output to reduce task or reduce task
pull it from map task
Posted by Prabhu Hari Dhanapal <dr...@gmail.com>.
Well ,I m not sure But I think it might be the pull.. because physically
the mappers and the reducers are the same nodes ,So if the Mappers had to
push , it might be the case that all nodes are mapping and there are no
reducers to accept it. May be for this reason ,unless all of the Mapper
tasks are finished, the reducers might not want to start reducing anything
@all..
There is also this sort shuffle layer between maping and reducing , it
clearly demarcates the phases.. whihc seem to suggest that its the pull
rather than the push ..
You might think of this as a performance bottle neck, but in reality it
seems it isnt .
btw, Wait for some expert to answer, I m a beginner too !
On Mon, Oct 26, 2009 at 9:05 PM, Jeff Zhang <zj...@gmail.com> wrote:
> Hi all,
>
> I'd like to know does the map task push map output to reduce task or reduce
> task pull it from map task ? Which way is real in hadoop ?
>
> Thank you very much.
>
>
> Jeff zhang
>
--
Hari
Re: Does the map task push map output to reduce task or reduce task
pull it from map task
Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi,
Reduce task looks at map tasks for the partition it requires, and pulls it ( the number of parallel copies is controlled by reduce.parallel.copies ). As partitions are taken in by reduce task, it performs a merge sort, this forms your S&S phase. Typically your mappers / reducers are O(n) , S&S is O(nlogn), so if the amount of intermediate data is huge you will see a relative drop in performance.
Amogh
On 10/27/09 6:35 AM, "Jeff Zhang" <zj...@gmail.com> wrote:
Hi all,
I'd like to know does the map task push map output to reduce task or reduce
task pull it from map task ? Which way is real in hadoop ?
Thank you very much.
Jeff zhang