You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jeff Zhang <zj...@gmail.com> on 2009/10/27 02:05:10 UTC

Does the map task push map output to reduce task or reduce task pull it from map task

Hi all,

I'd like to know does the map task push map output to reduce task or reduce
task pull it from map task ? Which way is real in hadoop ?

Thank you very much.


Jeff zhang

Re: Does the map task push map output to reduce task or reduce task pull it from map task

Posted by Jothi Padmanabhan <jo...@yahoo-inc.com>.
> Don't
> know what the equivalent would be in the mapreduce package
> in 0.20.x.
>
> dave bayer
>   
The framework code to do with fetching of map outputs is the same for 
both the mapred and mapreduce based reducers.


Re: Does the map task push map output to reduce task or reduce task pull it from map task

Posted by dave bayer <da...@cloudfactory.org>.
On Oct 26, 2009, at 6:05 PM, Jeff Zhang wrote:

> I'd like to know does the map task push map output to reduce task or  
> reduce
> task pull it from map task ? Which way is real in hadoop ?

In 0.19, it appears to be a pull. Look at the run() method in mapred/ 
org/apache/hadoop/mapred/ReduceTask.java. Don't
know what the equivalent would be in the mapreduce package
in 0.20.x.

dave bayer

Re: Does the map task push map output to reduce task or reduce task pull it from map task

Posted by Prabhu Hari Dhanapal <dr...@gmail.com>.
Well ,I m not sure But I think it might be the  pull.. because physically
the mappers and the reducers are the same nodes ,So if the Mappers  had to
push , it might be the case that all nodes are mapping and there are no
reducers  to  accept it. May be for this  reason ,unless all of the Mapper
tasks are finished, the reducers might not want to start  reducing  anything
@all..

There is also this sort shuffle layer between  maping and  reducing , it
 clearly demarcates the phases.. whihc seem to suggest that its the pull
rather than the push ..

You might think of this as a performance bottle neck, but in reality it
seems it isnt .

btw, Wait for some expert to answer, I m  a  beginner too !

On Mon, Oct 26, 2009 at 9:05 PM, Jeff Zhang <zj...@gmail.com> wrote:

> Hi all,
>
> I'd like to know does the map task push map output to reduce task or reduce
> task pull it from map task ? Which way is real in hadoop ?
>
> Thank you very much.
>
>
> Jeff zhang
>



-- 
Hari

Re: Does the map task push map output to reduce task or reduce task pull it from map task

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi,
Reduce task looks at  map tasks for the partition it requires, and pulls it ( the number of parallel copies is controlled by reduce.parallel.copies ). As partitions are taken in by reduce task, it performs a merge sort, this forms your S&S phase. Typically your mappers / reducers are O(n) , S&S is O(nlogn), so if the amount of intermediate data is huge you will see a relative drop in performance.

Amogh


On 10/27/09 6:35 AM, "Jeff Zhang" <zj...@gmail.com> wrote:

Hi all,

I'd like to know does the map task push map output to reduce task or reduce
task pull it from map task ? Which way is real in hadoop ?

Thank you very much.


Jeff zhang