You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mithila Nagendra <mn...@asu.edu> on 2009/03/07 21:03:49 UTC
Does reduce start only after the map is completed?
Hey all
Im using the hadoop version 0.18.3, and was wondering if the reduce phase
starts only after the mapping is completed? Is it required that the Map
phase is a 100% done, or can it be programmed in such a way that the reduce
starts earlier?
Thanks!
Mithila Nagendra
Arizona State University
Re: Does reduce start only after the map is completed?
Posted by Tim Wintle <ti...@teamrubber.com>.
On Sat, 2009-03-07 at 23:03 +0300, Mithila Nagendra wrote:
> Hey all
> Im using the hadoop version 0.18.3, and was wondering if the reduce phase
> starts only after the mapping is completed? Is it required that the Map
> phase is a 100% done, or can it be programmed in such a way that the reduce
> starts earlier?
As I understand it, the reducers have three phases:
1) Copy Data from the mappers ("Shuffle")
2) Sort the data on the reducer (by key)
3) Actually run the data through the function you've defined.
<http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Reducer.html>
The Reducer tasks/processes start as soon as they are able to (I
believe), and copying data and sorting happens while there may still be
mappers running.
Stage (3) cannot be run until stage (2) is completed, which can
obviously not happen until all the mappers are complete.
In my experience, I haven't found this a major issue (especially if
there are many times more mappers than machines), since the shuffle and
sort stages take significant time and effort anyway.
Tim Wintle
Re: Does reduce start only after the map is completed?
Posted by pa...@gmail.com.
On Sat, 07 Mar 2009 20:03:49 -0000, Mithila Nagendra <mn...@asu.edu>
wrote:
> Hey all
> Im using the hadoop version 0.18.3, and was wondering if the reduce phase
> starts only after the mapping is completed? Is it required that the Map
> phase is a 100% done, or can it be programmed in such a way that the
> reduce
> starts earlier?
>
> Thanks!
> Mithila Nagendra
> Arizona State University
As i can imagine, Reduce Phase starts immediately at Job starts and waits
data
from several Mappers. Say, you sonfigured system to run 2 reducers and 5
mappers.
When Job starts, 2 reducers also starts: one of them waits results from
some 2 maps, other one
waits results from other 3 maps. Between starts and stops of various
Mappers, the 2 Reducers alive
and collecting data from Mappers. After all 5 Mappers "eats" all the input
data, reducers terminates...