You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by parth <pa...@yahoo.com> on 2010/02/27 12:06:29 UTC

Availability of values in a key in Reduce stage

Hi, 

I am confused on a particular point about reducer. can anyone guide me about
the same ?

When mapper starts generating key value pairs, will it all be available in
reducer i.e. after all mappers have exited?  I mean for a key K will all
values be grouped and available in reducer. Or Will the reducer run on a
single key-value pair as it becomes available ? 
Second option seems high unrealistic. 

Thanks,
Parth
-- 
View this message in context: http://old.nabble.com/Availability-of-values-in-a-key-in-Reduce-stage-tp27727136p27727136.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Availability of values in a key in Reduce stage

Posted by Amar Kamat <am...@yahoo-inc.com>.

Parth,
The reducer process has 2 distinct steps

 1.  Shuffle
 2.  Reduce

In shuffle phase, the reducer 'r' does the following

 1.  copies the data generated by all the mappers for the reducer 'r'
 2.  sorts it

After the shuffle phase the reduce phase starts.  In this phase the reducer  invokes the reduce() function for each [k,<v1,v2...>] pairs generated in the shuffle phase.

Amar


On 2/27/10 4:36 PM, "parth" <pa...@yahoo.com> wrote:



Hi,

I am confused on a particular point about reducer. can anyone guide me about
the same ?

When mapper starts generating key value pairs, will it all be available in
reducer i.e. after all mappers have exited?  I mean for a key K will all
values be grouped and available in reducer. Or Will the reducer run on a
single key-value pair as it becomes available ?
Second option seems high unrealistic.

Thanks,
Parth
--
View this message in context: http://old.nabble.com/Availability-of-values-in-a-key-in-Reduce-stage-tp27727136p27727136.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.