You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Panayotis Antonopoulos <an...@hotmail.com> on 2011/04/29 20:48:54 UTC

ChainMapper and ChainReducer: Are the key/value pairs distributed to the nodes of the cluster before each Map phase?

Hello,
Let' say we have a MR job that uses ChainMapper and ChainReducer like in the following diagram:
Input->Map1->Map2->Reduce->Map3->Output

The input is split and distributed to the nodes of the cluster before being processed by Map1 phase.
Also, before the Reduce phase the key/value pairs are also distributed to the Reducers according to the Partitions made by the Partitioner.

I expected that the same thing (distribution of the keys) would happen before Map2 and Map3 phases but after reading "Pro Hadoop" Book I strongly doubt it.

I would like to ask you if the key/value pairs emitted by the Map1 phase (or those emitted by the Reduce phase) are distributed to the nodes of the cluster before being processed by the next Map phase,
or if the output of the Map1 phase (or Reduce phase) is immediately inserted to Map2 phase (or Map3 Phase) within the same node, without any distribution.

Thank you in advance!
Panagiotis Antonopoulos

Re: ChainMapper and ChainReducer: Are the key/value pairs distributed to the nodes of the cluster before each Map phase?

Posted by Rahul Jain <rj...@gmail.com>.

Your latter statement is correct:

> if the output of the Map1 phase (or Reduce phase) is immediately inserted
to Map2 phase (or Map3 Phase) within the same node, without any
distribution.

ChainMappers / ChainReducers are just convenience classes to allow reuse of
mapper code  whether executing as part of a sequence or executing
standalone. These do not force the system to do any additional distribution,
grouping, sorting etc.

-Rahul

2011/4/29 Panayotis Antonopoulos <an...@hotmail.com>

>
> Hello,
> Let' say we have a MR job that uses ChainMapper and ChainReducer like in
> the following diagram:
> Input->Map1->Map2->Reduce->Map3->Output
>
> The input is split and distributed to the nodes of the cluster before being
> processed by Map1 phase.
> Also, before the Reduce phase the key/value pairs are also distributed to
> the Reducers according to the Partitions made by the Partitioner.
>
> I expected that the same thing (distribution of the keys) would happen
> before Map2 and Map3 phases but after reading "Pro Hadoop" Book I strongly
> doubt it.
>
> I would like to ask you if the key/value pairs emitted by the Map1 phase
> (or those emitted by the Reduce phase) are distributed to the nodes of the
> cluster before being processed by the next Map phase,
> or if the output of the Map1 phase (or Reduce phase) is immediately
> inserted to Map2 phase (or Map3 Phase) within the same node, without any
> distribution.
>
> Thank you in advance!
> Panagiotis Antonopoulos
>