You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Ahmad Shahzad <as...@gmail.com> on 2010/04/21 13:07:51 UTC

communication protocols in hadoop mapreduce

Hey everyone,
                     I wanted to know that which communication protocols
hadoop mapreduce uses under the hood to provide communication if any. For
example for the shuffle process it uses http to shuffle the values to the
reducers.
So, job tracker has to talk to task trackers, and task trackers have to
report back to job trackers, and what about if the data  is not available on
the same node and the slave node has to fetch the data from other node. In
all of the cases which communication mechanisms are used to achieve the
communication, is it http only??

I would really appreciate if someone can tell me regarding this thing or if
someone has some link that can help me regarding this issue.

Regards,
Ahmad Shahzad

Re: communication protocols in hadoop mapreduce

Posted by Rekha Joshi <re...@yahoo-inc.com>.
A quick answer would be - it is heartbeat communication mechanism, poll-like flow between JT/TT's.
Also for communication underneath its RPC, and not the default Java serialization but a hadoop specific serialization implementation to have some performance gains.

AVRO is in strong contention to be used in hadoop for serialization.You might like to also look up into Thrift, Google Protocol Buffers.

Cheers,
/

On 4/21/10 4:37 PM, "Ahmad Shahzad" <as...@gmail.com> wrote:

Hey everyone,
                     I wanted to know that which communication protocols
hadoop mapreduce uses under the hood to provide communication if any. For
example for the shuffle process it uses http to shuffle the values to the
reducers.
So, job tracker has to talk to task trackers, and task trackers have to
report back to job trackers, and what about if the data  is not available on
the same node and the slave node has to fetch the data from other node. In
all of the cases which communication mechanisms are used to achieve the
communication, is it http only??

I would really appreciate if someone can tell me regarding this thing or if
someone has some link that can help me regarding this issue.

Regards,
Ahmad Shahzad