You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2013/05/29 00:43:16 UTC

NM/AM interaction

I was reading from the HortonWorks blog:
"How MapReduce shuffle takes advantage of NM's Auxiliary-services
The Shuffle functionality required to run a MapReduce (MR) application is implemented as an Auxiliary Service. This service starts up a Netty Web Server, and knows how to handle MR specific shuffle requests from Reduce tasks. The MR AM specifies the service id for the shuffle service, along with security tokens that may be required. The NM provides the AM with the port on which the shuffle service is running which is passed onto the Reduce tasks."
How does the AM get the service ID and the port?
Thanks,
John

Re: NM/AM interaction

Posted by Harsh J <ha...@cloudera.com>.

All AuxiliaryServices are configured in yarn-site.xml with a service
ID and a class is associated to the defined service ID. For MR2, one
generally adds the two below properties, first defining the name
(Service ID), second defining the class to launch for the defined
name:

<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>

As part of the interface, all AuxiliaryServices may submit back some
metadata that they would require clients to be aware of. For the
ShuffleHandler, the port is rather important, so it serializes it via
the getMeta() interface. [1]

As part of any startContainer(…) response from the NodeManager's
ContainerManager service, all metadata of all available auxiliary
services are shipped back as part of a successful response to a
container start request. This is a mapping based on (configured
service ID -> metadata) for every Aux service configured and currently
running. [2]

A client, such as MR2, receives this batch of metadata and
deserializes whatever it is looking for [3] using the service ID name
string it is aware of [4].

[1] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java#L195
and https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L328
[2] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java#L502
[3] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java#L165
[4] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L148

On Wed, May 29, 2013 at 4:13 AM, John Lilley <jo...@redpoint.net> wrote:
> I was reading from the HortonWorks blog:
>
> “How MapReduce shuffle takes advantage of NM’s Auxiliary-services
>
> The Shuffle functionality required to run a MapReduce (MR) application is
> implemented as an Auxiliary Service. This service starts up a Netty Web
> Server, and knows how to handle MR specific shuffle requests from Reduce
> tasks. The MR AM specifies the service id for the shuffle service, along
> with security tokens that may be required. The NM provides the AM with the
> port on which the shuffle service is running which is passed onto the Reduce
> tasks.”
>
> How does the AM get the service ID and the port?
>
> Thanks,
>
> John
>
>

--
Harsh J

Re: NM/AM interaction

Posted by Harsh J <ha...@cloudera.com>.

All AuxiliaryServices are configured in yarn-site.xml with a service
ID and a class is associated to the defined service ID. For MR2, one
generally adds the two below properties, first defining the name
(Service ID), second defining the class to launch for the defined
name:

<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>

As part of the interface, all AuxiliaryServices may submit back some
metadata that they would require clients to be aware of. For the
ShuffleHandler, the port is rather important, so it serializes it via
the getMeta() interface. [1]

As part of any startContainer(…) response from the NodeManager's
ContainerManager service, all metadata of all available auxiliary
services are shipped back as part of a successful response to a
container start request. This is a mapping based on (configured
service ID -> metadata) for every Aux service configured and currently
running. [2]

A client, such as MR2, receives this batch of metadata and
deserializes whatever it is looking for [3] using the service ID name
string it is aware of [4].

[1] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java#L195
and https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L328
[2] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java#L502
[3] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java#L165
[4] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L148

On Wed, May 29, 2013 at 4:13 AM, John Lilley <jo...@redpoint.net> wrote:
> I was reading from the HortonWorks blog:
>
> “How MapReduce shuffle takes advantage of NM’s Auxiliary-services
>
> The Shuffle functionality required to run a MapReduce (MR) application is
> implemented as an Auxiliary Service. This service starts up a Netty Web
> Server, and knows how to handle MR specific shuffle requests from Reduce
> tasks. The MR AM specifies the service id for the shuffle service, along
> with security tokens that may be required. The NM provides the AM with the
> port on which the shuffle service is running which is passed onto the Reduce
> tasks.”
>
> How does the AM get the service ID and the port?
>
> Thanks,
>
> John
>
>

--
Harsh J

Re: NM/AM interaction

Posted by Harsh J <ha...@cloudera.com>.

All AuxiliaryServices are configured in yarn-site.xml with a service
ID and a class is associated to the defined service ID. For MR2, one
generally adds the two below properties, first defining the name
(Service ID), second defining the class to launch for the defined
name:

<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>

As part of the interface, all AuxiliaryServices may submit back some
metadata that they would require clients to be aware of. For the
ShuffleHandler, the port is rather important, so it serializes it via
the getMeta() interface. [1]

As part of any startContainer(…) response from the NodeManager's
ContainerManager service, all metadata of all available auxiliary
services are shipped back as part of a successful response to a
container start request. This is a mapping based on (configured
service ID -> metadata) for every Aux service configured and currently
running. [2]

A client, such as MR2, receives this batch of metadata and
deserializes whatever it is looking for [3] using the service ID name
string it is aware of [4].

[1] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java#L195
and https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L328
[2] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java#L502
[3] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java#L165
[4] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L148

On Wed, May 29, 2013 at 4:13 AM, John Lilley <jo...@redpoint.net> wrote:
> I was reading from the HortonWorks blog:
>
> “How MapReduce shuffle takes advantage of NM’s Auxiliary-services
>
> The Shuffle functionality required to run a MapReduce (MR) application is
> implemented as an Auxiliary Service. This service starts up a Netty Web
> Server, and knows how to handle MR specific shuffle requests from Reduce
> tasks. The MR AM specifies the service id for the shuffle service, along
> with security tokens that may be required. The NM provides the AM with the
> port on which the shuffle service is running which is passed onto the Reduce
> tasks.”
>
> How does the AM get the service ID and the port?
>
> Thanks,
>
> John
>
>

--
Harsh J

Re: NM/AM interaction

Posted by Harsh J <ha...@cloudera.com>.

All AuxiliaryServices are configured in yarn-site.xml with a service
ID and a class is associated to the defined service ID. For MR2, one
generally adds the two below properties, first defining the name
(Service ID), second defining the class to launch for the defined
name:

<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>

As part of the interface, all AuxiliaryServices may submit back some
metadata that they would require clients to be aware of. For the
ShuffleHandler, the port is rather important, so it serializes it via
the getMeta() interface. [1]

As part of any startContainer(…) response from the NodeManager's
ContainerManager service, all metadata of all available auxiliary
services are shipped back as part of a successful response to a
container start request. This is a mapping based on (configured
service ID -> metadata) for every Aux service configured and currently
running. [2]

A client, such as MR2, receives this batch of metadata and
deserializes whatever it is looking for [3] using the service ID name
string it is aware of [4].

[1] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java#L195
and https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L328
[2] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java#L502
[3] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java#L165
[4] - https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L148

On Wed, May 29, 2013 at 4:13 AM, John Lilley <jo...@redpoint.net> wrote:
> I was reading from the HortonWorks blog:
>
> “How MapReduce shuffle takes advantage of NM’s Auxiliary-services
>
> The Shuffle functionality required to run a MapReduce (MR) application is
> implemented as an Auxiliary Service. This service starts up a Netty Web
> Server, and knows how to handle MR specific shuffle requests from Reduce
> tasks. The MR AM specifies the service id for the shuffle service, along
> with security tokens that may be required. The NM provides the AM with the
> port on which the shuffle service is running which is passed onto the Reduce
> tasks.”
>
> How does the AM get the service ID and the port?
>
> Thanks,
>
> John
>
>

--
Harsh J