You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Mohit Anchlia <mo...@gmail.com> on 2013/09/19 20:33:36 UTC

YARN MapReduce 2 concepts

I am going through the concepts of resource manager, application master and
node manager. As I undersand resource manager receives the job submission
and launches application master. It also launches node manager to monitor
application master. My questions are:

1. Is Node manager long lived and that one node manager monitors all the
containers launed on the data nodes?
2. How is resource negotiation done between the application master and the
resource manager? In other words what happens during this step? Does
resource manager looks at the active and pending tasks and resources
consumed by those before giving containers to the application master?
3. As it happens in old map reduce cluster that task trackers sends
periodic heartbeats to the job tracker nodes. How does this compare to
YARN? It looks like application master is a task tracker? Little confused
here.
4. It looks like client polls application master to get the progress of the
job but initially client connects to the resource manager. How does client
gets reference to the application master? Does it mean that client gets the
node ip/port from resource manager where application master was launced by
the resource manager?

Re: YARN MapReduce 2 concepts

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Mohit,
answers inline


On Fri, Sep 20, 2013 at 1:33 AM, Mohit Anchlia <mo...@gmail.com>wrote:

> I am going through the concepts of resource manager, application master
> and node manager. As I undersand resource manager receives the job
> submission and launches application master. It also launches node manager
> to monitor application master. My questions are:
>
> 1. Is Node manager long lived and that one node manager monitors all the
> containers launed on the data nodes?
>

Correct


> 2. How is resource negotiation done between the application master and the
> resource manager? In other words what happens during this step? Does
> resource manager looks at the active and pending tasks and resources
> consumed by those before giving containers to the application master?
>

The ResourceManager contains a pluggable scheduler that is responsible for
deciding which applications to give resources to when they become
available.  When a NodeManager heartbeats to the ResourceManager, the
scheduler will decide whether there are any containers it should place on
that node for an application, and will let the Application Master know
about its decision on the next AM-RM heartbeat.  Here's documentation for
the two recommended schedulers:
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html


> 3. As it happens in old map reduce cluster that task trackers sends
> periodic heartbeats to the job tracker nodes. How does this compare to
> YARN? It looks like application master is a task tracker? Little confused
> here.
>

The analog to this is the NodeManager sending periodic heartbeats to the
ResourceManager.  The Application Master also sends periodic heartbeats to
the NodeManagers that its containers are running on to check on their
status.


> 4. It looks like client polls application master to get the progress of
> the job but initially client connects to the resource manager. How does
> client gets reference to the application master? Does it mean that client
> gets the node ip/port from resource manager where application master was
> launced by the resource manager?
>

Correct

Re: YARN MapReduce 2 concepts

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Mohit,
answers inline


On Fri, Sep 20, 2013 at 1:33 AM, Mohit Anchlia <mo...@gmail.com>wrote:

> I am going through the concepts of resource manager, application master
> and node manager. As I undersand resource manager receives the job
> submission and launches application master. It also launches node manager
> to monitor application master. My questions are:
>
> 1. Is Node manager long lived and that one node manager monitors all the
> containers launed on the data nodes?
>

Correct


> 2. How is resource negotiation done between the application master and the
> resource manager? In other words what happens during this step? Does
> resource manager looks at the active and pending tasks and resources
> consumed by those before giving containers to the application master?
>

The ResourceManager contains a pluggable scheduler that is responsible for
deciding which applications to give resources to when they become
available.  When a NodeManager heartbeats to the ResourceManager, the
scheduler will decide whether there are any containers it should place on
that node for an application, and will let the Application Master know
about its decision on the next AM-RM heartbeat.  Here's documentation for
the two recommended schedulers:
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html


> 3. As it happens in old map reduce cluster that task trackers sends
> periodic heartbeats to the job tracker nodes. How does this compare to
> YARN? It looks like application master is a task tracker? Little confused
> here.
>

The analog to this is the NodeManager sending periodic heartbeats to the
ResourceManager.  The Application Master also sends periodic heartbeats to
the NodeManagers that its containers are running on to check on their
status.


> 4. It looks like client polls application master to get the progress of
> the job but initially client connects to the resource manager. How does
> client gets reference to the application master? Does it mean that client
> gets the node ip/port from resource manager where application master was
> launced by the resource manager?
>

Correct

Re: YARN MapReduce 2 concepts

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Mohit,
answers inline


On Fri, Sep 20, 2013 at 1:33 AM, Mohit Anchlia <mo...@gmail.com>wrote:

> I am going through the concepts of resource manager, application master
> and node manager. As I undersand resource manager receives the job
> submission and launches application master. It also launches node manager
> to monitor application master. My questions are:
>
> 1. Is Node manager long lived and that one node manager monitors all the
> containers launed on the data nodes?
>

Correct


> 2. How is resource negotiation done between the application master and the
> resource manager? In other words what happens during this step? Does
> resource manager looks at the active and pending tasks and resources
> consumed by those before giving containers to the application master?
>

The ResourceManager contains a pluggable scheduler that is responsible for
deciding which applications to give resources to when they become
available.  When a NodeManager heartbeats to the ResourceManager, the
scheduler will decide whether there are any containers it should place on
that node for an application, and will let the Application Master know
about its decision on the next AM-RM heartbeat.  Here's documentation for
the two recommended schedulers:
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html


> 3. As it happens in old map reduce cluster that task trackers sends
> periodic heartbeats to the job tracker nodes. How does this compare to
> YARN? It looks like application master is a task tracker? Little confused
> here.
>

The analog to this is the NodeManager sending periodic heartbeats to the
ResourceManager.  The Application Master also sends periodic heartbeats to
the NodeManagers that its containers are running on to check on their
status.


> 4. It looks like client polls application master to get the progress of
> the job but initially client connects to the resource manager. How does
> client gets reference to the application master? Does it mean that client
> gets the node ip/port from resource manager where application master was
> launced by the resource manager?
>

Correct

Re: YARN MapReduce 2 concepts

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Mohit,
answers inline


On Fri, Sep 20, 2013 at 1:33 AM, Mohit Anchlia <mo...@gmail.com>wrote:

> I am going through the concepts of resource manager, application master
> and node manager. As I undersand resource manager receives the job
> submission and launches application master. It also launches node manager
> to monitor application master. My questions are:
>
> 1. Is Node manager long lived and that one node manager monitors all the
> containers launed on the data nodes?
>

Correct


> 2. How is resource negotiation done between the application master and the
> resource manager? In other words what happens during this step? Does
> resource manager looks at the active and pending tasks and resources
> consumed by those before giving containers to the application master?
>

The ResourceManager contains a pluggable scheduler that is responsible for
deciding which applications to give resources to when they become
available.  When a NodeManager heartbeats to the ResourceManager, the
scheduler will decide whether there are any containers it should place on
that node for an application, and will let the Application Master know
about its decision on the next AM-RM heartbeat.  Here's documentation for
the two recommended schedulers:
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html


> 3. As it happens in old map reduce cluster that task trackers sends
> periodic heartbeats to the job tracker nodes. How does this compare to
> YARN? It looks like application master is a task tracker? Little confused
> here.
>

The analog to this is the NodeManager sending periodic heartbeats to the
ResourceManager.  The Application Master also sends periodic heartbeats to
the NodeManagers that its containers are running on to check on their
status.


> 4. It looks like client polls application master to get the progress of
> the job but initially client connects to the resource manager. How does
> client gets reference to the application master? Does it mean that client
> gets the node ip/port from resource manager where application master was
> launced by the resource manager?
>

Correct