You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by ricky l <ri...@gmail.com> on 2013/11/21 16:52:01 UTC

In YARN, how does a task tracker knows the address of a job tracker?

Hi all,

I have a question of how a task tracker identifies job tracker address when
I submit MR job through YARN. As far as I know, both job tracker and task
trackers are launched through application master and I am curious about the
details about job and task tracker launch sequence.

thanks.

RE: In YARN, how does a task tracker knows the address of a job tracker?

Posted by John Lilley <jo...@redpoint.net>.
Ricky,

What you are doing sounds familiar.  We are in the process of implementing, not exactly MapReduce, but a system that has to do many of the things that MapReduce does (find data splits, define tasks, choose execution affinity, launch an app master, etc)

There is another special thing that MapReduce under YARN does that a normal YARN app cannot easily access, which are "auxiliary services".  MapReduce sets up a YARN auxiliary service to serve up the results of mapper outputs.  I think it is based on netty or jetty and HTTP.  The point is, that the MR aux service is part of the Hadoop distro, so all MR has to do is tell the NM to run it.  Regular YARN apps don't have this luxury without installing jars on each node and adding them to the hadoop stack's CLASSPATH.  There doesn't appear to be any standard or documented way to inject extra jars into the hadoop install.  As they say, that exercise is left to the reader.

john

From: ricky l [mailto:rickylee0815@gmail.com]
Sent: Thursday, November 21, 2013 3:40 PM
To: user@hadoop.apache.org
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Hi John, thanks for your reply. I suspect there will be some external communication between AM and container tasks. I am trying to implement a Hadoop-like system to Yarn and I wanted to draw a high-level steps before starting the work. thanks,


On Thu, Nov 21, 2013 at 3:27 PM, John Lilley <jo...@redpoint.net>> wrote:
MapReduce also communicates outside of what is directly supported by YARN.
In a YARN application, there is very little direct communication between the client and the AM, and between the AM and container tasks.
I think that an AM can update to the client two pieces of information -- "state" and "percent complete".
However, at launch time an AM can open up a protocol port and tell the client and the container tasks how to connect back.
I don't know the details, but I believe that the MapReduce AM communicates directly with all mapper, reducer tasks as well as the client.
John


From: ricky l [mailto:rickylee0815@gmail.com<ma...@gmail.com>]
Sent: Thursday, November 21, 2013 12:36 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task tracker does not exist in the YARN MapReduce, doesn't it use the binary of job/task tracker? I though the application master runs job tracker binary and the containers in the node will run task tracker binary. thx

On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com>> wrote:
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all scheduling work)
2) Application state management :- managing and launching new map /reduce tasks (done by Application Master .. It is per job not one single entity in the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>. This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide 12) for how job actually gets executed.

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com>> wrote:
Hi all,

I have a question of how a task tracker identifies job tracker address when I submit MR job through YARN. As far as I know, both job tracker and task trackers are launched through application master and I am curious about the details about job and task tracker launch sequence.

thanks.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.



RE: In YARN, how does a task tracker knows the address of a job tracker?

Posted by John Lilley <jo...@redpoint.net>.
Ricky,

What you are doing sounds familiar.  We are in the process of implementing, not exactly MapReduce, but a system that has to do many of the things that MapReduce does (find data splits, define tasks, choose execution affinity, launch an app master, etc)

There is another special thing that MapReduce under YARN does that a normal YARN app cannot easily access, which are "auxiliary services".  MapReduce sets up a YARN auxiliary service to serve up the results of mapper outputs.  I think it is based on netty or jetty and HTTP.  The point is, that the MR aux service is part of the Hadoop distro, so all MR has to do is tell the NM to run it.  Regular YARN apps don't have this luxury without installing jars on each node and adding them to the hadoop stack's CLASSPATH.  There doesn't appear to be any standard or documented way to inject extra jars into the hadoop install.  As they say, that exercise is left to the reader.

john

From: ricky l [mailto:rickylee0815@gmail.com]
Sent: Thursday, November 21, 2013 3:40 PM
To: user@hadoop.apache.org
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Hi John, thanks for your reply. I suspect there will be some external communication between AM and container tasks. I am trying to implement a Hadoop-like system to Yarn and I wanted to draw a high-level steps before starting the work. thanks,


On Thu, Nov 21, 2013 at 3:27 PM, John Lilley <jo...@redpoint.net>> wrote:
MapReduce also communicates outside of what is directly supported by YARN.
In a YARN application, there is very little direct communication between the client and the AM, and between the AM and container tasks.
I think that an AM can update to the client two pieces of information -- "state" and "percent complete".
However, at launch time an AM can open up a protocol port and tell the client and the container tasks how to connect back.
I don't know the details, but I believe that the MapReduce AM communicates directly with all mapper, reducer tasks as well as the client.
John


From: ricky l [mailto:rickylee0815@gmail.com<ma...@gmail.com>]
Sent: Thursday, November 21, 2013 12:36 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task tracker does not exist in the YARN MapReduce, doesn't it use the binary of job/task tracker? I though the application master runs job tracker binary and the containers in the node will run task tracker binary. thx

On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com>> wrote:
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all scheduling work)
2) Application state management :- managing and launching new map /reduce tasks (done by Application Master .. It is per job not one single entity in the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>. This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide 12) for how job actually gets executed.

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com>> wrote:
Hi all,

I have a question of how a task tracker identifies job tracker address when I submit MR job through YARN. As far as I know, both job tracker and task trackers are launched through application master and I am curious about the details about job and task tracker launch sequence.

thanks.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.



RE: In YARN, how does a task tracker knows the address of a job tracker?

Posted by John Lilley <jo...@redpoint.net>.
Ricky,

What you are doing sounds familiar.  We are in the process of implementing, not exactly MapReduce, but a system that has to do many of the things that MapReduce does (find data splits, define tasks, choose execution affinity, launch an app master, etc)

There is another special thing that MapReduce under YARN does that a normal YARN app cannot easily access, which are "auxiliary services".  MapReduce sets up a YARN auxiliary service to serve up the results of mapper outputs.  I think it is based on netty or jetty and HTTP.  The point is, that the MR aux service is part of the Hadoop distro, so all MR has to do is tell the NM to run it.  Regular YARN apps don't have this luxury without installing jars on each node and adding them to the hadoop stack's CLASSPATH.  There doesn't appear to be any standard or documented way to inject extra jars into the hadoop install.  As they say, that exercise is left to the reader.

john

From: ricky l [mailto:rickylee0815@gmail.com]
Sent: Thursday, November 21, 2013 3:40 PM
To: user@hadoop.apache.org
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Hi John, thanks for your reply. I suspect there will be some external communication between AM and container tasks. I am trying to implement a Hadoop-like system to Yarn and I wanted to draw a high-level steps before starting the work. thanks,


On Thu, Nov 21, 2013 at 3:27 PM, John Lilley <jo...@redpoint.net>> wrote:
MapReduce also communicates outside of what is directly supported by YARN.
In a YARN application, there is very little direct communication between the client and the AM, and between the AM and container tasks.
I think that an AM can update to the client two pieces of information -- "state" and "percent complete".
However, at launch time an AM can open up a protocol port and tell the client and the container tasks how to connect back.
I don't know the details, but I believe that the MapReduce AM communicates directly with all mapper, reducer tasks as well as the client.
John


From: ricky l [mailto:rickylee0815@gmail.com<ma...@gmail.com>]
Sent: Thursday, November 21, 2013 12:36 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task tracker does not exist in the YARN MapReduce, doesn't it use the binary of job/task tracker? I though the application master runs job tracker binary and the containers in the node will run task tracker binary. thx

On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com>> wrote:
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all scheduling work)
2) Application state management :- managing and launching new map /reduce tasks (done by Application Master .. It is per job not one single entity in the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>. This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide 12) for how job actually gets executed.

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com>> wrote:
Hi all,

I have a question of how a task tracker identifies job tracker address when I submit MR job through YARN. As far as I know, both job tracker and task trackers are launched through application master and I am curious about the details about job and task tracker launch sequence.

thanks.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.



RE: In YARN, how does a task tracker knows the address of a job tracker?

Posted by John Lilley <jo...@redpoint.net>.
Ricky,

What you are doing sounds familiar.  We are in the process of implementing, not exactly MapReduce, but a system that has to do many of the things that MapReduce does (find data splits, define tasks, choose execution affinity, launch an app master, etc)

There is another special thing that MapReduce under YARN does that a normal YARN app cannot easily access, which are "auxiliary services".  MapReduce sets up a YARN auxiliary service to serve up the results of mapper outputs.  I think it is based on netty or jetty and HTTP.  The point is, that the MR aux service is part of the Hadoop distro, so all MR has to do is tell the NM to run it.  Regular YARN apps don't have this luxury without installing jars on each node and adding them to the hadoop stack's CLASSPATH.  There doesn't appear to be any standard or documented way to inject extra jars into the hadoop install.  As they say, that exercise is left to the reader.

john

From: ricky l [mailto:rickylee0815@gmail.com]
Sent: Thursday, November 21, 2013 3:40 PM
To: user@hadoop.apache.org
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Hi John, thanks for your reply. I suspect there will be some external communication between AM and container tasks. I am trying to implement a Hadoop-like system to Yarn and I wanted to draw a high-level steps before starting the work. thanks,


On Thu, Nov 21, 2013 at 3:27 PM, John Lilley <jo...@redpoint.net>> wrote:
MapReduce also communicates outside of what is directly supported by YARN.
In a YARN application, there is very little direct communication between the client and the AM, and between the AM and container tasks.
I think that an AM can update to the client two pieces of information -- "state" and "percent complete".
However, at launch time an AM can open up a protocol port and tell the client and the container tasks how to connect back.
I don't know the details, but I believe that the MapReduce AM communicates directly with all mapper, reducer tasks as well as the client.
John


From: ricky l [mailto:rickylee0815@gmail.com<ma...@gmail.com>]
Sent: Thursday, November 21, 2013 12:36 PM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task tracker does not exist in the YARN MapReduce, doesn't it use the binary of job/task tracker? I though the application master runs job tracker binary and the containers in the node will run task tracker binary. thx

On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com>> wrote:
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all scheduling work)
2) Application state management :- managing and launching new map /reduce tasks (done by Application Master .. It is per job not one single entity in the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>. This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide 12) for how job actually gets executed.

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com>> wrote:
Hi all,

I have a question of how a task tracker identifies job tracker address when I submit MR job through YARN. As far as I know, both job tracker and task trackers are launched through application master and I am curious about the details about job and task tracker launch sequence.

thanks.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.



Re: In YARN, how does a task tracker knows the address of a job tracker?

Posted by ricky l <ri...@gmail.com>.
Hi John, thanks for your reply. I suspect there will be some external
communication between AM and container tasks. I am trying to implement a
Hadoop-like system to Yarn and I wanted to draw a high-level steps before
starting the work. thanks,



On Thu, Nov 21, 2013 at 3:27 PM, John Lilley <jo...@redpoint.net>wrote:

>  MapReduce also communicates outside of what is directly supported by
> YARN.
>
> In a YARN application, there is very little direct communication between
> the client and the AM, and between the AM and container tasks.
>
> I think that an AM can update to the client two pieces of information --
> “state” and “percent complete”.
>
> However, at launch time an AM can open up a protocol port and tell the
> client and the container tasks how to connect back.
>
> I don’t know the details, but I believe that the MapReduce AM communicates
> directly with all mapper, reducer tasks as well as the client.
>
> John
>
>
>
>
>
> *From:* ricky l [mailto:rickylee0815@gmail.com]
> *Sent:* Thursday, November 21, 2013 12:36 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: In YARN, how does a task tracker knows the address of a
> job tracker?
>
>
>
> Thank you for the answer, Omkar.
>
>
>
> I read the links that were helpful. Though the concept of job tracker/task
> tracker does not exist in the YARN MapReduce, doesn't it use the binary of
> job/task tracker? I though the application master runs job tracker binary
> and the containers in the node will run task tracker binary. thx
>
>
>
> On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com>
> wrote:
>
> Hi,
>
>
>
> Starting with YARN there is no notion of job tracker and task tracker.
> Here is a quick summary
>
> JobTracker :-
>
> 1) Resource management :- Now done by Resource Manager (it does all
> scheduling work)
>
> 2) Application state management :- managing and launching new map /reduce
> tasks (done by Application Master .. It is per job not one single entity in
> the cluster for all jobs like MRv1).
>
> TaskTracker :- replaced by Node Manager
>
>
>
> I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
> This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide
> 12) for how job actually gets executed.
>
>
>   Thanks,
>
> Omkar Joshi
>
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
>
> On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com> wrote:
>
> Hi all,
>
>
>
> I have a question of how a task tracker identifies job tracker address
> when I submit MR job through YARN. As far as I know, both job tracker and
> task trackers are launched through application master and I am curious
> about the details about job and task tracker launch sequence.
>
>
>
> thanks.
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>

Re: In YARN, how does a task tracker knows the address of a job tracker?

Posted by ricky l <ri...@gmail.com>.
Hi John, thanks for your reply. I suspect there will be some external
communication between AM and container tasks. I am trying to implement a
Hadoop-like system to Yarn and I wanted to draw a high-level steps before
starting the work. thanks,



On Thu, Nov 21, 2013 at 3:27 PM, John Lilley <jo...@redpoint.net>wrote:

>  MapReduce also communicates outside of what is directly supported by
> YARN.
>
> In a YARN application, there is very little direct communication between
> the client and the AM, and between the AM and container tasks.
>
> I think that an AM can update to the client two pieces of information --
> “state” and “percent complete”.
>
> However, at launch time an AM can open up a protocol port and tell the
> client and the container tasks how to connect back.
>
> I don’t know the details, but I believe that the MapReduce AM communicates
> directly with all mapper, reducer tasks as well as the client.
>
> John
>
>
>
>
>
> *From:* ricky l [mailto:rickylee0815@gmail.com]
> *Sent:* Thursday, November 21, 2013 12:36 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: In YARN, how does a task tracker knows the address of a
> job tracker?
>
>
>
> Thank you for the answer, Omkar.
>
>
>
> I read the links that were helpful. Though the concept of job tracker/task
> tracker does not exist in the YARN MapReduce, doesn't it use the binary of
> job/task tracker? I though the application master runs job tracker binary
> and the containers in the node will run task tracker binary. thx
>
>
>
> On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com>
> wrote:
>
> Hi,
>
>
>
> Starting with YARN there is no notion of job tracker and task tracker.
> Here is a quick summary
>
> JobTracker :-
>
> 1) Resource management :- Now done by Resource Manager (it does all
> scheduling work)
>
> 2) Application state management :- managing and launching new map /reduce
> tasks (done by Application Master .. It is per job not one single entity in
> the cluster for all jobs like MRv1).
>
> TaskTracker :- replaced by Node Manager
>
>
>
> I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
> This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide
> 12) for how job actually gets executed.
>
>
>   Thanks,
>
> Omkar Joshi
>
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
>
> On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com> wrote:
>
> Hi all,
>
>
>
> I have a question of how a task tracker identifies job tracker address
> when I submit MR job through YARN. As far as I know, both job tracker and
> task trackers are launched through application master and I am curious
> about the details about job and task tracker launch sequence.
>
>
>
> thanks.
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>

Re: In YARN, how does a task tracker knows the address of a job tracker?

Posted by ricky l <ri...@gmail.com>.
Hi John, thanks for your reply. I suspect there will be some external
communication between AM and container tasks. I am trying to implement a
Hadoop-like system to Yarn and I wanted to draw a high-level steps before
starting the work. thanks,



On Thu, Nov 21, 2013 at 3:27 PM, John Lilley <jo...@redpoint.net>wrote:

>  MapReduce also communicates outside of what is directly supported by
> YARN.
>
> In a YARN application, there is very little direct communication between
> the client and the AM, and between the AM and container tasks.
>
> I think that an AM can update to the client two pieces of information --
> “state” and “percent complete”.
>
> However, at launch time an AM can open up a protocol port and tell the
> client and the container tasks how to connect back.
>
> I don’t know the details, but I believe that the MapReduce AM communicates
> directly with all mapper, reducer tasks as well as the client.
>
> John
>
>
>
>
>
> *From:* ricky l [mailto:rickylee0815@gmail.com]
> *Sent:* Thursday, November 21, 2013 12:36 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: In YARN, how does a task tracker knows the address of a
> job tracker?
>
>
>
> Thank you for the answer, Omkar.
>
>
>
> I read the links that were helpful. Though the concept of job tracker/task
> tracker does not exist in the YARN MapReduce, doesn't it use the binary of
> job/task tracker? I though the application master runs job tracker binary
> and the containers in the node will run task tracker binary. thx
>
>
>
> On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com>
> wrote:
>
> Hi,
>
>
>
> Starting with YARN there is no notion of job tracker and task tracker.
> Here is a quick summary
>
> JobTracker :-
>
> 1) Resource management :- Now done by Resource Manager (it does all
> scheduling work)
>
> 2) Application state management :- managing and launching new map /reduce
> tasks (done by Application Master .. It is per job not one single entity in
> the cluster for all jobs like MRv1).
>
> TaskTracker :- replaced by Node Manager
>
>
>
> I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
> This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide
> 12) for how job actually gets executed.
>
>
>   Thanks,
>
> Omkar Joshi
>
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
>
> On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com> wrote:
>
> Hi all,
>
>
>
> I have a question of how a task tracker identifies job tracker address
> when I submit MR job through YARN. As far as I know, both job tracker and
> task trackers are launched through application master and I am curious
> about the details about job and task tracker launch sequence.
>
>
>
> thanks.
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>

Re: In YARN, how does a task tracker knows the address of a job tracker?

Posted by ricky l <ri...@gmail.com>.
Hi John, thanks for your reply. I suspect there will be some external
communication between AM and container tasks. I am trying to implement a
Hadoop-like system to Yarn and I wanted to draw a high-level steps before
starting the work. thanks,



On Thu, Nov 21, 2013 at 3:27 PM, John Lilley <jo...@redpoint.net>wrote:

>  MapReduce also communicates outside of what is directly supported by
> YARN.
>
> In a YARN application, there is very little direct communication between
> the client and the AM, and between the AM and container tasks.
>
> I think that an AM can update to the client two pieces of information --
> “state” and “percent complete”.
>
> However, at launch time an AM can open up a protocol port and tell the
> client and the container tasks how to connect back.
>
> I don’t know the details, but I believe that the MapReduce AM communicates
> directly with all mapper, reducer tasks as well as the client.
>
> John
>
>
>
>
>
> *From:* ricky l [mailto:rickylee0815@gmail.com]
> *Sent:* Thursday, November 21, 2013 12:36 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: In YARN, how does a task tracker knows the address of a
> job tracker?
>
>
>
> Thank you for the answer, Omkar.
>
>
>
> I read the links that were helpful. Though the concept of job tracker/task
> tracker does not exist in the YARN MapReduce, doesn't it use the binary of
> job/task tracker? I though the application master runs job tracker binary
> and the containers in the node will run task tracker binary. thx
>
>
>
> On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com>
> wrote:
>
> Hi,
>
>
>
> Starting with YARN there is no notion of job tracker and task tracker.
> Here is a quick summary
>
> JobTracker :-
>
> 1) Resource management :- Now done by Resource Manager (it does all
> scheduling work)
>
> 2) Application state management :- managing and launching new map /reduce
> tasks (done by Application Master .. It is per job not one single entity in
> the cluster for all jobs like MRv1).
>
> TaskTracker :- replaced by Node Manager
>
>
>
> I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
> This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide
> 12) for how job actually gets executed.
>
>
>   Thanks,
>
> Omkar Joshi
>
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
>
> On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com> wrote:
>
> Hi all,
>
>
>
> I have a question of how a task tracker identifies job tracker address
> when I submit MR job through YARN. As far as I know, both job tracker and
> task trackers are launched through application master and I am curious
> about the details about job and task tracker launch sequence.
>
>
>
> thanks.
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>

RE: In YARN, how does a task tracker knows the address of a job tracker?

Posted by John Lilley <jo...@redpoint.net>.
MapReduce also communicates outside of what is directly supported by YARN.
In a YARN application, there is very little direct communication between the client and the AM, and between the AM and container tasks.
I think that an AM can update to the client two pieces of information -- "state" and "percent complete".
However, at launch time an AM can open up a protocol port and tell the client and the container tasks how to connect back.
I don't know the details, but I believe that the MapReduce AM communicates directly with all mapper, reducer tasks as well as the client.
John


From: ricky l [mailto:rickylee0815@gmail.com]
Sent: Thursday, November 21, 2013 12:36 PM
To: user@hadoop.apache.org
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task tracker does not exist in the YARN MapReduce, doesn't it use the binary of job/task tracker? I though the application master runs job tracker binary and the containers in the node will run task tracker binary. thx

On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com>> wrote:
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all scheduling work)
2) Application state management :- managing and launching new map /reduce tasks (done by Application Master .. It is per job not one single entity in the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>. This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide 12) for how job actually gets executed.

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com>> wrote:
Hi all,

I have a question of how a task tracker identifies job tracker address when I submit MR job through YARN. As far as I know, both job tracker and task trackers are launched through application master and I am curious about the details about job and task tracker launch sequence.

thanks.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.


RE: In YARN, how does a task tracker knows the address of a job tracker?

Posted by John Lilley <jo...@redpoint.net>.
MapReduce also communicates outside of what is directly supported by YARN.
In a YARN application, there is very little direct communication between the client and the AM, and between the AM and container tasks.
I think that an AM can update to the client two pieces of information -- "state" and "percent complete".
However, at launch time an AM can open up a protocol port and tell the client and the container tasks how to connect back.
I don't know the details, but I believe that the MapReduce AM communicates directly with all mapper, reducer tasks as well as the client.
John


From: ricky l [mailto:rickylee0815@gmail.com]
Sent: Thursday, November 21, 2013 12:36 PM
To: user@hadoop.apache.org
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task tracker does not exist in the YARN MapReduce, doesn't it use the binary of job/task tracker? I though the application master runs job tracker binary and the containers in the node will run task tracker binary. thx

On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com>> wrote:
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all scheduling work)
2) Application state management :- managing and launching new map /reduce tasks (done by Application Master .. It is per job not one single entity in the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>. This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide 12) for how job actually gets executed.

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com>> wrote:
Hi all,

I have a question of how a task tracker identifies job tracker address when I submit MR job through YARN. As far as I know, both job tracker and task trackers are launched through application master and I am curious about the details about job and task tracker launch sequence.

thanks.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.


RE: In YARN, how does a task tracker knows the address of a job tracker?

Posted by John Lilley <jo...@redpoint.net>.
MapReduce also communicates outside of what is directly supported by YARN.
In a YARN application, there is very little direct communication between the client and the AM, and between the AM and container tasks.
I think that an AM can update to the client two pieces of information -- "state" and "percent complete".
However, at launch time an AM can open up a protocol port and tell the client and the container tasks how to connect back.
I don't know the details, but I believe that the MapReduce AM communicates directly with all mapper, reducer tasks as well as the client.
John


From: ricky l [mailto:rickylee0815@gmail.com]
Sent: Thursday, November 21, 2013 12:36 PM
To: user@hadoop.apache.org
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task tracker does not exist in the YARN MapReduce, doesn't it use the binary of job/task tracker? I though the application master runs job tracker binary and the containers in the node will run task tracker binary. thx

On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com>> wrote:
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all scheduling work)
2) Application state management :- managing and launching new map /reduce tasks (done by Application Master .. It is per job not one single entity in the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>. This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide 12) for how job actually gets executed.

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com>> wrote:
Hi all,

I have a question of how a task tracker identifies job tracker address when I submit MR job through YARN. As far as I know, both job tracker and task trackers are launched through application master and I am curious about the details about job and task tracker launch sequence.

thanks.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.


RE: In YARN, how does a task tracker knows the address of a job tracker?

Posted by John Lilley <jo...@redpoint.net>.
MapReduce also communicates outside of what is directly supported by YARN.
In a YARN application, there is very little direct communication between the client and the AM, and between the AM and container tasks.
I think that an AM can update to the client two pieces of information -- "state" and "percent complete".
However, at launch time an AM can open up a protocol port and tell the client and the container tasks how to connect back.
I don't know the details, but I believe that the MapReduce AM communicates directly with all mapper, reducer tasks as well as the client.
John


From: ricky l [mailto:rickylee0815@gmail.com]
Sent: Thursday, November 21, 2013 12:36 PM
To: user@hadoop.apache.org
Subject: Re: In YARN, how does a task tracker knows the address of a job tracker?

Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task tracker does not exist in the YARN MapReduce, doesn't it use the binary of job/task tracker? I though the application master runs job tracker binary and the containers in the node will run task tracker binary. thx

On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com>> wrote:
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all scheduling work)
2) Application state management :- managing and launching new map /reduce tasks (done by Application Master .. It is per job not one single entity in the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>. This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide 12) for how job actually gets executed.

Thanks,
Omkar Joshi
Hortonworks Inc.<http://www.hortonworks.com>

On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com>> wrote:
Hi all,

I have a question of how a task tracker identifies job tracker address when I submit MR job through YARN. As far as I know, both job tracker and task trackers are launched through application master and I am curious about the details about job and task tracker launch sequence.

thanks.


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.


Re: In YARN, how does a task tracker knows the address of a job tracker?

Posted by ricky l <ri...@gmail.com>.
Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task
tracker does not exist in the YARN MapReduce, doesn't it use the binary of
job/task tracker? I though the application master runs job tracker binary
and the containers in the node will run task tracker binary. thx


On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com> wrote:

> Hi,
>
> Starting with YARN there is no notion of job tracker and task tracker.
> Here is a quick summary
> JobTracker :-
> 1) Resource management :- Now done by Resource Manager (it does all
> scheduling work)
> 2) Application state management :- managing and launching new map /reduce
> tasks (done by Application Master .. It is per job not one single entity in
> the cluster for all jobs like MRv1).
> TaskTracker :- replaced by Node Manager
>
> I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
> This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide
> 12) for how job actually gets executed.
>
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
> On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a question of how a task tracker identifies job tracker address
>> when I submit MR job through YARN. As far as I know, both job tracker and
>> task trackers are launched through application master and I am curious
>> about the details about job and task tracker launch sequence.
>>
>> thanks.
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: In YARN, how does a task tracker knows the address of a job tracker?

Posted by ricky l <ri...@gmail.com>.
Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task
tracker does not exist in the YARN MapReduce, doesn't it use the binary of
job/task tracker? I though the application master runs job tracker binary
and the containers in the node will run task tracker binary. thx


On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com> wrote:

> Hi,
>
> Starting with YARN there is no notion of job tracker and task tracker.
> Here is a quick summary
> JobTracker :-
> 1) Resource management :- Now done by Resource Manager (it does all
> scheduling work)
> 2) Application state management :- managing and launching new map /reduce
> tasks (done by Application Master .. It is per job not one single entity in
> the cluster for all jobs like MRv1).
> TaskTracker :- replaced by Node Manager
>
> I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
> This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide
> 12) for how job actually gets executed.
>
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
> On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a question of how a task tracker identifies job tracker address
>> when I submit MR job through YARN. As far as I know, both job tracker and
>> task trackers are launched through application master and I am curious
>> about the details about job and task tracker launch sequence.
>>
>> thanks.
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: In YARN, how does a task tracker knows the address of a job tracker?

Posted by ricky l <ri...@gmail.com>.
Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task
tracker does not exist in the YARN MapReduce, doesn't it use the binary of
job/task tracker? I though the application master runs job tracker binary
and the containers in the node will run task tracker binary. thx


On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com> wrote:

> Hi,
>
> Starting with YARN there is no notion of job tracker and task tracker.
> Here is a quick summary
> JobTracker :-
> 1) Resource management :- Now done by Resource Manager (it does all
> scheduling work)
> 2) Application state management :- managing and launching new map /reduce
> tasks (done by Application Master .. It is per job not one single entity in
> the cluster for all jobs like MRv1).
> TaskTracker :- replaced by Node Manager
>
> I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
> This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide
> 12) for how job actually gets executed.
>
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
> On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a question of how a task tracker identifies job tracker address
>> when I submit MR job through YARN. As far as I know, both job tracker and
>> task trackers are launched through application master and I am curious
>> about the details about job and task tracker launch sequence.
>>
>> thanks.
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: In YARN, how does a task tracker knows the address of a job tracker?

Posted by ricky l <ri...@gmail.com>.
Thank you for the answer, Omkar.

I read the links that were helpful. Though the concept of job tracker/task
tracker does not exist in the YARN MapReduce, doesn't it use the binary of
job/task tracker? I though the application master runs job tracker binary
and the containers in the node will run task tracker binary. thx


On Thu, Nov 21, 2013 at 2:06 PM, Omkar Joshi <oj...@hortonworks.com> wrote:

> Hi,
>
> Starting with YARN there is no notion of job tracker and task tracker.
> Here is a quick summary
> JobTracker :-
> 1) Resource management :- Now done by Resource Manager (it does all
> scheduling work)
> 2) Application state management :- managing and launching new map /reduce
> tasks (done by Application Master .. It is per job not one single entity in
> the cluster for all jobs like MRv1).
> TaskTracker :- replaced by Node Manager
>
> I would suggest you read the YARN blog post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
> This will answer most of your questions. Plus read this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond> (slide
> 12) for how job actually gets executed.
>
> Thanks,
> Omkar Joshi
> *Hortonworks Inc.* <http://www.hortonworks.com>
>
>
> On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a question of how a task tracker identifies job tracker address
>> when I submit MR job through YARN. As far as I know, both job tracker and
>> task trackers are launched through application master and I am curious
>> about the details about job and task tracker launch sequence.
>>
>> thanks.
>>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: In YARN, how does a task tracker knows the address of a job tracker?

Posted by Omkar Joshi <oj...@hortonworks.com>.
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here
is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all
scheduling work)
2) Application state management :- managing and launching new map /reduce
tasks (done by Application Master .. It is per job not one single entity in
the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog
post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
This will answer most of your questions. Plus read
this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond>
(slide
12) for how job actually gets executed.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com> wrote:

> Hi all,
>
> I have a question of how a task tracker identifies job tracker address
> when I submit MR job through YARN. As far as I know, both job tracker and
> task trackers are launched through application master and I am curious
> about the details about job and task tracker launch sequence.
>
> thanks.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: In YARN, how does a task tracker knows the address of a job tracker?

Posted by Omkar Joshi <oj...@hortonworks.com>.
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here
is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all
scheduling work)
2) Application state management :- managing and launching new map /reduce
tasks (done by Application Master .. It is per job not one single entity in
the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog
post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
This will answer most of your questions. Plus read
this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond>
(slide
12) for how job actually gets executed.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com> wrote:

> Hi all,
>
> I have a question of how a task tracker identifies job tracker address
> when I submit MR job through YARN. As far as I know, both job tracker and
> task trackers are launched through application master and I am curious
> about the details about job and task tracker launch sequence.
>
> thanks.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: In YARN, how does a task tracker knows the address of a job tracker?

Posted by Omkar Joshi <oj...@hortonworks.com>.
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here
is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all
scheduling work)
2) Application state management :- managing and launching new map /reduce
tasks (done by Application Master .. It is per job not one single entity in
the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog
post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
This will answer most of your questions. Plus read
this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond>
(slide
12) for how job actually gets executed.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com> wrote:

> Hi all,
>
> I have a question of how a task tracker identifies job tracker address
> when I submit MR job through YARN. As far as I know, both job tracker and
> task trackers are launched through application master and I am curious
> about the details about job and task tracker launch sequence.
>
> thanks.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: In YARN, how does a task tracker knows the address of a job tracker?

Posted by Omkar Joshi <oj...@hortonworks.com>.
Hi,

Starting with YARN there is no notion of job tracker and task tracker. Here
is a quick summary
JobTracker :-
1) Resource management :- Now done by Resource Manager (it does all
scheduling work)
2) Application state management :- managing and launching new map /reduce
tasks (done by Application Master .. It is per job not one single entity in
the cluster for all jobs like MRv1).
TaskTracker :- replaced by Node Manager

I would suggest you read the YARN blog
post<http://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/>.
This will answer most of your questions. Plus read
this<http://www.slideshare.net/ovjforu/yarn-way-to-share-cluster-beyond>
(slide
12) for how job actually gets executed.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>


On Thu, Nov 21, 2013 at 7:52 AM, ricky l <ri...@gmail.com> wrote:

> Hi all,
>
> I have a question of how a task tracker identifies job tracker address
> when I submit MR job through YARN. As far as I know, both job tracker and
> task trackers are launched through application master and I am curious
> about the details about job and task tracker launch sequence.
>
> thanks.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.