You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Ioan Zeng <ze...@gmail.com> on 2013/03/12 15:31:55 UTC

YARN Features

Hello Hadoop Team,

I need some help/support in evaluating Hadoop YARN from the perspective of:

- Adding/removing test nodes dynamically without restart
 Is this possible in YARN? or is it only a MapReduce application feature?

- Failover from crashed test nodes
 Is this automatically done by the YARN framework?

- Simple prioritization of jobs
 I know there is a prioritization possibility of the tasks in the
Scheduler, right?

- Insight into queue (simple reporting, including running jobs on nodes)
 Is there any reporting possibility in YARN regarding the status of
the not executed, running, finished jobs?

- Support heterogenous OSes on nodes (or use separate masters for
homogenous grids)
 I think this is supported, right?


Another point I would like to evaluate is the Distributed Shell example usage.
Our use case is to start different scripts on a grid. Once a node has
finished a script a new script has to be started on it. A report about
the scripts execution has to be provided. in case a node has failed to
execute a script it should be re-executed on a different node. Some
scripts are Windows specific other are Unix specific and have to be
executed on a node with a specific OS.

The question is:
Would it be feasible to adapt the example "Distributed Shell"
application to have the above features?
If yes how could I run some specific scripts only on a specific OS? Is
this the ResourceManager responsability? What happens if there is no
Windows node for example in the grid but in the queue there is a
Windows script?
How to re-execute failed scripts? Does it have to be implemented by
custom code, or is it a built in feature of YARN?


Thank you in advance for your support,
Ioan Zeng

Re: YARN Features

Posted by Hitesh Shah <hi...@hortonworks.com>.

Answers inline. Will address the DistributedShell questions in a follow-up. 

-- Hitesh

On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:

> Hello Hadoop Team,
> 
> I need some help/support in evaluating Hadoop YARN from the perspective of:
> 
> - Adding/removing test nodes dynamically without restart
> Is this possible in YARN? or is it only a MapReduce application feature?
> 

Yarn Feature. Each node has a NodeManager daemon running on it. The Resourcemanager tracks nodes' health
via heartbeats from the NodeManager. In that respect, unhealthy nodes will be removed from the available 
pool and when an application requests for resources, it will not get allocated to an unhealthy or down node.

> - Failover from crashed test nodes
> Is this automatically done by the YARN framework?
> 

When a node fails, containers running on it are also reported as failed. This feedback about failed nodes and failed containers
is passed to the application by the ResourceManager. How the application handles these failures and acts upon the failure events to 
recover is based on application code. Each application will have its own logic to recover - YARN provides all the information for an application 
to act upon but does not take any action on the application's behalf ( slight caveat that the yarn framework does support restarting ( up to a max 
retry attempt limit ) the application itself if the application died or the node on which the application was running died.

> - Simple prioritization of jobs
> I know there is a prioritization possibility of the tasks in the
> Scheduler, right?
> 

Yes and no - depends on which scheduler you configure the RM with and how you configure it. The default scheduler used is the CapacityScheduler. 
Details on that are at http://hadoop.apache.org/docs/stable/capacity_scheduler.html.

> - Insight into queue (simple reporting, including running jobs on nodes)
> Is there any reporting possibility in YARN regarding the status of
> the not executed, running, finished jobs?
> 

The YARN UI provides these details for the last N applications that were handled ( N being around 10000 ). There are also webservices to access this 
data in xml/json format. There is a plan for a generic application history server but that is still being worked on.

> - Support heterogenous OSes on nodes (or use separate masters for
> homogenous grids)
> I think this is supported, right?
> 

I don't believe that there is anything stopping support of a heterogenous cluster. However, I am not sure if we have come across anyone 
who has tried that out. A lot of the problems may arise based on how well a application is written to handle launching containers correctly on different OS types.
The scheduler today does not account for asking for containers on only certain OS types. I am guessing there might be some minor features that be needed to be
addressed to fully support it. If you do try the above use case, please let us know if there are features/issues that you would like to see addressed.

> Another point I would like to evaluate is the Distributed Shell example usage.
> Our use case is to start different scripts on a grid. Once a node has
> finished a script a new script has to be started on it. A report about
> the scripts execution has to be provided. in case a node has failed to
> execute a script it should be re-executed on a different node. Some
> scripts are Windows specific other are Unix specific and have to be
> executed on a node with a specific OS.
> 
> The question is:
> Would it be feasible to adapt the example "Distributed Shell"
> application to have the above features?
> If yes how could I run some specific scripts only on a specific OS? Is
> this the ResourceManager responsability? What happens if there is no
> Windows node for example in the grid but in the queue there is a
> Windows script?
> How to re-execute failed scripts? Does it have to be implemented by
> custom code, or is it a built in feature of YARN?
> 
> 
> Thank you in advance for your support,
> Ioan Zeng

Re: YARN Features

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Mar 12, 2013, at 12:26 PM, Ioan Zeng wrote:

> Another evaluation criteria was the community support of the framework
> which I rate now as very good :)
> 
> I would like to ask other questions:
> 
> I have seen YARN or MR used only in the context of HDFS. Would it be
> possible to keep all YARN features without using it in relation with
> HDFS (with no HDFS installed)?

To be clear - yes. YARN doesn't need HDFS for anything other than log-aggregation (which is turned off by default).

This is pretty much what LinkedIn is doing (see LinkedIn's use case in the link Hitesh provided).

Arun

> 
> You mentioned the CapacityScheduler. Does this require MapReduce? or
> is it included in YARN? I understood that MRv2 is just an application
> built over the YARN framework. For our use case we don't need MR.
> 
> For a better understanding of my questions regarding the Distributed
> Shell. We intend to use YARN for a distributed automated test
> environment which will execute set of test suites for specific builds
> in parallel. Do you know about similar usages of YARN or MR, maybe
> case studies?
> 
> Thanks,
> Ioan
> 
> On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hi...@hortonworks.com> wrote:
>> Answers regarding DistributedShell.
>> 
>> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>> 
>> -- Hitesh
>> 
>> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>> 
>>> 
>>> Another point I would like to evaluate is the Distributed Shell example usage.
>>> Our use case is to start different scripts on a grid. Once a node has
>>> finished a script a new script has to be started on it. A report about
>>> the scripts execution has to be provided. in case a node has failed to
>>> execute a script it should be re-executed on a different node. Some
>>> scripts are Windows specific other are Unix specific and have to be
>>> executed on a node with a specific OS.
>>> 
>> 
>> The current implementation of distributed shell is effectively a piece of example code to help
>> folks write more complex applications. It simply supports launching a script on a given number
>> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
>> and simply reports a success/failure based on the no. of failures in running the script.
>> 
>> Based on your use case, it should be easy enough to build on the example code to handle the features that
>> you require.
>> 
>> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
>> for this feature request with some details about your use-case.
>> 
>> 
>>> The question is:
>>> Would it be feasible to adapt the example "Distributed Shell"
>>> application to have the above features?
>>> If yes how could I run some specific scripts only on a specific OS? Is
>>> this the ResourceManager responsability? What happens if there is no
>>> Windows node for example in the grid but in the queue there is a
>>> Windows script?
>>> How to re-execute failed scripts? Does it have to be implemented by
>>> custom code, or is it a built in feature of YARN?
>>> 
>>> 
>> 
>> The way YARN works is slightly different from what you describe above.
>> 
>> What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster.
>> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
>> assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
>> for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type
>> scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which
>> would then have to handle retries ( may require asking the RM for a new container ).
>> 
>> 
>> 
>>> Thank you in advance for your support,
>>> Ioan Zeng
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: YARN Features

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Mar 12, 2013, at 12:26 PM, Ioan Zeng wrote:

> Another evaluation criteria was the community support of the framework
> which I rate now as very good :)
> 
> I would like to ask other questions:
> 
> I have seen YARN or MR used only in the context of HDFS. Would it be
> possible to keep all YARN features without using it in relation with
> HDFS (with no HDFS installed)?

To be clear - yes. YARN doesn't need HDFS for anything other than log-aggregation (which is turned off by default).

This is pretty much what LinkedIn is doing (see LinkedIn's use case in the link Hitesh provided).

Arun

> 
> You mentioned the CapacityScheduler. Does this require MapReduce? or
> is it included in YARN? I understood that MRv2 is just an application
> built over the YARN framework. For our use case we don't need MR.
> 
> For a better understanding of my questions regarding the Distributed
> Shell. We intend to use YARN for a distributed automated test
> environment which will execute set of test suites for specific builds
> in parallel. Do you know about similar usages of YARN or MR, maybe
> case studies?
> 
> Thanks,
> Ioan
> 
> On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hi...@hortonworks.com> wrote:
>> Answers regarding DistributedShell.
>> 
>> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>> 
>> -- Hitesh
>> 
>> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>> 
>>> 
>>> Another point I would like to evaluate is the Distributed Shell example usage.
>>> Our use case is to start different scripts on a grid. Once a node has
>>> finished a script a new script has to be started on it. A report about
>>> the scripts execution has to be provided. in case a node has failed to
>>> execute a script it should be re-executed on a different node. Some
>>> scripts are Windows specific other are Unix specific and have to be
>>> executed on a node with a specific OS.
>>> 
>> 
>> The current implementation of distributed shell is effectively a piece of example code to help
>> folks write more complex applications. It simply supports launching a script on a given number
>> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
>> and simply reports a success/failure based on the no. of failures in running the script.
>> 
>> Based on your use case, it should be easy enough to build on the example code to handle the features that
>> you require.
>> 
>> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
>> for this feature request with some details about your use-case.
>> 
>> 
>>> The question is:
>>> Would it be feasible to adapt the example "Distributed Shell"
>>> application to have the above features?
>>> If yes how could I run some specific scripts only on a specific OS? Is
>>> this the ResourceManager responsability? What happens if there is no
>>> Windows node for example in the grid but in the queue there is a
>>> Windows script?
>>> How to re-execute failed scripts? Does it have to be implemented by
>>> custom code, or is it a built in feature of YARN?
>>> 
>>> 
>> 
>> The way YARN works is slightly different from what you describe above.
>> 
>> What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster.
>> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
>> assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
>> for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type
>> scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which
>> would then have to handle retries ( may require asking the RM for a new container ).
>> 
>> 
>> 
>>> Thank you in advance for your support,
>>> Ioan Zeng
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: YARN Features

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Mar 12, 2013, at 12:26 PM, Ioan Zeng wrote:

> Another evaluation criteria was the community support of the framework
> which I rate now as very good :)
> 
> I would like to ask other questions:
> 
> I have seen YARN or MR used only in the context of HDFS. Would it be
> possible to keep all YARN features without using it in relation with
> HDFS (with no HDFS installed)?

To be clear - yes. YARN doesn't need HDFS for anything other than log-aggregation (which is turned off by default).

This is pretty much what LinkedIn is doing (see LinkedIn's use case in the link Hitesh provided).

Arun

> 
> You mentioned the CapacityScheduler. Does this require MapReduce? or
> is it included in YARN? I understood that MRv2 is just an application
> built over the YARN framework. For our use case we don't need MR.
> 
> For a better understanding of my questions regarding the Distributed
> Shell. We intend to use YARN for a distributed automated test
> environment which will execute set of test suites for specific builds
> in parallel. Do you know about similar usages of YARN or MR, maybe
> case studies?
> 
> Thanks,
> Ioan
> 
> On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hi...@hortonworks.com> wrote:
>> Answers regarding DistributedShell.
>> 
>> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>> 
>> -- Hitesh
>> 
>> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>> 
>>> 
>>> Another point I would like to evaluate is the Distributed Shell example usage.
>>> Our use case is to start different scripts on a grid. Once a node has
>>> finished a script a new script has to be started on it. A report about
>>> the scripts execution has to be provided. in case a node has failed to
>>> execute a script it should be re-executed on a different node. Some
>>> scripts are Windows specific other are Unix specific and have to be
>>> executed on a node with a specific OS.
>>> 
>> 
>> The current implementation of distributed shell is effectively a piece of example code to help
>> folks write more complex applications. It simply supports launching a script on a given number
>> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
>> and simply reports a success/failure based on the no. of failures in running the script.
>> 
>> Based on your use case, it should be easy enough to build on the example code to handle the features that
>> you require.
>> 
>> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
>> for this feature request with some details about your use-case.
>> 
>> 
>>> The question is:
>>> Would it be feasible to adapt the example "Distributed Shell"
>>> application to have the above features?
>>> If yes how could I run some specific scripts only on a specific OS? Is
>>> this the ResourceManager responsability? What happens if there is no
>>> Windows node for example in the grid but in the queue there is a
>>> Windows script?
>>> How to re-execute failed scripts? Does it have to be implemented by
>>> custom code, or is it a built in feature of YARN?
>>> 
>>> 
>> 
>> The way YARN works is slightly different from what you describe above.
>> 
>> What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster.
>> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
>> assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
>> for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type
>> scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which
>> would then have to handle retries ( may require asking the RM for a new container ).
>> 
>> 
>> 
>>> Thank you in advance for your support,
>>> Ioan Zeng
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: YARN Features

Posted by Hitesh Shah <hi...@hortonworks.com>.

Answers inline. 

-- Hitesh

On Mar 12, 2013, at 12:26 PM, Ioan Zeng wrote:

> Another evaluation criteria was the community support of the framework
> which I rate now as very good :)
> 
> I would like to ask other questions:
> 
> I have seen YARN or MR used only in the context of HDFS. Would it be
> possible to keep all YARN features without using it in relation with
> HDFS (with no HDFS installed)?

It uses the generic filesystem apis from hadoop to a very large extent so it should work with any filesytem solution. 
There are a couple of features which do depend on HDFS though - log aggregation for example ( collect all logs of all containers into a
central place ) that would need to be disabled. There may be some cases which I am may be unaware of. If you do see anything which 
depends on HDFS, please do file jiras so that we can address the issue.

> 
> You mentioned the CapacityScheduler. Does this require MapReduce? or
> is it included in YARN? I understood that MRv2 is just an application
> built over the YARN framework. For our use case we don't need MR.
> 

Yes - you are right - there would be no dependency on MapReduce. 
The CapacityScheduler is the scheduling module used inside the ResourceManager ( which is YARN only ). 


> For a better understanding of my questions regarding the Distributed
> Shell. We intend to use YARN for a distributed automated test
> environment which will execute set of test suites for specific builds
> in parallel. Do you know about similar usages of YARN or MR, maybe
> case studies?
> 

There are a few others who are using Yarn in various scenarios - none who use it for their test infrastructure as far as I know. 
The closest I can think of would be LinkedIn's use-case where they launch and monitor a bunch of services on a Yarn cluster.  
( http://riccomini.name/posts/hadoop/2012-10-12-hortonworks-yarn-meetup/ might be of help )


> Thanks,
> Ioan
> 
> On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hi...@hortonworks.com> wrote:
>> Answers regarding DistributedShell.
>> 
>> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>> 
>> -- Hitesh
>> 
>> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>> 
>>> 
>>> Another point I would like to evaluate is the Distributed Shell example usage.
>>> Our use case is to start different scripts on a grid. Once a node has
>>> finished a script a new script has to be started on it. A report about
>>> the scripts execution has to be provided. in case a node has failed to
>>> execute a script it should be re-executed on a different node. Some
>>> scripts are Windows specific other are Unix specific and have to be
>>> executed on a node with a specific OS.
>>> 
>> 
>> The current implementation of distributed shell is effectively a piece of example code to help
>> folks write more complex applications. It simply supports launching a script on a given number
>> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
>> and simply reports a success/failure based on the no. of failures in running the script.
>> 
>> Based on your use case, it should be easy enough to build on the example code to handle the features that
>> you require.
>> 
>> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
>> for this feature request with some details about your use-case.
>> 
>> 
>>> The question is:
>>> Would it be feasible to adapt the example "Distributed Shell"
>>> application to have the above features?
>>> If yes how could I run some specific scripts only on a specific OS? Is
>>> this the ResourceManager responsability? What happens if there is no
>>> Windows node for example in the grid but in the queue there is a
>>> Windows script?
>>> How to re-execute failed scripts? Does it have to be implemented by
>>> custom code, or is it a built in feature of YARN?
>>> 
>>> 
>> 
>> The way YARN works is slightly different from what you describe above.
>> 
>> What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster.
>> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
>> assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
>> for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type
>> scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which
>> would then have to handle retries ( may require asking the RM for a new container ).
>> 
>> 
>> 
>>> Thank you in advance for your support,
>>> Ioan Zeng
>>

Re: YARN Features

Posted by Hitesh Shah <hi...@hortonworks.com>.

Answers inline. 

-- Hitesh

On Mar 12, 2013, at 12:26 PM, Ioan Zeng wrote:

> Another evaluation criteria was the community support of the framework
> which I rate now as very good :)
> 
> I would like to ask other questions:
> 
> I have seen YARN or MR used only in the context of HDFS. Would it be
> possible to keep all YARN features without using it in relation with
> HDFS (with no HDFS installed)?

It uses the generic filesystem apis from hadoop to a very large extent so it should work with any filesytem solution. 
There are a couple of features which do depend on HDFS though - log aggregation for example ( collect all logs of all containers into a
central place ) that would need to be disabled. There may be some cases which I am may be unaware of. If you do see anything which 
depends on HDFS, please do file jiras so that we can address the issue.

> 
> You mentioned the CapacityScheduler. Does this require MapReduce? or
> is it included in YARN? I understood that MRv2 is just an application
> built over the YARN framework. For our use case we don't need MR.
> 

Yes - you are right - there would be no dependency on MapReduce. 
The CapacityScheduler is the scheduling module used inside the ResourceManager ( which is YARN only ). 


> For a better understanding of my questions regarding the Distributed
> Shell. We intend to use YARN for a distributed automated test
> environment which will execute set of test suites for specific builds
> in parallel. Do you know about similar usages of YARN or MR, maybe
> case studies?
> 

There are a few others who are using Yarn in various scenarios - none who use it for their test infrastructure as far as I know. 
The closest I can think of would be LinkedIn's use-case where they launch and monitor a bunch of services on a Yarn cluster.  
( http://riccomini.name/posts/hadoop/2012-10-12-hortonworks-yarn-meetup/ might be of help )


> Thanks,
> Ioan
> 
> On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hi...@hortonworks.com> wrote:
>> Answers regarding DistributedShell.
>> 
>> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>> 
>> -- Hitesh
>> 
>> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>> 
>>> 
>>> Another point I would like to evaluate is the Distributed Shell example usage.
>>> Our use case is to start different scripts on a grid. Once a node has
>>> finished a script a new script has to be started on it. A report about
>>> the scripts execution has to be provided. in case a node has failed to
>>> execute a script it should be re-executed on a different node. Some
>>> scripts are Windows specific other are Unix specific and have to be
>>> executed on a node with a specific OS.
>>> 
>> 
>> The current implementation of distributed shell is effectively a piece of example code to help
>> folks write more complex applications. It simply supports launching a script on a given number
>> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
>> and simply reports a success/failure based on the no. of failures in running the script.
>> 
>> Based on your use case, it should be easy enough to build on the example code to handle the features that
>> you require.
>> 
>> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
>> for this feature request with some details about your use-case.
>> 
>> 
>>> The question is:
>>> Would it be feasible to adapt the example "Distributed Shell"
>>> application to have the above features?
>>> If yes how could I run some specific scripts only on a specific OS? Is
>>> this the ResourceManager responsability? What happens if there is no
>>> Windows node for example in the grid but in the queue there is a
>>> Windows script?
>>> How to re-execute failed scripts? Does it have to be implemented by
>>> custom code, or is it a built in feature of YARN?
>>> 
>>> 
>> 
>> The way YARN works is slightly different from what you describe above.
>> 
>> What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster.
>> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
>> assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
>> for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type
>> scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which
>> would then have to handle retries ( may require asking the RM for a new container ).
>> 
>> 
>> 
>>> Thank you in advance for your support,
>>> Ioan Zeng
>>

Re: YARN Features

Posted by Hitesh Shah <hi...@hortonworks.com>.

Answers inline. 

-- Hitesh

On Mar 12, 2013, at 12:26 PM, Ioan Zeng wrote:

> Another evaluation criteria was the community support of the framework
> which I rate now as very good :)
> 
> I would like to ask other questions:
> 
> I have seen YARN or MR used only in the context of HDFS. Would it be
> possible to keep all YARN features without using it in relation with
> HDFS (with no HDFS installed)?

It uses the generic filesystem apis from hadoop to a very large extent so it should work with any filesytem solution. 
There are a couple of features which do depend on HDFS though - log aggregation for example ( collect all logs of all containers into a
central place ) that would need to be disabled. There may be some cases which I am may be unaware of. If you do see anything which 
depends on HDFS, please do file jiras so that we can address the issue.

> 
> You mentioned the CapacityScheduler. Does this require MapReduce? or
> is it included in YARN? I understood that MRv2 is just an application
> built over the YARN framework. For our use case we don't need MR.
> 

Yes - you are right - there would be no dependency on MapReduce. 
The CapacityScheduler is the scheduling module used inside the ResourceManager ( which is YARN only ). 


> For a better understanding of my questions regarding the Distributed
> Shell. We intend to use YARN for a distributed automated test
> environment which will execute set of test suites for specific builds
> in parallel. Do you know about similar usages of YARN or MR, maybe
> case studies?
> 

There are a few others who are using Yarn in various scenarios - none who use it for their test infrastructure as far as I know. 
The closest I can think of would be LinkedIn's use-case where they launch and monitor a bunch of services on a Yarn cluster.  
( http://riccomini.name/posts/hadoop/2012-10-12-hortonworks-yarn-meetup/ might be of help )


> Thanks,
> Ioan
> 
> On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hi...@hortonworks.com> wrote:
>> Answers regarding DistributedShell.
>> 
>> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>> 
>> -- Hitesh
>> 
>> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>> 
>>> 
>>> Another point I would like to evaluate is the Distributed Shell example usage.
>>> Our use case is to start different scripts on a grid. Once a node has
>>> finished a script a new script has to be started on it. A report about
>>> the scripts execution has to be provided. in case a node has failed to
>>> execute a script it should be re-executed on a different node. Some
>>> scripts are Windows specific other are Unix specific and have to be
>>> executed on a node with a specific OS.
>>> 
>> 
>> The current implementation of distributed shell is effectively a piece of example code to help
>> folks write more complex applications. It simply supports launching a script on a given number
>> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
>> and simply reports a success/failure based on the no. of failures in running the script.
>> 
>> Based on your use case, it should be easy enough to build on the example code to handle the features that
>> you require.
>> 
>> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
>> for this feature request with some details about your use-case.
>> 
>> 
>>> The question is:
>>> Would it be feasible to adapt the example "Distributed Shell"
>>> application to have the above features?
>>> If yes how could I run some specific scripts only on a specific OS? Is
>>> this the ResourceManager responsability? What happens if there is no
>>> Windows node for example in the grid but in the queue there is a
>>> Windows script?
>>> How to re-execute failed scripts? Does it have to be implemented by
>>> custom code, or is it a built in feature of YARN?
>>> 
>>> 
>> 
>> The way YARN works is slightly different from what you describe above.
>> 
>> What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster.
>> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
>> assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
>> for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type
>> scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which
>> would then have to handle retries ( may require asking the RM for a new container ).
>> 
>> 
>> 
>>> Thank you in advance for your support,
>>> Ioan Zeng
>>

Re: YARN Features

Posted by Hitesh Shah <hi...@hortonworks.com>.

Answers inline. 

-- Hitesh

On Mar 12, 2013, at 12:26 PM, Ioan Zeng wrote:

> Another evaluation criteria was the community support of the framework
> which I rate now as very good :)
> 
> I would like to ask other questions:
> 
> I have seen YARN or MR used only in the context of HDFS. Would it be
> possible to keep all YARN features without using it in relation with
> HDFS (with no HDFS installed)?

It uses the generic filesystem apis from hadoop to a very large extent so it should work with any filesytem solution. 
There are a couple of features which do depend on HDFS though - log aggregation for example ( collect all logs of all containers into a
central place ) that would need to be disabled. There may be some cases which I am may be unaware of. If you do see anything which 
depends on HDFS, please do file jiras so that we can address the issue.

> 
> You mentioned the CapacityScheduler. Does this require MapReduce? or
> is it included in YARN? I understood that MRv2 is just an application
> built over the YARN framework. For our use case we don't need MR.
> 

Yes - you are right - there would be no dependency on MapReduce. 
The CapacityScheduler is the scheduling module used inside the ResourceManager ( which is YARN only ). 


> For a better understanding of my questions regarding the Distributed
> Shell. We intend to use YARN for a distributed automated test
> environment which will execute set of test suites for specific builds
> in parallel. Do you know about similar usages of YARN or MR, maybe
> case studies?
> 

There are a few others who are using Yarn in various scenarios - none who use it for their test infrastructure as far as I know. 
The closest I can think of would be LinkedIn's use-case where they launch and monitor a bunch of services on a Yarn cluster.  
( http://riccomini.name/posts/hadoop/2012-10-12-hortonworks-yarn-meetup/ might be of help )


> Thanks,
> Ioan
> 
> On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hi...@hortonworks.com> wrote:
>> Answers regarding DistributedShell.
>> 
>> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>> 
>> -- Hitesh
>> 
>> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>> 
>>> 
>>> Another point I would like to evaluate is the Distributed Shell example usage.
>>> Our use case is to start different scripts on a grid. Once a node has
>>> finished a script a new script has to be started on it. A report about
>>> the scripts execution has to be provided. in case a node has failed to
>>> execute a script it should be re-executed on a different node. Some
>>> scripts are Windows specific other are Unix specific and have to be
>>> executed on a node with a specific OS.
>>> 
>> 
>> The current implementation of distributed shell is effectively a piece of example code to help
>> folks write more complex applications. It simply supports launching a script on a given number
>> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
>> and simply reports a success/failure based on the no. of failures in running the script.
>> 
>> Based on your use case, it should be easy enough to build on the example code to handle the features that
>> you require.
>> 
>> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
>> for this feature request with some details about your use-case.
>> 
>> 
>>> The question is:
>>> Would it be feasible to adapt the example "Distributed Shell"
>>> application to have the above features?
>>> If yes how could I run some specific scripts only on a specific OS? Is
>>> this the ResourceManager responsability? What happens if there is no
>>> Windows node for example in the grid but in the queue there is a
>>> Windows script?
>>> How to re-execute failed scripts? Does it have to be implemented by
>>> custom code, or is it a built in feature of YARN?
>>> 
>>> 
>> 
>> The way YARN works is slightly different from what you describe above.
>> 
>> What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster.
>> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
>> assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
>> for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type
>> scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which
>> would then have to handle retries ( may require asking the RM for a new container ).
>> 
>> 
>> 
>>> Thank you in advance for your support,
>>> Ioan Zeng
>>

Re: YARN Features

Posted by Arun C Murthy <ac...@hortonworks.com>.

On Mar 12, 2013, at 12:26 PM, Ioan Zeng wrote:

> Another evaluation criteria was the community support of the framework
> which I rate now as very good :)
> 
> I would like to ask other questions:
> 
> I have seen YARN or MR used only in the context of HDFS. Would it be
> possible to keep all YARN features without using it in relation with
> HDFS (with no HDFS installed)?

To be clear - yes. YARN doesn't need HDFS for anything other than log-aggregation (which is turned off by default).

This is pretty much what LinkedIn is doing (see LinkedIn's use case in the link Hitesh provided).

Arun

> 
> You mentioned the CapacityScheduler. Does this require MapReduce? or
> is it included in YARN? I understood that MRv2 is just an application
> built over the YARN framework. For our use case we don't need MR.
> 
> For a better understanding of my questions regarding the Distributed
> Shell. We intend to use YARN for a distributed automated test
> environment which will execute set of test suites for specific builds
> in parallel. Do you know about similar usages of YARN or MR, maybe
> case studies?
> 
> Thanks,
> Ioan
> 
> On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hi...@hortonworks.com> wrote:
>> Answers regarding DistributedShell.
>> 
>> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>> 
>> -- Hitesh
>> 
>> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>> 
>>> 
>>> Another point I would like to evaluate is the Distributed Shell example usage.
>>> Our use case is to start different scripts on a grid. Once a node has
>>> finished a script a new script has to be started on it. A report about
>>> the scripts execution has to be provided. in case a node has failed to
>>> execute a script it should be re-executed on a different node. Some
>>> scripts are Windows specific other are Unix specific and have to be
>>> executed on a node with a specific OS.
>>> 
>> 
>> The current implementation of distributed shell is effectively a piece of example code to help
>> folks write more complex applications. It simply supports launching a script on a given number
>> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
>> and simply reports a success/failure based on the no. of failures in running the script.
>> 
>> Based on your use case, it should be easy enough to build on the example code to handle the features that
>> you require.
>> 
>> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
>> for this feature request with some details about your use-case.
>> 
>> 
>>> The question is:
>>> Would it be feasible to adapt the example "Distributed Shell"
>>> application to have the above features?
>>> If yes how could I run some specific scripts only on a specific OS? Is
>>> this the ResourceManager responsability? What happens if there is no
>>> Windows node for example in the grid but in the queue there is a
>>> Windows script?
>>> How to re-execute failed scripts? Does it have to be implemented by
>>> custom code, or is it a built in feature of YARN?
>>> 
>>> 
>> 
>> The way YARN works is slightly different from what you describe above.
>> 
>> What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster.
>> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
>> assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
>> for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type
>> scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which
>> would then have to handle retries ( may require asking the RM for a new container ).
>> 
>> 
>> 
>>> Thank you in advance for your support,
>>> Ioan Zeng
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

Re: YARN Features

Posted by Ioan Zeng <ze...@gmail.com>.

Another evaluation criteria was the community support of the framework
which I rate now as very good :)

I would like to ask other questions:

I have seen YARN or MR used only in the context of HDFS. Would it be
possible to keep all YARN features without using it in relation with
HDFS (with no HDFS installed)?

You mentioned the CapacityScheduler. Does this require MapReduce? or
is it included in YARN? I understood that MRv2 is just an application
built over the YARN framework. For our use case we don't need MR.

For a better understanding of my questions regarding the Distributed
Shell. We intend to use YARN for a distributed automated test
environment which will execute set of test suites for specific builds
in parallel. Do you know about similar usages of YARN or MR, maybe
case studies?

Thanks,
Ioan

On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hi...@hortonworks.com> wrote:
> Answers regarding DistributedShell.
>
> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>
> -- Hitesh
>
> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>
>>
>> Another point I would like to evaluate is the Distributed Shell example usage.
>> Our use case is to start different scripts on a grid. Once a node has
>> finished a script a new script has to be started on it. A report about
>> the scripts execution has to be provided. in case a node has failed to
>> execute a script it should be re-executed on a different node. Some
>> scripts are Windows specific other are Unix specific and have to be
>> executed on a node with a specific OS.
>>
>
> The current implementation of distributed shell is effectively a piece of example code to help
> folks write more complex applications. It simply supports launching a script on a given number
> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
> and simply reports a success/failure based on the no. of failures in running the script.
>
> Based on your use case, it should be easy enough to build on the example code to handle the features that
> you require.
>
> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
> for this feature request with some details about your use-case.
>
>
>> The question is:
>> Would it be feasible to adapt the example "Distributed Shell"
>> application to have the above features?
>> If yes how could I run some specific scripts only on a specific OS? Is
>> this the ResourceManager responsability? What happens if there is no
>> Windows node for example in the grid but in the queue there is a
>> Windows script?
>> How to re-execute failed scripts? Does it have to be implemented by
>> custom code, or is it a built in feature of YARN?
>>
>>
>
> The way YARN works is slightly different from what you describe above.
>
> What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster.
> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
> assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
> for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type
> scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which
> would then have to handle retries ( may require asking the RM for a new container ).
>
>
>
>> Thank you in advance for your support,
>> Ioan Zeng
>

Re: YARN Features

Posted by Ioan Zeng <ze...@gmail.com>.

Another evaluation criteria was the community support of the framework
which I rate now as very good :)

I would like to ask other questions:

I have seen YARN or MR used only in the context of HDFS. Would it be
possible to keep all YARN features without using it in relation with
HDFS (with no HDFS installed)?

You mentioned the CapacityScheduler. Does this require MapReduce? or
is it included in YARN? I understood that MRv2 is just an application
built over the YARN framework. For our use case we don't need MR.

For a better understanding of my questions regarding the Distributed
Shell. We intend to use YARN for a distributed automated test
environment which will execute set of test suites for specific builds
in parallel. Do you know about similar usages of YARN or MR, maybe
case studies?

Thanks,
Ioan

On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hi...@hortonworks.com> wrote:
> Answers regarding DistributedShell.
>
> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>
> -- Hitesh
>
> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>
>>
>> Another point I would like to evaluate is the Distributed Shell example usage.
>> Our use case is to start different scripts on a grid. Once a node has
>> finished a script a new script has to be started on it. A report about
>> the scripts execution has to be provided. in case a node has failed to
>> execute a script it should be re-executed on a different node. Some
>> scripts are Windows specific other are Unix specific and have to be
>> executed on a node with a specific OS.
>>
>
> The current implementation of distributed shell is effectively a piece of example code to help
> folks write more complex applications. It simply supports launching a script on a given number
> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
> and simply reports a success/failure based on the no. of failures in running the script.
>
> Based on your use case, it should be easy enough to build on the example code to handle the features that
> you require.
>
> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
> for this feature request with some details about your use-case.
>
>
>> The question is:
>> Would it be feasible to adapt the example "Distributed Shell"
>> application to have the above features?
>> If yes how could I run some specific scripts only on a specific OS? Is
>> this the ResourceManager responsability? What happens if there is no
>> Windows node for example in the grid but in the queue there is a
>> Windows script?
>> How to re-execute failed scripts? Does it have to be implemented by
>> custom code, or is it a built in feature of YARN?
>>
>>
>
> The way YARN works is slightly different from what you describe above.
>
> What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster.
> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
> assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
> for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type
> scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which
> would then have to handle retries ( may require asking the RM for a new container ).
>
>
>
>> Thank you in advance for your support,
>> Ioan Zeng
>

Re: YARN Features

Posted by Ioan Zeng <ze...@gmail.com>.

Another evaluation criteria was the community support of the framework
which I rate now as very good :)

I would like to ask other questions:

I have seen YARN or MR used only in the context of HDFS. Would it be
possible to keep all YARN features without using it in relation with
HDFS (with no HDFS installed)?

You mentioned the CapacityScheduler. Does this require MapReduce? or
is it included in YARN? I understood that MRv2 is just an application
built over the YARN framework. For our use case we don't need MR.

For a better understanding of my questions regarding the Distributed
Shell. We intend to use YARN for a distributed automated test
environment which will execute set of test suites for specific builds
in parallel. Do you know about similar usages of YARN or MR, maybe
case studies?

Thanks,
Ioan

On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hi...@hortonworks.com> wrote:
> Answers regarding DistributedShell.
>
> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>
> -- Hitesh
>
> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>
>>
>> Another point I would like to evaluate is the Distributed Shell example usage.
>> Our use case is to start different scripts on a grid. Once a node has
>> finished a script a new script has to be started on it. A report about
>> the scripts execution has to be provided. in case a node has failed to
>> execute a script it should be re-executed on a different node. Some
>> scripts are Windows specific other are Unix specific and have to be
>> executed on a node with a specific OS.
>>
>
> The current implementation of distributed shell is effectively a piece of example code to help
> folks write more complex applications. It simply supports launching a script on a given number
> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
> and simply reports a success/failure based on the no. of failures in running the script.
>
> Based on your use case, it should be easy enough to build on the example code to handle the features that
> you require.
>
> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
> for this feature request with some details about your use-case.
>
>
>> The question is:
>> Would it be feasible to adapt the example "Distributed Shell"
>> application to have the above features?
>> If yes how could I run some specific scripts only on a specific OS? Is
>> this the ResourceManager responsability? What happens if there is no
>> Windows node for example in the grid but in the queue there is a
>> Windows script?
>> How to re-execute failed scripts? Does it have to be implemented by
>> custom code, or is it a built in feature of YARN?
>>
>>
>
> The way YARN works is slightly different from what you describe above.
>
> What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster.
> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
> assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
> for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type
> scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which
> would then have to handle retries ( may require asking the RM for a new container ).
>
>
>
>> Thank you in advance for your support,
>> Ioan Zeng
>

Re: YARN Features

Posted by Ioan Zeng <ze...@gmail.com>.

Another evaluation criteria was the community support of the framework
which I rate now as very good :)

I would like to ask other questions:

I have seen YARN or MR used only in the context of HDFS. Would it be
possible to keep all YARN features without using it in relation with
HDFS (with no HDFS installed)?

You mentioned the CapacityScheduler. Does this require MapReduce? or
is it included in YARN? I understood that MRv2 is just an application
built over the YARN framework. For our use case we don't need MR.

For a better understanding of my questions regarding the Distributed
Shell. We intend to use YARN for a distributed automated test
environment which will execute set of test suites for specific builds
in parallel. Do you know about similar usages of YARN or MR, maybe
case studies?

Thanks,
Ioan

On Tue, Mar 12, 2013 at 8:47 PM, Hitesh Shah <hi...@hortonworks.com> wrote:
> Answers regarding DistributedShell.
>
> https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.
>
> -- Hitesh
>
> On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:
>
>>
>> Another point I would like to evaluate is the Distributed Shell example usage.
>> Our use case is to start different scripts on a grid. Once a node has
>> finished a script a new script has to be started on it. A report about
>> the scripts execution has to be provided. in case a node has failed to
>> execute a script it should be re-executed on a different node. Some
>> scripts are Windows specific other are Unix specific and have to be
>> executed on a node with a specific OS.
>>
>
> The current implementation of distributed shell is effectively a piece of example code to help
> folks write more complex applications. It simply supports launching a script on a given number
> of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
> and simply reports a success/failure based on the no. of failures in running the script.
>
> Based on your use case, it should be easy enough to build on the example code to handle the features that
> you require.
>
> The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA
> for this feature request with some details about your use-case.
>
>
>> The question is:
>> Would it be feasible to adapt the example "Distributed Shell"
>> application to have the above features?
>> If yes how could I run some specific scripts only on a specific OS? Is
>> this the ResourceManager responsability? What happens if there is no
>> Windows node for example in the grid but in the queue there is a
>> Windows script?
>> How to re-execute failed scripts? Does it have to be implemented by
>> custom code, or is it a built in feature of YARN?
>>
>>
>
> The way YARN works is slightly different from what you describe above.
>
> What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster.
> It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
> assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
> for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type
> scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which
> would then have to handle retries ( may require asking the RM for a new container ).
>
>
>
>> Thank you in advance for your support,
>> Ioan Zeng
>

Re: YARN Features

Posted by Hitesh Shah <hi...@hortonworks.com>.

Answers regarding DistributedShell. 

https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.

-- Hitesh

On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:

> 
> Another point I would like to evaluate is the Distributed Shell example usage.
> Our use case is to start different scripts on a grid. Once a node has
> finished a script a new script has to be started on it. A report about
> the scripts execution has to be provided. in case a node has failed to
> execute a script it should be re-executed on a different node. Some
> scripts are Windows specific other are Unix specific and have to be
> executed on a node with a specific OS.
> 

The current implementation of distributed shell is effectively a piece of example code to help 
folks write more complex applications. It simply supports launching a script on a given number 
of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
and simply reports a success/failure based on the no. of failures in running the script. 

Based on your use case, it should be easy enough to build on the example code to handle the features that 
you require. 

The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA 
for this feature request with some details about your use-case. 

> The question is:
> Would it be feasible to adapt the example "Distributed Shell"
> application to have the above features?
> If yes how could I run some specific scripts only on a specific OS? Is
> this the ResourceManager responsability? What happens if there is no
> Windows node for example in the grid but in the queue there is a
> Windows script?
> How to re-execute failed scripts? Does it have to be implemented by
> custom code, or is it a built in feature of YARN?
> 
> 

The way YARN works is slightly different from what you describe above. 

What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster. 
It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type 
scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which 
would then have to handle retries ( may require asking the RM for a new container ).

> Thank you in advance for your support,
> Ioan Zeng

Re: YARN Features

Posted by Hitesh Shah <hi...@hortonworks.com>.

Answers inline. Will address the DistributedShell questions in a follow-up. 

-- Hitesh

On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:

> Hello Hadoop Team,
> 
> I need some help/support in evaluating Hadoop YARN from the perspective of:
> 
> - Adding/removing test nodes dynamically without restart
> Is this possible in YARN? or is it only a MapReduce application feature?
> 

Yarn Feature. Each node has a NodeManager daemon running on it. The Resourcemanager tracks nodes' health
via heartbeats from the NodeManager. In that respect, unhealthy nodes will be removed from the available 
pool and when an application requests for resources, it will not get allocated to an unhealthy or down node.

> - Failover from crashed test nodes
> Is this automatically done by the YARN framework?
> 

When a node fails, containers running on it are also reported as failed. This feedback about failed nodes and failed containers
is passed to the application by the ResourceManager. How the application handles these failures and acts upon the failure events to 
recover is based on application code. Each application will have its own logic to recover - YARN provides all the information for an application 
to act upon but does not take any action on the application's behalf ( slight caveat that the yarn framework does support restarting ( up to a max 
retry attempt limit ) the application itself if the application died or the node on which the application was running died.

> - Simple prioritization of jobs
> I know there is a prioritization possibility of the tasks in the
> Scheduler, right?
> 

Yes and no - depends on which scheduler you configure the RM with and how you configure it. The default scheduler used is the CapacityScheduler. 
Details on that are at http://hadoop.apache.org/docs/stable/capacity_scheduler.html.

> - Insight into queue (simple reporting, including running jobs on nodes)
> Is there any reporting possibility in YARN regarding the status of
> the not executed, running, finished jobs?
> 

The YARN UI provides these details for the last N applications that were handled ( N being around 10000 ). There are also webservices to access this 
data in xml/json format. There is a plan for a generic application history server but that is still being worked on.

> - Support heterogenous OSes on nodes (or use separate masters for
> homogenous grids)
> I think this is supported, right?
> 

I don't believe that there is anything stopping support of a heterogenous cluster. However, I am not sure if we have come across anyone 
who has tried that out. A lot of the problems may arise based on how well a application is written to handle launching containers correctly on different OS types.
The scheduler today does not account for asking for containers on only certain OS types. I am guessing there might be some minor features that be needed to be
addressed to fully support it. If you do try the above use case, please let us know if there are features/issues that you would like to see addressed.

> Another point I would like to evaluate is the Distributed Shell example usage.
> Our use case is to start different scripts on a grid. Once a node has
> finished a script a new script has to be started on it. A report about
> the scripts execution has to be provided. in case a node has failed to
> execute a script it should be re-executed on a different node. Some
> scripts are Windows specific other are Unix specific and have to be
> executed on a node with a specific OS.
> 
> The question is:
> Would it be feasible to adapt the example "Distributed Shell"
> application to have the above features?
> If yes how could I run some specific scripts only on a specific OS? Is
> this the ResourceManager responsability? What happens if there is no
> Windows node for example in the grid but in the queue there is a
> Windows script?
> How to re-execute failed scripts? Does it have to be implemented by
> custom code, or is it a built in feature of YARN?
> 
> 
> Thank you in advance for your support,
> Ioan Zeng

Re: YARN Features

Posted by Hitesh Shah <hi...@hortonworks.com>.

Answers regarding DistributedShell. 

https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.

-- Hitesh

On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:

> 
> Another point I would like to evaluate is the Distributed Shell example usage.
> Our use case is to start different scripts on a grid. Once a node has
> finished a script a new script has to be started on it. A report about
> the scripts execution has to be provided. in case a node has failed to
> execute a script it should be re-executed on a different node. Some
> scripts are Windows specific other are Unix specific and have to be
> executed on a node with a specific OS.
> 

The current implementation of distributed shell is effectively a piece of example code to help 
folks write more complex applications. It simply supports launching a script on a given number 
of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
and simply reports a success/failure based on the no. of failures in running the script. 

Based on your use case, it should be easy enough to build on the example code to handle the features that 
you require. 

The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA 
for this feature request with some details about your use-case. 

> The question is:
> Would it be feasible to adapt the example "Distributed Shell"
> application to have the above features?
> If yes how could I run some specific scripts only on a specific OS? Is
> this the ResourceManager responsability? What happens if there is no
> Windows node for example in the grid but in the queue there is a
> Windows script?
> How to re-execute failed scripts? Does it have to be implemented by
> custom code, or is it a built in feature of YARN?
> 
> 

The way YARN works is slightly different from what you describe above. 

What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster. 
It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type 
scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which 
would then have to handle retries ( may require asking the RM for a new container ).

> Thank you in advance for your support,
> Ioan Zeng

Re: YARN Features

Posted by Hitesh Shah <hi...@hortonworks.com>.

Answers inline. Will address the DistributedShell questions in a follow-up. 

-- Hitesh

On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:

> Hello Hadoop Team,
> 
> I need some help/support in evaluating Hadoop YARN from the perspective of:
> 
> - Adding/removing test nodes dynamically without restart
> Is this possible in YARN? or is it only a MapReduce application feature?
> 

Yarn Feature. Each node has a NodeManager daemon running on it. The Resourcemanager tracks nodes' health
via heartbeats from the NodeManager. In that respect, unhealthy nodes will be removed from the available 
pool and when an application requests for resources, it will not get allocated to an unhealthy or down node.

> - Failover from crashed test nodes
> Is this automatically done by the YARN framework?
> 

When a node fails, containers running on it are also reported as failed. This feedback about failed nodes and failed containers
is passed to the application by the ResourceManager. How the application handles these failures and acts upon the failure events to 
recover is based on application code. Each application will have its own logic to recover - YARN provides all the information for an application 
to act upon but does not take any action on the application's behalf ( slight caveat that the yarn framework does support restarting ( up to a max 
retry attempt limit ) the application itself if the application died or the node on which the application was running died.

> - Simple prioritization of jobs
> I know there is a prioritization possibility of the tasks in the
> Scheduler, right?
> 

Yes and no - depends on which scheduler you configure the RM with and how you configure it. The default scheduler used is the CapacityScheduler. 
Details on that are at http://hadoop.apache.org/docs/stable/capacity_scheduler.html.

> - Insight into queue (simple reporting, including running jobs on nodes)
> Is there any reporting possibility in YARN regarding the status of
> the not executed, running, finished jobs?
> 

The YARN UI provides these details for the last N applications that were handled ( N being around 10000 ). There are also webservices to access this 
data in xml/json format. There is a plan for a generic application history server but that is still being worked on.

> - Support heterogenous OSes on nodes (or use separate masters for
> homogenous grids)
> I think this is supported, right?
> 

I don't believe that there is anything stopping support of a heterogenous cluster. However, I am not sure if we have come across anyone 
who has tried that out. A lot of the problems may arise based on how well a application is written to handle launching containers correctly on different OS types.
The scheduler today does not account for asking for containers on only certain OS types. I am guessing there might be some minor features that be needed to be
addressed to fully support it. If you do try the above use case, please let us know if there are features/issues that you would like to see addressed.

> Another point I would like to evaluate is the Distributed Shell example usage.
> Our use case is to start different scripts on a grid. Once a node has
> finished a script a new script has to be started on it. A report about
> the scripts execution has to be provided. in case a node has failed to
> execute a script it should be re-executed on a different node. Some
> scripts are Windows specific other are Unix specific and have to be
> executed on a node with a specific OS.
> 
> The question is:
> Would it be feasible to adapt the example "Distributed Shell"
> application to have the above features?
> If yes how could I run some specific scripts only on a specific OS? Is
> this the ResourceManager responsability? What happens if there is no
> Windows node for example in the grid but in the queue there is a
> Windows script?
> How to re-execute failed scripts? Does it have to be implemented by
> custom code, or is it a built in feature of YARN?
> 
> 
> Thank you in advance for your support,
> Ioan Zeng

Re: YARN Features

Posted by Hitesh Shah <hi...@hortonworks.com>.

Answers regarding DistributedShell. 

https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.

-- Hitesh

On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:

> 
> Another point I would like to evaluate is the Distributed Shell example usage.
> Our use case is to start different scripts on a grid. Once a node has
> finished a script a new script has to be started on it. A report about
> the scripts execution has to be provided. in case a node has failed to
> execute a script it should be re-executed on a different node. Some
> scripts are Windows specific other are Unix specific and have to be
> executed on a node with a specific OS.
> 

The current implementation of distributed shell is effectively a piece of example code to help 
folks write more complex applications. It simply supports launching a script on a given number 
of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
and simply reports a success/failure based on the no. of failures in running the script. 

Based on your use case, it should be easy enough to build on the example code to handle the features that 
you require. 

The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA 
for this feature request with some details about your use-case. 

> The question is:
> Would it be feasible to adapt the example "Distributed Shell"
> application to have the above features?
> If yes how could I run some specific scripts only on a specific OS? Is
> this the ResourceManager responsability? What happens if there is no
> Windows node for example in the grid but in the queue there is a
> Windows script?
> How to re-execute failed scripts? Does it have to be implemented by
> custom code, or is it a built in feature of YARN?
> 
> 

The way YARN works is slightly different from what you describe above. 

What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster. 
It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type 
scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which 
would then have to handle retries ( may require asking the RM for a new container ).

> Thank you in advance for your support,
> Ioan Zeng

Re: YARN Features

Posted by Hitesh Shah <hi...@hortonworks.com>.

Answers inline. Will address the DistributedShell questions in a follow-up. 

-- Hitesh

On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:

> Hello Hadoop Team,
> 
> I need some help/support in evaluating Hadoop YARN from the perspective of:
> 
> - Adding/removing test nodes dynamically without restart
> Is this possible in YARN? or is it only a MapReduce application feature?
> 

Yarn Feature. Each node has a NodeManager daemon running on it. The Resourcemanager tracks nodes' health
via heartbeats from the NodeManager. In that respect, unhealthy nodes will be removed from the available 
pool and when an application requests for resources, it will not get allocated to an unhealthy or down node.

> - Failover from crashed test nodes
> Is this automatically done by the YARN framework?
> 

When a node fails, containers running on it are also reported as failed. This feedback about failed nodes and failed containers
is passed to the application by the ResourceManager. How the application handles these failures and acts upon the failure events to 
recover is based on application code. Each application will have its own logic to recover - YARN provides all the information for an application 
to act upon but does not take any action on the application's behalf ( slight caveat that the yarn framework does support restarting ( up to a max 
retry attempt limit ) the application itself if the application died or the node on which the application was running died.

> - Simple prioritization of jobs
> I know there is a prioritization possibility of the tasks in the
> Scheduler, right?
> 

Yes and no - depends on which scheduler you configure the RM with and how you configure it. The default scheduler used is the CapacityScheduler. 
Details on that are at http://hadoop.apache.org/docs/stable/capacity_scheduler.html.

> - Insight into queue (simple reporting, including running jobs on nodes)
> Is there any reporting possibility in YARN regarding the status of
> the not executed, running, finished jobs?
> 

The YARN UI provides these details for the last N applications that were handled ( N being around 10000 ). There are also webservices to access this 
data in xml/json format. There is a plan for a generic application history server but that is still being worked on.

> - Support heterogenous OSes on nodes (or use separate masters for
> homogenous grids)
> I think this is supported, right?
> 

I don't believe that there is anything stopping support of a heterogenous cluster. However, I am not sure if we have come across anyone 
who has tried that out. A lot of the problems may arise based on how well a application is written to handle launching containers correctly on different OS types.
The scheduler today does not account for asking for containers on only certain OS types. I am guessing there might be some minor features that be needed to be
addressed to fully support it. If you do try the above use case, please let us know if there are features/issues that you would like to see addressed.

> Another point I would like to evaluate is the Distributed Shell example usage.
> Our use case is to start different scripts on a grid. Once a node has
> finished a script a new script has to be started on it. A report about
> the scripts execution has to be provided. in case a node has failed to
> execute a script it should be re-executed on a different node. Some
> scripts are Windows specific other are Unix specific and have to be
> executed on a node with a specific OS.
> 
> The question is:
> Would it be feasible to adapt the example "Distributed Shell"
> application to have the above features?
> If yes how could I run some specific scripts only on a specific OS? Is
> this the ResourceManager responsability? What happens if there is no
> Windows node for example in the grid but in the queue there is a
> Windows script?
> How to re-execute failed scripts? Does it have to be implemented by
> custom code, or is it a built in feature of YARN?
> 
> 
> Thank you in advance for your support,
> Ioan Zeng

Re: YARN Features

Posted by Hitesh Shah <hi...@hortonworks.com>.

Answers regarding DistributedShell. 

https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf has some details on YARN's architecture.

-- Hitesh

On Mar 12, 2013, at 7:31 AM, Ioan Zeng wrote:

> 
> Another point I would like to evaluate is the Distributed Shell example usage.
> Our use case is to start different scripts on a grid. Once a node has
> finished a script a new script has to be started on it. A report about
> the scripts execution has to be provided. in case a node has failed to
> execute a script it should be re-executed on a different node. Some
> scripts are Windows specific other are Unix specific and have to be
> executed on a node with a specific OS.
> 

The current implementation of distributed shell is effectively a piece of example code to help 
folks write more complex applications. It simply supports launching a script on a given number 
of containers ( without accounting for where the containers are assigned ), does not handle retries on failures
and simply reports a success/failure based on the no. of failures in running the script. 

Based on your use case, it should be easy enough to build on the example code to handle the features that 
you require. 

The OS specific resource ask is something which will be need to be addressed in YARN. Could you file a JIRA 
for this feature request with some details about your use-case. 

> The question is:
> Would it be feasible to adapt the example "Distributed Shell"
> application to have the above features?
> If yes how could I run some specific scripts only on a specific OS? Is
> this the ResourceManager responsability? What happens if there is no
> Windows node for example in the grid but in the queue there is a
> Windows script?
> How to re-execute failed scripts? Does it have to be implemented by
> custom code, or is it a built in feature of YARN?
> 
> 

The way YARN works is slightly different from what you describe above. 

What you would do is write some form of a controller which in YARN terminology is referred to as an ApplicationMaster. 
It would request containers from the RM ( for example, 5 containers on WinOS, 5 on Linux with 1 GB each of RAM ). Once, the container is
assigned, the controller would be responsible for launching the correct script based on the container allocated. The RM would be responsible
for ensuring the correct set of containers are allocated to the container based on resource usage limits, priorities, etc. [ Again to clarify, OS type 
scheduling is currently not supported ]. If a script fails, the container's exit code and completion status would be fed back to the controller which 
would then have to handle retries ( may require asking the RM for a new container ).

> Thank you in advance for your support,
> Ioan Zeng