You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Rahul Bhattacharjee <ra...@gmail.com> on 2013/05/29 16:34:25 UTC

What else can be built on top of YARN.

Hi all,

I was going through the motivation behind Yarn. Splitting the
responsibility of JT is the major concern.Ultimately the base (Yarn) was
built in a generic way for building other generic distributed applications
too.

I am not able to think of any other parallel processing use case that would
be useful to built on top of YARN. I though of a lot of use cases that
would be beneficial when run in parallel , but again ,we can do those using
map only jobs in MR.

Can someone tell me a scenario , where a application can utilize Yarn
features or can be built on top of YARN and at the same time , it cannot be
done efficiently using MRv2 jobs.

thanks,
Rahul

RE: What else can be built on top of YARN.

Posted by John Lilley <jo...@redpoint.net>.
Thanks!  Would be nice if there was a tiny bit of documentation :)
But maybe it is good to use as an AM example.
John

From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Thursday, June 06, 2013 7:42 AM
To: user@hadoop.apache.org
Subject: Re: What else can be built on top of YARN.

John,

On Jun 1, 2013, at 7:02 AM, John Lilley wrote:



*         Algorithms that are not well-suited to the MR model, such as transitive closure.  They are more naturally expressed as MPI-like algorithms.

You might be interested in MPICH2 on YARN:
https://github.com/clarkyzl/mpich2-yarn

Disclaimer: I haven't used it myself.

Arun


RE: What else can be built on top of YARN.

Posted by John Lilley <jo...@redpoint.net>.
Thanks!  Would be nice if there was a tiny bit of documentation :)
But maybe it is good to use as an AM example.
John

From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Thursday, June 06, 2013 7:42 AM
To: user@hadoop.apache.org
Subject: Re: What else can be built on top of YARN.

John,

On Jun 1, 2013, at 7:02 AM, John Lilley wrote:



*         Algorithms that are not well-suited to the MR model, such as transitive closure.  They are more naturally expressed as MPI-like algorithms.

You might be interested in MPICH2 on YARN:
https://github.com/clarkyzl/mpich2-yarn

Disclaimer: I haven't used it myself.

Arun


RE: What else can be built on top of YARN.

Posted by John Lilley <jo...@redpoint.net>.
Thanks!  Would be nice if there was a tiny bit of documentation :)
But maybe it is good to use as an AM example.
John

From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Thursday, June 06, 2013 7:42 AM
To: user@hadoop.apache.org
Subject: Re: What else can be built on top of YARN.

John,

On Jun 1, 2013, at 7:02 AM, John Lilley wrote:



*         Algorithms that are not well-suited to the MR model, such as transitive closure.  They are more naturally expressed as MPI-like algorithms.

You might be interested in MPICH2 on YARN:
https://github.com/clarkyzl/mpich2-yarn

Disclaimer: I haven't used it myself.

Arun


RE: What else can be built on top of YARN.

Posted by John Lilley <jo...@redpoint.net>.
Thanks!  Would be nice if there was a tiny bit of documentation :)
But maybe it is good to use as an AM example.
John

From: Arun C Murthy [mailto:acm@hortonworks.com]
Sent: Thursday, June 06, 2013 7:42 AM
To: user@hadoop.apache.org
Subject: Re: What else can be built on top of YARN.

John,

On Jun 1, 2013, at 7:02 AM, John Lilley wrote:



*         Algorithms that are not well-suited to the MR model, such as transitive closure.  They are more naturally expressed as MPI-like algorithms.

You might be interested in MPICH2 on YARN:
https://github.com/clarkyzl/mpich2-yarn

Disclaimer: I haven't used it myself.

Arun


Re: What else can be built on top of YARN.

Posted by Arun C Murthy <ac...@hortonworks.com>.
John,

On Jun 1, 2013, at 7:02 AM, John Lilley wrote:

> 
> ·         Algorithms that are not well-suited to the MR model, such as transitive closure.  They are more naturally expressed as MPI-like algorithms.

You might be interested in MPICH2 on YARN:
https://github.com/clarkyzl/mpich2-yarn

Disclaimer: I haven't used it myself.

Arun


Re: What else can be built on top of YARN.

Posted by Arun C Murthy <ac...@hortonworks.com>.
John,

On Jun 1, 2013, at 7:02 AM, John Lilley wrote:

> 
> ·         Algorithms that are not well-suited to the MR model, such as transitive closure.  They are more naturally expressed as MPI-like algorithms.

You might be interested in MPICH2 on YARN:
https://github.com/clarkyzl/mpich2-yarn

Disclaimer: I haven't used it myself.

Arun


Re: What else can be built on top of YARN.

Posted by Arun C Murthy <ac...@hortonworks.com>.
John,

On Jun 1, 2013, at 7:02 AM, John Lilley wrote:

> 
> ·         Algorithms that are not well-suited to the MR model, such as transitive closure.  They are more naturally expressed as MPI-like algorithms.

You might be interested in MPICH2 on YARN:
https://github.com/clarkyzl/mpich2-yarn

Disclaimer: I haven't used it myself.

Arun


Re: What else can be built on top of YARN.

Posted by Arun C Murthy <ac...@hortonworks.com>.
John,

On Jun 1, 2013, at 7:02 AM, John Lilley wrote:

> 
> ·         Algorithms that are not well-suited to the MR model, such as transitive closure.  They are more naturally expressed as MPI-like algorithms.

You might be interested in MPICH2 on YARN:
https://github.com/clarkyzl/mpich2-yarn

Disclaimer: I haven't used it myself.

Arun


RE: What else can be built on top of YARN.

Posted by John Lilley <jo...@redpoint.net>.
Rahul,

This is a very good question, and one we are grappling with currently in our application port.  I think there are a lot of legacy data-processing applications like ours which would benefit by a port to Hadoop.  However, because we have a great load of C++, it is not necessarily a good fit for MR.  There seem to be two main choices:

·         Run under Hadoop “streams”

·         Run as a custom ApplicationMaster

One of the selling points of our application is its performance and single-code efficiency.  I have concerns about streams:

·         We will lose performance, because of the extra layers of translation and I/O and because streams data is uncompressed

·         The streams model is limited to single-in, single-out

·         We have a very large number and size of files to make available locally, it is unclear that the -files option is going to recursively copy and cache all of it

In contrast, porting our application as a YARN ApplicationMaster appears to offer several benefits (which come at the expense of extra complexity):

·         Negotiation for container resources and scheduling.  Some of our operations are very heavy (load time and memory use), so they need larger containers and will benefit from larger data splits.

·         Direct access to HDFS via JNI without translation layers.

·         Algorithms that are not well-suited to the MR model, such as transitive closure.  They are more naturally expressed as MPI-like algorithms.

·         If warranted, the ability to replace MR shuffle with a C++ data partition (this could be a discussion thread in its own right).

Moving our processing into native Java for a more seamless MR integration is not an option due to the size and complexity of the code base.

It may be that I am completely wrong about the limitations of the streams interface; if so please tell me why.

john

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
Sent: Wednesday, May 29, 2013 8:34 AM
To: user@hadoop.apache.org
Subject: What else can be built on top of YARN.

Hi all,
I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too.
I am not able to think of any other parallel processing use case that would be useful to built on top of YARN. I though of a lot of use cases that would be beneficial when run in parallel , but again ,we can do those using map only jobs in MR.
Can someone tell me a scenario , where a application can utilize Yarn features or can be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs.
thanks,
Rahul


Re: What else can be built on top of YARN.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks a lot for the responses. I now have a better understanding.

To answer to Jay's question , I think ZK can be used as for coordination
service for a distributed program (you have built it on top of exposed
granular api's) and it doesn't have features like resource management
(including allocation of resources based on requests) of cluster nodes ,
which yarn has.

Rahul


On Thu, May 30, 2013 at 5:59 PM, Jay Vyas <ja...@gmail.com> wrote:

> What is the separation of concerns between YARN and Zookeeper?  That is,
> where does YARN leave off and where does Zookeeper begin?  Or is there some
> overlap....
>
>
> On Thu, May 30, 2013 at 2:42 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Rahul,
>>
>>   It is at least because of the reasons that Vinod listed that makes my
>> life easy for porting my application on to YARN instead of making it work
>> in the Map Reduce framework. The main purpose of me using YARN is to
>> exploit the resource management capabilities of YARN.
>>
>> Thanks,
>> Kishore
>>
>>
>> On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Thanks for the response Krishna.
>>>
>>> I was wondering if it were possible for using MR to  solve you problem
>>> instead of building the whole stack on top of yarn.
>>> Most likely its not possible , thats why you are building it . I wanted
>>> to know why is that ?
>>>
>>> I am in just trying to find out the need or why we might need to write
>>> the application on yarn.
>>>
>>> Rahul
>>>
>>>
>>> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi Rahul,
>>>>
>>>>   I am porting a distributed application that runs on a fixed set of
>>>> given resources to YARN, with the aim of  being able to run it on a
>>>> dynamically selected resources whichever are available at the time of
>>>> running the application.
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>>
>>>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I was going through the motivation behind Yarn. Splitting the
>>>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>>>> built in a generic way for building other generic distributed applications
>>>>> too.
>>>>>
>>>>> I am not able to think of any other parallel processing use case that
>>>>> would be useful to built on top of YARN. I though of a lot of use cases
>>>>> that would be beneficial when run in parallel , but again ,we can do those
>>>>> using map only jobs in MR.
>>>>>
>>>>> Can someone tell me a scenario , where a application can utilize Yarn
>>>>> features or can be built on top of YARN and at the same time , it cannot be
>>>>> done efficiently using MRv2 jobs.
>>>>>
>>>>> thanks,
>>>>> Rahul
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: What else can be built on top of YARN.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks a lot for the responses. I now have a better understanding.

To answer to Jay's question , I think ZK can be used as for coordination
service for a distributed program (you have built it on top of exposed
granular api's) and it doesn't have features like resource management
(including allocation of resources based on requests) of cluster nodes ,
which yarn has.

Rahul


On Thu, May 30, 2013 at 5:59 PM, Jay Vyas <ja...@gmail.com> wrote:

> What is the separation of concerns between YARN and Zookeeper?  That is,
> where does YARN leave off and where does Zookeeper begin?  Or is there some
> overlap....
>
>
> On Thu, May 30, 2013 at 2:42 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Rahul,
>>
>>   It is at least because of the reasons that Vinod listed that makes my
>> life easy for porting my application on to YARN instead of making it work
>> in the Map Reduce framework. The main purpose of me using YARN is to
>> exploit the resource management capabilities of YARN.
>>
>> Thanks,
>> Kishore
>>
>>
>> On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Thanks for the response Krishna.
>>>
>>> I was wondering if it were possible for using MR to  solve you problem
>>> instead of building the whole stack on top of yarn.
>>> Most likely its not possible , thats why you are building it . I wanted
>>> to know why is that ?
>>>
>>> I am in just trying to find out the need or why we might need to write
>>> the application on yarn.
>>>
>>> Rahul
>>>
>>>
>>> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi Rahul,
>>>>
>>>>   I am porting a distributed application that runs on a fixed set of
>>>> given resources to YARN, with the aim of  being able to run it on a
>>>> dynamically selected resources whichever are available at the time of
>>>> running the application.
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>>
>>>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I was going through the motivation behind Yarn. Splitting the
>>>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>>>> built in a generic way for building other generic distributed applications
>>>>> too.
>>>>>
>>>>> I am not able to think of any other parallel processing use case that
>>>>> would be useful to built on top of YARN. I though of a lot of use cases
>>>>> that would be beneficial when run in parallel , but again ,we can do those
>>>>> using map only jobs in MR.
>>>>>
>>>>> Can someone tell me a scenario , where a application can utilize Yarn
>>>>> features or can be built on top of YARN and at the same time , it cannot be
>>>>> done efficiently using MRv2 jobs.
>>>>>
>>>>> thanks,
>>>>> Rahul
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: What else can be built on top of YARN.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks a lot for the responses. I now have a better understanding.

To answer to Jay's question , I think ZK can be used as for coordination
service for a distributed program (you have built it on top of exposed
granular api's) and it doesn't have features like resource management
(including allocation of resources based on requests) of cluster nodes ,
which yarn has.

Rahul


On Thu, May 30, 2013 at 5:59 PM, Jay Vyas <ja...@gmail.com> wrote:

> What is the separation of concerns between YARN and Zookeeper?  That is,
> where does YARN leave off and where does Zookeeper begin?  Or is there some
> overlap....
>
>
> On Thu, May 30, 2013 at 2:42 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Rahul,
>>
>>   It is at least because of the reasons that Vinod listed that makes my
>> life easy for porting my application on to YARN instead of making it work
>> in the Map Reduce framework. The main purpose of me using YARN is to
>> exploit the resource management capabilities of YARN.
>>
>> Thanks,
>> Kishore
>>
>>
>> On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Thanks for the response Krishna.
>>>
>>> I was wondering if it were possible for using MR to  solve you problem
>>> instead of building the whole stack on top of yarn.
>>> Most likely its not possible , thats why you are building it . I wanted
>>> to know why is that ?
>>>
>>> I am in just trying to find out the need or why we might need to write
>>> the application on yarn.
>>>
>>> Rahul
>>>
>>>
>>> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi Rahul,
>>>>
>>>>   I am porting a distributed application that runs on a fixed set of
>>>> given resources to YARN, with the aim of  being able to run it on a
>>>> dynamically selected resources whichever are available at the time of
>>>> running the application.
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>>
>>>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I was going through the motivation behind Yarn. Splitting the
>>>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>>>> built in a generic way for building other generic distributed applications
>>>>> too.
>>>>>
>>>>> I am not able to think of any other parallel processing use case that
>>>>> would be useful to built on top of YARN. I though of a lot of use cases
>>>>> that would be beneficial when run in parallel , but again ,we can do those
>>>>> using map only jobs in MR.
>>>>>
>>>>> Can someone tell me a scenario , where a application can utilize Yarn
>>>>> features or can be built on top of YARN and at the same time , it cannot be
>>>>> done efficiently using MRv2 jobs.
>>>>>
>>>>> thanks,
>>>>> Rahul
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: What else can be built on top of YARN.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks a lot for the responses. I now have a better understanding.

To answer to Jay's question , I think ZK can be used as for coordination
service for a distributed program (you have built it on top of exposed
granular api's) and it doesn't have features like resource management
(including allocation of resources based on requests) of cluster nodes ,
which yarn has.

Rahul


On Thu, May 30, 2013 at 5:59 PM, Jay Vyas <ja...@gmail.com> wrote:

> What is the separation of concerns between YARN and Zookeeper?  That is,
> where does YARN leave off and where does Zookeeper begin?  Or is there some
> overlap....
>
>
> On Thu, May 30, 2013 at 2:42 AM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Rahul,
>>
>>   It is at least because of the reasons that Vinod listed that makes my
>> life easy for porting my application on to YARN instead of making it work
>> in the Map Reduce framework. The main purpose of me using YARN is to
>> exploit the resource management capabilities of YARN.
>>
>> Thanks,
>> Kishore
>>
>>
>> On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Thanks for the response Krishna.
>>>
>>> I was wondering if it were possible for using MR to  solve you problem
>>> instead of building the whole stack on top of yarn.
>>> Most likely its not possible , thats why you are building it . I wanted
>>> to know why is that ?
>>>
>>> I am in just trying to find out the need or why we might need to write
>>> the application on yarn.
>>>
>>> Rahul
>>>
>>>
>>> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
>>> write2kishore@gmail.com> wrote:
>>>
>>>> Hi Rahul,
>>>>
>>>>   I am porting a distributed application that runs on a fixed set of
>>>> given resources to YARN, with the aim of  being able to run it on a
>>>> dynamically selected resources whichever are available at the time of
>>>> running the application.
>>>>
>>>> Thanks,
>>>> Kishore
>>>>
>>>>
>>>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>>>> rahul.rec.dgp@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I was going through the motivation behind Yarn. Splitting the
>>>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>>>> built in a generic way for building other generic distributed applications
>>>>> too.
>>>>>
>>>>> I am not able to think of any other parallel processing use case that
>>>>> would be useful to built on top of YARN. I though of a lot of use cases
>>>>> that would be beneficial when run in parallel , but again ,we can do those
>>>>> using map only jobs in MR.
>>>>>
>>>>> Can someone tell me a scenario , where a application can utilize Yarn
>>>>> features or can be built on top of YARN and at the same time , it cannot be
>>>>> done efficiently using MRv2 jobs.
>>>>>
>>>>> thanks,
>>>>> Rahul
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: What else can be built on top of YARN.

Posted by Jay Vyas <ja...@gmail.com>.
What is the separation of concerns between YARN and Zookeeper?  That is,
where does YARN leave off and where does Zookeeper begin?  Or is there some
overlap....


On Thu, May 30, 2013 at 2:42 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Rahul,
>
>   It is at least because of the reasons that Vinod listed that makes my
> life easy for porting my application on to YARN instead of making it work
> in the Map Reduce framework. The main purpose of me using YARN is to
> exploit the resource management capabilities of YARN.
>
> Thanks,
> Kishore
>
>
> On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Thanks for the response Krishna.
>>
>> I was wondering if it were possible for using MR to  solve you problem
>> instead of building the whole stack on top of yarn.
>> Most likely its not possible , thats why you are building it . I wanted
>> to know why is that ?
>>
>> I am in just trying to find out the need or why we might need to write
>> the application on yarn.
>>
>> Rahul
>>
>>
>> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi Rahul,
>>>
>>>   I am porting a distributed application that runs on a fixed set of
>>> given resources to YARN, with the aim of  being able to run it on a
>>> dynamically selected resources whichever are available at the time of
>>> running the application.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I was going through the motivation behind Yarn. Splitting the
>>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>>> built in a generic way for building other generic distributed applications
>>>> too.
>>>>
>>>> I am not able to think of any other parallel processing use case that
>>>> would be useful to built on top of YARN. I though of a lot of use cases
>>>> that would be beneficial when run in parallel , but again ,we can do those
>>>> using map only jobs in MR.
>>>>
>>>> Can someone tell me a scenario , where a application can utilize Yarn
>>>> features or can be built on top of YARN and at the same time , it cannot be
>>>> done efficiently using MRv2 jobs.
>>>>
>>>> thanks,
>>>> Rahul
>>>>
>>>>
>>>>
>>>
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: What else can be built on top of YARN.

Posted by Jay Vyas <ja...@gmail.com>.
What is the separation of concerns between YARN and Zookeeper?  That is,
where does YARN leave off and where does Zookeeper begin?  Or is there some
overlap....


On Thu, May 30, 2013 at 2:42 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Rahul,
>
>   It is at least because of the reasons that Vinod listed that makes my
> life easy for porting my application on to YARN instead of making it work
> in the Map Reduce framework. The main purpose of me using YARN is to
> exploit the resource management capabilities of YARN.
>
> Thanks,
> Kishore
>
>
> On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Thanks for the response Krishna.
>>
>> I was wondering if it were possible for using MR to  solve you problem
>> instead of building the whole stack on top of yarn.
>> Most likely its not possible , thats why you are building it . I wanted
>> to know why is that ?
>>
>> I am in just trying to find out the need or why we might need to write
>> the application on yarn.
>>
>> Rahul
>>
>>
>> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi Rahul,
>>>
>>>   I am porting a distributed application that runs on a fixed set of
>>> given resources to YARN, with the aim of  being able to run it on a
>>> dynamically selected resources whichever are available at the time of
>>> running the application.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I was going through the motivation behind Yarn. Splitting the
>>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>>> built in a generic way for building other generic distributed applications
>>>> too.
>>>>
>>>> I am not able to think of any other parallel processing use case that
>>>> would be useful to built on top of YARN. I though of a lot of use cases
>>>> that would be beneficial when run in parallel , but again ,we can do those
>>>> using map only jobs in MR.
>>>>
>>>> Can someone tell me a scenario , where a application can utilize Yarn
>>>> features or can be built on top of YARN and at the same time , it cannot be
>>>> done efficiently using MRv2 jobs.
>>>>
>>>> thanks,
>>>> Rahul
>>>>
>>>>
>>>>
>>>
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: What else can be built on top of YARN.

Posted by Jay Vyas <ja...@gmail.com>.
What is the separation of concerns between YARN and Zookeeper?  That is,
where does YARN leave off and where does Zookeeper begin?  Or is there some
overlap....


On Thu, May 30, 2013 at 2:42 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Rahul,
>
>   It is at least because of the reasons that Vinod listed that makes my
> life easy for porting my application on to YARN instead of making it work
> in the Map Reduce framework. The main purpose of me using YARN is to
> exploit the resource management capabilities of YARN.
>
> Thanks,
> Kishore
>
>
> On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Thanks for the response Krishna.
>>
>> I was wondering if it were possible for using MR to  solve you problem
>> instead of building the whole stack on top of yarn.
>> Most likely its not possible , thats why you are building it . I wanted
>> to know why is that ?
>>
>> I am in just trying to find out the need or why we might need to write
>> the application on yarn.
>>
>> Rahul
>>
>>
>> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi Rahul,
>>>
>>>   I am porting a distributed application that runs on a fixed set of
>>> given resources to YARN, with the aim of  being able to run it on a
>>> dynamically selected resources whichever are available at the time of
>>> running the application.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I was going through the motivation behind Yarn. Splitting the
>>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>>> built in a generic way for building other generic distributed applications
>>>> too.
>>>>
>>>> I am not able to think of any other parallel processing use case that
>>>> would be useful to built on top of YARN. I though of a lot of use cases
>>>> that would be beneficial when run in parallel , but again ,we can do those
>>>> using map only jobs in MR.
>>>>
>>>> Can someone tell me a scenario , where a application can utilize Yarn
>>>> features or can be built on top of YARN and at the same time , it cannot be
>>>> done efficiently using MRv2 jobs.
>>>>
>>>> thanks,
>>>> Rahul
>>>>
>>>>
>>>>
>>>
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: What else can be built on top of YARN.

Posted by Jay Vyas <ja...@gmail.com>.
What is the separation of concerns between YARN and Zookeeper?  That is,
where does YARN leave off and where does Zookeeper begin?  Or is there some
overlap....


On Thu, May 30, 2013 at 2:42 AM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Rahul,
>
>   It is at least because of the reasons that Vinod listed that makes my
> life easy for porting my application on to YARN instead of making it work
> in the Map Reduce framework. The main purpose of me using YARN is to
> exploit the resource management capabilities of YARN.
>
> Thanks,
> Kishore
>
>
> On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Thanks for the response Krishna.
>>
>> I was wondering if it were possible for using MR to  solve you problem
>> instead of building the whole stack on top of yarn.
>> Most likely its not possible , thats why you are building it . I wanted
>> to know why is that ?
>>
>> I am in just trying to find out the need or why we might need to write
>> the application on yarn.
>>
>> Rahul
>>
>>
>> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
>> write2kishore@gmail.com> wrote:
>>
>>> Hi Rahul,
>>>
>>>   I am porting a distributed application that runs on a fixed set of
>>> given resources to YARN, with the aim of  being able to run it on a
>>> dynamically selected resources whichever are available at the time of
>>> running the application.
>>>
>>> Thanks,
>>> Kishore
>>>
>>>
>>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>>> rahul.rec.dgp@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I was going through the motivation behind Yarn. Splitting the
>>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>>> built in a generic way for building other generic distributed applications
>>>> too.
>>>>
>>>> I am not able to think of any other parallel processing use case that
>>>> would be useful to built on top of YARN. I though of a lot of use cases
>>>> that would be beneficial when run in parallel , but again ,we can do those
>>>> using map only jobs in MR.
>>>>
>>>> Can someone tell me a scenario , where a application can utilize Yarn
>>>> features or can be built on top of YARN and at the same time , it cannot be
>>>> done efficiently using MRv2 jobs.
>>>>
>>>> thanks,
>>>> Rahul
>>>>
>>>>
>>>>
>>>
>>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: What else can be built on top of YARN.

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Rahul,

  It is at least because of the reasons that Vinod listed that makes my
life easy for porting my application on to YARN instead of making it work
in the Map Reduce framework. The main purpose of me using YARN is to
exploit the resource management capabilities of YARN.

Thanks,
Kishore


On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Thanks for the response Krishna.
>
> I was wondering if it were possible for using MR to  solve you problem
> instead of building the whole stack on top of yarn.
> Most likely its not possible , thats why you are building it . I wanted to
> know why is that ?
>
> I am in just trying to find out the need or why we might need to write the
> application on yarn.
>
> Rahul
>
>
> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Rahul,
>>
>>   I am porting a distributed application that runs on a fixed set of
>> given resources to YARN, with the aim of  being able to run it on a
>> dynamically selected resources whichever are available at the time of
>> running the application.
>>
>> Thanks,
>> Kishore
>>
>>
>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was going through the motivation behind Yarn. Splitting the
>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>> built in a generic way for building other generic distributed applications
>>> too.
>>>
>>> I am not able to think of any other parallel processing use case that
>>> would be useful to built on top of YARN. I though of a lot of use cases
>>> that would be beneficial when run in parallel , but again ,we can do those
>>> using map only jobs in MR.
>>>
>>> Can someone tell me a scenario , where a application can utilize Yarn
>>> features or can be built on top of YARN and at the same time , it cannot be
>>> done efficiently using MRv2 jobs.
>>>
>>> thanks,
>>> Rahul
>>>
>>>
>>>
>>
>

Re: What else can be built on top of YARN.

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Rahul,

  It is at least because of the reasons that Vinod listed that makes my
life easy for porting my application on to YARN instead of making it work
in the Map Reduce framework. The main purpose of me using YARN is to
exploit the resource management capabilities of YARN.

Thanks,
Kishore


On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Thanks for the response Krishna.
>
> I was wondering if it were possible for using MR to  solve you problem
> instead of building the whole stack on top of yarn.
> Most likely its not possible , thats why you are building it . I wanted to
> know why is that ?
>
> I am in just trying to find out the need or why we might need to write the
> application on yarn.
>
> Rahul
>
>
> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Rahul,
>>
>>   I am porting a distributed application that runs on a fixed set of
>> given resources to YARN, with the aim of  being able to run it on a
>> dynamically selected resources whichever are available at the time of
>> running the application.
>>
>> Thanks,
>> Kishore
>>
>>
>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was going through the motivation behind Yarn. Splitting the
>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>> built in a generic way for building other generic distributed applications
>>> too.
>>>
>>> I am not able to think of any other parallel processing use case that
>>> would be useful to built on top of YARN. I though of a lot of use cases
>>> that would be beneficial when run in parallel , but again ,we can do those
>>> using map only jobs in MR.
>>>
>>> Can someone tell me a scenario , where a application can utilize Yarn
>>> features or can be built on top of YARN and at the same time , it cannot be
>>> done efficiently using MRv2 jobs.
>>>
>>> thanks,
>>> Rahul
>>>
>>>
>>>
>>
>

Re: What else can be built on top of YARN.

Posted by Viral Bajaria <vi...@gmail.com>.
There is a project at Yahoo which makes it possible to run Storm on Yarn. I
think the team behind it is going to give a talk at Hadoop Summit and plan
to open source it after that.

-Viral

On Wed, May 29, 2013 at 11:04 AM, John Conwell <jo...@iamjohn.me> wrote:

> Storm, a distributed realtime computation framework used for analyzing
> realtime steams of data, doesn't really need to be ported.  Its doing fine
> by itself, though I think its a prime candidate for a Yarn port.

Re: What else can be built on top of YARN.

Posted by Viral Bajaria <vi...@gmail.com>.
There is a project at Yahoo which makes it possible to run Storm on Yarn. I
think the team behind it is going to give a talk at Hadoop Summit and plan
to open source it after that.

-Viral

On Wed, May 29, 2013 at 11:04 AM, John Conwell <jo...@iamjohn.me> wrote:

> Storm, a distributed realtime computation framework used for analyzing
> realtime steams of data, doesn't really need to be ported.  Its doing fine
> by itself, though I think its a prime candidate for a Yarn port.

Re: What else can be built on top of YARN.

Posted by Viral Bajaria <vi...@gmail.com>.
There is a project at Yahoo which makes it possible to run Storm on Yarn. I
think the team behind it is going to give a talk at Hadoop Summit and plan
to open source it after that.

-Viral

On Wed, May 29, 2013 at 11:04 AM, John Conwell <jo...@iamjohn.me> wrote:

> Storm, a distributed realtime computation framework used for analyzing
> realtime steams of data, doesn't really need to be ported.  Its doing fine
> by itself, though I think its a prime candidate for a Yarn port.

Re: What else can be built on top of YARN.

Posted by Viral Bajaria <vi...@gmail.com>.
There is a project at Yahoo which makes it possible to run Storm on Yarn. I
think the team behind it is going to give a talk at Hadoop Summit and plan
to open source it after that.

-Viral

On Wed, May 29, 2013 at 11:04 AM, John Conwell <jo...@iamjohn.me> wrote:

> Storm, a distributed realtime computation framework used for analyzing
> realtime steams of data, doesn't really need to be ported.  Its doing fine
> by itself, though I think its a prime candidate for a Yarn port.

Re: What else can be built on top of YARN.

Posted by John Conwell <jo...@iamjohn.me>.
Two scenarios I can think of are re-implementations of Twitter's Storm (
http://storm-project.net/) and DryadLinq (
http://research.microsoft.com/en-us/projects/dryadlinq/).

Storm, a distributed realtime computation framework used for analyzing
realtime steams of data, doesn't really need to be ported.  Its doing fine
by itself, though I think its a prime candidate for a Yarn port.

DryadLinq is a (now closed) research project out of Microsoft Research that
allowed the user to write standard LINQ code (in any .net language) and it
build an execution DAG based structure of the LINQ statement, and execute
the DAG on a MS HPC cluster.

The LINQ syntax is very much like PIG, though way more flexible and has
full IDE support (is Visual Studio), and is used in standard single process
programming.  That, to me, was the beauty behind DryadLinq: the programming
language for distributed execution was exactly the same as a well known and
used language for standard single process programming already used by
hundreds of thousands of programmers, so learning curve and acceptance debt
is really low.  But, like all good things that come out of MS Research, it
was killed because they sat on it too long.

The interesting thing is that distributed DAG execution is one of the main
examples given for the types of Yarn applications that could be developed.








On Wed, May 29, 2013 at 10:30 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Thanks for the response Krishna.
>
> I was wondering if it were possible for using MR to  solve you problem
> instead of building the whole stack on top of yarn.
> Most likely its not possible , thats why you are building it . I wanted to
> know why is that ?
>
> I am in just trying to find out the need or why we might need to write the
> application on yarn.
>
> Rahul
>
>
> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Rahul,
>>
>>   I am porting a distributed application that runs on a fixed set of
>> given resources to YARN, with the aim of  being able to run it on a
>> dynamically selected resources whichever are available at the time of
>> running the application.
>>
>> Thanks,
>> Kishore
>>
>>
>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was going through the motivation behind Yarn. Splitting the
>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>> built in a generic way for building other generic distributed applications
>>> too.
>>>
>>> I am not able to think of any other parallel processing use case that
>>> would be useful to built on top of YARN. I though of a lot of use cases
>>> that would be beneficial when run in parallel , but again ,we can do those
>>> using map only jobs in MR.
>>>
>>> Can someone tell me a scenario , where a application can utilize Yarn
>>> features or can be built on top of YARN and at the same time , it cannot be
>>> done efficiently using MRv2 jobs.
>>>
>>> thanks,
>>> Rahul
>>>
>>>
>>>
>>
>


-- 

Thanks,
John C

Re: What else can be built on top of YARN.

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Rahul,

  It is at least because of the reasons that Vinod listed that makes my
life easy for porting my application on to YARN instead of making it work
in the Map Reduce framework. The main purpose of me using YARN is to
exploit the resource management capabilities of YARN.

Thanks,
Kishore


On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Thanks for the response Krishna.
>
> I was wondering if it were possible for using MR to  solve you problem
> instead of building the whole stack on top of yarn.
> Most likely its not possible , thats why you are building it . I wanted to
> know why is that ?
>
> I am in just trying to find out the need or why we might need to write the
> application on yarn.
>
> Rahul
>
>
> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Rahul,
>>
>>   I am porting a distributed application that runs on a fixed set of
>> given resources to YARN, with the aim of  being able to run it on a
>> dynamically selected resources whichever are available at the time of
>> running the application.
>>
>> Thanks,
>> Kishore
>>
>>
>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was going through the motivation behind Yarn. Splitting the
>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>> built in a generic way for building other generic distributed applications
>>> too.
>>>
>>> I am not able to think of any other parallel processing use case that
>>> would be useful to built on top of YARN. I though of a lot of use cases
>>> that would be beneficial when run in parallel , but again ,we can do those
>>> using map only jobs in MR.
>>>
>>> Can someone tell me a scenario , where a application can utilize Yarn
>>> features or can be built on top of YARN and at the same time , it cannot be
>>> done efficiently using MRv2 jobs.
>>>
>>> thanks,
>>> Rahul
>>>
>>>
>>>
>>
>

Re: What else can be built on top of YARN.

Posted by John Conwell <jo...@iamjohn.me>.
Two scenarios I can think of are re-implementations of Twitter's Storm (
http://storm-project.net/) and DryadLinq (
http://research.microsoft.com/en-us/projects/dryadlinq/).

Storm, a distributed realtime computation framework used for analyzing
realtime steams of data, doesn't really need to be ported.  Its doing fine
by itself, though I think its a prime candidate for a Yarn port.

DryadLinq is a (now closed) research project out of Microsoft Research that
allowed the user to write standard LINQ code (in any .net language) and it
build an execution DAG based structure of the LINQ statement, and execute
the DAG on a MS HPC cluster.

The LINQ syntax is very much like PIG, though way more flexible and has
full IDE support (is Visual Studio), and is used in standard single process
programming.  That, to me, was the beauty behind DryadLinq: the programming
language for distributed execution was exactly the same as a well known and
used language for standard single process programming already used by
hundreds of thousands of programmers, so learning curve and acceptance debt
is really low.  But, like all good things that come out of MS Research, it
was killed because they sat on it too long.

The interesting thing is that distributed DAG execution is one of the main
examples given for the types of Yarn applications that could be developed.








On Wed, May 29, 2013 at 10:30 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Thanks for the response Krishna.
>
> I was wondering if it were possible for using MR to  solve you problem
> instead of building the whole stack on top of yarn.
> Most likely its not possible , thats why you are building it . I wanted to
> know why is that ?
>
> I am in just trying to find out the need or why we might need to write the
> application on yarn.
>
> Rahul
>
>
> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Rahul,
>>
>>   I am porting a distributed application that runs on a fixed set of
>> given resources to YARN, with the aim of  being able to run it on a
>> dynamically selected resources whichever are available at the time of
>> running the application.
>>
>> Thanks,
>> Kishore
>>
>>
>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was going through the motivation behind Yarn. Splitting the
>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>> built in a generic way for building other generic distributed applications
>>> too.
>>>
>>> I am not able to think of any other parallel processing use case that
>>> would be useful to built on top of YARN. I though of a lot of use cases
>>> that would be beneficial when run in parallel , but again ,we can do those
>>> using map only jobs in MR.
>>>
>>> Can someone tell me a scenario , where a application can utilize Yarn
>>> features or can be built on top of YARN and at the same time , it cannot be
>>> done efficiently using MRv2 jobs.
>>>
>>> thanks,
>>> Rahul
>>>
>>>
>>>
>>
>


-- 

Thanks,
John C

Re: What else can be built on top of YARN.

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Rahul,

  It is at least because of the reasons that Vinod listed that makes my
life easy for porting my application on to YARN instead of making it work
in the Map Reduce framework. The main purpose of me using YARN is to
exploit the resource management capabilities of YARN.

Thanks,
Kishore


On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Thanks for the response Krishna.
>
> I was wondering if it were possible for using MR to  solve you problem
> instead of building the whole stack on top of yarn.
> Most likely its not possible , thats why you are building it . I wanted to
> know why is that ?
>
> I am in just trying to find out the need or why we might need to write the
> application on yarn.
>
> Rahul
>
>
> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Rahul,
>>
>>   I am porting a distributed application that runs on a fixed set of
>> given resources to YARN, with the aim of  being able to run it on a
>> dynamically selected resources whichever are available at the time of
>> running the application.
>>
>> Thanks,
>> Kishore
>>
>>
>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was going through the motivation behind Yarn. Splitting the
>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>> built in a generic way for building other generic distributed applications
>>> too.
>>>
>>> I am not able to think of any other parallel processing use case that
>>> would be useful to built on top of YARN. I though of a lot of use cases
>>> that would be beneficial when run in parallel , but again ,we can do those
>>> using map only jobs in MR.
>>>
>>> Can someone tell me a scenario , where a application can utilize Yarn
>>> features or can be built on top of YARN and at the same time , it cannot be
>>> done efficiently using MRv2 jobs.
>>>
>>> thanks,
>>> Rahul
>>>
>>>
>>>
>>
>

Re: What else can be built on top of YARN.

Posted by John Conwell <jo...@iamjohn.me>.
Two scenarios I can think of are re-implementations of Twitter's Storm (
http://storm-project.net/) and DryadLinq (
http://research.microsoft.com/en-us/projects/dryadlinq/).

Storm, a distributed realtime computation framework used for analyzing
realtime steams of data, doesn't really need to be ported.  Its doing fine
by itself, though I think its a prime candidate for a Yarn port.

DryadLinq is a (now closed) research project out of Microsoft Research that
allowed the user to write standard LINQ code (in any .net language) and it
build an execution DAG based structure of the LINQ statement, and execute
the DAG on a MS HPC cluster.

The LINQ syntax is very much like PIG, though way more flexible and has
full IDE support (is Visual Studio), and is used in standard single process
programming.  That, to me, was the beauty behind DryadLinq: the programming
language for distributed execution was exactly the same as a well known and
used language for standard single process programming already used by
hundreds of thousands of programmers, so learning curve and acceptance debt
is really low.  But, like all good things that come out of MS Research, it
was killed because they sat on it too long.

The interesting thing is that distributed DAG execution is one of the main
examples given for the types of Yarn applications that could be developed.








On Wed, May 29, 2013 at 10:30 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Thanks for the response Krishna.
>
> I was wondering if it were possible for using MR to  solve you problem
> instead of building the whole stack on top of yarn.
> Most likely its not possible , thats why you are building it . I wanted to
> know why is that ?
>
> I am in just trying to find out the need or why we might need to write the
> application on yarn.
>
> Rahul
>
>
> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Rahul,
>>
>>   I am porting a distributed application that runs on a fixed set of
>> given resources to YARN, with the aim of  being able to run it on a
>> dynamically selected resources whichever are available at the time of
>> running the application.
>>
>> Thanks,
>> Kishore
>>
>>
>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was going through the motivation behind Yarn. Splitting the
>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>> built in a generic way for building other generic distributed applications
>>> too.
>>>
>>> I am not able to think of any other parallel processing use case that
>>> would be useful to built on top of YARN. I though of a lot of use cases
>>> that would be beneficial when run in parallel , but again ,we can do those
>>> using map only jobs in MR.
>>>
>>> Can someone tell me a scenario , where a application can utilize Yarn
>>> features or can be built on top of YARN and at the same time , it cannot be
>>> done efficiently using MRv2 jobs.
>>>
>>> thanks,
>>> Rahul
>>>
>>>
>>>
>>
>


-- 

Thanks,
John C

Re: What else can be built on top of YARN.

Posted by John Conwell <jo...@iamjohn.me>.
Two scenarios I can think of are re-implementations of Twitter's Storm (
http://storm-project.net/) and DryadLinq (
http://research.microsoft.com/en-us/projects/dryadlinq/).

Storm, a distributed realtime computation framework used for analyzing
realtime steams of data, doesn't really need to be ported.  Its doing fine
by itself, though I think its a prime candidate for a Yarn port.

DryadLinq is a (now closed) research project out of Microsoft Research that
allowed the user to write standard LINQ code (in any .net language) and it
build an execution DAG based structure of the LINQ statement, and execute
the DAG on a MS HPC cluster.

The LINQ syntax is very much like PIG, though way more flexible and has
full IDE support (is Visual Studio), and is used in standard single process
programming.  That, to me, was the beauty behind DryadLinq: the programming
language for distributed execution was exactly the same as a well known and
used language for standard single process programming already used by
hundreds of thousands of programmers, so learning curve and acceptance debt
is really low.  But, like all good things that come out of MS Research, it
was killed because they sat on it too long.

The interesting thing is that distributed DAG execution is one of the main
examples given for the types of Yarn applications that could be developed.








On Wed, May 29, 2013 at 10:30 AM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Thanks for the response Krishna.
>
> I was wondering if it were possible for using MR to  solve you problem
> instead of building the whole stack on top of yarn.
> Most likely its not possible , thats why you are building it . I wanted to
> know why is that ?
>
> I am in just trying to find out the need or why we might need to write the
> application on yarn.
>
> Rahul
>
>
> On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
> write2kishore@gmail.com> wrote:
>
>> Hi Rahul,
>>
>>   I am porting a distributed application that runs on a fixed set of
>> given resources to YARN, with the aim of  being able to run it on a
>> dynamically selected resources whichever are available at the time of
>> running the application.
>>
>> Thanks,
>> Kishore
>>
>>
>> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
>> rahul.rec.dgp@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I was going through the motivation behind Yarn. Splitting the
>>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>>> built in a generic way for building other generic distributed applications
>>> too.
>>>
>>> I am not able to think of any other parallel processing use case that
>>> would be useful to built on top of YARN. I though of a lot of use cases
>>> that would be beneficial when run in parallel , but again ,we can do those
>>> using map only jobs in MR.
>>>
>>> Can someone tell me a scenario , where a application can utilize Yarn
>>> features or can be built on top of YARN and at the same time , it cannot be
>>> done efficiently using MRv2 jobs.
>>>
>>> thanks,
>>> Rahul
>>>
>>>
>>>
>>
>


-- 

Thanks,
John C

Re: What else can be built on top of YARN.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks for the response Krishna.

I was wondering if it were possible for using MR to  solve you problem
instead of building the whole stack on top of yarn.
Most likely its not possible , thats why you are building it . I wanted to
know why is that ?

I am in just trying to find out the need or why we might need to write the
application on yarn.

Rahul


On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Rahul,
>
>   I am porting a distributed application that runs on a fixed set of given
> resources to YARN, with the aim of  being able to run it on a dynamically
> selected resources whichever are available at the time of running the
> application.
>
> Thanks,
> Kishore
>
>
> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi all,
>>
>> I was going through the motivation behind Yarn. Splitting the
>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>> built in a generic way for building other generic distributed applications
>> too.
>>
>> I am not able to think of any other parallel processing use case that
>> would be useful to built on top of YARN. I though of a lot of use cases
>> that would be beneficial when run in parallel , but again ,we can do those
>> using map only jobs in MR.
>>
>> Can someone tell me a scenario , where a application can utilize Yarn
>> features or can be built on top of YARN and at the same time , it cannot be
>> done efficiently using MRv2 jobs.
>>
>> thanks,
>> Rahul
>>
>>
>>
>

Re: What else can be built on top of YARN.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks for the response Krishna.

I was wondering if it were possible for using MR to  solve you problem
instead of building the whole stack on top of yarn.
Most likely its not possible , thats why you are building it . I wanted to
know why is that ?

I am in just trying to find out the need or why we might need to write the
application on yarn.

Rahul


On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Rahul,
>
>   I am porting a distributed application that runs on a fixed set of given
> resources to YARN, with the aim of  being able to run it on a dynamically
> selected resources whichever are available at the time of running the
> application.
>
> Thanks,
> Kishore
>
>
> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi all,
>>
>> I was going through the motivation behind Yarn. Splitting the
>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>> built in a generic way for building other generic distributed applications
>> too.
>>
>> I am not able to think of any other parallel processing use case that
>> would be useful to built on top of YARN. I though of a lot of use cases
>> that would be beneficial when run in parallel , but again ,we can do those
>> using map only jobs in MR.
>>
>> Can someone tell me a scenario , where a application can utilize Yarn
>> features or can be built on top of YARN and at the same time , it cannot be
>> done efficiently using MRv2 jobs.
>>
>> thanks,
>> Rahul
>>
>>
>>
>

Re: What else can be built on top of YARN.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks for the response Krishna.

I was wondering if it were possible for using MR to  solve you problem
instead of building the whole stack on top of yarn.
Most likely its not possible , thats why you are building it . I wanted to
know why is that ?

I am in just trying to find out the need or why we might need to write the
application on yarn.

Rahul


On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Rahul,
>
>   I am porting a distributed application that runs on a fixed set of given
> resources to YARN, with the aim of  being able to run it on a dynamically
> selected resources whichever are available at the time of running the
> application.
>
> Thanks,
> Kishore
>
>
> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi all,
>>
>> I was going through the motivation behind Yarn. Splitting the
>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>> built in a generic way for building other generic distributed applications
>> too.
>>
>> I am not able to think of any other parallel processing use case that
>> would be useful to built on top of YARN. I though of a lot of use cases
>> that would be beneficial when run in parallel , but again ,we can do those
>> using map only jobs in MR.
>>
>> Can someone tell me a scenario , where a application can utilize Yarn
>> features or can be built on top of YARN and at the same time , it cannot be
>> done efficiently using MRv2 jobs.
>>
>> thanks,
>> Rahul
>>
>>
>>
>

Re: What else can be built on top of YARN.

Posted by Rahul Bhattacharjee <ra...@gmail.com>.
Thanks for the response Krishna.

I was wondering if it were possible for using MR to  solve you problem
instead of building the whole stack on top of yarn.
Most likely its not possible , thats why you are building it . I wanted to
know why is that ?

I am in just trying to find out the need or why we might need to write the
application on yarn.

Rahul


On Wed, May 29, 2013 at 8:23 PM, Krishna Kishore Bonagiri <
write2kishore@gmail.com> wrote:

> Hi Rahul,
>
>   I am porting a distributed application that runs on a fixed set of given
> resources to YARN, with the aim of  being able to run it on a dynamically
> selected resources whichever are available at the time of running the
> application.
>
> Thanks,
> Kishore
>
>
> On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
> rahul.rec.dgp@gmail.com> wrote:
>
>> Hi all,
>>
>> I was going through the motivation behind Yarn. Splitting the
>> responsibility of JT is the major concern.Ultimately the base (Yarn) was
>> built in a generic way for building other generic distributed applications
>> too.
>>
>> I am not able to think of any other parallel processing use case that
>> would be useful to built on top of YARN. I though of a lot of use cases
>> that would be beneficial when run in parallel , but again ,we can do those
>> using map only jobs in MR.
>>
>> Can someone tell me a scenario , where a application can utilize Yarn
>> features or can be built on top of YARN and at the same time , it cannot be
>> done efficiently using MRv2 jobs.
>>
>> thanks,
>> Rahul
>>
>>
>>
>

Re: What else can be built on top of YARN.

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Rahul,

  I am porting a distributed application that runs on a fixed set of given
resources to YARN, with the aim of  being able to run it on a dynamically
selected resources whichever are available at the time of running the
application.

Thanks,
Kishore


On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi all,
>
> I was going through the motivation behind Yarn. Splitting the
> responsibility of JT is the major concern.Ultimately the base (Yarn) was
> built in a generic way for building other generic distributed applications
> too.
>
> I am not able to think of any other parallel processing use case that
> would be useful to built on top of YARN. I though of a lot of use cases
> that would be beneficial when run in parallel , but again ,we can do those
> using map only jobs in MR.
>
> Can someone tell me a scenario , where a application can utilize Yarn
> features or can be built on top of YARN and at the same time , it cannot be
> done efficiently using MRv2 jobs.
>
> thanks,
> Rahul
>
>
>

Re: What else can be built on top of YARN.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Historically, many applications/frameworks wanted to take advantage of just the resource management capabilities and failure handling of Hadoop (via JobTracker/TaskTracker), but were forced to used MapReduce even though they didn't have to. Obvious examples are graph processing (Giraph), BSP(Hama), storm/s4 and even a simple tool like DistCp.

There are issues even with map-only jobs.
 - You have to fake key-value processing, periodic pings, key-value outputs
 - You are limited to map slot capacity in the cluster
 - The number of tasks is static, so you cannot grow and shrink your job
 - You are forced to sort data all the time (even though this has changed recently)
 - You are tied to faking things like OutputCommit even if you don't need to.

That's just for starters. I can definitely think harder and list more ;)

YARN lets you move ahead without those limitations.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/


On May 29, 2013, at 7:34 AM, Rahul Bhattacharjee wrote:

> Hi all,
> 
> I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too.
> 
> I am not able to think of any other parallel processing use case that would be useful to built on top of YARN. I though of a lot of use cases that would be beneficial when run in parallel , but again ,we can do those using map only jobs in MR.
> 
> Can someone tell me a scenario , where a application can utilize Yarn features or can be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs.
> 
> thanks,
> Rahul
> 
> 


RE: What else can be built on top of YARN.

Posted by John Lilley <jo...@redpoint.net>.
Rahul,

This is a very good question, and one we are grappling with currently in our application port.  I think there are a lot of legacy data-processing applications like ours which would benefit by a port to Hadoop.  However, because we have a great load of C++, it is not necessarily a good fit for MR.  There seem to be two main choices:

·         Run under Hadoop “streams”

·         Run as a custom ApplicationMaster

One of the selling points of our application is its performance and single-code efficiency.  I have concerns about streams:

·         We will lose performance, because of the extra layers of translation and I/O and because streams data is uncompressed

·         The streams model is limited to single-in, single-out

·         We have a very large number and size of files to make available locally, it is unclear that the -files option is going to recursively copy and cache all of it

In contrast, porting our application as a YARN ApplicationMaster appears to offer several benefits (which come at the expense of extra complexity):

·         Negotiation for container resources and scheduling.  Some of our operations are very heavy (load time and memory use), so they need larger containers and will benefit from larger data splits.

·         Direct access to HDFS via JNI without translation layers.

·         Algorithms that are not well-suited to the MR model, such as transitive closure.  They are more naturally expressed as MPI-like algorithms.

·         If warranted, the ability to replace MR shuffle with a C++ data partition (this could be a discussion thread in its own right).

Moving our processing into native Java for a more seamless MR integration is not an option due to the size and complexity of the code base.

It may be that I am completely wrong about the limitations of the streams interface; if so please tell me why.

john

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
Sent: Wednesday, May 29, 2013 8:34 AM
To: user@hadoop.apache.org
Subject: What else can be built on top of YARN.

Hi all,
I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too.
I am not able to think of any other parallel processing use case that would be useful to built on top of YARN. I though of a lot of use cases that would be beneficial when run in parallel , but again ,we can do those using map only jobs in MR.
Can someone tell me a scenario , where a application can utilize Yarn features or can be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs.
thanks,
Rahul


RE: What else can be built on top of YARN.

Posted by John Lilley <jo...@redpoint.net>.
Rahul,

This is a very good question, and one we are grappling with currently in our application port.  I think there are a lot of legacy data-processing applications like ours which would benefit by a port to Hadoop.  However, because we have a great load of C++, it is not necessarily a good fit for MR.  There seem to be two main choices:

·         Run under Hadoop “streams”

·         Run as a custom ApplicationMaster

One of the selling points of our application is its performance and single-code efficiency.  I have concerns about streams:

·         We will lose performance, because of the extra layers of translation and I/O and because streams data is uncompressed

·         The streams model is limited to single-in, single-out

·         We have a very large number and size of files to make available locally, it is unclear that the -files option is going to recursively copy and cache all of it

In contrast, porting our application as a YARN ApplicationMaster appears to offer several benefits (which come at the expense of extra complexity):

·         Negotiation for container resources and scheduling.  Some of our operations are very heavy (load time and memory use), so they need larger containers and will benefit from larger data splits.

·         Direct access to HDFS via JNI without translation layers.

·         Algorithms that are not well-suited to the MR model, such as transitive closure.  They are more naturally expressed as MPI-like algorithms.

·         If warranted, the ability to replace MR shuffle with a C++ data partition (this could be a discussion thread in its own right).

Moving our processing into native Java for a more seamless MR integration is not an option due to the size and complexity of the code base.

It may be that I am completely wrong about the limitations of the streams interface; if so please tell me why.

john

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
Sent: Wednesday, May 29, 2013 8:34 AM
To: user@hadoop.apache.org
Subject: What else can be built on top of YARN.

Hi all,
I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too.
I am not able to think of any other parallel processing use case that would be useful to built on top of YARN. I though of a lot of use cases that would be beneficial when run in parallel , but again ,we can do those using map only jobs in MR.
Can someone tell me a scenario , where a application can utilize Yarn features or can be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs.
thanks,
Rahul


Re: What else can be built on top of YARN.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Historically, many applications/frameworks wanted to take advantage of just the resource management capabilities and failure handling of Hadoop (via JobTracker/TaskTracker), but were forced to used MapReduce even though they didn't have to. Obvious examples are graph processing (Giraph), BSP(Hama), storm/s4 and even a simple tool like DistCp.

There are issues even with map-only jobs.
 - You have to fake key-value processing, periodic pings, key-value outputs
 - You are limited to map slot capacity in the cluster
 - The number of tasks is static, so you cannot grow and shrink your job
 - You are forced to sort data all the time (even though this has changed recently)
 - You are tied to faking things like OutputCommit even if you don't need to.

That's just for starters. I can definitely think harder and list more ;)

YARN lets you move ahead without those limitations.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/


On May 29, 2013, at 7:34 AM, Rahul Bhattacharjee wrote:

> Hi all,
> 
> I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too.
> 
> I am not able to think of any other parallel processing use case that would be useful to built on top of YARN. I though of a lot of use cases that would be beneficial when run in parallel , but again ,we can do those using map only jobs in MR.
> 
> Can someone tell me a scenario , where a application can utilize Yarn features or can be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs.
> 
> thanks,
> Rahul
> 
> 


Re: What else can be built on top of YARN.

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Rahul,

  I am porting a distributed application that runs on a fixed set of given
resources to YARN, with the aim of  being able to run it on a dynamically
selected resources whichever are available at the time of running the
application.

Thanks,
Kishore


On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi all,
>
> I was going through the motivation behind Yarn. Splitting the
> responsibility of JT is the major concern.Ultimately the base (Yarn) was
> built in a generic way for building other generic distributed applications
> too.
>
> I am not able to think of any other parallel processing use case that
> would be useful to built on top of YARN. I though of a lot of use cases
> that would be beneficial when run in parallel , but again ,we can do those
> using map only jobs in MR.
>
> Can someone tell me a scenario , where a application can utilize Yarn
> features or can be built on top of YARN and at the same time , it cannot be
> done efficiently using MRv2 jobs.
>
> thanks,
> Rahul
>
>
>

Re: What else can be built on top of YARN.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Historically, many applications/frameworks wanted to take advantage of just the resource management capabilities and failure handling of Hadoop (via JobTracker/TaskTracker), but were forced to used MapReduce even though they didn't have to. Obvious examples are graph processing (Giraph), BSP(Hama), storm/s4 and even a simple tool like DistCp.

There are issues even with map-only jobs.
 - You have to fake key-value processing, periodic pings, key-value outputs
 - You are limited to map slot capacity in the cluster
 - The number of tasks is static, so you cannot grow and shrink your job
 - You are forced to sort data all the time (even though this has changed recently)
 - You are tied to faking things like OutputCommit even if you don't need to.

That's just for starters. I can definitely think harder and list more ;)

YARN lets you move ahead without those limitations.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/


On May 29, 2013, at 7:34 AM, Rahul Bhattacharjee wrote:

> Hi all,
> 
> I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too.
> 
> I am not able to think of any other parallel processing use case that would be useful to built on top of YARN. I though of a lot of use cases that would be beneficial when run in parallel , but again ,we can do those using map only jobs in MR.
> 
> Can someone tell me a scenario , where a application can utilize Yarn features or can be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs.
> 
> thanks,
> Rahul
> 
> 


Re: What else can be built on top of YARN.

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Rahul,

  I am porting a distributed application that runs on a fixed set of given
resources to YARN, with the aim of  being able to run it on a dynamically
selected resources whichever are available at the time of running the
application.

Thanks,
Kishore


On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi all,
>
> I was going through the motivation behind Yarn. Splitting the
> responsibility of JT is the major concern.Ultimately the base (Yarn) was
> built in a generic way for building other generic distributed applications
> too.
>
> I am not able to think of any other parallel processing use case that
> would be useful to built on top of YARN. I though of a lot of use cases
> that would be beneficial when run in parallel , but again ,we can do those
> using map only jobs in MR.
>
> Can someone tell me a scenario , where a application can utilize Yarn
> features or can be built on top of YARN and at the same time , it cannot be
> done efficiently using MRv2 jobs.
>
> thanks,
> Rahul
>
>
>

Re: What else can be built on top of YARN.

Posted by Krishna Kishore Bonagiri <wr...@gmail.com>.
Hi Rahul,

  I am porting a distributed application that runs on a fixed set of given
resources to YARN, with the aim of  being able to run it on a dynamically
selected resources whichever are available at the time of running the
application.

Thanks,
Kishore


On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi all,
>
> I was going through the motivation behind Yarn. Splitting the
> responsibility of JT is the major concern.Ultimately the base (Yarn) was
> built in a generic way for building other generic distributed applications
> too.
>
> I am not able to think of any other parallel processing use case that
> would be useful to built on top of YARN. I though of a lot of use cases
> that would be beneficial when run in parallel , but again ,we can do those
> using map only jobs in MR.
>
> Can someone tell me a scenario , where a application can utilize Yarn
> features or can be built on top of YARN and at the same time , it cannot be
> done efficiently using MRv2 jobs.
>
> thanks,
> Rahul
>
>
>

Re: What else can be built on top of YARN.

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Historically, many applications/frameworks wanted to take advantage of just the resource management capabilities and failure handling of Hadoop (via JobTracker/TaskTracker), but were forced to used MapReduce even though they didn't have to. Obvious examples are graph processing (Giraph), BSP(Hama), storm/s4 and even a simple tool like DistCp.

There are issues even with map-only jobs.
 - You have to fake key-value processing, periodic pings, key-value outputs
 - You are limited to map slot capacity in the cluster
 - The number of tasks is static, so you cannot grow and shrink your job
 - You are forced to sort data all the time (even though this has changed recently)
 - You are tied to faking things like OutputCommit even if you don't need to.

That's just for starters. I can definitely think harder and list more ;)

YARN lets you move ahead without those limitations.

HTH
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/


On May 29, 2013, at 7:34 AM, Rahul Bhattacharjee wrote:

> Hi all,
> 
> I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too.
> 
> I am not able to think of any other parallel processing use case that would be useful to built on top of YARN. I though of a lot of use cases that would be beneficial when run in parallel , but again ,we can do those using map only jobs in MR.
> 
> Can someone tell me a scenario , where a application can utilize Yarn features or can be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs.
> 
> thanks,
> Rahul
> 
> 


RE: What else can be built on top of YARN.

Posted by John Lilley <jo...@redpoint.net>.
Rahul,

This is a very good question, and one we are grappling with currently in our application port.  I think there are a lot of legacy data-processing applications like ours which would benefit by a port to Hadoop.  However, because we have a great load of C++, it is not necessarily a good fit for MR.  There seem to be two main choices:

·         Run under Hadoop “streams”

·         Run as a custom ApplicationMaster

One of the selling points of our application is its performance and single-code efficiency.  I have concerns about streams:

·         We will lose performance, because of the extra layers of translation and I/O and because streams data is uncompressed

·         The streams model is limited to single-in, single-out

·         We have a very large number and size of files to make available locally, it is unclear that the -files option is going to recursively copy and cache all of it

In contrast, porting our application as a YARN ApplicationMaster appears to offer several benefits (which come at the expense of extra complexity):

·         Negotiation for container resources and scheduling.  Some of our operations are very heavy (load time and memory use), so they need larger containers and will benefit from larger data splits.

·         Direct access to HDFS via JNI without translation layers.

·         Algorithms that are not well-suited to the MR model, such as transitive closure.  They are more naturally expressed as MPI-like algorithms.

·         If warranted, the ability to replace MR shuffle with a C++ data partition (this could be a discussion thread in its own right).

Moving our processing into native Java for a more seamless MR integration is not an option due to the size and complexity of the code base.

It may be that I am completely wrong about the limitations of the streams interface; if so please tell me why.

john

From: Rahul Bhattacharjee [mailto:rahul.rec.dgp@gmail.com]
Sent: Wednesday, May 29, 2013 8:34 AM
To: user@hadoop.apache.org
Subject: What else can be built on top of YARN.

Hi all,
I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too.
I am not able to think of any other parallel processing use case that would be useful to built on top of YARN. I though of a lot of use cases that would be beneficial when run in parallel , but again ,we can do those using map only jobs in MR.
Can someone tell me a scenario , where a application can utilize Yarn features or can be built on top of YARN and at the same time , it cannot be done efficiently using MRv2 jobs.
thanks,
Rahul