You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Spico Florin <sp...@gmail.com> on 2015/05/27 11:57:56 UTC

Status of running storm on yarn (the yahoo project)

Hello!
I'm interesting in running the storm topologies on yarn.
I was looking at the yahoo project https://github.com/yahoo/storm-yarn, and
I could observed that there is no activity since 7 months ago. Also, the
issues and requests lists are not updated.
Therefore I have some questions:
1. Is there any plan to evolve this project?
2. Is there any plan to integrate this project in the main branch?
3. Is someone using this approach in production ready mode?

I look forward for your answers.
 Regards,
 Florin

RE: Status of running storm on yarn (the yahoo project)

Posted by prasad ch <ch...@outlook.com>.
Hi Nathan,
I want to do real time computation using storm, which one is best storm or trident. i need to handle huge amount of  data , exactly once please help me

Thanks!
Date: Wed, 27 May 2015 12:40:43 -0400
Subject: Re: Status of running storm on yarn (the yahoo project)
From: nathan@nathanmarz.com
To: user@storm.apache.org
CC: evans@yahoo-inc.com; maassql@gmail.com

Mesosphere has official support for Storm on Mesos: https://github.com/mesos/storm
On Wed, May 27, 2015 at 11:14 AM,  <Ra...@dellteam.com> wrote:
Dell - Internal Use - Confidential 
Thanks Bobby, for the detailed answer. So it sounds like ,  it is better not to combine Storm with batch workloads at this point (yarn, mesos or ec2), due to the network saturation and timeout threats. Is this behavior also seen in other streaming frameworks like spark streaming running on YARN. From: Bobby Evans [mailto:evans@yahoo-inc.com] 
Sent: Wednesday, May 27, 2015 9:07 AM
To: Jeffery Maass; user@storm.apache.org
Subject: Re: Status of running storm on yarn (the yahoo project) Mesos is very similar to YARN.  It is a resource scheduler.  Storm in the past had support for mesos, through a separate repo https://github.com/nathanmarz/storm-mesos it might still work with the latest versions of storm.  I don't know.  The concept here is that there was a special layer installed that would look for when the cluster had outstanding requests and not enough resources to meet those requests.  It would then request that many resources from mesos, launch supervisors on those nodes and let the scheduler do the rest.  It works quire well for elasticity at a small scale, or when you have a lot more network bandwidth than you need.  The problem is if mesos, or YARN, or open-stack, or EC2, or ... collocates your storm topology with some big batch job that suddenly saturates the network for a few seconds to a min heartbeats could start to time out, traffic would not flow from one worker to another, etc.  For some topologies all you do is tune your timeouts so workers don't get shot and relaunched too frequently and live with the noise from other stuff happening on the network.  For us though we have some very tight SLAs, if the data is 5 seconds old throw it away I cannot use it any more.   My current goal with storm in this area is to have it be aware of the resources that your topology is using, the SLAs that it has, its desired budget for resources, how far over that budget it is willing to go,  Where it could possibly get other resources if needed (i.e. YARN, Mesos, Open Stack), and any other constraints it might have.  Storm would then take all of this into account and adjust the scheduling of your topology so that it can grow and shrink with the resources it needs to meet the SLAs it has, optionally taking some of those resources from other systems if needed.  This is still a ways out, but looking at the research that is being done in this area it should be doable in the next year or so. - Bobby    On Wednesday, May 27, 2015 8:38 AM, Jeffery Maass <ma...@gmail.com> wrote: I have heard Nathan Marz mention Mesos.How is yarn / storm-yarn / slider-yarn different from Mesos?

These are the links I found to Mesos:
https://github.com/mesos/storm
https://github.com/nathanmarz/storm-mesos
http://mesos.apache.org/Thank you for your time!

+++++++++++++++++++++
Jeff Maass
linkedin.com/in/jeffmaass
stackoverflow.com/users/373418/maassql
+++++++++++++++++++++ On Wed, May 27, 2015 at 8:28 AM, Bobby Evans <ev...@yahoo-inc.com> wrote:storm-yarn was originally done as a proof of concept.  We had plans to take it further, but the amount of work required to make it production ready on a very heavily used cluster was more then we were willing to invest at the time.  Most of that work was around network scheduling, isolation and prioritization, mainly in YARN itself.  There has been some work looking into this, but nothing much has happened with it.  At the same time http://slider.incubator.apache.org/ showed up and is now the preferred way to run Storm on YARN.  To get around the networking issues most people will tag a subset of their cluster, a few racks, and only schedule storm to run on those nodes.  Long term I really would like to revive storm on yarn, and integrate it directly into storm.  Giving storm and the scheduler the ability to request new resources with specific constraints opens up a lot of new possibilities.  If you want to help out, or if anyone else wants to help out with this work, I would be very happy to file some JIRA in open source and help direct what needs to be done. - Bobby   On Wednesday, May 27, 2015 4:59 AM, Spico Florin <sp...@gmail.com> wrote: Hello!I'm interesting in running the storm topologies on yarn. I was looking at the yahoo project https://github.com/yahoo/storm-yarn, and I could observed that there is no activity since 7 months ago. Also, the issues and requests lists are not updated.Therefore I have some questions:1. Is there any plan to evolve this project?2. Is there any plan to integrate this project in the main branch?3. Is someone using this approach in production ready mode? I look forward for your answers. Regards, Florin        

-- 
Twitter: @nathanmarz
http://nathanmarz.com
 		 	   		  

Re: Status of running storm on yarn (the yahoo project)

Posted by Nathan Marz <na...@nathanmarz.com>.
Mesosphere has official support for Storm on Mesos:
https://github.com/mesos/storm

On Wed, May 27, 2015 at 11:14 AM, <Ra...@dellteam.com> wrote:

> *Dell - Internal Use - Confidential *
>
> Thanks Bobby, for the detailed answer.
>
>
>
> So it sounds like ,  it is better not to combine Storm with batch
> workloads at this point (yarn, mesos or ec2), due to the network saturation
> and timeout threats.
>
>
>
> Is this behavior also seen in other streaming frameworks like spark
> streaming running on YARN.
>
>
>
> *From:* Bobby Evans [mailto:evans@yahoo-inc.com]
> *Sent:* Wednesday, May 27, 2015 9:07 AM
> *To:* Jeffery Maass; user@storm.apache.org
> *Subject:* Re: Status of running storm on yarn (the yahoo project)
>
>
>
> Mesos is very similar to YARN.  It is a resource scheduler.  Storm in the
> past had support for mesos, through a separate repo
>
>
>
> https://github.com/nathanmarz/storm-mesos
>
>
>
> it might still work with the latest versions of storm.  I don't know.  The
> concept here is that there was a special layer installed that would look
> for when the cluster had outstanding requests and not enough resources to
> meet those requests.  It would then request that many resources from mesos,
> launch supervisors on those nodes and let the scheduler do the rest.  It
> works quire well for elasticity at a small scale, or when you have a lot
> more network bandwidth than you need.  The problem is if mesos, or YARN, or
> open-stack, or EC2, or ... collocates your storm topology with some big
> batch job that suddenly saturates the network for a few seconds to a min
> heartbeats could start to time out, traffic would not flow from one worker
> to another, etc.  For some topologies all you do is tune your timeouts so
> workers don't get shot and relaunched too frequently and live with the
> noise from other stuff happening on the network.  For us though we have
> some very tight SLAs, if the data is 5 seconds old throw it away I cannot
> use it any more.
>
>
>
> My current goal with storm in this area is to have it be aware of the
> resources that your topology is using, the SLAs that it has, its desired
> budget for resources, how far over that budget it is willing to go,  Where
> it could possibly get other resources if needed (i.e. YARN, Mesos, Open
> Stack), and any other constraints it might have.  Storm would then take all
> of this into account and adjust the scheduling of your topology so that it
> can grow and shrink with the resources it needs to meet the SLAs it has,
> optionally taking some of those resources from other systems if needed.
> This is still a ways out, but looking at the research that is being done in
> this area it should be doable in the next year or so.
>
>
>
> - Bobby
>
>
>
>
>
>
>
> On Wednesday, May 27, 2015 8:38 AM, Jeffery Maass <ma...@gmail.com>
> wrote:
>
>
>
> I have heard Nathan Marz mention Mesos.
>
> How is yarn / storm-yarn / slider-yarn different from Mesos?
>
> These are the links I found to Mesos:
> https://github.com/mesos/storm
> https://github.com/nathanmarz/storm-mesos
> http://mesos.apache.org/
>
>
> Thank you for your time!
>
> +++++++++++++++++++++
> Jeff Maass <ma...@gmail.com>
> linkedin.com/in/jeffmaass
> stackoverflow.com/users/373418/maassql
> +++++++++++++++++++++
>
>
>
> On Wed, May 27, 2015 at 8:28 AM, Bobby Evans <ev...@yahoo-inc.com> wrote:
>
> storm-yarn was originally done as a proof of concept.  We had plans to
> take it further, but the amount of work required to make it production
> ready on a very heavily used cluster was more then we were willing to
> invest at the time.  Most of that work was around network scheduling,
> isolation and prioritization, mainly in YARN itself.  There has been some
> work looking into this, but nothing much has happened with it.  At the same
> time http://slider.incubator.apache.org/ showed up and is now the
> preferred way to run Storm on YARN.  To get around the networking issues
> most people will tag a subset of their cluster, a few racks, and only
> schedule storm to run on those nodes.  Long term I really would like to
> revive storm on yarn, and integrate it directly into storm.  Giving storm
> and the scheduler the ability to request new resources with specific
> constraints opens up a lot of new possibilities.  If you want to help out,
> or if anyone else wants to help out with this work, I would be very happy
> to file some JIRA in open source and help direct what needs to be done.
>
> - Bobby
>
>
>
>
>
> On Wednesday, May 27, 2015 4:59 AM, Spico Florin <sp...@gmail.com>
> wrote:
>
>
>
> Hello!
>
> I'm interesting in running the storm topologies on yarn.
>
> I was looking at the yahoo project https://github.com/yahoo/storm-yarn,
> and I could observed that there is no activity since 7 months ago. Also,
> the issues and requests lists are not updated.
>
> Therefore I have some questions:
>
> 1. Is there any plan to evolve this project?
>
> 2. Is there any plan to integrate this project in the main branch?
>
> 3. Is someone using this approach in production ready mode?
>
>
>
> I look forward for your answers.
>
>  Regards,
>
>  Florin
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



-- 
Twitter: @nathanmarz
http://nathanmarz.com

RE: Status of running storm on yarn (the yahoo project)

Posted by Ra...@DellTeam.com.
Dell - Internal Use - Confidential
Thanks Bobby, for the detailed answer.

So it sounds like ,  it is better not to combine Storm with batch workloads at this point (yarn, mesos or ec2), due to the network saturation and timeout threats.

Is this behavior also seen in other streaming frameworks like spark streaming running on YARN.

From: Bobby Evans [mailto:evans@yahoo-inc.com]
Sent: Wednesday, May 27, 2015 9:07 AM
To: Jeffery Maass; user@storm.apache.org
Subject: Re: Status of running storm on yarn (the yahoo project)

Mesos is very similar to YARN.  It is a resource scheduler.  Storm in the past had support for mesos, through a separate repo

https://github.com/nathanmarz/storm-mesos

it might still work with the latest versions of storm.  I don't know.  The concept here is that there was a special layer installed that would look for when the cluster had outstanding requests and not enough resources to meet those requests.  It would then request that many resources from mesos, launch supervisors on those nodes and let the scheduler do the rest.  It works quire well for elasticity at a small scale, or when you have a lot more network bandwidth than you need.  The problem is if mesos, or YARN, or open-stack, or EC2, or ... collocates your storm topology with some big batch job that suddenly saturates the network for a few seconds to a min heartbeats could start to time out, traffic would not flow from one worker to another, etc.  For some topologies all you do is tune your timeouts so workers don't get shot and relaunched too frequently and live with the noise from other stuff happening on the network.  For us though we have some very tight SLAs, if the data is 5 seconds old throw it away I cannot use it any more.

My current goal with storm in this area is to have it be aware of the resources that your topology is using, the SLAs that it has, its desired budget for resources, how far over that budget it is willing to go,  Where it could possibly get other resources if needed (i.e. YARN, Mesos, Open Stack), and any other constraints it might have.  Storm would then take all of this into account and adjust the scheduling of your topology so that it can grow and shrink with the resources it needs to meet the SLAs it has, optionally taking some of those resources from other systems if needed.  This is still a ways out, but looking at the research that is being done in this area it should be doable in the next year or so.

- Bobby



On Wednesday, May 27, 2015 8:38 AM, Jeffery Maass <ma...@gmail.com>> wrote:

I have heard Nathan Marz mention Mesos.
How is yarn / storm-yarn / slider-yarn different from Mesos?

These are the links I found to Mesos:
https://github.com/mesos/storm
https://github.com/nathanmarz/storm-mesos
http://mesos.apache.org/

Thank you for your time!

+++++++++++++++++++++
Jeff Maass<ma...@gmail.com>
linkedin.com/in/jeffmaass<http://linkedin.com/in/jeffmaass>
stackoverflow.com/users/373418/maassql<http://stackoverflow.com/users/373418/maassql>
+++++++++++++++++++++

On Wed, May 27, 2015 at 8:28 AM, Bobby Evans <ev...@yahoo-inc.com>> wrote:
storm-yarn was originally done as a proof of concept.  We had plans to take it further, but the amount of work required to make it production ready on a very heavily used cluster was more then we were willing to invest at the time.  Most of that work was around network scheduling, isolation and prioritization, mainly in YARN itself.  There has been some work looking into this, but nothing much has happened with it.  At the same time http://slider.incubator.apache.org/ showed up and is now the preferred way to run Storm on YARN.  To get around the networking issues most people will tag a subset of their cluster, a few racks, and only schedule storm to run on those nodes.  Long term I really would like to revive storm on yarn, and integrate it directly into storm.  Giving storm and the scheduler the ability to request new resources with specific constraints opens up a lot of new possibilities.  If you want to help out, or if anyone else wants to help out with this work, I would be very happy to file some JIRA in open source and help direct what needs to be done.
- Bobby


On Wednesday, May 27, 2015 4:59 AM, Spico Florin <sp...@gmail.com>> wrote:

Hello!
I'm interesting in running the storm topologies on yarn.
I was looking at the yahoo project https://github.com/yahoo/storm-yarn, and I could observed that there is no activity since 7 months ago. Also, the issues and requests lists are not updated.
Therefore I have some questions:
1. Is there any plan to evolve this project?
2. Is there any plan to integrate this project in the main branch?
3. Is someone using this approach in production ready mode?

I look forward for your answers.
 Regards,
 Florin









Re: Status of running storm on yarn (the yahoo project)

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
I also developed a prototype/proof-of-concept (read: duck tape and bailer twine) for running Storm on YARN.

I took a slightly different approach than Yahoo’s storm-yarn and Slider which from a high level allow you to spin up a Storm cluster on top of YARN. In my PoC a topology is treated as a single YARN application — you use a specialized `storm jar` command to submit a topology and request the resources that will be dedicated to that topology. Behind the scenes it spins up the necessary resources to run the topology — essentially a Storm cluster where all worker slots, resources, etc. are dedicated to a single topology. That approach makes it easier to deal with things like multi-tenancy.

The way I did it was to develop YARN-aware implementations of the INimbus and ISupervisor interfaces that talked to the YARN resource manager, very similar to the approach Nathan took with storm-mesos. The ultimate goal was to implement elastic scaling of a topology based on demand, SLAs, etc., similar to what Bobby described.

Unfortunately, I haven’t had much time to develop it further, though I hope to revive it at some point in the future.

-Taylor

On May 27, 2015, at 10:06 AM, Bobby Evans <ev...@yahoo-inc.com> wrote:

> Mesos is very similar to YARN.  It is a resource scheduler.  Storm in the past had support for mesos, through a separate repo
> 
> https://github.com/nathanmarz/storm-mesos
> 
> it might still work with the latest versions of storm.  I don't know.  The concept here is that there was a special layer installed that would look for when the cluster had outstanding requests and not enough resources to meet those requests.  It would then request that many resources from mesos, launch supervisors on those nodes and let the scheduler do the rest.  It works quire well for elasticity at a small scale, or when you have a lot more network bandwidth than you need.  The problem is if mesos, or YARN, or open-stack, or EC2, or ... collocates your storm topology with some big batch job that suddenly saturates the network for a few seconds to a min heartbeats could start to time out, traffic would not flow from one worker to another, etc.  For some topologies all you do is tune your timeouts so workers don't get shot and relaunched too frequently and live with the noise from other stuff happening on the network.  For us though we have some very tight SLAs, if the data is 5 seconds old throw it away I cannot use it any more.  
> 
> My current goal with storm in this area is to have it be aware of the resources that your topology is using, the SLAs that it has, its desired budget for resources, how far over that budget it is willing to go,  Where it could possibly get other resources if needed (i.e. YARN, Mesos, Open Stack), and any other constraints it might have.  Storm would then take all of this into account and adjust the scheduling of your topology so that it can grow and shrink with the resources it needs to meet the SLAs it has, optionally taking some of those resources from other systems if needed.  This is still a ways out, but looking at the research that is being done in this area it should be doable in the next year or so.
>  
> - Bobby 
> 
> 
> 
>  
> On Wednesday, May 27, 2015 8:38 AM, Jeffery Maass <ma...@gmail.com> wrote:
> 
> 
> I have heard Nathan Marz mention Mesos.
> 
> How is yarn / storm-yarn / slider-yarn different from Mesos?
> 
> These are the links I found to Mesos:
> https://github.com/mesos/storm
> https://github.com/nathanmarz/storm-mesos
> http://mesos.apache.org/
> 
> 
> Thank you for your time!
> 
> +++++++++++++++++++++
> Jeff Maass
> linkedin.com/in/jeffmaass
> stackoverflow.com/users/373418/maassql
> +++++++++++++++++++++
> 
> 
> On Wed, May 27, 2015 at 8:28 AM, Bobby Evans <ev...@yahoo-inc.com> wrote:
> storm-yarn was originally done as a proof of concept.  We had plans to take it further, but the amount of work required to make it production ready on a very heavily used cluster was more then we were willing to invest at the time.  Most of that work was around network scheduling, isolation and prioritization, mainly in YARN itself.  There has been some work looking into this, but nothing much has happened with it.  At the same time http://slider.incubator.apache.org/ showed up and is now the preferred way to run Storm on YARN.  To get around the networking issues most people will tag a subset of their cluster, a few racks, and only schedule storm to run on those nodes.  Long term I really would like to revive storm on yarn, and integrate it directly into storm.  Giving storm and the scheduler the ability to request new resources with specific constraints opens up a lot of new possibilities.  If you want to help out, or if anyone else wants to help out with this work, I would be very happy to file some JIRA in open source and help direct what needs to be done. 
> - Bobby 
> 
> 
> 
> On Wednesday, May 27, 2015 4:59 AM, Spico Florin <sp...@gmail.com> wrote:
> 
> 
> Hello!
> I'm interesting in running the storm topologies on yarn. 
> I was looking at the yahoo project https://github.com/yahoo/storm-yarn, and I could observed that there is no activity since 7 months ago. Also, the issues and requests lists are not updated.
> Therefore I have some questions:
> 1. Is there any plan to evolve this project?
> 2. Is there any plan to integrate this project in the main branch?
> 3. Is someone using this approach in production ready mode?
> 
> I look forward for your answers.
>  Regards,
>  Florin
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


Re: Status of running storm on yarn (the yahoo project)

Posted by Bobby Evans <ev...@yahoo-inc.com>.
Mesos is very similar to YARN.  It is a resource scheduler.  Storm in the past had support for mesos, through a separate repo
 https://github.com/nathanmarz/storm-mesos
it might still work with the latest versions of storm.  I don't know.  The concept here is that there was a special layer installed that would look for when the cluster had outstanding requests and not enough resources to meet those requests.  It would then request that many resources from mesos, launch supervisors on those nodes and let the scheduler do the rest.  It works quire well for elasticity at a small scale, or when you have a lot more network bandwidth than you need.  The problem is if mesos, or YARN, or open-stack, or EC2, or ... collocates your storm topology with some big batch job that suddenly saturates the network for a few seconds to a min heartbeats could start to time out, traffic would not flow from one worker to another, etc.  For some topologies all you do is tune your timeouts so workers don't get shot and relaunched too frequently and live with the noise from other stuff happening on the network.  For us though we have some very tight SLAs, if the data is 5 seconds old throw it away I cannot use it any more.  

My current goal with storm in this area is to have it be aware of the resources that your topology is using, the SLAs that it has, its desired budget for resources, how far over that budget it is willing to go,  Where it could possibly get other resources if needed (i.e. YARN, Mesos, Open Stack), and any other constraints it might have.  Storm would then take all of this into account and adjust the scheduling of your topology so that it can grow and shrink with the resources it needs to meet the SLAs it has, optionally taking some of those resources from other systems if needed.  This is still a ways out, but looking at the research that is being done in this area it should be doable in the next year or so.
 - Bobby
 


      On Wednesday, May 27, 2015 8:38 AM, Jeffery Maass <ma...@gmail.com> wrote:
   

 I have heard Nathan Marz mention Mesos.

How is yarn / storm-yarn / slider-yarn different from Mesos?

These are the links I found to Mesos:
https://github.com/mesos/storm
https://github.com/nathanmarz/storm-mesos
http://mesos.apache.org/


Thank you for your time!

+++++++++++++++++++++
Jeff Maass
linkedin.com/in/jeffmaass
stackoverflow.com/users/373418/maassql
+++++++++++++++++++++


On Wed, May 27, 2015 at 8:28 AM, Bobby Evans <ev...@yahoo-inc.com> wrote:

storm-yarn was originally done as a proof of concept.  We had plans to take it further, but the amount of work required to make it production ready on a very heavily used cluster was more then we were willing to invest at the time.  Most of that work was around network scheduling, isolation and prioritization, mainly in YARN itself.  There has been some work looking into this, but nothing much has happened with it.  At the same time http://slider.incubator.apache.org/ showed up and is now the preferred way to run Storm on YARN.  To get around the networking issues most people will tag a subset of their cluster, a few racks, and only schedule storm to run on those nodes.  Long term I really would like to revive storm on yarn, and integrate it directly into storm.  Giving storm and the scheduler the ability to request new resources with specific constraints opens up a lot of new possibilities.  If you want to help out, or if anyone else wants to help out with this work, I would be very happy to file some JIRA in open source and help direct what needs to be done. 
- Bobby
 


     On Wednesday, May 27, 2015 4:59 AM, Spico Florin <sp...@gmail.com> wrote:
   

 Hello!I'm interesting in running the storm topologies on yarn. I was looking at the yahoo project https://github.com/yahoo/storm-yarn, and I could observed that there is no activity since 7 months ago. Also, the issues and requests lists are not updated.Therefore I have some questions:1. Is there any plan to evolve this project?2. Is there any plan to integrate this project in the main branch?3. Is someone using this approach in production ready mode?
I look forward for your answers. Regards, Florin






   



  

Re: Status of running storm on yarn (the yahoo project)

Posted by Jeffery Maass <ma...@gmail.com>.
I have heard Nathan Marz mention Mesos.

How is yarn / storm-yarn / slider-yarn different from Mesos?

These are the links I found to Mesos:
https://github.com/mesos/storm
https://github.com/nathanmarz/storm-mesos
http://mesos.apache.org/


Thank you for your time!

+++++++++++++++++++++
Jeff Maass <ma...@gmail.com>
linkedin.com/in/jeffmaass
stackoverflow.com/users/373418/maassql
+++++++++++++++++++++


On Wed, May 27, 2015 at 8:28 AM, Bobby Evans <ev...@yahoo-inc.com> wrote:

> storm-yarn was originally done as a proof of concept.  We had plans to
> take it further, but the amount of work required to make it production
> ready on a very heavily used cluster was more then we were willing to
> invest at the time.  Most of that work was around network scheduling,
> isolation and prioritization, mainly in YARN itself.  There has been some
> work looking into this, but nothing much has happened with it.  At the same
> time http://slider.incubator.apache.org/ showed up and is now the
> preferred way to run Storm on YARN.  To get around the networking issues
> most people will tag a subset of their cluster, a few racks, and only
> schedule storm to run on those nodes.  Long term I really would like to
> revive storm on yarn, and integrate it directly into storm.  Giving storm
> and the scheduler the ability to request new resources with specific
> constraints opens up a lot of new possibilities.  If you want to help out,
> or if anyone else wants to help out with this work, I would be very happy
> to file some JIRA in open source and help direct what needs to be done.
> - Bobby
>
>
>
>   On Wednesday, May 27, 2015 4:59 AM, Spico Florin <sp...@gmail.com>
> wrote:
>
>
> Hello!
> I'm interesting in running the storm topologies on yarn.
> I was looking at the yahoo project https://github.com/yahoo/storm-yarn,
> and I could observed that there is no activity since 7 months ago. Also,
> the issues and requests lists are not updated.
> Therefore I have some questions:
> 1. Is there any plan to evolve this project?
> 2. Is there any plan to integrate this project in the main branch?
> 3. Is someone using this approach in production ready mode?
>
> I look forward for your answers.
>  Regards,
>  Florin
>
>
>
>
>
>
>
>

Re: Status of running storm on yarn (the yahoo project)

Posted by Bobby Evans <ev...@yahoo-inc.com>.
storm-yarn was originally done as a proof of concept.  We had plans to take it further, but the amount of work required to make it production ready on a very heavily used cluster was more then we were willing to invest at the time.  Most of that work was around network scheduling, isolation and prioritization, mainly in YARN itself.  There has been some work looking into this, but nothing much has happened with it.  At the same time http://slider.incubator.apache.org/ showed up and is now the preferred way to run Storm on YARN.  To get around the networking issues most people will tag a subset of their cluster, a few racks, and only schedule storm to run on those nodes.  Long term I really would like to revive storm on yarn, and integrate it directly into storm.  Giving storm and the scheduler the ability to request new resources with specific constraints opens up a lot of new possibilities.  If you want to help out, or if anyone else wants to help out with this work, I would be very happy to file some JIRA in open source and help direct what needs to be done. 
- Bobby
 


     On Wednesday, May 27, 2015 4:59 AM, Spico Florin <sp...@gmail.com> wrote:
   

 Hello!I'm interesting in running the storm topologies on yarn. I was looking at the yahoo project https://github.com/yahoo/storm-yarn, and I could observed that there is no activity since 7 months ago. Also, the issues and requests lists are not updated.Therefore I have some questions:1. Is there any plan to evolve this project?2. Is there any plan to integrate this project in the main branch?3. Is someone using this approach in production ready mode?
I look forward for your answers. Regards, Florin