You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@storm.apache.org by Jerry Peng <je...@gmail.com> on 2016/01/13 06:56:48 UTC

JStorm CGroup

Hello everyone,

This question is directed more towards the people that worked on
JStorm.  If I recall correctly JStorm offers some sort of resource
isolation through CGroups.  What kind of support does JStorm offer for
resource isolation? Can someone elaborate on this feature in JStorm.

Best,

Jerry

Re: JStorm CGroup

Posted by "Boyang(Jerry) Peng" <je...@yahoo-inc.com.INVALID>.

Hello Everyone,
Currently at Yahoo, we want to enable the Resource Aware Scheduler we built to have cgroup support. The CGroup code that is part of JStorm looks good and perhaps we can modify it slightly so that the Resource Aware Scheduler can interact with it. What I would like to do is modify the CGroup code that already exists in JStorm to be able to start jvm workers that is limited to the amount of resources that the resource aware scheduler has allocated for that worker and move it to Storm. I would like to have a discussion (especially with people that worked on JStorm) about how we can integrate support for the resource aware scheduler into the existing CGroups code. Also, I know the folks at Alibaba is working on converting the supervisor.clj to java which is tied to launching workers and in the future would include CGoups. What is the status of that?
Best,
Boyang Jerry Peng

    On Thursday, January 14, 2016 9:25 AM, Bobby Evans <ev...@yahoo-inc.com.INVALID> wrote:

 I would love to see true support for mesos, YARN, openstack, etc. added, but I also see stand alone mode offering a lot more flexibility, especially in the area of scheduling, than a two level scheduler can currently offer.  It is on my roadmap to look into after the JStorm migration (just started), Resource Aware Scheduling (almost done needs testing and better isolation), and adding in automatic elasticity around topology specified SLAs (working with a few researchers around some prototypes in this area).

To be able to support running on other cluster technologies in a proper way we need to provide plugability in a few different places.
First we need a way for a scheduler/cluster to request topology specific dedicated resources, and for nimbus to provision, manage, monitor, and ideally resize (for elasticity) those resources.  With security and resource aware scheduling, we need these external requests to be on a per topology bases, not bolted on like they are now.  This would also necessitate the schedulers being updated so that they could take advantage of these new APIs requesting external resources either when a topology explicitly asks to be on a given external resource, or optionally when dedicated resources are no longer available and the topology has specified the proper configurations/credentials to allow it to run using those external resources.

That handles scheduling, but there are some additional features that storm offers which other systems don't yet offer, and many never will.  For example the storm blob store API is similar to the dist cache in YARN, but it we can do in place replacement without relaunching.  We also favor fast fail and I don't think all of these types of clusters will nor should offer the process monitoring and re-spawning needed for it.  As such we would need some sort of a supervisor that would also run under YARN/mesos, etc to provide this extra functionality.  I have not totally thought about all of what it would need from a plugability standpoint to make that work.  There is also the logviewer which does more then just logs, so we would need some pluggable way to be able to point people to where their logs/artifacts are, and to monitor the resource usage of the logs (perhaps that part should move off to the supervisor). All of that seems like a lot more work compared to providing a pluggable interface in the supervisor that would allow for it to provision, manage, monitor, and again possibly resize, local workers.  In fact I see a lot of potential overlap between the two of them and the pluggability that would be needed in the supervisor for running on mesos, YARN, etc.

- Bobby 

    On Thursday, January 14, 2016 12:39 AM, Erik Weathers <ew...@groupon.com.INVALID> wrote:

 Perhaps rather than just bolting on "cgroup support", we could instead open
a dialogue about having Mesos support be a core feature of Storm.

The current integration is a bit unwieldy & hackish at the moment, arising
from the conflicting natures of Mesos and Storm w.r.t. scheduling of
resources.  i.e., Storm assumes you have existing "slots" for running
workers on, whereas Mesos is more dynamic, requiring frameworks that run on
top of it to tell Mesos just how many resources (CPUs, Memory, etc.) are
needed by the framework's tasks.

One example of an issue with Storm-on-Mesos:  the Storm logviewer is
completely busted when you are using Mesos, I filed a ticket with a
description of the issue and proposed modifications to allow it to function:

  - https://issues.apache.org/jira/browse/STORM-1342

Furthermore, there are fundamental behaviors in Storm that don't mesh well
with Mesos:

  - the interfaces of INimbus (allSlotsAvailableForScheduling(),
  assignSlots(), getForcedScheduler(), etc.) make it difficult to create an
  ideal Mesos integration framework, since they don't allow the Mesos
  integration code to *really* know what's going on from the Nimbus's
  perspective. e.g.,
      - knowing which topologies & how many workers need to be scheduled at
      any given moment.
      - since the integration code cannot know what is actually needed to
      be run when it receives offers from Mesos, it just hoards those offers,
      leading to resource starvation in the Mesos cluster.
  - the "fallback" behavior of allowing the topology to settle for having
  less worker processes than requested should be disable-able.  For carefully
  tuned topologies it is quite bad to run on less than the expected number of
  worker processes.
      - also, this behavior endangers the idea of having the Mesos
      integration code *only* hoard Mesos offers after a successful round-trip
      through the allSlotsAvailableForScheduling() polling calls (i.e., only
      hoard when we know there are pending topologies).  It's dangerous because
      while we wait for another call to allSlotsAvailableForScheduling(), the
      Nimbus may have decided that it's okie dokie to use less than
the requested
      number of worker processes.

I'm sure there are other issues that I can conjure up, but those are the
major ones that came to mind instantly.  I'm happy to explain more about
this, since I realize the above bulleted info may lack context.

I wish I knew something about how Twitter's new Heron project addresses the
concerns above since it comes with Mesos support out-of-the-box, but it's
unclear at this point what they're doing until they open source it.

Thanks!

- Erik

On Wed, Jan 13, 2016 at 6:27 PM, 刘键(Basti Liu) <ba...@alibaba-inc.com>
wrote:

> Hi Bobby & Jerry,
>
> Yes, JStorm implements generic cgroup support. But just only cpu control
> is enable when starting worker.
>
> Regards
> Basti
>
> -----Original Message-----
> From: Bobby Evans [mailto:evans@yahoo-inc.com.INVALID]
> Sent: Wednesday, January 13, 2016 11:14 PM
> To: dev@storm.apache.org
> Subject: Re: JStorm CGroup
>
> Jerry,
> I think most of the code you are going to want to look at is here
> https://github.com/apache/storm/blob/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/daemon/supervisor/CgroupManager.java
> The back end for most of it seems to come from
>
>
> https://github.com/apache/storm/tree/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/container
>
> Which looks like it implements a somewhat generic cgroup support.
>  - Bobby
>
>    On Wednesday, January 13, 2016 1:34 AM, 刘键(Basti Liu) <
> basti.lj@alibaba-inc.com> wrote:
>
>
>  Hi Jerry,
>
> Currently, JStorm supports to control the upper limit of cpu time for a
> worker by cpu.cfs_period_us & cpu.cfs_quota_us in cgroup.
> e.g. cpu.cfs_period_us= 100000, cpu.cfs_quota_us=3*100000. Cgroup will
> limit the corresponding process to occupy at most 300% cpu (3 cores).
>
> Regards
> Basti
>
> -----Original Message-----
> From: Jerry Peng [mailto:jerry.boyang.peng@gmail.com]
> Sent: Wednesday, January 13, 2016 1:57 PM
> To: dev@storm.apache.org
> Subject: JStorm CGroup
>
> Hello everyone,
>
> This question is directed more towards the people that worked on JStorm.
> If I recall correctly JStorm offers some sort of resource isolation through
> CGroups.  What kind of support does JStorm offer for resource isolation?
> Can someone elaborate on this feature in JStorm.
>
> Best,
>
> Jerry
>
>
>
>
>

Re: JStorm CGroup

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

Yes I am familiar with the Mesos API too.  The problem of how long it takes for resources to be available is fairly common on both YARN and mesos.   Neither of them offer gang scheduling either, so no matter what it is going to require special support from the scheduler, and it will probably also require some compromises in the quality of the scheduling.
 - Bobby 

    On Wednesday, January 27, 2016 5:42 PM, Erik Weathers <ew...@groupon.com.INVALID> wrote:
 

 Thanks for the detailed response Bobby.

Please include me in discussions about the pluggable interfaces when you
get to that step, as I can provide some insight from working on the
Storm-to-Mesos integration for awhile.  e.g., mesos client applications
("frameworks") don't "request external resources" like they do in YARN,
instead they wait for mesos to offer them resources.  So that behavior
difference has implications for the Nimbus scheduler (e.g., it shouldn't
assume all potential resources are present when getting the available
slots, as it might take some time for all available resources to percolate
from mesos).

That being said, as long as the "RAS" (resource aware scheduler) + cgroups
feature is allowing for dynamically partitioned per-topology-declared
resources across a cluster of hosts, then it sounds like a vast improvement
for multi-tenancy in native Storm.  (i.e., the "isolation scheduler" of
storm-0.8.2 and the "multi-tenant scheduler with resource limits" of
storm-0.10.0 are too static, isolating topologies at a host level instead
of allowing for individual topologies to declare the amount of CPU/memory
resources they need).

Thanks!

- Erik

On Thu, Jan 14, 2016 at 6:53 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> I would love to see true support for mesos, YARN, openstack, etc. added,
> but I also see stand alone mode offering a lot more flexibility, especially
> in the area of scheduling, than a two level scheduler can currently offer.
> It is on my roadmap to look into after the JStorm migration (just started),
> Resource Aware Scheduling (almost done needs testing and better isolation),
> and adding in automatic elasticity around topology specified SLAs (working
> with a few researchers around some prototypes in this area).
>
> To be able to support running on other cluster technologies in a proper
> way we need to provide plugability in a few different places.
> First we need a way for a scheduler/cluster to request topology specific
> dedicated resources, and for nimbus to provision, manage, monitor, and
> ideally resize (for elasticity) those resources.  With security and
> resource aware scheduling, we need these external requests to be on a per
> topology bases, not bolted on like they are now.  This would also
> necessitate the schedulers being updated so that they could take advantage
> of these new APIs requesting external resources either when a topology
> explicitly asks to be on a given external resource, or optionally when
> dedicated resources are no longer available and the topology has specified
> the proper configurations/credentials to allow it to run using those
> external resources.
>
> That handles scheduling, but there are some additional features that storm
> offers which other systems don't yet offer, and many never will.  For
> example the storm blob store API is similar to the dist cache in YARN, but
> it we can do in place replacement without relaunching.  We also favor fast
> fail and I don't think all of these types of clusters will nor should offer
> the process monitoring and re-spawning needed for it.  As such we would
> need some sort of a supervisor that would also run under YARN/mesos, etc to
> provide this extra functionality.  I have not totally thought about all of
> what it would need from a plugability standpoint to make that work.  There
> is also the logviewer which does more then just logs, so we would need some
> pluggable way to be able to point people to where their logs/artifacts are,
> and to monitor the resource usage of the logs (perhaps that part should
> move off to the supervisor). All of that seems like a lot more work
> compared to providing a pluggable interface in the supervisor that would
> allow for it to provision, manage, monitor, and again possibly resize,
> local workers.  In fact I see a lot of potential overlap between the two of
> them and the pluggability that would be needed in the supervisor for
> running on mesos, YARN, etc.
>
> - Bobby
>
>    On Thursday, January 14, 2016 12:39 AM, Erik Weathers
> <ew...@groupon.com.INVALID> wrote:
>
>
>  Perhaps rather than just bolting on "cgroup support", we could instead
> open
> a dialogue about having Mesos support be a core feature of Storm.
>
> The current integration is a bit unwieldy & hackish at the moment, arising
> from the conflicting natures of Mesos and Storm w.r.t. scheduling of
> resources.  i.e., Storm assumes you have existing "slots" for running
> workers on, whereas Mesos is more dynamic, requiring frameworks that run on
> top of it to tell Mesos just how many resources (CPUs, Memory, etc.) are
> needed by the framework's tasks.
>
> One example of an issue with Storm-on-Mesos:  the Storm logviewer is
> completely busted when you are using Mesos, I filed a ticket with a
> description of the issue and proposed modifications to allow it to
> function:
>
>  - https://issues.apache.org/jira/browse/STORM-1342
>
> Furthermore, there are fundamental behaviors in Storm that don't mesh well
> with Mesos:
>
>  - the interfaces of INimbus (allSlotsAvailableForScheduling(),
>  assignSlots(), getForcedScheduler(), etc.) make it difficult to create an
>  ideal Mesos integration framework, since they don't allow the Mesos
>  integration code to *really* know what's going on from the Nimbus's
>  perspective. e.g.,
>      - knowing which topologies & how many workers need to be scheduled at
>      any given moment.
>      - since the integration code cannot know what is actually needed to
>      be run when it receives offers from Mesos, it just hoards those
> offers,
>      leading to resource starvation in the Mesos cluster.
>  - the "fallback" behavior of allowing the topology to settle for having
>  less worker processes than requested should be disable-able.  For
> carefully
>  tuned topologies it is quite bad to run on less than the expected number
> of
>  worker processes.
>      - also, this behavior endangers the idea of having the Mesos
>      integration code *only* hoard Mesos offers after a successful
> round-trip
>      through the allSlotsAvailableForScheduling() polling calls (i.e.,
> only
>      hoard when we know there are pending topologies).  It's dangerous
> because
>      while we wait for another call to allSlotsAvailableForScheduling(),
> the
>      Nimbus may have decided that it's okie dokie to use less than
> the requested
>      number of worker processes.
>
> I'm sure there are other issues that I can conjure up, but those are the
> major ones that came to mind instantly.  I'm happy to explain more about
> this, since I realize the above bulleted info may lack context.
>
> I wish I knew something about how Twitter's new Heron project addresses the
> concerns above since it comes with Mesos support out-of-the-box, but it's
> unclear at this point what they're doing until they open source it.
>
> Thanks!
>
> - Erik
>
> On Wed, Jan 13, 2016 at 6:27 PM, 刘键(Basti Liu) <ba...@alibaba-inc.com>
> wrote:
>
> > Hi Bobby & Jerry,
> >
> > Yes, JStorm implements generic cgroup support. But just only cpu control
> > is enable when starting worker.
> >
> > Regards
> > Basti
> >
> > -----Original Message-----
> > From: Bobby Evans [mailto:evans@yahoo-inc.com.INVALID]
> > Sent: Wednesday, January 13, 2016 11:14 PM
> > To: dev@storm.apache.org
> > Subject: Re: JStorm CGroup
> >
> > Jerry,
> > I think most of the code you are going to want to look at is here
> >
> https://github.com/apache/storm/blob/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/daemon/supervisor/CgroupManager.java
> > The back end for most of it seems to come from
> >
> >
> >
> https://github.com/apache/storm/tree/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/container
> >
> > Which looks like it implements a somewhat generic cgroup support.
> >  - Bobby
> >
> >    On Wednesday, January 13, 2016 1:34 AM, 刘键(Basti Liu) <
> > basti.lj@alibaba-inc.com> wrote:
> >
> >
> >  Hi Jerry,
> >
> > Currently, JStorm supports to control the upper limit of cpu time for a
> > worker by cpu.cfs_period_us & cpu.cfs_quota_us in cgroup.
> > e.g. cpu.cfs_period_us= 100000, cpu.cfs_quota_us=3*100000. Cgroup will
> > limit the corresponding process to occupy at most 300% cpu (3 cores).
> >
> > Regards
> > Basti
> >
> > -----Original Message-----
> > From: Jerry Peng [mailto:jerry.boyang.peng@gmail.com]
> > Sent: Wednesday, January 13, 2016 1:57 PM
> > To: dev@storm.apache.org
> > Subject: JStorm CGroup
> >
> > Hello everyone,
> >
> > This question is directed more towards the people that worked on JStorm.
> > If I recall correctly JStorm offers some sort of resource isolation
> through
> > CGroups.  What kind of support does JStorm offer for resource isolation?
> > Can someone elaborate on this feature in JStorm.
> >
> > Best,
> >
> > Jerry
> >
> >
> >
> >
> >
>
>
>

Re: JStorm CGroup

Posted by Erik Weathers <ew...@groupon.com.INVALID>.

Thanks for the detailed response Bobby.

Please include me in discussions about the pluggable interfaces when you
get to that step, as I can provide some insight from working on the
Storm-to-Mesos integration for awhile.  e.g., mesos client applications
("frameworks") don't "request external resources" like they do in YARN,
instead they wait for mesos to offer them resources.  So that behavior
difference has implications for the Nimbus scheduler (e.g., it shouldn't
assume all potential resources are present when getting the available
slots, as it might take some time for all available resources to percolate
from mesos).

That being said, as long as the "RAS" (resource aware scheduler) + cgroups
feature is allowing for dynamically partitioned per-topology-declared
resources across a cluster of hosts, then it sounds like a vast improvement
for multi-tenancy in native Storm.  (i.e., the "isolation scheduler" of
storm-0.8.2 and the "multi-tenant scheduler with resource limits" of
storm-0.10.0 are too static, isolating topologies at a host level instead
of allowing for individual topologies to declare the amount of CPU/memory
resources they need).

Thanks!

- Erik

On Thu, Jan 14, 2016 at 6:53 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> I would love to see true support for mesos, YARN, openstack, etc. added,
> but I also see stand alone mode offering a lot more flexibility, especially
> in the area of scheduling, than a two level scheduler can currently offer.
> It is on my roadmap to look into after the JStorm migration (just started),
> Resource Aware Scheduling (almost done needs testing and better isolation),
> and adding in automatic elasticity around topology specified SLAs (working
> with a few researchers around some prototypes in this area).
>
> To be able to support running on other cluster technologies in a proper
> way we need to provide plugability in a few different places.
> First we need a way for a scheduler/cluster to request topology specific
> dedicated resources, and for nimbus to provision, manage, monitor, and
> ideally resize (for elasticity) those resources.  With security and
> resource aware scheduling, we need these external requests to be on a per
> topology bases, not bolted on like they are now.  This would also
> necessitate the schedulers being updated so that they could take advantage
> of these new APIs requesting external resources either when a topology
> explicitly asks to be on a given external resource, or optionally when
> dedicated resources are no longer available and the topology has specified
> the proper configurations/credentials to allow it to run using those
> external resources.
>
> That handles scheduling, but there are some additional features that storm
> offers which other systems don't yet offer, and many never will.  For
> example the storm blob store API is similar to the dist cache in YARN, but
> it we can do in place replacement without relaunching.  We also favor fast
> fail and I don't think all of these types of clusters will nor should offer
> the process monitoring and re-spawning needed for it.  As such we would
> need some sort of a supervisor that would also run under YARN/mesos, etc to
> provide this extra functionality.  I have not totally thought about all of
> what it would need from a plugability standpoint to make that work.  There
> is also the logviewer which does more then just logs, so we would need some
> pluggable way to be able to point people to where their logs/artifacts are,
> and to monitor the resource usage of the logs (perhaps that part should
> move off to the supervisor). All of that seems like a lot more work
> compared to providing a pluggable interface in the supervisor that would
> allow for it to provision, manage, monitor, and again possibly resize,
> local workers.  In fact I see a lot of potential overlap between the two of
> them and the pluggability that would be needed in the supervisor for
> running on mesos, YARN, etc.
>
> - Bobby
>
>     On Thursday, January 14, 2016 12:39 AM, Erik Weathers
> <ew...@groupon.com.INVALID> wrote:
>
>
>  Perhaps rather than just bolting on "cgroup support", we could instead
> open
> a dialogue about having Mesos support be a core feature of Storm.
>
> The current integration is a bit unwieldy & hackish at the moment, arising
> from the conflicting natures of Mesos and Storm w.r.t. scheduling of
> resources.  i.e., Storm assumes you have existing "slots" for running
> workers on, whereas Mesos is more dynamic, requiring frameworks that run on
> top of it to tell Mesos just how many resources (CPUs, Memory, etc.) are
> needed by the framework's tasks.
>
> One example of an issue with Storm-on-Mesos:  the Storm logviewer is
> completely busted when you are using Mesos, I filed a ticket with a
> description of the issue and proposed modifications to allow it to
> function:
>
>   - https://issues.apache.org/jira/browse/STORM-1342
>
> Furthermore, there are fundamental behaviors in Storm that don't mesh well
> with Mesos:
>
>   - the interfaces of INimbus (allSlotsAvailableForScheduling(),
>   assignSlots(), getForcedScheduler(), etc.) make it difficult to create an
>   ideal Mesos integration framework, since they don't allow the Mesos
>   integration code to *really* know what's going on from the Nimbus's
>   perspective. e.g.,
>       - knowing which topologies & how many workers need to be scheduled at
>       any given moment.
>       - since the integration code cannot know what is actually needed to
>       be run when it receives offers from Mesos, it just hoards those
> offers,
>       leading to resource starvation in the Mesos cluster.
>   - the "fallback" behavior of allowing the topology to settle for having
>   less worker processes than requested should be disable-able.  For
> carefully
>   tuned topologies it is quite bad to run on less than the expected number
> of
>   worker processes.
>       - also, this behavior endangers the idea of having the Mesos
>       integration code *only* hoard Mesos offers after a successful
> round-trip
>       through the allSlotsAvailableForScheduling() polling calls (i.e.,
> only
>       hoard when we know there are pending topologies).  It's dangerous
> because
>       while we wait for another call to allSlotsAvailableForScheduling(),
> the
>       Nimbus may have decided that it's okie dokie to use less than
> the requested
>       number of worker processes.
>
> I'm sure there are other issues that I can conjure up, but those are the
> major ones that came to mind instantly.  I'm happy to explain more about
> this, since I realize the above bulleted info may lack context.
>
> I wish I knew something about how Twitter's new Heron project addresses the
> concerns above since it comes with Mesos support out-of-the-box, but it's
> unclear at this point what they're doing until they open source it.
>
> Thanks!
>
> - Erik
>
> On Wed, Jan 13, 2016 at 6:27 PM, 刘键(Basti Liu) <ba...@alibaba-inc.com>
> wrote:
>
> > Hi Bobby & Jerry,
> >
> > Yes, JStorm implements generic cgroup support. But just only cpu control
> > is enable when starting worker.
> >
> > Regards
> > Basti
> >
> > -----Original Message-----
> > From: Bobby Evans [mailto:evans@yahoo-inc.com.INVALID]
> > Sent: Wednesday, January 13, 2016 11:14 PM
> > To: dev@storm.apache.org
> > Subject: Re: JStorm CGroup
> >
> > Jerry,
> > I think most of the code you are going to want to look at is here
> >
> https://github.com/apache/storm/blob/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/daemon/supervisor/CgroupManager.java
> > The back end for most of it seems to come from
> >
> >
> >
> https://github.com/apache/storm/tree/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/container
> >
> > Which looks like it implements a somewhat generic cgroup support.
> >  - Bobby
> >
> >    On Wednesday, January 13, 2016 1:34 AM, 刘键(Basti Liu) <
> > basti.lj@alibaba-inc.com> wrote:
> >
> >
> >  Hi Jerry,
> >
> > Currently, JStorm supports to control the upper limit of cpu time for a
> > worker by cpu.cfs_period_us & cpu.cfs_quota_us in cgroup.
> > e.g. cpu.cfs_period_us= 100000, cpu.cfs_quota_us=3*100000. Cgroup will
> > limit the corresponding process to occupy at most 300% cpu (3 cores).
> >
> > Regards
> > Basti
> >
> > -----Original Message-----
> > From: Jerry Peng [mailto:jerry.boyang.peng@gmail.com]
> > Sent: Wednesday, January 13, 2016 1:57 PM
> > To: dev@storm.apache.org
> > Subject: JStorm CGroup
> >
> > Hello everyone,
> >
> > This question is directed more towards the people that worked on JStorm.
> > If I recall correctly JStorm offers some sort of resource isolation
> through
> > CGroups.  What kind of support does JStorm offer for resource isolation?
> > Can someone elaborate on this feature in JStorm.
> >
> > Best,
> >
> > Jerry
> >
> >
> >
> >
> >
>
>
>

Re: JStorm CGroup

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

I would love to see true support for mesos, YARN, openstack, etc. added, but I also see stand alone mode offering a lot more flexibility, especially in the area of scheduling, than a two level scheduler can currently offer.  It is on my roadmap to look into after the JStorm migration (just started), Resource Aware Scheduling (almost done needs testing and better isolation), and adding in automatic elasticity around topology specified SLAs (working with a few researchers around some prototypes in this area).

To be able to support running on other cluster technologies in a proper way we need to provide plugability in a few different places.
First we need a way for a scheduler/cluster to request topology specific dedicated resources, and for nimbus to provision, manage, monitor, and ideally resize (for elasticity) those resources.  With security and resource aware scheduling, we need these external requests to be on a per topology bases, not bolted on like they are now.  This would also necessitate the schedulers being updated so that they could take advantage of these new APIs requesting external resources either when a topology explicitly asks to be on a given external resource, or optionally when dedicated resources are no longer available and the topology has specified the proper configurations/credentials to allow it to run using those external resources.

That handles scheduling, but there are some additional features that storm offers which other systems don't yet offer, and many never will.  For example the storm blob store API is similar to the dist cache in YARN, but it we can do in place replacement without relaunching.  We also favor fast fail and I don't think all of these types of clusters will nor should offer the process monitoring and re-spawning needed for it.  As such we would need some sort of a supervisor that would also run under YARN/mesos, etc to provide this extra functionality.  I have not totally thought about all of what it would need from a plugability standpoint to make that work.  There is also the logviewer which does more then just logs, so we would need some pluggable way to be able to point people to where their logs/artifacts are, and to monitor the resource usage of the logs (perhaps that part should move off to the supervisor). All of that seems like a lot more work compared to providing a pluggable interface in the supervisor that would allow for it to provision, manage, monitor, and again possibly resize, local workers.  In fact I see a lot of potential overlap between the two of them and the pluggability that would be needed in the supervisor for running on mesos, YARN, etc.

- Bobby 

    On Thursday, January 14, 2016 12:39 AM, Erik Weathers <ew...@groupon.com.INVALID> wrote:

 Perhaps rather than just bolting on "cgroup support", we could instead open
a dialogue about having Mesos support be a core feature of Storm.

The current integration is a bit unwieldy & hackish at the moment, arising
from the conflicting natures of Mesos and Storm w.r.t. scheduling of
resources.  i.e., Storm assumes you have existing "slots" for running
workers on, whereas Mesos is more dynamic, requiring frameworks that run on
top of it to tell Mesos just how many resources (CPUs, Memory, etc.) are
needed by the framework's tasks.

One example of an issue with Storm-on-Mesos:  the Storm logviewer is
completely busted when you are using Mesos, I filed a ticket with a
description of the issue and proposed modifications to allow it to function:

  - https://issues.apache.org/jira/browse/STORM-1342

Furthermore, there are fundamental behaviors in Storm that don't mesh well
with Mesos:

  - the interfaces of INimbus (allSlotsAvailableForScheduling(),
  assignSlots(), getForcedScheduler(), etc.) make it difficult to create an
  ideal Mesos integration framework, since they don't allow the Mesos
  integration code to *really* know what's going on from the Nimbus's
  perspective. e.g.,
      - knowing which topologies & how many workers need to be scheduled at
      any given moment.
      - since the integration code cannot know what is actually needed to
      be run when it receives offers from Mesos, it just hoards those offers,
      leading to resource starvation in the Mesos cluster.
  - the "fallback" behavior of allowing the topology to settle for having
  less worker processes than requested should be disable-able.  For carefully
  tuned topologies it is quite bad to run on less than the expected number of
  worker processes.
      - also, this behavior endangers the idea of having the Mesos
      integration code *only* hoard Mesos offers after a successful round-trip
      through the allSlotsAvailableForScheduling() polling calls (i.e., only
      hoard when we know there are pending topologies).  It's dangerous because
      while we wait for another call to allSlotsAvailableForScheduling(), the
      Nimbus may have decided that it's okie dokie to use less than
the requested
      number of worker processes.

I'm sure there are other issues that I can conjure up, but those are the
major ones that came to mind instantly.  I'm happy to explain more about
this, since I realize the above bulleted info may lack context.

I wish I knew something about how Twitter's new Heron project addresses the
concerns above since it comes with Mesos support out-of-the-box, but it's
unclear at this point what they're doing until they open source it.

Thanks!

- Erik

On Wed, Jan 13, 2016 at 6:27 PM, 刘键(Basti Liu) <ba...@alibaba-inc.com>
wrote:

> Hi Bobby & Jerry,
>
> Yes, JStorm implements generic cgroup support. But just only cpu control
> is enable when starting worker.
>
> Regards
> Basti
>
> -----Original Message-----
> From: Bobby Evans [mailto:evans@yahoo-inc.com.INVALID]
> Sent: Wednesday, January 13, 2016 11:14 PM
> To: dev@storm.apache.org
> Subject: Re: JStorm CGroup
>
> Jerry,
> I think most of the code you are going to want to look at is here
> https://github.com/apache/storm/blob/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/daemon/supervisor/CgroupManager.java
> The back end for most of it seems to come from
>
>
> https://github.com/apache/storm/tree/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/container
>
> Which looks like it implements a somewhat generic cgroup support.
>  - Bobby
>
>    On Wednesday, January 13, 2016 1:34 AM, 刘键(Basti Liu) <
> basti.lj@alibaba-inc.com> wrote:
>
>
>  Hi Jerry,
>
> Currently, JStorm supports to control the upper limit of cpu time for a
> worker by cpu.cfs_period_us & cpu.cfs_quota_us in cgroup.
> e.g. cpu.cfs_period_us= 100000, cpu.cfs_quota_us=3*100000. Cgroup will
> limit the corresponding process to occupy at most 300% cpu (3 cores).
>
> Regards
> Basti
>
> -----Original Message-----
> From: Jerry Peng [mailto:jerry.boyang.peng@gmail.com]
> Sent: Wednesday, January 13, 2016 1:57 PM
> To: dev@storm.apache.org
> Subject: JStorm CGroup
>
> Hello everyone,
>
> This question is directed more towards the people that worked on JStorm.
> If I recall correctly JStorm offers some sort of resource isolation through
> CGroups.  What kind of support does JStorm offer for resource isolation?
> Can someone elaborate on this feature in JStorm.
>
> Best,
>
> Jerry
>
>
>
>
>

Re: JStorm CGroup

Posted by Erik Weathers <ew...@groupon.com.INVALID>.

Perhaps rather than just bolting on "cgroup support", we could instead open
a dialogue about having Mesos support be a core feature of Storm.

The current integration is a bit unwieldy & hackish at the moment, arising
from the conflicting natures of Mesos and Storm w.r.t. scheduling of
resources.  i.e., Storm assumes you have existing "slots" for running
workers on, whereas Mesos is more dynamic, requiring frameworks that run on
top of it to tell Mesos just how many resources (CPUs, Memory, etc.) are
needed by the framework's tasks.

One example of an issue with Storm-on-Mesos:  the Storm logviewer is
completely busted when you are using Mesos, I filed a ticket with a
description of the issue and proposed modifications to allow it to function:

   - https://issues.apache.org/jira/browse/STORM-1342

Furthermore, there are fundamental behaviors in Storm that don't mesh well
with Mesos:

   - the interfaces of INimbus (allSlotsAvailableForScheduling(),
   assignSlots(), getForcedScheduler(), etc.) make it difficult to create an
   ideal Mesos integration framework, since they don't allow the Mesos
   integration code to *really* know what's going on from the Nimbus's
   perspective. e.g.,
      - knowing which topologies & how many workers need to be scheduled at
      any given moment.
      - since the integration code cannot know what is actually needed to
      be run when it receives offers from Mesos, it just hoards those offers,
      leading to resource starvation in the Mesos cluster.
   - the "fallback" behavior of allowing the topology to settle for having
   less worker processes than requested should be disable-able.  For carefully
   tuned topologies it is quite bad to run on less than the expected number of
   worker processes.
      - also, this behavior endangers the idea of having the Mesos
      integration code *only* hoard Mesos offers after a successful round-trip
      through the allSlotsAvailableForScheduling() polling calls (i.e., only
      hoard when we know there are pending topologies).  It's dangerous because
      while we wait for another call to allSlotsAvailableForScheduling(), the
      Nimbus may have decided that it's okie dokie to use less than
the requested
      number of worker processes.

I'm sure there are other issues that I can conjure up, but those are the
major ones that came to mind instantly.  I'm happy to explain more about
this, since I realize the above bulleted info may lack context.

I wish I knew something about how Twitter's new Heron project addresses the
concerns above since it comes with Mesos support out-of-the-box, but it's
unclear at this point what they're doing until they open source it.

Thanks!

- Erik

On Wed, Jan 13, 2016 at 6:27 PM, 刘键(Basti Liu) <ba...@alibaba-inc.com>
wrote:

> Hi Bobby & Jerry,
>
> Yes, JStorm implements generic cgroup support. But just only cpu control
> is enable when starting worker.
>
> Regards
> Basti
>
> -----Original Message-----
> From: Bobby Evans [mailto:evans@yahoo-inc.com.INVALID]
> Sent: Wednesday, January 13, 2016 11:14 PM
> To: dev@storm.apache.org
> Subject: Re: JStorm CGroup
>
> Jerry,
> I think most of the code you are going to want to look at is here
> https://github.com/apache/storm/blob/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/daemon/supervisor/CgroupManager.java
> The back end for most of it seems to come from
>
>
> https://github.com/apache/storm/tree/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/container
>
> Which looks like it implements a somewhat generic cgroup support.
>  - Bobby
>
>     On Wednesday, January 13, 2016 1:34 AM, 刘键(Basti Liu) <
> basti.lj@alibaba-inc.com> wrote:
>
>
>  Hi Jerry,
>
> Currently, JStorm supports to control the upper limit of cpu time for a
> worker by cpu.cfs_period_us & cpu.cfs_quota_us in cgroup.
> e.g. cpu.cfs_period_us= 100000, cpu.cfs_quota_us=3*100000. Cgroup will
> limit the corresponding process to occupy at most 300% cpu (3 cores).
>
> Regards
> Basti
>
> -----Original Message-----
> From: Jerry Peng [mailto:jerry.boyang.peng@gmail.com]
> Sent: Wednesday, January 13, 2016 1:57 PM
> To: dev@storm.apache.org
> Subject: JStorm CGroup
>
> Hello everyone,
>
> This question is directed more towards the people that worked on JStorm.
> If I recall correctly JStorm offers some sort of resource isolation through
> CGroups.  What kind of support does JStorm offer for resource isolation?
> Can someone elaborate on this feature in JStorm.
>
> Best,
>
> Jerry
>
>
>
>
>

RE: JStorm CGroup

Posted by "刘键(Basti Liu)" <ba...@alibaba-inc.com>.

Hi Bobby & Jerry,

Yes, JStorm implements generic cgroup support. But just only cpu control is enable when starting worker.

Regards
Basti

-----Original Message-----
From: Bobby Evans [mailto:evans@yahoo-inc.com.INVALID] 
Sent: Wednesday, January 13, 2016 11:14 PM
To: dev@storm.apache.org
Subject: Re: JStorm CGroup

Jerry,
I think most of the code you are going to want to look at is here https://github.com/apache/storm/blob/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/daemon/supervisor/CgroupManager.java
The back end for most of it seems to come from

https://github.com/apache/storm/tree/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/container

Which looks like it implements a somewhat generic cgroup support.
 - Bobby 

    On Wednesday, January 13, 2016 1:34 AM, 刘键(Basti Liu) <ba...@alibaba-inc.com> wrote:
 

 Hi Jerry,

Currently, JStorm supports to control the upper limit of cpu time for a worker by cpu.cfs_period_us & cpu.cfs_quota_us in cgroup.
e.g. cpu.cfs_period_us= 100000, cpu.cfs_quota_us=3*100000. Cgroup will limit the corresponding process to occupy at most 300% cpu (3 cores).

Regards
Basti

-----Original Message-----
From: Jerry Peng [mailto:jerry.boyang.peng@gmail.com]
Sent: Wednesday, January 13, 2016 1:57 PM
To: dev@storm.apache.org
Subject: JStorm CGroup

Hello everyone,

This question is directed more towards the people that worked on JStorm.  If I recall correctly JStorm offers some sort of resource isolation through CGroups.  What kind of support does JStorm offer for resource isolation? Can someone elaborate on this feature in JStorm.

Best,

Jerry

Re: JStorm CGroup

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

Jerry,
I think most of the code you are going to want to look at is here
https://github.com/apache/storm/blob/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/daemon/supervisor/CgroupManager.java
The back end for most of it seems to come from

https://github.com/apache/storm/tree/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/container

Which looks like it implements a somewhat generic cgroup support.
 - Bobby 

    On Wednesday, January 13, 2016 1:34 AM, 刘键(Basti Liu) <ba...@alibaba-inc.com> wrote:
 

 Hi Jerry,

Currently, JStorm supports to control the upper limit of cpu time for a worker by cpu.cfs_period_us & cpu.cfs_quota_us in cgroup.
e.g. cpu.cfs_period_us= 100000, cpu.cfs_quota_us=3*100000. Cgroup will limit the corresponding process to occupy at most 300% cpu (3 cores).

Regards
Basti

-----Original Message-----
From: Jerry Peng [mailto:jerry.boyang.peng@gmail.com] 
Sent: Wednesday, January 13, 2016 1:57 PM
To: dev@storm.apache.org
Subject: JStorm CGroup

Hello everyone,

This question is directed more towards the people that worked on JStorm.  If I recall correctly JStorm offers some sort of resource isolation through CGroups.  What kind of support does JStorm offer for resource isolation? Can someone elaborate on this feature in JStorm.

Best,

Jerry

RE: JStorm CGroup

Posted by "刘键(Basti Liu)" <ba...@alibaba-inc.com>.

Hi Jerry,

Currently, JStorm supports to control the upper limit of cpu time for a worker by cpu.cfs_period_us & cpu.cfs_quota_us in cgroup.
e.g. cpu.cfs_period_us= 100000, cpu.cfs_quota_us=3*100000. Cgroup will limit the corresponding process to occupy at most 300% cpu (3 cores).

Regards
Basti

-----Original Message-----
From: Jerry Peng [mailto:jerry.boyang.peng@gmail.com] 
Sent: Wednesday, January 13, 2016 1:57 PM
To: dev@storm.apache.org
Subject: JStorm CGroup

Hello everyone,

This question is directed more towards the people that worked on JStorm.  If I recall correctly JStorm offers some sort of resource isolation through CGroups.  What kind of support does JStorm offer for resource isolation? Can someone elaborate on this feature in JStorm.

Best,

Jerry