You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@storm.apache.org by Aakash Khochare <aa...@grads.cds.iisc.ac.in> on 2016/08/21 13:08:36 UTC

Details about the Storm Scheduler

Greetings,

I am Aakash Khochare. I am currently a Masters student in Indian Institute of Science, Bangalore. I know that the number of worker processes to be used by a topology and the number of executors can be changed using storm rebalance.  However, is there any facility that does this work automatically based on probably the workload of the component/node. And if it doesn't exist, would such a facility be a valuable addition towards storm?


Regards,

Aakash Khochare

Re: Details about the Storm Scheduler

Posted by Roshan Naik <ro...@hortonworks.com>.

Indeed. Since Bobby was pointing out that CPU consumption monitoring is impractical at a component level... My suggestion was merely intended as a next best alternative.
-roshan

From: Nathan Leung <nc...@gmail.com>>
Date: Wednesday, August 24, 2016 at 7:24 AM
To: "dev@storm.apache.org<ma...@storm.apache.org>" <de...@storm.apache.org>>, Bobby Evans <ev...@yahoo-inc.com>>
Cc: Roshan Naik <ro...@hortonworks.com>>
Subject: Re: Details about the Storm Scheduler

Also your bolt may be pending on a call to an external resource (e.g. DB) and thus not consuming much CPU despite a relatively high usage.

On Wed, Aug 24, 2016 at 10:09 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>> wrote:
But CGroups is restricting the actual CPU usage and scheduling is taking the CPU usage into account so as to not overload a box.  You can use the latency to guess how much CPU is being used, but that only works for a single threaded bolt/spout.  Not all bolts/spouts are single threaded.  Think about a shell bolt or a shell spout.
 - Bobby

    On Tuesday, August 23, 2016 5:38 PM, Roshan Naik <ro...@hortonworks.com>> wrote:




On 8/22/16, 6:55 AM, "Bobby Evans" <ev...@yahoo-inc.com.INVALID>> wrote:

>Getting the CPU used for a worker is simple, but getting the CPU used for
>individual components is not so simple/almost impossible for
>multi-threaded bolts/spouts.  The current scheduling assumes that the CPU
>for all bolts/spouts is the same for the same component.  This could
>cause a hot spot in one part of the topology to cause rescheduling to
>happen in other parts of the topology needlessly.


I think that can be worked around by looking at the latency of each
component instead of CPU consumption of each component.

-roshan

Re: Details about the Storm Scheduler

Posted by Nathan Leung <nc...@gmail.com>.

Also your bolt may be pending on a call to an external resource (e.g. DB)
and thus not consuming much CPU despite a relatively high usage.

On Wed, Aug 24, 2016 at 10:09 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> But CGroups is restricting the actual CPU usage and scheduling is taking
> the CPU usage into account so as to not overload a box.  You can use the
> latency to guess how much CPU is being used, but that only works for a
> single threaded bolt/spout.  Not all bolts/spouts are single threaded.
> Think about a shell bolt or a shell spout.
>  - Bobby
>
>     On Tuesday, August 23, 2016 5:38 PM, Roshan Naik <
> roshan@hortonworks.com> wrote:
>
>
>
>
> On 8/22/16, 6:55 AM, "Bobby Evans" <ev...@yahoo-inc.com.INVALID> wrote:
>
> >Getting the CPU used for a worker is simple, but getting the CPU used for
> >individual components is not so simple/almost impossible for
> >multi-threaded bolts/spouts.  The current scheduling assumes that the CPU
> >for all bolts/spouts is the same for the same component.  This could
> >cause a hot spot in one part of the topology to cause rescheduling to
> >happen in other parts of the topology needlessly.
>
>
> I think that can be worked around by looking at the latency of each
> component instead of CPU consumption of each component.
>
> -roshan
>
>
>
>

Re: Details about the Storm Scheduler

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

But CGroups is restricting the actual CPU usage and scheduling is taking the CPU usage into account so as to not overload a box.  You can use the latency to guess how much CPU is being used, but that only works for a single threaded bolt/spout.  Not all bolts/spouts are single threaded.  Think about a shell bolt or a shell spout.
 - Bobby 

    On Tuesday, August 23, 2016 5:38 PM, Roshan Naik <ro...@hortonworks.com> wrote:
 

 

On 8/22/16, 6:55 AM, "Bobby Evans" <ev...@yahoo-inc.com.INVALID> wrote:

>Getting the CPU used for a worker is simple, but getting the CPU used for
>individual components is not so simple/almost impossible for
>multi-threaded bolts/spouts.  The current scheduling assumes that the CPU
>for all bolts/spouts is the same for the same component.  This could
>cause a hot spot in one part of the topology to cause rescheduling to
>happen in other parts of the topology needlessly.


I think that can be worked around by looking at the latency of each
component instead of CPU consumption of each component.

-roshan

Re: Details about the Storm Scheduler

Posted by Roshan Naik <ro...@hortonworks.com>.

On 8/22/16, 6:55 AM, "Bobby Evans" <ev...@yahoo-inc.com.INVALID> wrote:

>Getting the CPU used for a worker is simple, but getting the CPU used for
>individual components is not so simple/almost impossible for
>multi-threaded bolts/spouts.  The current scheduling assumes that the CPU
>for all bolts/spouts is the same for the same component.  This could
>cause a hot spot in one part of the topology to cause rescheduling to
>happen in other parts of the topology needlessly.

I think that can be worked around by looking at the latency of each
component instead of CPU consumption of each component.

-roshan

Re: Details about the Storm Scheduler

Posted by Bobby Evans <ev...@yahoo-inc.com.INVALID>.

That is something that we have been thinking about for a while (elasticity in a topology).  There are a lot of obstacles to overcome, beyond just the scheduler.  

1) The metrics feedback loop is far from ideal in being able to automatically detect a bottleneck.  Capacity kind of works, but the way we collect and aggregate these metrics makes it very difficult to scale/fix some of the issues.
2) Currently the only mechanism for increasing the parallelism is to increase/decrease the number of executors.  By default the number of executors matches the number of tasks, but you can have some tasks share a thread of execution aka an executor.  This works kind of but if you are not careful the workload can be very uneven and result in hot spots in the topology.  
3) any adjustment to the parallelism can cause the entire topology to be rescheduled, with a high probability that all of the workers will need to be relaunched.

With some of the work we have been doing on the Resource Aware Scheduler and with CGROUP resource enforcement it can help to address #2 and some of #3 in that we could adjust the CPU for a given component in a much more fine grained manor, and if it is just CPU that is being adjusted and there is enough free CPU on the box to handle the change then we could change the CGROUP to increase/decrease the CPU allocation without killing any workers.  But there are still problems.
Getting the CPU used for a worker is simple, but getting the CPU used for individual components is not so simple/almost impossible for multi-threaded bolts/spouts.  The current scheduling assumes that the CPU for all bolts/spouts is the same for the same component.  This could cause a hot spot in one part of the topology to cause rescheduling to happen in other parts of the topology needlessly.

And #1 is still a problem.
Any work you do on this would be appreciated and very interesting, but I want to be sure that you know what you are getting into before you get started.  Also be aware that this is something others in the community are very interested in. If your plan is to contribute it back, including the community in the design and implementation of this feature would be really good.  - Bobby 

    On Sunday, August 21, 2016 8:34 AM, Abhishek Agarwal <ab...@gmail.com> wrote:
 

 Auto scaling doesn't exist right now. And yes adding it will definitely add
a lot of value

On Aug 21, 2016 6:38 PM, "Aakash Khochare" <aa...@grads.cds.iisc.ac.in>
wrote:

Greetings,

I am Aakash Khochare. I am currently a Masters student in Indian Institute
of Science, Bangalore. I know that the number of worker processes to be
used by a topology and the number of executors can be changed using storm
rebalance.  However, is there any facility that does this work
automatically based on probably the workload of the component/node. And if
it doesn't exist, would such a facility be a valuable addition towards
storm?


Regards,

Aakash Khochare

Re: Details about the Storm Scheduler

Posted by Abhishek Agarwal <ab...@gmail.com>.

Auto scaling doesn't exist right now. And yes adding it will definitely add
a lot of value

On Aug 21, 2016 6:38 PM, "Aakash Khochare" <aa...@grads.cds.iisc.ac.in>
wrote:

Greetings,

I am Aakash Khochare. I am currently a Masters student in Indian Institute
of Science, Bangalore. I know that the number of worker processes to be
used by a topology and the number of executors can be changed using storm
rebalance.  However, is there any facility that does this work
automatically based on probably the workload of the component/node. And if
it doesn't exist, would such a facility be a valuable addition towards
storm?


Regards,

Aakash Khochare