You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by "Dan DeCapria, CivicScience" <da...@civicscience.com> on 2014/11/04 22:45:55 UTC

On Multiple Topologies and Worker Auto-Normalization

Use Case:
I have a production storm cluster running with six workers.  Currently
topology A is active and consuming all six workers via conf.setNumWorkers(6).
Now, launching Topology B with six workers (again via conf.setNumWorkers(6))
states the topology is active, but currently there are no available workers
on the cluster for Topology B to use (as Topology A has claimed them all
already) and hence Topology B is doing nothing. I believe this is due to
storm requiring a priori that the sum of all topology's workers requested
<= cluster worker capacity.

I am wondering why the sum of all topology workers is not normalizing the
allocations for the worker pool when capacity is exceeded and
auto-adjusting as new topologies come and go? Meaning, from the use case,
since both topologies requested the same count of six workers, and given
the finite capacity of the cluster at six actual workers, the *implemented*
normalized proportion of the cluster resources for each topology would be
50% split - such that Topology A gets three actual workers and Topology B
gets three actual workers as well.

How would I go about implementing a dynamic re-allocation of topology
workers based on the proportion of expected workers given the cluster's
finite worker capacity? An idea would be allocating workers as desired for
all topologies until the cluster capacity is reached, at which point the
actual number of workers desired becomes a normalized proportion allocation
model over the cluster.

Many thanks,

-Dan

Re: On Multiple Topologies and Worker Auto-Normalization

Posted by Ahmed El Rheddane <ah...@imag.fr>.

Hello everybody,

I believe that the number of workers is only relevant if you use Storm's 
default scheduler. You can also define your own scheduler by 
implementing the IScheduler interface and assign the different executors 
on the available slots as you see fit.

Ahmed

On 11/05/2014 06:20 PM, Tyson Norris wrote:
> I agree the behavior is awkward.
>
> The setNumWorkers config appears to behave as an upper limit of 
> workers that will be utilized (e.g. setNumWorkers(200) will not fail 
> to deploy when workers < 200), so this can also impact the ORDER of 
> topology deployment required when workers < sum of all setNumWorkers 
> configs for all topologies.
>
> For example, I originally setNumWorkers(200) - artificially high - so 
> that we can scale our worker pool up to 200 without deploying new 
> code. However, when worker pool is 10, and this topology gets deployed 
> first, then no other topologies can be deployed, since all the workers 
> are assigned to this single topology.
>
> Now, while this is awkward IMHO (I would rather have some additional 
> knobs, like “assign a percentage of workers” or “assign a topology 
> priority”), we can get around this by using rebalance and setting the 
> number of workers to reflect the current state of the cluster. Since 
> you MUST rebalance to increase/decrease the number of threads (which 
> would typically go hand in hand with changing the number of workers, I 
> think), then its not really extra work.
>
> At some point it would be nice to have some pluggable logic that 
> controls the number of executors dynamically, so that when a worker is 
> added, the number of executors can be altered programmatically, 
> instead of requiring manual intervention.
>
> Thanks
> Tyson
>
> On Nov 5, 2014, at 9:05 AM, Dan DeCapria, CivicScience 
> <dan.decapria@civicscience.com <ma...@civicscience.com>> 
> wrote:
>
>> Hi Nathan,
>>
>> Sounds like I need to just bite-the-bullet and manually define the 
>> number of workers for each topology considering all topologies that 
>> will be running concurrently. The secondary process you mentioned is 
>> interesting wrt using Thrift to query the worker utilization and then 
>> auto-balance all topologies during runtime - I'll have to look into 
>> that process further.
>>
>> Thanks for you help,
>>
>> -Dan
>>
>>
>> On Wed, Nov 5, 2014 at 11:50 AM, Nathan Leung <ncleung@gmail.com 
>> <ma...@gmail.com>> wrote:
>>
>>     It doesn't make sense to automatically balance worker load
>>     because you can get yourself into strange situations (e.g. 3
>>     workers and each topology requests 4, or 4 workers and > 4
>>     topologies, etc).  It would be nice if the UI or the logs gave a
>>     better indication that there were not enough workers to go around
>>     though.  You could write something that reads cluster information
>>     from the nimbus over thrift and rebalances all topologies as
>>     necessary, but I think it would be better to just make sure that
>>     you have enough workers available (or even better, more than
>>     enough workers) to satisfy the needs of your applications.
>>
>>     On Tue, Nov 4, 2014 at 4:45 PM, Dan DeCapria, CivicScience
>>     <dan.decapria@civicscience.com
>>     <ma...@civicscience.com>> wrote:
>>
>>         Use Case:
>>         I have a production storm cluster running with six workers. 
>>         Currently topology A is active and consuming all six workers
>>         via conf.setNumWorkers(6). Now, launching Topology B with six
>>         workers (again via conf.setNumWorkers(6)) states the topology
>>         is active, but currently there are no available workers on
>>         the cluster for Topology B to use (as Topology A has claimed
>>         them all already) and hence Topology B is doing nothing. I
>>         believe this is due to storm requiring a priori that the sum
>>         of all topology's workers requested <= cluster worker capacity.
>>
>>         I am wondering why the sum of all topology workers is not
>>         normalizing the allocations for the worker pool when capacity
>>         is exceeded and auto-adjusting as new topologies come and go?
>>         Meaning, from the use case, since both topologies requested
>>         the same count of six workers, and given the finite capacity
>>         of the cluster at six actual workers, the /implemented/
>>         normalized proportion of the cluster resources for each
>>         topology would be 50% split - such that Topology A gets three
>>         actual workers and Topology B gets three actual workers as well.
>>
>>         How would I go about implementing a dynamic re-allocation of
>>         topology workers based on the proportion of expected workers
>>         given the cluster's finite worker capacity? An idea would be
>>         allocating workers as desired for all topologies until the
>>         cluster capacity is reached, at which point the actual number
>>         of workers desired becomes a normalized proportion allocation
>>         model over the cluster.
>>
>>         Many thanks,
>>
>>         -Dan
>>
>>
>>
>

Re: On Multiple Topologies and Worker Auto-Normalization

Posted by Tyson Norris <tn...@adobe.com>.

I agree the behavior is awkward.

The setNumWorkers config appears to behave as an upper limit of workers that will be utilized (e.g. setNumWorkers(200) will not fail to deploy when workers < 200), so this can also impact the ORDER of topology deployment required when workers < sum of all setNumWorkers configs for all topologies.

For example, I originally setNumWorkers(200) - artificially high - so that we can scale our worker pool up to 200 without deploying new code. However, when worker pool is 10, and this topology gets deployed first, then no other topologies can be deployed, since all the workers are assigned to this single topology.

Now, while this is awkward IMHO (I would rather have some additional knobs, like “assign a percentage of workers” or “assign a topology priority”), we can get around this by using rebalance and setting the number of workers to reflect the current state of the cluster. Since you MUST rebalance to increase/decrease the number of threads (which would typically go hand in hand with changing the number of workers, I think), then its not really extra work.

At some point it would be nice to have some pluggable logic that controls the number of executors dynamically, so that when a worker is added, the number of executors can be altered programmatically, instead of requiring manual intervention.

Thanks
Tyson

On Nov 5, 2014, at 9:05 AM, Dan DeCapria, CivicScience <da...@civicscience.com>> wrote:

Hi Nathan,

Sounds like I need to just bite-the-bullet and manually define the number of workers for each topology considering all topologies that will be running concurrently. The secondary process you mentioned is interesting wrt using Thrift to query the worker utilization and then auto-balance all topologies during runtime - I'll have to look into that process further.

Thanks for you help,

-Dan

On Wed, Nov 5, 2014 at 11:50 AM, Nathan Leung <nc...@gmail.com>> wrote:
It doesn't make sense to automatically balance worker load because you can get yourself into strange situations (e.g. 3 workers and each topology requests 4, or 4 workers and > 4 topologies, etc). It would be nice if the UI or the logs gave a better indication that there were not enough workers to go around though. You could write something that reads cluster information from the nimbus over thrift and rebalances all topologies as necessary, but I think it would be better to just make sure that you have enough workers available (or even better, more than enough workers) to satisfy the needs of your applications.

On Tue, Nov 4, 2014 at 4:45 PM, Dan DeCapria, CivicScience <da...@civicscience.com>> wrote:
Use Case:
I have a production storm cluster running with six workers. Currently topology A is active and consuming all six workers via conf.setNumWorkers(6). Now, launching Topology B with six workers (again via conf.setNumWorkers(6)) states the topology is active, but currently there are no available workers on the cluster for Topology B to use (as Topology A has claimed them all already) and hence Topology B is doing nothing. I believe this is due to storm requiring a priori that the sum of all topology's workers requested <= cluster worker capacity.

I am wondering why the sum of all topology workers is not normalizing the allocations for the worker pool when capacity is exceeded and auto-adjusting as new topologies come and go? Meaning, from the use case, since both topologies requested the same count of six workers, and given the finite capacity of the cluster at six actual workers, the implemented normalized proportion of the cluster resources for each topology would be 50% split - such that Topology A gets three actual workers and Topology B gets three actual workers as well.

How would I go about implementing a dynamic re-allocation of topology workers based on the proportion of expected workers given the cluster's finite worker capacity? An idea would be allocating workers as desired for all topologies until the cluster capacity is reached, at which point the actual number of workers desired becomes a normalized proportion allocation model over the cluster.

Many thanks,

-Dan

Re: On Multiple Topologies and Worker Auto-Normalization

Posted by "Dan DeCapria, CivicScience" <da...@civicscience.com>.

Hi Nathan,

Sounds like I need to just bite-the-bullet and manually define the number
of workers for each topology considering all topologies that will be
running concurrently. The secondary process you mentioned is interesting
wrt using Thrift to query the worker utilization and then auto-balance all
topologies during runtime - I'll have to look into that process further.

Thanks for you help,

-Dan


On Wed, Nov 5, 2014 at 11:50 AM, Nathan Leung <nc...@gmail.com> wrote:

> It doesn't make sense to automatically balance worker load because you can
> get yourself into strange situations (e.g. 3 workers and each topology
> requests 4, or 4 workers and > 4 topologies, etc).  It would be nice if the
> UI or the logs gave a better indication that there were not enough workers
> to go around though.  You could write something that reads cluster
> information from the nimbus over thrift and rebalances all topologies as
> necessary, but I think it would be better to just make sure that you have
> enough workers available (or even better, more than enough workers) to
> satisfy the needs of your applications.
>
> On Tue, Nov 4, 2014 at 4:45 PM, Dan DeCapria, CivicScience <
> dan.decapria@civicscience.com> wrote:
>
>> Use Case:
>> I have a production storm cluster running with six workers.  Currently
>> topology A is active and consuming all six workers via
>> conf.setNumWorkers(6).  Now, launching Topology B with six workers
>> (again via conf.setNumWorkers(6)) states the topology is active, but
>> currently there are no available workers on the cluster for Topology B to
>> use (as Topology A has claimed them all already) and hence Topology B is
>> doing nothing. I believe this is due to storm requiring a priori that the
>> sum of all topology's workers requested <= cluster worker capacity.
>>
>> I am wondering why the sum of all topology workers is not normalizing the
>> allocations for the worker pool when capacity is exceeded and
>> auto-adjusting as new topologies come and go? Meaning, from the use case,
>> since both topologies requested the same count of six workers, and given
>> the finite capacity of the cluster at six actual workers, the
>> *implemented* normalized proportion of the cluster resources for each
>> topology would be 50% split - such that Topology A gets three actual
>> workers and Topology B gets three actual workers as well.
>>
>> How would I go about implementing a dynamic re-allocation of topology
>> workers based on the proportion of expected workers given the cluster's
>> finite worker capacity? An idea would be allocating workers as desired for
>> all topologies until the cluster capacity is reached, at which point the
>> actual number of workers desired becomes a normalized proportion allocation
>> model over the cluster.
>>
>> Many thanks,
>>
>> -Dan
>>
>>
>

Re: On Multiple Topologies and Worker Auto-Normalization

Posted by Nathan Leung <nc...@gmail.com>.

It doesn't make sense to automatically balance worker load because you can
get yourself into strange situations (e.g. 3 workers and each topology
requests 4, or 4 workers and > 4 topologies, etc).  It would be nice if the
UI or the logs gave a better indication that there were not enough workers
to go around though.  You could write something that reads cluster
information from the nimbus over thrift and rebalances all topologies as
necessary, but I think it would be better to just make sure that you have
enough workers available (or even better, more than enough workers) to
satisfy the needs of your applications.

On Tue, Nov 4, 2014 at 4:45 PM, Dan DeCapria, CivicScience <
dan.decapria@civicscience.com> wrote:

> Use Case:
> I have a production storm cluster running with six workers.  Currently
> topology A is active and consuming all six workers via
> conf.setNumWorkers(6).  Now, launching Topology B with six workers (again
> via conf.setNumWorkers(6)) states the topology is active, but currently
> there are no available workers on the cluster for Topology B to use (as
> Topology A has claimed them all already) and hence Topology B is doing
> nothing. I believe this is due to storm requiring a priori that the sum of
> all topology's workers requested <= cluster worker capacity.
>
> I am wondering why the sum of all topology workers is not normalizing the
> allocations for the worker pool when capacity is exceeded and
> auto-adjusting as new topologies come and go? Meaning, from the use case,
> since both topologies requested the same count of six workers, and given
> the finite capacity of the cluster at six actual workers, the
> *implemented* normalized proportion of the cluster resources for each
> topology would be 50% split - such that Topology A gets three actual
> workers and Topology B gets three actual workers as well.
>
> How would I go about implementing a dynamic re-allocation of topology
> workers based on the proportion of expected workers given the cluster's
> finite worker capacity? An idea would be allocating workers as desired for
> all topologies until the cluster capacity is reached, at which point the
> actual number of workers desired becomes a normalized proportion allocation
> model over the cluster.
>
> Many thanks,
>
> -Dan
>
>