You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by ab...@gmail.com on 2016/04/21 03:17:55 UTC

RE: How to control Affinity of Executor to a Worker

Hi,

I have setup 4 workers on each machine. In my Topology, there is one bolt which needs a lot of memory, so ideally, I don’t want it to schedule more than 1 of that on any machines. In terms of computation it is pretty fast so it can manage good throughput when running. Lets call it, BoltHighTension.
But, my other bolts are very light weight and I can have a lot of parallelism on that. 

How do I ensure that if I have 20 Supervisors, I don’t have more than 1 ‘BoltHighTension’ on each machine? I want to give parallelism hint of 20 to this bolt.
But, I notice that sometimes, more than 1 such instance gets allocated on same machine. (Machine can handle 2, but the performance due to paging becomes a problem).

Thanks for your help/advice/hints.

Thanks
-Abhishek

Sent from Mail for Windows 10

RE: How to control Affinity of Executor to a Worker

Posted by ab...@gmail.com.

@Vijay: Memory is static. It is not per-bolt, it is just loading some large datasets. Refactoring is certainly an option but will require plenty of code changes and will also cause a lot of data to be transferred over wire.



@John   That is correct. But, the scenario is a bit different. I have not tested but I am not sure how will it work.

Lets say if I set 20 as parallelism on the ‘HighTensionBolt’ bolt and 20 workers and each machine with exactly 1 slot. But if I have 10 other bolts, there is no way to ensure that all 20 instance of HighTensionBolts are distributed eveninly.



Thanks
-Abhishek

Sent from Mail for Windows 10

From: hokiegeek2@gmail.com
Sent: Thursday, April 21, 2016 4:18 AM
To: user@storm.apache.org
Subject: Re: How to control Affinity of Executor to a Worker

That's interesting. When I match the number of executors to the number of workers, I always get exactly one executor per worker, at least with versions 0.9.4-0.9.6. 

--John

Sent from my iPhone

On Apr 21, 2016, at 12:19 AM, Vijay Patil <vi...@gmail.com> wrote:
If your topology is the only topology running on 20 node cluster, then I think you can reduce slots per supervisor to "1" by setting "supervisor.slots.ports" (mention just 1 port number there) in storm.yaml. If you are running multiple topologies on this 20 node cluster then this solution may not work, need to think of something else like writing our own meta-data aware custom scheduler by implementing backtype.storm.scheduler.IScheduler.

But I think there can be some scope for refactoring that HishTensionBolt in order to reduce memory usage. All the memory used by that bolt remains static for every tuple? Or it's "execute()" method which consumes that much memory each time and discards it once tuple processing is done? 

On 21 April 2016 at 06:47, <ab...@gmail.com> wrote:
 
Hi,
 
I have setup 4 workers on each machine. In my Topology, there is one bolt which needs a lot of memory, so ideally, I don’t want it to schedule more than 1 of that on any machines. In terms of computation it is pretty fast so it can manage good throughput when running. Lets call it, BoltHighTension.
But, my other bolts are very light weight and I can have a lot of parallelism on that. 
 
How do I ensure that if I have 20 Supervisors, I don’t have more than 1 ‘BoltHighTension’ on each machine? I want to give parallelism hint of 20 to this bolt.
But, I notice that sometimes, more than 1 such instance gets allocated on same machine. (Machine can handle 2, but the performance due to paging becomes a problem).
 
Thanks for your help/advice/hints.
 
Thanks
-Abhishek
 
Sent from Mail for Windows 10

Re: How to control Affinity of Executor to a Worker

Posted by ho...@gmail.com.

That's interesting. When I match the number of executors to the number of workers, I always get exactly one executor per worker, at least with versions 0.9.4-0.9.6. 

--John

Sent from my iPhone

> On Apr 21, 2016, at 12:19 AM, Vijay Patil <vi...@gmail.com> wrote:
> 
> If your topology is the only topology running on 20 node cluster, then I think you can reduce slots per supervisor to "1" by setting "supervisor.slots.ports" (mention just 1 port number there) in storm.yaml. If you are running multiple topologies on this 20 node cluster then this solution may not work, need to think of something else like writing our own meta-data aware custom scheduler by implementing backtype.storm.scheduler.IScheduler.
> 
> But I think there can be some scope for refactoring that HishTensionBolt in order to reduce memory usage. All the memory used by that bolt remains static for every tuple? Or it's "execute()" method which consumes that much memory each time and discards it once tuple processing is done? 
> 
>> On 21 April 2016 at 06:47, <ab...@gmail.com> wrote:
>>  
>> 
>> Hi,
>> 
>>  
>> 
>> I have setup 4 workers on each machine. In my Topology, there is one bolt which needs a lot of memory, so ideally, I don’t want it to schedule more than 1 of that on any machines. In terms of computation it is pretty fast so it can manage good throughput when running. Lets call it, BoltHighTension.
>> 
>> But, my other bolts are very light weight and I can have a lot of parallelism on that.
>> 
>>  
>> 
>> How do I ensure that if I have 20 Supervisors, I don’t have more than 1 ‘BoltHighTension’ on each machine? I want to give parallelism hint of 20 to this bolt.
>> 
>> But, I notice that sometimes, more than 1 such instance gets allocated on same machine. (Machine can handle 2, but the performance due to paging becomes a problem).
>> 
>>  
>> 
>> Thanks for your help/advice/hints.
>> 
>>  
>> 
>> Thanks
>> 
>> -Abhishek
>> 
>>  
>> 
>> Sent from Mail for Windows 10
>> 
>

Re: How to control Affinity of Executor to a Worker

Posted by Vijay Patil <vi...@gmail.com>.

If your topology is the only topology running on 20 node cluster, then I
think you can reduce slots per supervisor to "1" by setting
"supervisor.slots.ports"
(mention just 1 port number there) in storm.yaml. If you are running
multiple topologies on this 20 node cluster then this solution may not
work, need to think of something else like writing our own meta-data aware
custom scheduler by implementing backtype.storm.scheduler.IScheduler.

But I think there can be some scope for refactoring that HishTensionBolt in
order to reduce memory usage. All the memory used by that bolt remains
static for every tuple? Or it's "execute()" method which consumes that much
memory each time and discards it once tuple processing is done?

On 21 April 2016 at 06:47, <ab...@gmail.com> wrote:

>
>
> Hi,
>
>
>
> I have setup 4 workers on each machine. In my Topology, there is one bolt
> which needs a lot of memory, so ideally, I don’t want it to schedule more
> than 1 of that on any machines. In terms of computation it is pretty fast
> so it can manage good throughput when running. Lets call it,
> BoltHighTension.
>
> But, my other bolts are very light weight and I can have a lot of
> parallelism on that.
>
>
>
> How do I ensure that if I have 20 Supervisors, I don’t have more than 1
> ‘BoltHighTension’ on each machine? I want to give parallelism hint of 20 to
> this bolt.
>
> But, I notice that sometimes, more than 1 such instance gets allocated on
> same machine. (Machine can handle 2, but the performance due to paging
> becomes a problem).
>
>
>
> Thanks for your help/advice/hints.
>
>
>
> Thanks
>
> -Abhishek
>
>
>
> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
>
>
>