You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by 이승진 <sw...@navercorp.com> on 2015/08/07 08:47:08 UTC

one worker per machine per topology, is it still recommended?

 I read articles about this before suggesting this.
 
And I tried to run a topology in 2 workers in a same node, but performance was bad at that moment, even though resource usage was not that high.
(Sorry that I cannot mention exact metrics since I didn't keep it)
 
I remember that this is because inter process communication is expensive.
 
This recommendation is still valid and should I use storm that way?

Re: one worker per machine per topology, is it still recommended?

Posted by Kishore Senji <ks...@gmail.com>.

The two recommendations 1) to have more workers on Supervisor (for bulkhead
pattern) and 2) one worker/topology/machine - creates some confusion I
guess whether we would not be using resources to the fullest if we have
free slots in the Cluster.

If we are able to use more than one worker/topology/machine itself implies
there are some free slots in the cluster and I guess why not use them when
they are available?

But because of IPC, it makes sense to not use more than one
worker/topology/machine, instead make the parallelism a multiple of the
number of workers. For example, in a 3 supervisor and 2 workers/supervisor
cluster, we will have 6 slots. If the topology needs 4 workers, it will use
two slots on one node for the same topology. Instead of this, we can only
use 3 workers and adjust the parallelism appropriately for 3 workers. If a
bolt parallelism is 4 with 4 workers, it will become 6 (rounded up) with 3
workers. Instead of one executor/worker in the 4 workers scenario, it will
use 2 executors/worker with 3 workers. The worker process which is running
the topology will consume more cpu taking the share from the other worker
process on the node, and since the slot is free anyways even though there
is overhead due to an extra process, the worker process running the
topology can make use of the extra cpu. Similar is the case of the Heap
space, we can over subscribe for the workers so that they can expand if
they can. Later if another topology is deployed the other worker slot also
will be filled and depending upon if the resources are over subscribed, we
might have to scale the cluster horizontally and adjust the workers used
per topology appropriately.

On Fri, Aug 7, 2015 at 3:48 AM, 임정택 <ka...@gmail.com> wrote:

> In addition to Matthias's opinion,
> it's true that inter-process communication cost is high, since it
> also skips serialization / deserialization when destination is same process.
> And having less workers reduce ZK load.
>
> But you need to configure more heap memory to each worker when using less
> workers, and without proper GC tuning it may incur long STW.
>
> Best,
> Jungtaek Lim (HeartSaVioR)
>
> 2015년 8월 7일 금요일, Matthias J. Sax<mj...@informatik.hu-berlin.de>님이 작성한 메시지:
>
> IMHO, it's a question about fault-tolerance.
>>
>> If you have a single worker per node per topology, the impact in failure
>> case (ie, rack going down) on a topology is low. Of course, all
>> topologies using this failure rack are effected.
>>
>> If you use multiple workers for a single topology on the same
>> supervisor, the impact is high. In fact, if you use a single supervisor,
>> the whole topology goes down. On the other hand, it might effect less
>> topologies...
>>
>> Thus, it's a tradeoff you need to consider by yourself.
>>
>>
>> -Matthias
>>
>> On 08/07/2015 09:24 AM, Denis DEBARBIEUX wrote:
>> > Hi,
>> >
>> > I am always doing like that. I will be interested if this invariant had
>> > been updated
>> >
>> > Denis
>> >
>> > Le 07/08/2015 08:47, 이승진 a écrit :
>> >>
>> >> I read articles about this before suggesting this.
>> >>
>> >>
>> >>
>> >> And I tried to run a topology in 2 workers in a same node, but
>> >> performance was bad at that moment, even though resource usage was not
>> >> that high.
>> >>
>> >> (Sorry that I cannot mention exact metrics since I didn't keep it)
>> >>
>> >>
>> >>
>> >> I remember that this is because inter process communication is
>> expensive.
>> >>
>> >>
>> >>
>> >> This recommendation is still valid and should I use storm that way?
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> > ------------------------------------------------------------------------
>> > Avast logo <https://www.avast.com/antivirus>
>> >
>> > L'absence de virus dans ce courrier électronique a été vérifiée par le
>> > logiciel antivirus Avast.
>> > www.avast.com <https://www.avast.com/antivirus>
>> >
>> >
>>
>>
>
> --
> Name : 임 정택
> Blog : http://www.heartsavior.net / http://dev.heartsavior.net
> Twitter : http://twitter.com/heartsavior
> LinkedIn : http://www.linkedin.com/in/heartsavior
>
>

Re: one worker per machine per topology, is it still recommended?

Posted by 임정택 <ka...@gmail.com>.

In addition to Matthias's opinion,
it's true that inter-process communication cost is high, since it
also skips serialization / deserialization when destination is same process.
And having less workers reduce ZK load.

But you need to configure more heap memory to each worker when using less
workers, and without proper GC tuning it may incur long STW.

Best,
Jungtaek Lim (HeartSaVioR)

2015년 8월 7일 금요일, Matthias J. Sax<mj...@informatik.hu-berlin.de>님이 작성한 메시지:

> IMHO, it's a question about fault-tolerance.
>
> If you have a single worker per node per topology, the impact in failure
> case (ie, rack going down) on a topology is low. Of course, all
> topologies using this failure rack are effected.
>
> If you use multiple workers for a single topology on the same
> supervisor, the impact is high. In fact, if you use a single supervisor,
> the whole topology goes down. On the other hand, it might effect less
> topologies...
>
> Thus, it's a tradeoff you need to consider by yourself.
>
>
> -Matthias
>
> On 08/07/2015 09:24 AM, Denis DEBARBIEUX wrote:
> > Hi,
> >
> > I am always doing like that. I will be interested if this invariant had
> > been updated
> >
> > Denis
> >
> > Le 07/08/2015 08:47, 이승진 a écrit :
> >>
> >> I read articles about this before suggesting this.
> >>
> >>
> >>
> >> And I tried to run a topology in 2 workers in a same node, but
> >> performance was bad at that moment, even though resource usage was not
> >> that high.
> >>
> >> (Sorry that I cannot mention exact metrics since I didn't keep it)
> >>
> >>
> >>
> >> I remember that this is because inter process communication is
> expensive.
> >>
> >>
> >>
> >> This recommendation is still valid and should I use storm that way?
> >>
> >>
> >>
> >
> >
> >
> > ------------------------------------------------------------------------
> > Avast logo <https://www.avast.com/antivirus>
> >
> > L'absence de virus dans ce courrier électronique a été vérifiée par le
> > logiciel antivirus Avast.
> > www.avast.com <https://www.avast.com/antivirus>
> >
> >
>
>

-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

Re: one worker per machine per topology, is it still recommended?

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.

IMHO, it's a question about fault-tolerance.

If you have a single worker per node per topology, the impact in failure
case (ie, rack going down) on a topology is low. Of course, all
topologies using this failure rack are effected.

If you use multiple workers for a single topology on the same
supervisor, the impact is high. In fact, if you use a single supervisor,
the whole topology goes down. On the other hand, it might effect less
topologies...

Thus, it's a tradeoff you need to consider by yourself.

-Matthias

On 08/07/2015 09:24 AM, Denis DEBARBIEUX wrote:
> Hi,
> 
> I am always doing like that. I will be interested if this invariant had
> been updated
> 
> Denis
> 
> Le 07/08/2015 08:47, 이승진 a écrit :
>>
>> I read articles about this before suggesting this.
>>
>>  
>>
>> And I tried to run a topology in 2 workers in a same node, but
>> performance was bad at that moment, even though resource usage was not
>> that high.
>>
>> (Sorry that I cannot mention exact metrics since I didn't keep it)
>>
>>  
>>
>> I remember that this is because inter process communication is expensive.
>>
>>  
>>
>> This recommendation is still valid and should I use storm that way?
>>
>>  
>>
> 
> 
> 
> ------------------------------------------------------------------------
> Avast logo <https://www.avast.com/antivirus> 	
> 
> L'absence de virus dans ce courrier électronique a été vérifiée par le
> logiciel antivirus Avast.
> www.avast.com <https://www.avast.com/antivirus>
> 
>

Re: one worker per machine per topology, is it still recommended?

Posted by Denis DEBARBIEUX <dd...@norsys.fr>.

Hi,

I am always doing like that. I will be interested if this invariant had 
been updated

Denis

Le 07/08/2015 08:47, 이승진 a écrit :
>
> I read articles about this before suggesting this.
>
> And I tried to run a topology in 2 workers in a same node, but 
> performance was bad at that moment, even though resource usage was not 
> that high.
>
> (Sorry that I cannot mention exact metrics since I didn't keep it)
>
> I remember that this is because inter process communication is expensive.
>
> This recommendation is still valid and should I use storm that way?
>



---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus