You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Vincenzo Gulisano <vi...@gmail.com> on 2014/03/30 18:35:33 UTC

Why more than 1 worker per supervisor?

Hi,
If you have: (1) a topology composed by a certain number of spouts and
bolts, (2) each task assigned to a single executor and (3) a multi-core
machine that acts as supervisor, how many workers should you define for the
latter?

If my understanding is correct, you will have distinct threads running each
spout and bolt independently of the number of workers you span.
Also, the inter-worker communication between tasks will be slower than the
intra-worker one, isn't it?

Is there any reason related to the application throughput for having
multiple workers?

Thank you very much!

Re: Why more than 1 worker per supervisor?

Posted by Jon Logan <jm...@buffalo.edu>.
The actual CPU utilization is determined by contention for the processor
time by the operating system. While it's likely that T1 would use more
resources, it's not guaranteed. The workers could be running at different
scheduling priority levels, or one worker may be blocking on IO or
something.


Additionally, you can set the number of tasks independently from the number
of threads. So one worker could have 100 tasks, but be running them on only
2 threads.


On Sun, Mar 30, 2014 at 2:01 PM, Vincenzo Gulisano <
vincenzo.gulisano@gmail.com> wrote:

> Hi Jon, thank you for your answer!
>
> I imagined fault tolerance and number of topologies would play a role wrt
> to the number of workers.
>
> I hoped the number or workers could be used to limit the resource
> utilization of a topology for a given supervisor. That is, if a supervisor
> node has 4 workers and the topology runs on 1, then the topology will use
> 1/4 of the supervisor resources (e.g., 1 out of 4 cores). Since the number
> of executors/threads is independent of the number of workers, that will not
> be the case, isn't it?
>
> As an example:
> 1) 1 supervisor
> 2) 4 topologies T1 (10 tasks), T2 (1 task), T3 (1 task) and T4 (1 task),
> 3) 4 workers (each topology assigned to 1 distinct worker)
> and all tasks perform the same amount of work.
> At their maximum throughput, topology T1 will consume 10/13 of the
> supervisor resources.
> If this correct?
>
> Thanks again!
>
>
> On 30 March 2014 19:11, Jon Logan <jm...@buffalo.edu> wrote:
>
>> All tasks on a worker run in the same JVM. This can be good for
>> performance in some cases (like a localShuffle), but can cause issues. If
>> you have 100 tasks, and one of them runs wild, and crashes, that will take
>> down the entire JVM. Similarly for memory -- if you run out of memory, it
>> can take out more.
>>
>> Some applications also run into issues co-existing multiple instances in
>> the same JVM (usually relying on static variables).
>>
>>
>> Additionally, workers cannot be shared across different topologies. So if
>> you only have one worker per machine, you can't run multiple topologies on
>> that machine. There's no real good rule of thumb for number of workers vs
>> size of workers. It's all application-dependent.
>>
>>
>> On Sun, Mar 30, 2014 at 12:35 PM, Vincenzo Gulisano <
>> vincenzo.gulisano@gmail.com> wrote:
>>
>>> Hi,
>>> If you have: (1) a topology composed by a certain number of spouts and
>>> bolts, (2) each task assigned to a single executor and (3) a multi-core
>>> machine that acts as supervisor, how many workers should you define for the
>>> latter?
>>>
>>> If my understanding is correct, you will have distinct threads running
>>> each spout and bolt independently of the number of workers you span.
>>> Also, the inter-worker communication between tasks will be slower than
>>> the intra-worker one, isn't it?
>>>
>>> Is there any reason related to the application throughput for having
>>> multiple workers?
>>>
>>> Thank you very much!
>>>
>>
>>
>

Re: Why more than 1 worker per supervisor?

Posted by Vincenzo Gulisano <vi...@gmail.com>.
Hi Jon, thank you for your answer!

I imagined fault tolerance and number of topologies would play a role wrt
to the number of workers.

I hoped the number or workers could be used to limit the resource
utilization of a topology for a given supervisor. That is, if a supervisor
node has 4 workers and the topology runs on 1, then the topology will use
1/4 of the supervisor resources (e.g., 1 out of 4 cores). Since the number
of executors/threads is independent of the number of workers, that will not
be the case, isn't it?

As an example:
1) 1 supervisor
2) 4 topologies T1 (10 tasks), T2 (1 task), T3 (1 task) and T4 (1 task),
3) 4 workers (each topology assigned to 1 distinct worker)
and all tasks perform the same amount of work.
At their maximum throughput, topology T1 will consume 10/13 of the
supervisor resources.
If this correct?

Thanks again!


On 30 March 2014 19:11, Jon Logan <jm...@buffalo.edu> wrote:

> All tasks on a worker run in the same JVM. This can be good for
> performance in some cases (like a localShuffle), but can cause issues. If
> you have 100 tasks, and one of them runs wild, and crashes, that will take
> down the entire JVM. Similarly for memory -- if you run out of memory, it
> can take out more.
>
> Some applications also run into issues co-existing multiple instances in
> the same JVM (usually relying on static variables).
>
>
> Additionally, workers cannot be shared across different topologies. So if
> you only have one worker per machine, you can't run multiple topologies on
> that machine. There's no real good rule of thumb for number of workers vs
> size of workers. It's all application-dependent.
>
>
> On Sun, Mar 30, 2014 at 12:35 PM, Vincenzo Gulisano <
> vincenzo.gulisano@gmail.com> wrote:
>
>> Hi,
>> If you have: (1) a topology composed by a certain number of spouts and
>> bolts, (2) each task assigned to a single executor and (3) a multi-core
>> machine that acts as supervisor, how many workers should you define for the
>> latter?
>>
>> If my understanding is correct, you will have distinct threads running
>> each spout and bolt independently of the number of workers you span.
>> Also, the inter-worker communication between tasks will be slower than
>> the intra-worker one, isn't it?
>>
>> Is there any reason related to the application throughput for having
>> multiple workers?
>>
>> Thank you very much!
>>
>
>

Re: Why more than 1 worker per supervisor?

Posted by Jon Logan <jm...@buffalo.edu>.
All tasks on a worker run in the same JVM. This can be good for performance
in some cases (like a localShuffle), but can cause issues. If you have 100
tasks, and one of them runs wild, and crashes, that will take down the
entire JVM. Similarly for memory -- if you run out of memory, it can take
out more.

Some applications also run into issues co-existing multiple instances in
the same JVM (usually relying on static variables).


Additionally, workers cannot be shared across different topologies. So if
you only have one worker per machine, you can't run multiple topologies on
that machine. There's no real good rule of thumb for number of workers vs
size of workers. It's all application-dependent.


On Sun, Mar 30, 2014 at 12:35 PM, Vincenzo Gulisano <
vincenzo.gulisano@gmail.com> wrote:

> Hi,
> If you have: (1) a topology composed by a certain number of spouts and
> bolts, (2) each task assigned to a single executor and (3) a multi-core
> machine that acts as supervisor, how many workers should you define for the
> latter?
>
> If my understanding is correct, you will have distinct threads running
> each spout and bolt independently of the number of workers you span.
> Also, the inter-worker communication between tasks will be slower than the
> intra-worker one, isn't it?
>
> Is there any reason related to the application throughput for having
> multiple workers?
>
> Thank you very much!
>