You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Alok Kumbhare <ku...@usc.edu> on 2014/07/16 02:36:50 UTC

Storm Parallelism/Scaling

Hi,
I have a few questions regarding the parallelism in Storm.

I have gone through the documentation at
http://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
but I have some confusion regarding executors and tasks.

>From what I understand (to simplify, assume there is only one topology
running):

Worker processes runs executor threads (one or more executor per worker).
Since each executor is an independent thread, they run in parallel.

Now the documentation says "each Executor, runs one or more task of same
component". My question is whether the tasks run in parallel or in
sequence? What does it mean to have multiple tasks of the same component
per executor?


Further, the documentation mentions: "The number of tasks for a component
is always the same throughout the lifetime of a topology, but the number of
executors (threads) for a component can change over time."

I don't understand this part. Assuming the default configuration where each
executor runs one task (i.e. number of tasks is equal to the number of
executors) and I run the topology with some initial configuration. Now, if
I use the "rebalance" command and increase the number of executors, what
happens to the number of tasks?

Please help.

Thanks,
Alok

Re: Storm Parallelism/Scaling

Posted by David DIDIER <ci...@gmail.com>.
Hi,
What I understood is that basically you set the number of tasks when the
topology starts and that cannot be changed later. If there are N tasks for
T1 threads with N > T1, that means that the tasks are indeed running
sequentially on at least some threads. When you add more power (nodes) and
rebalance the topology, let's say you have now T2 > T1 threads, you are
actually increasing the available cluster processing power since there will
be more tasks running in parallel. This limitation comes from the fact that
the "by field" grouping needs a never changing set of bolts so the same
field value will always be directed to the same bolt (task) : the "routes"
are calculated at starting time. I'm just wondering if this could be lifted
for other shuffle type... ?
If tasks = executors then you add more nodes and rebalance : some executors
will be idle.
Conclusion : set the right number of tasks at the beginning, since this is
the max parallel processing units that you will get throughout the topology
lifetime.
David


2014-07-16 2:36 GMT+02:00 Alok Kumbhare <ku...@usc.edu>:

> Hi,
> I have a few questions regarding the parallelism in Storm.
>
> I have gone through the documentation at
> http://storm.incubator.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
> but I have some confusion regarding executors and tasks.
>
> From what I understand (to simplify, assume there is only one topology
> running):
>
> Worker processes runs executor threads (one or more executor per worker).
> Since each executor is an independent thread, they run in parallel.
>
> Now the documentation says "each Executor, runs one or more task of same
> component". My question is whether the tasks run in parallel or in
> sequence? What does it mean to have multiple tasks of the same component
> per executor?
>
>
> Further, the documentation mentions: "The number of tasks for a component
> is always the same throughout the lifetime of a topology, but the number of
> executors (threads) for a component can change over time."
>
> I don't understand this part. Assuming the default configuration where
> each executor runs one task (i.e. number of tasks is equal to the number of
> executors) and I run the topology with some initial configuration. Now, if
> I use the "rebalance" command and increase the number of executors, what
> happens to the number of tasks?
>
> Please help.
>
> Thanks,
> Alok
>