You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Adrien Carreira <ac...@reportlinker.com> on 2016/06/09 12:22:06 UTC

Another parallelism question

Hi,

After a month building a topology on storm. I've one question about
parallelism that I can't answer.

I've developed my topology and tested on a cluster with two nodes.

My parallelism_hint are ok, everything are fine.

My question is, if I need to scale the number of worker in the topology to
have more worker dooing the same thing how can I achieve that without
kill/restart the topology

Thanks for your reply

Re: Another parallelism question

Posted by "Matthias J. Sax" <mj...@apache.org>.
If you rebalance to 2 workers via

$ storm rebalance topology-name -n new-num-workers

the number of thread/executors is not changed. Thus, one worker will
execute one thread, while the other will execute two. Each thread, still
executes 2 tasks each.

You can also change the number of executors using "-e" flag in rebalance
command.

However, your desired behavior to have one task if you are on one
machine, and two tasks if you rebalance to two machines is not possible.
The number of tasks cannot be change dynamically -- you would need to
kill and redeploy your topology to get this behavior.

However, having more tasks does not result in measurable overhead. So
why should it be problem to have more tasks?


-Matthias


On 06/15/2016 09:34 AM, Adrien Carreira wrote:
> I think I understood that.
> 
> But, In my example :
> 
> 1 machine on cluster with this basic topology and with 1 worker on conf
> 
> builder.setBolt("fetcher", new Fetch()).setNumTasks(2).shuffleGrouping("spout");
> 
> builder.setBolt("extract", new Extract()).setNumTasks(2).shuffleGrouping("fetcher");
> 
> builder.setBolt("indexer", new Indexer()).setNumTasks(2).shuffleGrouping("extract");
> 
> Storm will spawn on 1 worker, 3 thread with 6 task. I'm right ?
> 
> Then, If I rebalance to 2 worker, I will have 6 thread for tasks.
> 
> I'm still right ?
> 
> My Problem is : to scale up I understood that I need to set the numTasks
> to a bigger value, but It will spawn more task than I want... I only
> want One task when I've one machine, two when I've two machine, etc, etc....
> 
> Hope I'm clear
> 
> 
> 2016-06-09 16:27 GMT+02:00 Matthias J. Sax <mjsax@apache.org
> <ma...@apache.org>>:
> 
>     See here:
> 
>     https://stackoverflow.com/questions/31932573/rebalancing-executors-in-apache-storm/31941796#31941796
> 
>     https://stackoverflow.com/questions/20371073/how-to-tune-the-parallelism-hint-in-storm
> 
>     http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
> 
> 
>     -Matthias
> 
> 
>     On 06/09/2016 03:41 PM, Nathan Leung wrote:
>     > At that point you have to think about what makes sense for your system
>     > right now.  For example, maybe it makes sense to have # tasks = 4 times
>     > what you need right now, and then reload the topology when you outgrow that.
>     >
>     > Alternatively, you can consider bringing up a larger replacement
>     > topology, and then killing the older one.  In this case you will have to
>     > be more careful with names, and possibly things like resource (worker)
>     > allocation.
>     >
>     > On Thu, Jun 9, 2016 at 9:30 AM, Adrien Carreira <aca@reportlinker.com <ma...@reportlinker.com>
>     > <mailto:aca@reportlinker.com <ma...@reportlinker.com>>> wrote:
>     >
>     >     So let's say one day I would like to have 100 machine,
>     >
>     >     I should set 100 on setNumTask ?
>     >
>     >     2016-06-09 15:20 GMT+02:00 Nathan Leung <ncleung@gmail.com <ma...@gmail.com>
>     >     <mailto:ncleung@gmail.com <ma...@gmail.com>>>:
>     >
>     >         You can create your topology with more tasks than executors,
>     >         then when the rebalance happens you can add executors.  However
>     >         at the moment you cannot add more tasks to a running topology.
>     >
>     >         On Thu, Jun 9, 2016 at 8:58 AM, Adrien Carreira
>     >         <aca@reportlinker.com <ma...@reportlinker.com>
>     <mailto:aca@reportlinker.com <ma...@reportlinker.com>>> wrote:
>     >
>     >             I've just create a topology like this :
>     >
>     >             builder.setBolt("fetcher", new Fetch())
>     >             .shuffleGrouping("spout");
>     >
>     >             builder.setBolt("extract", new Extract())
>     >             .shuffleGrouping("fetcher");
>     >
>     >             builder.setBolt("indexer", new Indexer())
>     >             .shuffleGrouping("extract");
>     >
>     >
>     >             Means that I've three bolt with One Worker and
>     >             parrallelism_hint of 1.
>     >
>     >             Now, Let's say that I've another machine available, or
>     that
>     >             I've too many tuple to process and I need another machine.
>     >
>     >
>     >             I've executed this command :
>     >
>     >             storm rebalance kairos-who -n 2 -e indexer=2 -e
>     fetcher=2 -e
>     >             extract=2
>     >
>     >
>     >             But what I've is two worker with :
>     >
>     >             worker 1 => Spout + extract
>     >
>     >             worker 2 => fetcher + indexer
>     >
>     >
>     >             What I would love :
>     >
>     >             Worker 1 => Spout + fetcher + extract + indexer
>     >
>     >             Worker 2 => Same...
>     >
>     >
>     >             I hope I'm clear...
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >             2016-06-09 14:47 GMT+02:00 Andrew Xor
>     >             <andreas.grammenos@gmail.com
>     <ma...@gmail.com>
>     >             <mailto:andreas.grammenos@gmail.com
>     <ma...@gmail.com>>>:
>     >
>     >                 Hello,
>     >
>     >                   I am sorry, but I don't know why you cannot emulate
>     >                 those scale up factors by using rebalance; after all it
>     >                 spawns the requested amount of workers (in topology) and
>     >                 executors (in spouts/bolts) only bounded by the
>     >                 topology_max_task_parallelism. Have you read the article
>     >                 in order to understand how parallelism works in storm?
>     >
>     >                 Regards.
>     >
>     >                 On Thu, Jun 9, 2016 at 3:34 PM, Adrien Carreira
>     >                 <aca@reportlinker.com <ma...@reportlinker.com>
>     <mailto:aca@reportlinker.com <ma...@reportlinker.com>>> wrote:
>     >
>     >                     Yes,
>     >
>     >                     But the rebalance command doesn't do what I would like.
>     >
>     >
>     >                     Let's suppose that I've :
>     >
>     >                     SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)
>     >
>     >                     (number is the parallelism hint)
>     >                     It means that If I scale to n worker I would like :
>     >
>     >                     SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) =>
>     >                     BOLT3 (3*n)
>     >
>     >
>     >                     But, the storm rebalance keeps the parralisme_hint :/
>     >
>     >
>     >
>     >                     2016-06-09 14:29 GMT+02:00 Andrew Xor
>     >                     <andreas.grammenos@gmail.com <ma...@gmail.com>
>     >                     <mailto:andreas.grammenos@gmail.com
>     <ma...@gmail.com>>>:
>     >
>     >                         Hello,
>     >
>     >                          Why not use the rebalance command? It's well
>     >                         documented here
>     >                       
>      <http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html>.
>     >
>     >                         Regards.
>     >
>     >                         On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira
>     >                         <aca@reportlinker.com <ma...@reportlinker.com>
>     >                         <mailto:aca@reportlinker.com
>     <ma...@reportlinker.com>>> wrote:
>     >
>     >                             Hi,
>     >
>     >                             After a month building a topology on
>     storm.
>     >                             I've one question about parallelism that I
>     >                             can't answer.
>     >
>     >                             I've developed my topology and tested on a
>     >                             cluster with two nodes.
>     >
>     >                             My parallelism_hint are ok, everything
>     are fine.
>     >
>     >                             My question is, if I need to scale the
>     >                             number of worker in the topology to have
>     >                             more worker dooing the same thing how
>     can I
>     >                             achieve that without kill/restart the
>     topology
>     >
>     >                             Thanks for your reply
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
> 
> 


Re: Another parallelism question

Posted by Satish Duggana <sa...@gmail.com>.
You need to rebalance topology with desired no of executors for respective
bolts in a topology.

Hope it helps,
Satish.

On Wed, Jun 15, 2016 at 1:04 PM, Adrien Carreira <ac...@reportlinker.com>
wrote:

> I think I understood that.
>
> But, In my example :
>
> 1 machine on cluster with this basic topology and with 1 worker on conf
>
> builder.setBolt("fetcher", new Fetch()).setNumTasks(2).shuffleGrouping("spout");
>
> builder.setBolt("extract", new Extract()).setNumTasks(2).shuffleGrouping("fetcher");
>
> builder.setBolt("indexer", new Indexer()) .setNumTasks(2).shuffleGrouping("extract");
>
> Storm will spawn on 1 worker, 3 thread with 6 task. I'm right ?
>
> Then, If I rebalance to 2 worker, I will have 6 thread for tasks.
>
> I'm still right ?
>
> My Problem is : to scale up I understood that I need to set the numTasks
> to a bigger value, but It will spawn more task than I want... I only want
> One task when I've one machine, two when I've two machine, etc, etc....
>
> Hope I'm clear
>
>
> 2016-06-09 16:27 GMT+02:00 Matthias J. Sax <mj...@apache.org>:
>
>> See here:
>>
>>
>> https://stackoverflow.com/questions/31932573/rebalancing-executors-in-apache-storm/31941796#31941796
>>
>>
>> https://stackoverflow.com/questions/20371073/how-to-tune-the-parallelism-hint-in-storm
>>
>>
>> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
>>
>>
>> -Matthias
>>
>>
>> On 06/09/2016 03:41 PM, Nathan Leung wrote:
>> > At that point you have to think about what makes sense for your system
>> > right now.  For example, maybe it makes sense to have # tasks = 4 times
>> > what you need right now, and then reload the topology when you outgrow
>> that.
>> >
>> > Alternatively, you can consider bringing up a larger replacement
>> > topology, and then killing the older one.  In this case you will have to
>> > be more careful with names, and possibly things like resource (worker)
>> > allocation.
>> >
>> > On Thu, Jun 9, 2016 at 9:30 AM, Adrien Carreira <aca@reportlinker.com
>> > <ma...@reportlinker.com>> wrote:
>> >
>> >     So let's say one day I would like to have 100 machine,
>> >
>> >     I should set 100 on setNumTask ?
>> >
>> >     2016-06-09 15:20 GMT+02:00 Nathan Leung <ncleung@gmail.com
>> >     <ma...@gmail.com>>:
>> >
>> >         You can create your topology with more tasks than executors,
>> >         then when the rebalance happens you can add executors.  However
>> >         at the moment you cannot add more tasks to a running topology.
>> >
>> >         On Thu, Jun 9, 2016 at 8:58 AM, Adrien Carreira
>> >         <aca@reportlinker.com <ma...@reportlinker.com>> wrote:
>> >
>> >             I've just create a topology like this :
>> >
>> >             builder.setBolt("fetcher", new Fetch())
>> >             .shuffleGrouping("spout");
>> >
>> >             builder.setBolt("extract", new Extract())
>> >             .shuffleGrouping("fetcher");
>> >
>> >             builder.setBolt("indexer", new Indexer())
>> >             .shuffleGrouping("extract");
>> >
>> >
>> >             Means that I've three bolt with One Worker and
>> >             parrallelism_hint of 1.
>> >
>> >             Now, Let's say that I've another machine available, or that
>> >             I've too many tuple to process and I need another machine.
>> >
>> >
>> >             I've executed this command :
>> >
>> >             storm rebalance kairos-who -n 2 -e indexer=2 -e fetcher=2 -e
>> >             extract=2
>> >
>> >
>> >             But what I've is two worker with :
>> >
>> >             worker 1 => Spout + extract
>> >
>> >             worker 2 => fetcher + indexer
>> >
>> >
>> >             What I would love :
>> >
>> >             Worker 1 => Spout + fetcher + extract + indexer
>> >
>> >             Worker 2 => Same...
>> >
>> >
>> >             I hope I'm clear...
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >             2016-06-09 14:47 GMT+02:00 Andrew Xor
>> >             <andreas.grammenos@gmail.com
>> >             <ma...@gmail.com>>:
>> >
>> >                 Hello,
>> >
>> >                   I am sorry, but I don't know why you cannot emulate
>> >                 those scale up factors by using rebalance; after all it
>> >                 spawns the requested amount of workers (in topology) and
>> >                 executors (in spouts/bolts) only bounded by the
>> >                 topology_max_task_parallelism. Have you read the article
>> >                 in order to understand how parallelism works in storm?
>> >
>> >                 Regards.
>> >
>> >                 On Thu, Jun 9, 2016 at 3:34 PM, Adrien Carreira
>> >                 <aca@reportlinker.com <ma...@reportlinker.com>>
>> wrote:
>> >
>> >                     Yes,
>> >
>> >                     But the rebalance command doesn't do what I would
>> like.
>> >
>> >
>> >                     Let's suppose that I've :
>> >
>> >                     SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)
>> >
>> >                     (number is the parallelism hint)
>> >                     It means that If I scale to n worker I would like :
>> >
>> >                     SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) =>
>> >                     BOLT3 (3*n)
>> >
>> >
>> >                     But, the storm rebalance keeps the parralisme_hint
>> :/
>> >
>> >
>> >
>> >                     2016-06-09 14:29 GMT+02:00 Andrew Xor
>> >                     <andreas.grammenos@gmail.com
>> >                     <ma...@gmail.com>>:
>> >
>> >                         Hello,
>> >
>> >                          Why not use the rebalance command? It's well
>> >                         documented here
>> >                         <
>> http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html
>> >.
>> >
>> >                         Regards.
>> >
>> >                         On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira
>> >                         <aca@reportlinker.com
>> >                         <ma...@reportlinker.com>> wrote:
>> >
>> >                             Hi,
>> >
>> >                             After a month building a topology on storm.
>> >                             I've one question about parallelism that I
>> >                             can't answer.
>> >
>> >                             I've developed my topology and tested on a
>> >                             cluster with two nodes.
>> >
>> >                             My parallelism_hint are ok, everything are
>> fine.
>> >
>> >                             My question is, if I need to scale the
>> >                             number of worker in the topology to have
>> >                             more worker dooing the same thing how can I
>> >                             achieve that without kill/restart the
>> topology
>> >
>> >                             Thanks for your reply
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>

Re: Another parallelism question

Posted by Adrien Carreira <ac...@reportlinker.com>.
I think I understood that.

But, In my example :

1 machine on cluster with this basic topology and with 1 worker on conf

builder.setBolt("fetcher", new Fetch()).setNumTasks(2).shuffleGrouping("spout");

builder.setBolt("extract", new
Extract()).setNumTasks(2).shuffleGrouping("fetcher");

builder.setBolt("indexer", new Indexer())
.setNumTasks(2).shuffleGrouping("extract");

Storm will spawn on 1 worker, 3 thread with 6 task. I'm right ?

Then, If I rebalance to 2 worker, I will have 6 thread for tasks.

I'm still right ?

My Problem is : to scale up I understood that I need to set the numTasks to
a bigger value, but It will spawn more task than I want... I only want One
task when I've one machine, two when I've two machine, etc, etc....

Hope I'm clear


2016-06-09 16:27 GMT+02:00 Matthias J. Sax <mj...@apache.org>:

> See here:
>
>
> https://stackoverflow.com/questions/31932573/rebalancing-executors-in-apache-storm/31941796#31941796
>
>
> https://stackoverflow.com/questions/20371073/how-to-tune-the-parallelism-hint-in-storm
>
>
> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
>
>
> -Matthias
>
>
> On 06/09/2016 03:41 PM, Nathan Leung wrote:
> > At that point you have to think about what makes sense for your system
> > right now.  For example, maybe it makes sense to have # tasks = 4 times
> > what you need right now, and then reload the topology when you outgrow
> that.
> >
> > Alternatively, you can consider bringing up a larger replacement
> > topology, and then killing the older one.  In this case you will have to
> > be more careful with names, and possibly things like resource (worker)
> > allocation.
> >
> > On Thu, Jun 9, 2016 at 9:30 AM, Adrien Carreira <aca@reportlinker.com
> > <ma...@reportlinker.com>> wrote:
> >
> >     So let's say one day I would like to have 100 machine,
> >
> >     I should set 100 on setNumTask ?
> >
> >     2016-06-09 15:20 GMT+02:00 Nathan Leung <ncleung@gmail.com
> >     <ma...@gmail.com>>:
> >
> >         You can create your topology with more tasks than executors,
> >         then when the rebalance happens you can add executors.  However
> >         at the moment you cannot add more tasks to a running topology.
> >
> >         On Thu, Jun 9, 2016 at 8:58 AM, Adrien Carreira
> >         <aca@reportlinker.com <ma...@reportlinker.com>> wrote:
> >
> >             I've just create a topology like this :
> >
> >             builder.setBolt("fetcher", new Fetch())
> >             .shuffleGrouping("spout");
> >
> >             builder.setBolt("extract", new Extract())
> >             .shuffleGrouping("fetcher");
> >
> >             builder.setBolt("indexer", new Indexer())
> >             .shuffleGrouping("extract");
> >
> >
> >             Means that I've three bolt with One Worker and
> >             parrallelism_hint of 1.
> >
> >             Now, Let's say that I've another machine available, or that
> >             I've too many tuple to process and I need another machine.
> >
> >
> >             I've executed this command :
> >
> >             storm rebalance kairos-who -n 2 -e indexer=2 -e fetcher=2 -e
> >             extract=2
> >
> >
> >             But what I've is two worker with :
> >
> >             worker 1 => Spout + extract
> >
> >             worker 2 => fetcher + indexer
> >
> >
> >             What I would love :
> >
> >             Worker 1 => Spout + fetcher + extract + indexer
> >
> >             Worker 2 => Same...
> >
> >
> >             I hope I'm clear...
> >
> >
> >
> >
> >
> >
> >
> >             2016-06-09 14:47 GMT+02:00 Andrew Xor
> >             <andreas.grammenos@gmail.com
> >             <ma...@gmail.com>>:
> >
> >                 Hello,
> >
> >                   I am sorry, but I don't know why you cannot emulate
> >                 those scale up factors by using rebalance; after all it
> >                 spawns the requested amount of workers (in topology) and
> >                 executors (in spouts/bolts) only bounded by the
> >                 topology_max_task_parallelism. Have you read the article
> >                 in order to understand how parallelism works in storm?
> >
> >                 Regards.
> >
> >                 On Thu, Jun 9, 2016 at 3:34 PM, Adrien Carreira
> >                 <aca@reportlinker.com <ma...@reportlinker.com>>
> wrote:
> >
> >                     Yes,
> >
> >                     But the rebalance command doesn't do what I would
> like.
> >
> >
> >                     Let's suppose that I've :
> >
> >                     SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)
> >
> >                     (number is the parallelism hint)
> >                     It means that If I scale to n worker I would like :
> >
> >                     SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) =>
> >                     BOLT3 (3*n)
> >
> >
> >                     But, the storm rebalance keeps the parralisme_hint :/
> >
> >
> >
> >                     2016-06-09 14:29 GMT+02:00 Andrew Xor
> >                     <andreas.grammenos@gmail.com
> >                     <ma...@gmail.com>>:
> >
> >                         Hello,
> >
> >                          Why not use the rebalance command? It's well
> >                         documented here
> >                         <
> http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html
> >.
> >
> >                         Regards.
> >
> >                         On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira
> >                         <aca@reportlinker.com
> >                         <ma...@reportlinker.com>> wrote:
> >
> >                             Hi,
> >
> >                             After a month building a topology on storm.
> >                             I've one question about parallelism that I
> >                             can't answer.
> >
> >                             I've developed my topology and tested on a
> >                             cluster with two nodes.
> >
> >                             My parallelism_hint are ok, everything are
> fine.
> >
> >                             My question is, if I need to scale the
> >                             number of worker in the topology to have
> >                             more worker dooing the same thing how can I
> >                             achieve that without kill/restart the
> topology
> >
> >                             Thanks for your reply
> >
> >
> >
> >
> >
> >
> >
> >
>
>

Re: Another parallelism question

Posted by "Matthias J. Sax" <mj...@apache.org>.
See here:

https://stackoverflow.com/questions/31932573/rebalancing-executors-in-apache-storm/31941796#31941796

https://stackoverflow.com/questions/20371073/how-to-tune-the-parallelism-hint-in-storm

http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/


-Matthias


On 06/09/2016 03:41 PM, Nathan Leung wrote:
> At that point you have to think about what makes sense for your system
> right now.  For example, maybe it makes sense to have # tasks = 4 times
> what you need right now, and then reload the topology when you outgrow that.
> 
> Alternatively, you can consider bringing up a larger replacement
> topology, and then killing the older one.  In this case you will have to
> be more careful with names, and possibly things like resource (worker)
> allocation.
> 
> On Thu, Jun 9, 2016 at 9:30 AM, Adrien Carreira <aca@reportlinker.com
> <ma...@reportlinker.com>> wrote:
> 
>     So let's say one day I would like to have 100 machine, 
> 
>     I should set 100 on setNumTask ?
> 
>     2016-06-09 15:20 GMT+02:00 Nathan Leung <ncleung@gmail.com
>     <ma...@gmail.com>>:
> 
>         You can create your topology with more tasks than executors,
>         then when the rebalance happens you can add executors.  However
>         at the moment you cannot add more tasks to a running topology.
> 
>         On Thu, Jun 9, 2016 at 8:58 AM, Adrien Carreira
>         <aca@reportlinker.com <ma...@reportlinker.com>> wrote:
> 
>             I've just create a topology like this :
> 
>             builder.setBolt("fetcher", new Fetch())
>             .shuffleGrouping("spout");
> 
>             builder.setBolt("extract", new Extract())
>             .shuffleGrouping("fetcher");
> 
>             builder.setBolt("indexer", new Indexer())
>             .shuffleGrouping("extract");
> 
> 
>             Means that I've three bolt with One Worker and
>             parrallelism_hint of 1.
> 
>             Now, Let's say that I've another machine available, or that
>             I've too many tuple to process and I need another machine.
> 
> 
>             I've executed this command :
> 
>             storm rebalance kairos-who -n 2 -e indexer=2 -e fetcher=2 -e
>             extract=2
> 
> 
>             But what I've is two worker with :
> 
>             worker 1 => Spout + extract
> 
>             worker 2 => fetcher + indexer
> 
> 
>             What I would love : 
> 
>             Worker 1 => Spout + fetcher + extract + indexer
> 
>             Worker 2 => Same...
> 
> 
>             I hope I'm clear...
> 
> 
> 
> 
> 
> 
> 
>             2016-06-09 14:47 GMT+02:00 Andrew Xor
>             <andreas.grammenos@gmail.com
>             <ma...@gmail.com>>:
> 
>                 Hello,
> 
>                   I am sorry, but I don't know why you cannot emulate
>                 those scale up factors by using rebalance; after all it
>                 spawns the requested amount of workers (in topology) and
>                 executors (in spouts/bolts) only bounded by the
>                 topology_max_task_parallelism. Have you read the article
>                 in order to understand how parallelism works in storm?
> 
>                 Regards.
> 
>                 On Thu, Jun 9, 2016 at 3:34 PM, Adrien Carreira
>                 <aca@reportlinker.com <ma...@reportlinker.com>> wrote:
> 
>                     Yes, 
> 
>                     But the rebalance command doesn't do what I would like.
> 
> 
>                     Let's suppose that I've : 
> 
>                     SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)
> 
>                     (number is the parallelism hint)
>                     It means that If I scale to n worker I would like : 
> 
>                     SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) =>
>                     BOLT3 (3*n)
> 
> 
>                     But, the storm rebalance keeps the parralisme_hint :/
> 
> 
> 
>                     2016-06-09 14:29 GMT+02:00 Andrew Xor
>                     <andreas.grammenos@gmail.com
>                     <ma...@gmail.com>>:
> 
>                         Hello,
> 
>                          Why not use the rebalance command? It's well
>                         documented here
>                         <http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html>.
> 
>                         Regards.
> 
>                         On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira
>                         <aca@reportlinker.com
>                         <ma...@reportlinker.com>> wrote:
> 
>                             Hi,
> 
>                             After a month building a topology on storm.
>                             I've one question about parallelism that I
>                             can't answer.
> 
>                             I've developed my topology and tested on a
>                             cluster with two nodes.
> 
>                             My parallelism_hint are ok, everything are fine.
> 
>                             My question is, if I need to scale the
>                             number of worker in the topology to have
>                             more worker dooing the same thing how can I
>                             achieve that without kill/restart the topology
> 
>                             Thanks for your reply
> 
> 
> 
> 
> 
> 
> 
> 


Re: Another parallelism question

Posted by Nathan Leung <nc...@gmail.com>.
At that point you have to think about what makes sense for your system
right now.  For example, maybe it makes sense to have # tasks = 4 times
what you need right now, and then reload the topology when you outgrow that.

Alternatively, you can consider bringing up a larger replacement topology,
and then killing the older one.  In this case you will have to be more
careful with names, and possibly things like resource (worker) allocation.

On Thu, Jun 9, 2016 at 9:30 AM, Adrien Carreira <ac...@reportlinker.com>
wrote:

> So let's say one day I would like to have 100 machine,
>
> I should set 100 on setNumTask ?
>
> 2016-06-09 15:20 GMT+02:00 Nathan Leung <nc...@gmail.com>:
>
>> You can create your topology with more tasks than executors, then when
>> the rebalance happens you can add executors.  However at the moment you
>> cannot add more tasks to a running topology.
>>
>> On Thu, Jun 9, 2016 at 8:58 AM, Adrien Carreira <ac...@reportlinker.com>
>> wrote:
>>
>>> I've just create a topology like this :
>>>
>>> builder.setBolt("fetcher", new Fetch())
>>>         .shuffleGrouping("spout");
>>>
>>> builder.setBolt("extract", new Extract())
>>>         .shuffleGrouping("fetcher");
>>>
>>> builder.setBolt("indexer", new Indexer())
>>>         .shuffleGrouping("extract");
>>>
>>>
>>> Means that I've three bolt with One Worker and parrallelism_hint of 1.
>>>
>>> Now, Let's say that I've another machine available, or that I've too many tuple to process and I need another machine.
>>>
>>>
>>> I've executed this command :
>>>
>>> storm rebalance kairos-who -n 2 -e indexer=2 -e fetcher=2 -e extract=2
>>>
>>>
>>> But what I've is two worker with :
>>>
>>> worker 1 => Spout + extract
>>>
>>> worker 2 => fetcher + indexer
>>>
>>>
>>> What I would love :
>>>
>>> Worker 1 => Spout + fetcher + extract + indexer
>>>
>>> Worker 2 => Same...
>>>
>>>
>>> I hope I'm clear...
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2016-06-09 14:47 GMT+02:00 Andrew Xor <an...@gmail.com>:
>>>
>>>> Hello,
>>>>
>>>>   I am sorry, but I don't know why you cannot emulate those scale up
>>>> factors by using rebalance; after all it spawns the requested amount of
>>>> workers (in topology) and executors (in spouts/bolts) only bounded by the
>>>> topology_max_task_parallelism. Have you read the article in order to
>>>> understand how parallelism works in storm?
>>>>
>>>> Regards.
>>>>
>>>> On Thu, Jun 9, 2016 at 3:34 PM, Adrien Carreira <ac...@reportlinker.com>
>>>> wrote:
>>>>
>>>>> Yes,
>>>>>
>>>>> But the rebalance command doesn't do what I would like.
>>>>>
>>>>>
>>>>> Let's suppose that I've :
>>>>>
>>>>> SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)
>>>>>
>>>>> (number is the parallelism hint)
>>>>> It means that If I scale to n worker I would like :
>>>>>
>>>>> SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) => BOLT3 (3*n)
>>>>>
>>>>>
>>>>> But, the storm rebalance keeps the parralisme_hint :/
>>>>>
>>>>>
>>>>>
>>>>> 2016-06-09 14:29 GMT+02:00 Andrew Xor <an...@gmail.com>:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>>  Why not use the rebalance command? It's well documented here
>>>>>> <http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html>
>>>>>> .
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>> On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira <aca@reportlinker.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> After a month building a topology on storm. I've one question about
>>>>>>> parallelism that I can't answer.
>>>>>>>
>>>>>>> I've developed my topology and tested on a cluster with two nodes.
>>>>>>>
>>>>>>> My parallelism_hint are ok, everything are fine.
>>>>>>>
>>>>>>> My question is, if I need to scale the number of worker in the
>>>>>>> topology to have more worker dooing the same thing how can I achieve that
>>>>>>> without kill/restart the topology
>>>>>>>
>>>>>>> Thanks for your reply
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Another parallelism question

Posted by Adrien Carreira <ac...@reportlinker.com>.
So let's say one day I would like to have 100 machine,

I should set 100 on setNumTask ?

2016-06-09 15:20 GMT+02:00 Nathan Leung <nc...@gmail.com>:

> You can create your topology with more tasks than executors, then when the
> rebalance happens you can add executors.  However at the moment you cannot
> add more tasks to a running topology.
>
> On Thu, Jun 9, 2016 at 8:58 AM, Adrien Carreira <ac...@reportlinker.com>
> wrote:
>
>> I've just create a topology like this :
>>
>> builder.setBolt("fetcher", new Fetch())
>>         .shuffleGrouping("spout");
>>
>> builder.setBolt("extract", new Extract())
>>         .shuffleGrouping("fetcher");
>>
>> builder.setBolt("indexer", new Indexer())
>>         .shuffleGrouping("extract");
>>
>>
>> Means that I've three bolt with One Worker and parrallelism_hint of 1.
>>
>> Now, Let's say that I've another machine available, or that I've too many tuple to process and I need another machine.
>>
>>
>> I've executed this command :
>>
>> storm rebalance kairos-who -n 2 -e indexer=2 -e fetcher=2 -e extract=2
>>
>>
>> But what I've is two worker with :
>>
>> worker 1 => Spout + extract
>>
>> worker 2 => fetcher + indexer
>>
>>
>> What I would love :
>>
>> Worker 1 => Spout + fetcher + extract + indexer
>>
>> Worker 2 => Same...
>>
>>
>> I hope I'm clear...
>>
>>
>>
>>
>>
>>
>>
>> 2016-06-09 14:47 GMT+02:00 Andrew Xor <an...@gmail.com>:
>>
>>> Hello,
>>>
>>>   I am sorry, but I don't know why you cannot emulate those scale up
>>> factors by using rebalance; after all it spawns the requested amount of
>>> workers (in topology) and executors (in spouts/bolts) only bounded by the
>>> topology_max_task_parallelism. Have you read the article in order to
>>> understand how parallelism works in storm?
>>>
>>> Regards.
>>>
>>> On Thu, Jun 9, 2016 at 3:34 PM, Adrien Carreira <ac...@reportlinker.com>
>>> wrote:
>>>
>>>> Yes,
>>>>
>>>> But the rebalance command doesn't do what I would like.
>>>>
>>>>
>>>> Let's suppose that I've :
>>>>
>>>> SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)
>>>>
>>>> (number is the parallelism hint)
>>>> It means that If I scale to n worker I would like :
>>>>
>>>> SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) => BOLT3 (3*n)
>>>>
>>>>
>>>> But, the storm rebalance keeps the parralisme_hint :/
>>>>
>>>>
>>>>
>>>> 2016-06-09 14:29 GMT+02:00 Andrew Xor <an...@gmail.com>:
>>>>
>>>>> Hello,
>>>>>
>>>>>  Why not use the rebalance command? It's well documented here
>>>>> <http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html>
>>>>> .
>>>>>
>>>>> Regards.
>>>>>
>>>>> On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira <ac...@reportlinker.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> After a month building a topology on storm. I've one question about
>>>>>> parallelism that I can't answer.
>>>>>>
>>>>>> I've developed my topology and tested on a cluster with two nodes.
>>>>>>
>>>>>> My parallelism_hint are ok, everything are fine.
>>>>>>
>>>>>> My question is, if I need to scale the number of worker in the
>>>>>> topology to have more worker dooing the same thing how can I achieve that
>>>>>> without kill/restart the topology
>>>>>>
>>>>>> Thanks for your reply
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Another parallelism question

Posted by Nathan Leung <nc...@gmail.com>.
You can create your topology with more tasks than executors, then when the
rebalance happens you can add executors.  However at the moment you cannot
add more tasks to a running topology.

On Thu, Jun 9, 2016 at 8:58 AM, Adrien Carreira <ac...@reportlinker.com>
wrote:

> I've just create a topology like this :
>
> builder.setBolt("fetcher", new Fetch())
>         .shuffleGrouping("spout");
>
> builder.setBolt("extract", new Extract())
>         .shuffleGrouping("fetcher");
>
> builder.setBolt("indexer", new Indexer())
>         .shuffleGrouping("extract");
>
>
> Means that I've three bolt with One Worker and parrallelism_hint of 1.
>
> Now, Let's say that I've another machine available, or that I've too many tuple to process and I need another machine.
>
>
> I've executed this command :
>
> storm rebalance kairos-who -n 2 -e indexer=2 -e fetcher=2 -e extract=2
>
>
> But what I've is two worker with :
>
> worker 1 => Spout + extract
>
> worker 2 => fetcher + indexer
>
>
> What I would love :
>
> Worker 1 => Spout + fetcher + extract + indexer
>
> Worker 2 => Same...
>
>
> I hope I'm clear...
>
>
>
>
>
>
>
> 2016-06-09 14:47 GMT+02:00 Andrew Xor <an...@gmail.com>:
>
>> Hello,
>>
>>   I am sorry, but I don't know why you cannot emulate those scale up
>> factors by using rebalance; after all it spawns the requested amount of
>> workers (in topology) and executors (in spouts/bolts) only bounded by the
>> topology_max_task_parallelism. Have you read the article in order to
>> understand how parallelism works in storm?
>>
>> Regards.
>>
>> On Thu, Jun 9, 2016 at 3:34 PM, Adrien Carreira <ac...@reportlinker.com>
>> wrote:
>>
>>> Yes,
>>>
>>> But the rebalance command doesn't do what I would like.
>>>
>>>
>>> Let's suppose that I've :
>>>
>>> SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)
>>>
>>> (number is the parallelism hint)
>>> It means that If I scale to n worker I would like :
>>>
>>> SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) => BOLT3 (3*n)
>>>
>>>
>>> But, the storm rebalance keeps the parralisme_hint :/
>>>
>>>
>>>
>>> 2016-06-09 14:29 GMT+02:00 Andrew Xor <an...@gmail.com>:
>>>
>>>> Hello,
>>>>
>>>>  Why not use the rebalance command? It's well documented here
>>>> <http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html>
>>>> .
>>>>
>>>> Regards.
>>>>
>>>> On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira <ac...@reportlinker.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> After a month building a topology on storm. I've one question about
>>>>> parallelism that I can't answer.
>>>>>
>>>>> I've developed my topology and tested on a cluster with two nodes.
>>>>>
>>>>> My parallelism_hint are ok, everything are fine.
>>>>>
>>>>> My question is, if I need to scale the number of worker in the
>>>>> topology to have more worker dooing the same thing how can I achieve that
>>>>> without kill/restart the topology
>>>>>
>>>>> Thanks for your reply
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Another parallelism question

Posted by Adrien Carreira <ac...@reportlinker.com>.
I've just create a topology like this :

builder.setBolt("fetcher", new Fetch())
        .shuffleGrouping("spout");

builder.setBolt("extract", new Extract())
        .shuffleGrouping("fetcher");

builder.setBolt("indexer", new Indexer())
        .shuffleGrouping("extract");


Means that I've three bolt with One Worker and parrallelism_hint of 1.

Now, Let's say that I've another machine available, or that I've too
many tuple to process and I need another machine.


I've executed this command :

storm rebalance kairos-who -n 2 -e indexer=2 -e fetcher=2 -e extract=2


But what I've is two worker with :

worker 1 => Spout + extract

worker 2 => fetcher + indexer


What I would love :

Worker 1 => Spout + fetcher + extract + indexer

Worker 2 => Same...


I hope I'm clear...







2016-06-09 14:47 GMT+02:00 Andrew Xor <an...@gmail.com>:

> Hello,
>
>   I am sorry, but I don't know why you cannot emulate those scale up
> factors by using rebalance; after all it spawns the requested amount of
> workers (in topology) and executors (in spouts/bolts) only bounded by the
> topology_max_task_parallelism. Have you read the article in order to
> understand how parallelism works in storm?
>
> Regards.
>
> On Thu, Jun 9, 2016 at 3:34 PM, Adrien Carreira <ac...@reportlinker.com>
> wrote:
>
>> Yes,
>>
>> But the rebalance command doesn't do what I would like.
>>
>>
>> Let's suppose that I've :
>>
>> SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)
>>
>> (number is the parallelism hint)
>> It means that If I scale to n worker I would like :
>>
>> SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) => BOLT3 (3*n)
>>
>>
>> But, the storm rebalance keeps the parralisme_hint :/
>>
>>
>>
>> 2016-06-09 14:29 GMT+02:00 Andrew Xor <an...@gmail.com>:
>>
>>> Hello,
>>>
>>>  Why not use the rebalance command? It's well documented here
>>> <http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html>
>>> .
>>>
>>> Regards.
>>>
>>> On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira <ac...@reportlinker.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> After a month building a topology on storm. I've one question about
>>>> parallelism that I can't answer.
>>>>
>>>> I've developed my topology and tested on a cluster with two nodes.
>>>>
>>>> My parallelism_hint are ok, everything are fine.
>>>>
>>>> My question is, if I need to scale the number of worker in the topology
>>>> to have more worker dooing the same thing how can I achieve that without
>>>> kill/restart the topology
>>>>
>>>> Thanks for your reply
>>>>
>>>
>>>
>>
>

Re: Another parallelism question

Posted by Andrew Xor <an...@gmail.com>.
Hello,

  I am sorry, but I don't know why you cannot emulate those scale up
factors by using rebalance; after all it spawns the requested amount of
workers (in topology) and executors (in spouts/bolts) only bounded by the
topology_max_task_parallelism. Have you read the article in order to
understand how parallelism works in storm?

Regards.

On Thu, Jun 9, 2016 at 3:34 PM, Adrien Carreira <ac...@reportlinker.com>
wrote:

> Yes,
>
> But the rebalance command doesn't do what I would like.
>
>
> Let's suppose that I've :
>
> SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)
>
> (number is the parallelism hint)
> It means that If I scale to n worker I would like :
>
> SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) => BOLT3 (3*n)
>
>
> But, the storm rebalance keeps the parralisme_hint :/
>
>
>
> 2016-06-09 14:29 GMT+02:00 Andrew Xor <an...@gmail.com>:
>
>> Hello,
>>
>>  Why not use the rebalance command? It's well documented here
>> <http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html>
>> .
>>
>> Regards.
>>
>> On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira <ac...@reportlinker.com>
>> wrote:
>>
>>> Hi,
>>>
>>> After a month building a topology on storm. I've one question about
>>> parallelism that I can't answer.
>>>
>>> I've developed my topology and tested on a cluster with two nodes.
>>>
>>> My parallelism_hint are ok, everything are fine.
>>>
>>> My question is, if I need to scale the number of worker in the topology
>>> to have more worker dooing the same thing how can I achieve that without
>>> kill/restart the topology
>>>
>>> Thanks for your reply
>>>
>>
>>
>

Re: Another parallelism question

Posted by Adrien Carreira <ac...@reportlinker.com>.
Yes,

But the rebalance command doesn't do what I would like.


Let's suppose that I've :

SPOUT A (1) => BOLT 1 (1) => BOLT2 (1) => BOLT3 (3)

(number is the parallelism hint)
It means that If I scale to n worker I would like :

SPOUT A (1*n) => BOLT 1 (1*n) => BOLT2 (1*n) => BOLT3 (3*n)


But, the storm rebalance keeps the parralisme_hint :/



2016-06-09 14:29 GMT+02:00 Andrew Xor <an...@gmail.com>:

> Hello,
>
>  Why not use the rebalance command? It's well documented here
> <http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html>
> .
>
> Regards.
>
> On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira <ac...@reportlinker.com>
> wrote:
>
>> Hi,
>>
>> After a month building a topology on storm. I've one question about
>> parallelism that I can't answer.
>>
>> I've developed my topology and tested on a cluster with two nodes.
>>
>> My parallelism_hint are ok, everything are fine.
>>
>> My question is, if I need to scale the number of worker in the topology
>> to have more worker dooing the same thing how can I achieve that without
>> kill/restart the topology
>>
>> Thanks for your reply
>>
>
>

Re: Another parallelism question

Posted by Andrew Xor <an...@gmail.com>.
Hello,

 Why not use the rebalance command? It's well documented here
<http://storm.apache.org/releases/current/Understanding-the-parallelism-of-a-Storm-topology.html>
.

Regards.

On Thu, Jun 9, 2016 at 3:22 PM, Adrien Carreira <ac...@reportlinker.com>
wrote:

> Hi,
>
> After a month building a topology on storm. I've one question about
> parallelism that I can't answer.
>
> I've developed my topology and tested on a cluster with two nodes.
>
> My parallelism_hint are ok, everything are fine.
>
> My question is, if I need to scale the number of worker in the topology to
> have more worker dooing the same thing how can I achieve that without
> kill/restart the topology
>
> Thanks for your reply
>