You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by tishan pubudu kanishka dahanayakage <dt...@gmail.com> on 2015/03/11 09:45:18 UTC

Understanding parallelism

Hi,

I went through this[1] and tried few topology deployments. Just want to
clear small doubt. According to [1] what i understood was that parallelism
hint is the initial parallelism value and that value increase dynamically
in run-time. Last comment on [2] also suggest the same. However when I
tested it in Storm I did not see parallelism for that Bolt increase with
load.

​Does my understanding about how parallelism hint operates in Storm
correct. If so do I need to do any more configurations to make it work.

Thanks,
Tishan​

-- 
Regards,
Tishan

Re: Understanding parallelism

Posted by Susheel Kumar Gadalay <sk...@gmail.com>.
On the fly change is by stome rebalance command.
storm rebalance topology-name> -e component=parallelism

On 3/13/15, tishan pubudu kanishka dahanayakage <dt...@gmail.com> wrote:
> Hi Kosala,
>
> Thanks for the response. Yeah. I came across that. But that was written in
> 2012 whereas [1] is more recently. it says "Note that as of Storm 0.8 the
> parallelism_hint parameter now specifies the
> ​'​
> initial
> ​'​
> number of executors (not tasks!) for that bolt". Also in here[2] Nathan
> says that "0.8.0 will let you change the parallelism of topologies on the
> fly"
> ​ . That's why i raised this concern. So what you are saying is if I set
> parallelism to 'x' it will have x number of executors forever. Please
> correct if I am wrong.
>
>
> [1]
> http://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
> [2] https://groups.google.com/forum/#!topic/storm-user/Rr9K7f-AMLc
>
> Thanks,
> Tishan
>
> On Fri, Mar 13, 2015 at 11:54 AM, Kosala Dissanayake <um...@gmail.com>
> wrote:
>
>> *"initial parallelism value and that value increase dynamically in
>> run-time."*
>>
>> No. The parallelism value is the number of executors you get. This does
>> not change at run-time.
>>
>> Read this.
>> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
>>
>> On Wed, Mar 11, 2015 at 7:45 PM, tishan pubudu kanishka dahanayakage <
>> dtishanpubudu@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I went through this[1] and tried few topology deployments. Just want to
>>> clear small doubt. According to [1] what i understood was that
>>> parallelism
>>> hint is the initial parallelism value and that value increase
>>> dynamically
>>> in run-time. Last comment on [2] also suggest the same. However when I
>>> tested it in Storm I did not see parallelism for that Bolt increase with
>>> load.
>>>
>>> ​Does my understanding about how parallelism hint operates in Storm
>>> correct. If so do I need to do any more configurations to make it work.
>>>
>>> Thanks,
>>> Tishan​
>>>
>>> --
>>> Regards,
>>> Tishan
>>>
>>
>>
>
>
> --
> Regards,
> Tishan
>

Re: Understanding parallelism

Posted by tishan pubudu kanishka dahanayakage <dt...@gmail.com>.
Hi all,

Thanks a lot for great clarification. So if I have multiple tasks in one
executor, no of executors will increase upon re-balancing. But no change
will happen with default one task per executor concept.

Thanks,
Tishan

On Fri, Mar 13, 2015 at 12:49 PM, Kosala Dissanayake <um...@gmail.com>
wrote:

> I am pretty sure Nathan is referring to rebalancing in that response.
>
> *'When you set the parallelism to 'x', you will have 'x' executors forever*.'
> No. The number of *tasks *is static. You can change the number of
> *executors* using the rebalance command.
>
> Since from 0.8.0 'parallelism' refers to the number of initial executors,
> which can be changed, this sort of means that the 'parallelism' can be
> changed on the fly. It's confusing because 0.8.0 redefined the meaning of
> parallelism and then said that the 'parallelism' could be changed on the
> fly. Which is true, but you need to realize that the number of tasks
> remains the same regardless.
>
>
>
> Rebalancing becomes useful when you have more than one task per executor.
> The default is one task per executor. However, you can override the one
> task per executor default and manually set the number of tasks using
> setNumTasks.
>
>
> Why do this? I'll just copy Michael's excellent explanation.
>
> *"So one reason for having 2+ tasks per executor thread is to give you the
> flexibility to expand/scale up the topology through the storm rebalance
> command in the future without taking the topology offline. For instance,
> imagine you start out with a Storm cluster of 15 machines but already know
> that next week another 10 boxes will be added. Here you could opt for
> running the topology at the anticipated parallelism level of 25 machines
> already on the 15 initial boxes (which is of course slower than 25 boxes).
> Once the additional 10 boxes are integrated you can then storm rebalance
> the topology to make full use of all 25 boxes without any downtime."*
>
>
> *http://stackoverflow.com/questions/17257448/what-is-the-task-in-twitter-storm-parallelism
> <http://stackoverflow.com/questions/17257448/what-is-the-task-in-twitter-storm-parallelism>*
>
>
>
>
> On Fri, Mar 13, 2015 at 6:00 PM, tishan pubudu kanishka dahanayakage <
> dtishanpubudu@gmail.com> wrote:
>
>> Hi Kosala,
>>
>> Thanks for the response. Yeah. I came across that. But that was written
>> in 2012 whereas [1] is more recently. it says "Note that as of Storm 0.8
>> the parallelism_hint parameter now specifies the
>> ​'​
>> initial
>> ​'​
>> number of executors (not tasks!) for that bolt". Also in here[2] Nathan
>> says that "0.8.0 will let you change the parallelism of topologies on the
>> fly"
>> ​ . That's why i raised this concern. So what you are saying is if I set
>> parallelism to 'x' it will have x number of executors forever. Please
>> correct if I am wrong.
>>
>>
>> [1]
>> http://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
>> [2] https://groups.google.com/forum/#!topic/storm-user/Rr9K7f-AMLc
>>
>> Thanks,
>> Tishan
>>
>> On Fri, Mar 13, 2015 at 11:54 AM, Kosala Dissanayake <
>> umaradissa@gmail.com> wrote:
>>
>>> *"initial parallelism value and that value increase dynamically in
>>> run-time."*
>>>
>>> No. The parallelism value is the number of executors you get. This does
>>> not change at run-time.
>>>
>>> Read this.
>>> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
>>>
>>> On Wed, Mar 11, 2015 at 7:45 PM, tishan pubudu kanishka dahanayakage <
>>> dtishanpubudu@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I went through this[1] and tried few topology deployments. Just want to
>>>> clear small doubt. According to [1] what i understood was that parallelism
>>>> hint is the initial parallelism value and that value increase dynamically
>>>> in run-time. Last comment on [2] also suggest the same. However when I
>>>> tested it in Storm I did not see parallelism for that Bolt increase with
>>>> load.
>>>>
>>>> ​Does my understanding about how parallelism hint operates in Storm
>>>> correct. If so do I need to do any more configurations to make it work.
>>>>
>>>> Thanks,
>>>> Tishan​
>>>>
>>>> --
>>>> Regards,
>>>> Tishan
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Tishan
>>
>
>


-- 
Regards,
Tishan

Re: Understanding parallelism

Posted by Kosala Dissanayake <um...@gmail.com>.
I am pretty sure Nathan is referring to rebalancing in that response.

*'When you set the parallelism to 'x', you will have 'x' executors forever*.'
No. The number of *tasks *is static. You can change the number of
*executors* using the rebalance command.

Since from 0.8.0 'parallelism' refers to the number of initial executors,
which can be changed, this sort of means that the 'parallelism' can be
changed on the fly. It's confusing because 0.8.0 redefined the meaning of
parallelism and then said that the 'parallelism' could be changed on the
fly. Which is true, but you need to realize that the number of tasks
remains the same regardless.



Rebalancing becomes useful when you have more than one task per executor.
The default is one task per executor. However, you can override the one
task per executor default and manually set the number of tasks using
setNumTasks.


Why do this? I'll just copy Michael's excellent explanation.

*"So one reason for having 2+ tasks per executor thread is to give you the
flexibility to expand/scale up the topology through the storm rebalance
command in the future without taking the topology offline. For instance,
imagine you start out with a Storm cluster of 15 machines but already know
that next week another 10 boxes will be added. Here you could opt for
running the topology at the anticipated parallelism level of 25 machines
already on the 15 initial boxes (which is of course slower than 25 boxes).
Once the additional 10 boxes are integrated you can then storm rebalance
the topology to make full use of all 25 boxes without any downtime."*


*http://stackoverflow.com/questions/17257448/what-is-the-task-in-twitter-storm-parallelism
<http://stackoverflow.com/questions/17257448/what-is-the-task-in-twitter-storm-parallelism>*




On Fri, Mar 13, 2015 at 6:00 PM, tishan pubudu kanishka dahanayakage <
dtishanpubudu@gmail.com> wrote:

> Hi Kosala,
>
> Thanks for the response. Yeah. I came across that. But that was written in
> 2012 whereas [1] is more recently. it says "Note that as of Storm 0.8 the
> parallelism_hint parameter now specifies the
> ​'​
> initial
> ​'​
> number of executors (not tasks!) for that bolt". Also in here[2] Nathan
> says that "0.8.0 will let you change the parallelism of topologies on the
> fly"
> ​ . That's why i raised this concern. So what you are saying is if I set
> parallelism to 'x' it will have x number of executors forever. Please
> correct if I am wrong.
>
>
> [1]
> http://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
> [2] https://groups.google.com/forum/#!topic/storm-user/Rr9K7f-AMLc
>
> Thanks,
> Tishan
>
> On Fri, Mar 13, 2015 at 11:54 AM, Kosala Dissanayake <umaradissa@gmail.com
> > wrote:
>
>> *"initial parallelism value and that value increase dynamically in
>> run-time."*
>>
>> No. The parallelism value is the number of executors you get. This does
>> not change at run-time.
>>
>> Read this.
>> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
>>
>> On Wed, Mar 11, 2015 at 7:45 PM, tishan pubudu kanishka dahanayakage <
>> dtishanpubudu@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I went through this[1] and tried few topology deployments. Just want to
>>> clear small doubt. According to [1] what i understood was that parallelism
>>> hint is the initial parallelism value and that value increase dynamically
>>> in run-time. Last comment on [2] also suggest the same. However when I
>>> tested it in Storm I did not see parallelism for that Bolt increase with
>>> load.
>>>
>>> ​Does my understanding about how parallelism hint operates in Storm
>>> correct. If so do I need to do any more configurations to make it work.
>>>
>>> Thanks,
>>> Tishan​
>>>
>>> --
>>> Regards,
>>> Tishan
>>>
>>
>>
>
>
> --
> Regards,
> Tishan
>

Re: Understanding parallelism

Posted by tishan pubudu kanishka dahanayakage <dt...@gmail.com>.
Hi Kosala,

Thanks for the response. Yeah. I came across that. But that was written in
2012 whereas [1] is more recently. it says "Note that as of Storm 0.8 the
parallelism_hint parameter now specifies the
​'​
initial
​'​
number of executors (not tasks!) for that bolt". Also in here[2] Nathan
says that "0.8.0 will let you change the parallelism of topologies on the
fly"
​ . That's why i raised this concern. So what you are saying is if I set
parallelism to 'x' it will have x number of executors forever. Please
correct if I am wrong.


[1]
http://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html
[2] https://groups.google.com/forum/#!topic/storm-user/Rr9K7f-AMLc

Thanks,
Tishan

On Fri, Mar 13, 2015 at 11:54 AM, Kosala Dissanayake <um...@gmail.com>
wrote:

> *"initial parallelism value and that value increase dynamically in
> run-time."*
>
> No. The parallelism value is the number of executors you get. This does
> not change at run-time.
>
> Read this.
> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
>
> On Wed, Mar 11, 2015 at 7:45 PM, tishan pubudu kanishka dahanayakage <
> dtishanpubudu@gmail.com> wrote:
>
>> Hi,
>>
>> I went through this[1] and tried few topology deployments. Just want to
>> clear small doubt. According to [1] what i understood was that parallelism
>> hint is the initial parallelism value and that value increase dynamically
>> in run-time. Last comment on [2] also suggest the same. However when I
>> tested it in Storm I did not see parallelism for that Bolt increase with
>> load.
>>
>> ​Does my understanding about how parallelism hint operates in Storm
>> correct. If so do I need to do any more configurations to make it work.
>>
>> Thanks,
>> Tishan​
>>
>> --
>> Regards,
>> Tishan
>>
>
>


-- 
Regards,
Tishan

Re: Understanding parallelism

Posted by Kosala Dissanayake <um...@gmail.com>.
*"initial parallelism value and that value increase dynamically in
run-time."*

No. The parallelism value is the number of executors you get. This does not
change at run-time.

Read this.
http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/

On Wed, Mar 11, 2015 at 7:45 PM, tishan pubudu kanishka dahanayakage <
dtishanpubudu@gmail.com> wrote:

> Hi,
>
> I went through this[1] and tried few topology deployments. Just want to
> clear small doubt. According to [1] what i understood was that parallelism
> hint is the initial parallelism value and that value increase dynamically
> in run-time. Last comment on [2] also suggest the same. However when I
> tested it in Storm I did not see parallelism for that Bolt increase with
> load.
>
> ​Does my understanding about how parallelism hint operates in Storm
> correct. If so do I need to do any more configurations to make it work.
>
> Thanks,
> Tishan​
>
> --
> Regards,
> Tishan
>