You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by clay teahouse <cl...@gmail.com> on 2015/02/01 23:54:05 UTC

questions on task, threads and workers

Hi,
I have a few simple questions.
1)In storm .9.x, what is the default value for the bolt num tasks?
According to the docs, the parallelism hint no longer sets the number of
tasks, but the number of executor threads.
2)What happens if the number of tasks is less than the number of threads?
Should I assume this results in idle threads?
3)Does the number of workers multiplies the number of tasks and threads?

feedback appreciated,
Clay

Re: Is there anyone who have experience of using storm with druid?

Posted by Brian O'Neill <bo...@alumni.brown.edu>.
Ironically, we dedicated two chapters to a Storm/Druid integration when we
wrote the book:
http://www.amazon.com/Storm-Blueprints-Distributed-Real-time-Computation/dp/
178216829X/

If you want to contact me directly, I’d be interested in hearing more about
your use case, and what specifically you are looking to do.  (I might also
have code you can use)

-brian


---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile ? @boneill42 <http://www.twitter.com/boneill42>


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Andrew Neilson <ar...@gmail.com>
Reply-To:  <us...@storm.apache.org>
Date:  Wednesday, February 4, 2015 at 3:49 PM
To:  <us...@storm.apache.org>, 이승진 <sw...@navercorp.com>
Subject:  Re: Is there anyone who have experience of using storm with druid?

I can't answer your question directly as I haven't used Druid, but I will
point out that Druid was developed by Metamarkets and they are using Druid
with Storm and Kafka in their architecture
(https://metamarkets.com/2014/building-a-data-pipeline-that-handles-billions
-of-events-in-real-time/). They've open-sourced Tranquility, which includes
components for streaming data from Storm to Druid:

https://github.com/metamx/tranquility

Hope this helps.

On Tue, Feb 3, 2015 at 7:17 PM, 이승진 <sw...@navercorp.com> wrote:
> Hello all,
> 
>  
> 
> We are about to implement aggregation feature in our system and for that we
> think it's a good option to use Druid with Storm.
> 
>  
> 
> If any of you have experience, I want to here from you about these things.
> 
>  
> 
> - data feeding : Do you feed data from kafka to druid directly or use Storm as
> a intermeidate refiner?
> 
> - Did you develop custom druid driver for that?
> 
> - any inconvenience during implementation?
> 
>  
> 
> Thanks in advance
> 




Re: Is there anyone who have experience of using storm with druid?

Posted by Andrew Neilson <ar...@gmail.com>.
I can't answer your question directly as I haven't used Druid, but I will
point out that Druid was developed by Metamarkets and they are using Druid
with Storm and Kafka in their architecture (
https://metamarkets.com/2014/building-a-data-pipeline-that-handles-billions-of-events-in-real-time/).
They've open-sourced Tranquility, which includes components for streaming
data from Storm to Druid:

https://github.com/metamx/tranquility

Hope this helps.

On Tue, Feb 3, 2015 at 7:17 PM, 이승진 <sw...@navercorp.com> wrote:

> Hello all,
>
>
>
> We are about to implement aggregation feature in our system and for that
> we think it's a good option to use Druid with Storm.
>
>
>
> If any of you have experience, I want to here from you about these things.
>
>
>
> - data feeding : Do you feed data from kafka to druid directly or use
> Storm as a intermeidate refiner?
>
> - Did you develop custom druid driver for that?
>
> - any inconvenience during implementation?
>
>
>
> Thanks in advance
>

Is there anyone who have experience of using storm with druid?

Posted by 이승진 <sw...@navercorp.com>.
Hello all,
 
We are about to implement aggregation feature in our system and for that we think it's a good option to use Druid with Storm. 
 
If any of you have experience, I want to here from you about these things.
 
- data feeding : Do you feed data from kafka to druid directly or use Storm as a intermeidate refiner?
- Did you develop custom druid driver for that?
- any inconvenience during implementation?
 
Thanks in advance

Re: questions on task, threads and workers

Posted by Kosala Dissanayake <um...@gmail.com>.
@Ben

No. setNumTasks refers to the *total* number of tasks for the component.

Anyway this is not possible, which is confirmed by Nathan's answer in a
different thread given below.

*"The number of executors for a component must be <= than the number of
tasks"*  (
https://groups.google.com/d/msg/storm-user/VvXCG-TqMx0/7DfWltEkzvAJ)

On Wed, Feb 4, 2015 at 2:30 AM, Ben Gould <be...@inovexcorp.com> wrote:

>  Just a guess, but if you said:
>
> topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2).setNumTasks(1)
> (2 executor threads, but only 1 task)
>
> Then you'd get two executors with 1 task each?
>
>
>
> On 02/02/2015 08:15 PM, Kosala Dissanayake wrote:
>
> Yes it specifies the number of executors.
>
>  *But by default, storm assigns one task per executor. *
>
>  Therefore, when you set the number of executors, by default, this will
> be equal to the number of tasks.
>
>
>  If you wish to, you can override the one task per executor default and
> manually set the number of tasks
>
>  topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2).setNumTasks(4)
>
>  This will set 4 tasks to the green bolt. Since the parallelism hint is 2,
> there will be 2 executors. Therefore, each executor will get 4/2 = 2
> tasks.
>
>
>  Your other question is whether we can do this
>
>  topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2).setNumTasks(1)
>   (2 executor threads, but only 1 task)
>
>  I do not know off the top of my head but I suspect that you can't do
> this. Maybe you can it out.
>
>
>  You can read this thread as well to understand this better
>
> http://stackoverflow.com/questions/20371073/how-to-tune-the-parallelism-hint-in-storm
>
>
> On Tue, Feb 3, 2015 at 10:44 AM, clay teahouse <cl...@gmail.com>
> wrote:
>
>> According to the storm docs, as of Storm 0.8 the*parallelism_hint* parameter
>> now specifies the initial number of executors (not tasks!) for that bolt.
>> I assume this means that number of tasks and the number of parallelism
>> hints have to be set separately and the number of executor threads (i.e.,
>> the parallelism hint) does not set the value of the number of tasks. Hence
>> my question, what happens if the number of tasks is less than the number of
>> executors?
>>
>>
>> On Mon, Feb 2, 2015 at 5:37 PM, Kosala Dissanayake <um...@gmail.com>
>> wrote:
>>
>>> 1. The default number of tasks for an executor thread is 1. So if you
>>> just specify the parallelism for a bolt as 3, you will have 3 executor
>>> threads. Since the default num of tasks for an executor thread is 1, this
>>> means you will get 3 tasks.
>>>
>>>  You can increase the number of tasks for an executor thread from the
>>> default 1 by using setNumTasks.
>>>
>>>  2. I don't think that's possible.
>>>
>>>  3. No. The number of workers is just the number of processes to which
>>> executors can be allocated. If you have less workers, more executors will
>>> be assigned to each worker and vice versa.
>>>
>>>
>>>
>>>
>>> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/ is
>>> a pretty good introduction to these concepts
>>>
>>> On Mon, Feb 2, 2015 at 9:54 AM, clay teahouse <cl...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> I have a few simple questions.
>>>> 1)In storm .9.x, what is the default value for the bolt num tasks?
>>>> According to the docs, the parallelism hint no longer sets the number of
>>>> tasks, but the number of executor threads.
>>>> 2)What happens if the number of tasks is less than the number of
>>>> threads? Should I assume this results in idle threads?
>>>> 3)Does the number of workers multiplies the number of tasks and threads?
>>>>
>>>>  feedback appreciated,
>>>> Clay
>>>>
>>>
>>>
>>
>

Re: questions on task, threads and workers

Posted by Ben Gould <be...@inovexcorp.com>.
Just a guess, but if you said:

topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2).setNumTasks(1) 
   (2 executor threads, but only 1 task)

Then you'd get two executors with 1 task each?


On 02/02/2015 08:15 PM, Kosala Dissanayake wrote:
> Yes it specifies the number of executors.
>
> *But by default, storm assigns one task per executor. *
>
> Therefore, when you set the number of executors, by default, this will 
> be equal to the number of tasks.
>
>
> If you wish to, you can override the one task per executor default and 
> manually set the number of tasks
>
> topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2).setNumTasks(4)
>
> This will set 4 tasks to the green bolt. Since the parallelism hint is 
> 2, there will be 2 executors. Therefore, each executor will get 4/2 = 
> 2 tasks.
>
>
> Your other question is whether we can do this
>
> topologyBuilder.setBolt("green-bolt", new GreenBolt(), 
> 2).setNumTasks(1)   (2 executor threads, but only 1 task)
>
> I do not know off the top of my head but I suspect that you can't do 
> this. Maybe you can it out.
>
>
> You can read this thread as well to understand this better
> http://stackoverflow.com/questions/20371073/how-to-tune-the-parallelism-hint-in-storm
>
>
> On Tue, Feb 3, 2015 at 10:44 AM, clay teahouse <clayteahouse@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     According to the storm docs, as of Storm 0.8
>     the/parallelism_hint/ parameter now specifies the initial number
>     of executors (not tasks!) for that bolt.
>     I assume this means that number of tasks and the number of
>     parallelism hints have to be set separately and the number of
>     executor threads (i.e., the parallelism hint) does not set the
>     value of the number of tasks. Hence my question, what happens if
>     the number of tasks is less than the number of executors?
>
>
>     On Mon, Feb 2, 2015 at 5:37 PM, Kosala Dissanayake
>     <umaradissa@gmail.com <ma...@gmail.com>> wrote:
>
>         1. The default number of tasks for an executor thread is 1. So
>         if you just specify the parallelism for a bolt as 3, you will
>         have 3 executor threads. Since the default num of tasks for an
>         executor thread is 1, this means you will get 3 tasks.
>
>         You can increase the number of tasks for an executor thread
>         from the default 1 by using setNumTasks.
>
>         2. I don't think that's possible.
>
>         3. No. The number of workers is just the number of processes
>         to which executors can be allocated. If you have less workers,
>         more executors will be assigned to each worker and vice versa.
>
>
>
>         http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/ is
>         a pretty good introduction to these concepts
>
>         On Mon, Feb 2, 2015 at 9:54 AM, clay teahouse
>         <clayteahouse@gmail.com <ma...@gmail.com>> wrote:
>
>             Hi,
>             I have a few simple questions.
>             1)In storm .9.x, what is the default value for the bolt
>             num tasks? According to the docs, the parallelism hint no
>             longer sets the number of tasks, but the number of
>             executor threads.
>             2)What happens if the number of tasks is less than the
>             number of threads? Should I assume this results in idle
>             threads?
>             3)Does the number of workers multiplies the number of
>             tasks and threads?
>
>             feedback appreciated,
>             Clay
>
>
>
>

Re: questions on task, threads and workers

Posted by Kosala Dissanayake <um...@gmail.com>.
Yes it specifies the number of executors.

*But by default, storm assigns one task per executor. *

Therefore, when you set the number of executors, by default, this will be
equal to the number of tasks.


If you wish to, you can override the one task per executor default and
manually set the number of tasks

topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2).setNumTasks(4)

This will set 4 tasks to the green bolt. Since the parallelism hint is 2,
there will be 2 executors. Therefore, each executor will get 4/2 = 2 tasks.


Your other question is whether we can do this

topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2).setNumTasks(1)
(2 executor threads, but only 1 task)

I do not know off the top of my head but I suspect that you can't do this.
Maybe you can it out.


You can read this thread as well to understand this better
http://stackoverflow.com/questions/20371073/how-to-tune-the-parallelism-hint-in-storm


On Tue, Feb 3, 2015 at 10:44 AM, clay teahouse <cl...@gmail.com>
wrote:

> According to the storm docs, as of Storm 0.8 the*parallelism_hint* parameter
> now specifies the initial number of executors (not tasks!) for that bolt.
> I assume this means that number of tasks and the number of parallelism
> hints have to be set separately and the number of executor threads (i.e.,
> the parallelism hint) does not set the value of the number of tasks. Hence
> my question, what happens if the number of tasks is less than the number of
> executors?
>
>
> On Mon, Feb 2, 2015 at 5:37 PM, Kosala Dissanayake <um...@gmail.com>
> wrote:
>
>> 1. The default number of tasks for an executor thread is 1. So if you
>> just specify the parallelism for a bolt as 3, you will have 3 executor
>> threads. Since the default num of tasks for an executor thread is 1, this
>> means you will get 3 tasks.
>>
>> You can increase the number of tasks for an executor thread from the
>> default 1 by using setNumTasks.
>>
>> 2. I don't think that's possible.
>>
>> 3. No. The number of workers is just the number of processes to which
>> executors can be allocated. If you have less workers, more executors will
>> be assigned to each worker and vice versa.
>>
>>
>>
>>
>> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/ is
>> a pretty good introduction to these concepts
>>
>> On Mon, Feb 2, 2015 at 9:54 AM, clay teahouse <cl...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> I have a few simple questions.
>>> 1)In storm .9.x, what is the default value for the bolt num tasks?
>>> According to the docs, the parallelism hint no longer sets the number of
>>> tasks, but the number of executor threads.
>>> 2)What happens if the number of tasks is less than the number of
>>> threads? Should I assume this results in idle threads?
>>> 3)Does the number of workers multiplies the number of tasks and threads?
>>>
>>> feedback appreciated,
>>> Clay
>>>
>>
>>
>

Re: questions on task, threads and workers

Posted by clay teahouse <cl...@gmail.com>.
According to the storm docs, as of Storm 0.8 the*parallelism_hint* parameter
now specifies the initial number of executors (not tasks!) for that bolt.
I assume this means that number of tasks and the number of parallelism
hints have to be set separately and the number of executor threads (i.e.,
the parallelism hint) does not set the value of the number of tasks. Hence
my question, what happens if the number of tasks is less than the number of
executors?


On Mon, Feb 2, 2015 at 5:37 PM, Kosala Dissanayake <um...@gmail.com>
wrote:

> 1. The default number of tasks for an executor thread is 1. So if you just
> specify the parallelism for a bolt as 3, you will have 3 executor threads.
> Since the default num of tasks for an executor thread is 1, this means you
> will get 3 tasks.
>
> You can increase the number of tasks for an executor thread from the
> default 1 by using setNumTasks.
>
> 2. I don't think that's possible.
>
> 3. No. The number of workers is just the number of processes to which
> executors can be allocated. If you have less workers, more executors will
> be assigned to each worker and vice versa.
>
>
>
>
> http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/ is
> a pretty good introduction to these concepts
>
> On Mon, Feb 2, 2015 at 9:54 AM, clay teahouse <cl...@gmail.com>
> wrote:
>
>> Hi,
>> I have a few simple questions.
>> 1)In storm .9.x, what is the default value for the bolt num tasks?
>> According to the docs, the parallelism hint no longer sets the number of
>> tasks, but the number of executor threads.
>> 2)What happens if the number of tasks is less than the number of threads?
>> Should I assume this results in idle threads?
>> 3)Does the number of workers multiplies the number of tasks and threads?
>>
>> feedback appreciated,
>> Clay
>>
>
>

Re: questions on task, threads and workers

Posted by Kosala Dissanayake <um...@gmail.com>.
1. The default number of tasks for an executor thread is 1. So if you just
specify the parallelism for a bolt as 3, you will have 3 executor threads.
Since the default num of tasks for an executor thread is 1, this means you
will get 3 tasks.

You can increase the number of tasks for an executor thread from the
default 1 by using setNumTasks.

2. I don't think that's possible.

3. No. The number of workers is just the number of processes to which
executors can be allocated. If you have less workers, more executors will
be assigned to each worker and vice versa.



http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-a-storm-topology/
is
a pretty good introduction to these concepts

On Mon, Feb 2, 2015 at 9:54 AM, clay teahouse <cl...@gmail.com>
wrote:

> Hi,
> I have a few simple questions.
> 1)In storm .9.x, what is the default value for the bolt num tasks?
> According to the docs, the parallelism hint no longer sets the number of
> tasks, but the number of executor threads.
> 2)What happens if the number of tasks is less than the number of threads?
> Should I assume this results in idle threads?
> 3)Does the number of workers multiplies the number of tasks and threads?
>
> feedback appreciated,
> Clay
>