You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Seungtack Baek <se...@precocityllc.com> on 2015/06/09 00:56:14 UTC

Question on Parallelsim

I was reading on "How many Workers should I use?" (link
<https://storm.apache.org/documentation/FAQ.html#how-many-workers-should-i-use?>)
and it suggested us to use parallelism hint that is same as the total
number of cores in the cluster. I just want to clarify that this
parallelism is solely for this bolt only, without counting acker and spout
task, right?

Also, even if then number of bolts (not tasks) increases, are we still
encouraged to keep the parallelism = total cores in cluster?

Thanks,
Baek


*Seungtack Baek | Precocity, LLC*

Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715

*SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
www.precocityllc.com


This is the end of this message.

--

Re: Question on Parallelsim

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
One comment: The suggestion to use a single worker to avoid overhead is
basically right. It only has the drawback of coarse grained
fault-tolerance -- if the worker JVM goes done, be one bad behaving
spout/bolt, all other spouts/bolts die, too. Also keep in mind, that a
worker will only process spouts/bolts of a single topology (enforced to
isolate topologies from each other for fault-tolerance reason). Thus,
you need at least one worker (per supervisor) per parallel executing
topology.

-Matthias


On 06/09/2015 02:22 AM, Javier Gonzalez wrote:
> In that case, I would increase tho numbers of bolts and/or spouts. If
> your use case permits*, I'd say you can safely increase those numbers.
> The machine you describe should be able to support about 15 times as
> much. Study your current performance to see where do you need more power
> - is your spout running away with it and your bolts lagging behind? Add
> more bolts. Are your bolts idle because you can't feed them enough? More
> spouts. Everything running cool? Add more everything :)
> 
> * that is, if for some reason you are not restricted to only 4 spouts
> and/or only 13 bolts
> 
> Regards,
> Javier
> 
> On Mon, Jun 8, 2015 at 8:03 PM, Seungtack Baek
> <seungtackbaek@precocityllc.com <ma...@precocityllc.com>>
> wrote:
> 
>     What would be best to do if you have more than the number of cores?
> 
>     For example, we have 4 spout and 13 bolts and our machine has 32
>     CPUs with 8 cores each..
> 
> 
>     *Seungtack Baek | Precocity, LLC*
> 
>     Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
> 
>     _SeungtackBaek@precocityllc.com
>     <ma...@precocityllc.com>_ | www.precocityllc.com
>     <http://www.precocityllc.com/>__
> 
> 
>     This is the end of this message.
> 
>     --
> 
> 
>     On Mon, Jun 8, 2015 at 6:26 PM, Javier Gonzalez <jagonzal@gmail.com
>     <ma...@gmail.com>> wrote:
> 
>         I would say, configure so that your total parallelism matches
>         the number of cores available (i.e. if you have a topology with
>         X spouts, Y boltAs and Z boltBs, make it so that X+Y+Z = cores
>         available).  And one worker per machine, inter-JVM
>         communications are expensive. When you have more bolts and
>         spouts than available cores, you're losing time to switching
>         available cpus between them. In an ideal world, your topology
>         will be able to allocate the cores with components in a 1-1
>         fashion without switching.
> 
>         Regards,
>         JG
> 
>         On Mon, Jun 8, 2015 at 6:56 PM, Seungtack Baek
>         <seungtackbaek@precocityllc.com
>         <ma...@precocityllc.com>> wrote:
> 
>             I was reading on "How many Workers should I use?" (link
>             <https://storm.apache.org/documentation/FAQ.html#how-many-workers-should-i-use?>)
>             and it suggested us to use parallelism hint that is same as
>             the total number of cores in the cluster. I just want to
>             clarify that this parallelism is solely for this bolt only,
>             without counting acker and spout task, right?
> 
>             Also, even if then number of bolts (not tasks) increases,
>             are we still encouraged to keep the parallelism = total
>             cores in cluster?
> 
>             Thanks,
>             Baek
> 
> 
>             *Seungtack Baek | Precocity, LLC*
> 
>             Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>             <tel:%28214%29%20477-5715>
> 
>             _SeungtackBaek@precocityllc.com
>             <ma...@precocityllc.com>_ | www.precocityllc.com
>             <http://www.precocityllc.com/>__
> 
> 
>             This is the end of this message.
> 
>             --
> 
> 
> 
> 
>         -- 
>         Javier González Nicolini
> 
> 
> 
> 
> 
> -- 
> Javier González Nicolini


Re: Question on Parallelsim

Posted by Javier Gonzalez <ja...@gmail.com>.
In that case, I would increase tho numbers of bolts and/or spouts. If your
use case permits*, I'd say you can safely increase those numbers. The
machine you describe should be able to support about 15 times as much.
Study your current performance to see where do you need more power - is
your spout running away with it and your bolts lagging behind? Add more
bolts. Are your bolts idle because you can't feed them enough? More spouts.
Everything running cool? Add more everything :)

* that is, if for some reason you are not restricted to only 4 spouts
and/or only 13 bolts

Regards,
Javier

On Mon, Jun 8, 2015 at 8:03 PM, Seungtack Baek <
seungtackbaek@precocityllc.com> wrote:

> What would be best to do if you have more than the number of cores?
>
> For example, we have 4 spout and 13 bolts and our machine has 32 CPUs with
> 8 cores each..
>
>
> *Seungtack Baek | Precocity, LLC*
>
> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>
> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
> www.precocityllc.com
>
>
> This is the end of this message.
>
> --
>
> On Mon, Jun 8, 2015 at 6:26 PM, Javier Gonzalez <ja...@gmail.com>
> wrote:
>
>> I would say, configure so that your total parallelism matches the number
>> of cores available (i.e. if you have a topology with X spouts, Y boltAs and
>> Z boltBs, make it so that X+Y+Z = cores available).  And one worker per
>> machine, inter-JVM communications are expensive. When you have more bolts
>> and spouts than available cores, you're losing time to switching available
>> cpus between them. In an ideal world, your topology will be able to
>> allocate the cores with components in a 1-1 fashion without switching.
>>
>> Regards,
>> JG
>>
>> On Mon, Jun 8, 2015 at 6:56 PM, Seungtack Baek <
>> seungtackbaek@precocityllc.com> wrote:
>>
>>> I was reading on "How many Workers should I use?" (link
>>> <https://storm.apache.org/documentation/FAQ.html#how-many-workers-should-i-use?>)
>>> and it suggested us to use parallelism hint that is same as the total
>>> number of cores in the cluster. I just want to clarify that this
>>> parallelism is solely for this bolt only, without counting acker and spout
>>> task, right?
>>>
>>> Also, even if then number of bolts (not tasks) increases, are we still
>>> encouraged to keep the parallelism = total cores in cluster?
>>>
>>> Thanks,
>>> Baek
>>>
>>>
>>> *Seungtack Baek | Precocity, LLC*
>>>
>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>
>>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>>> www.precocityllc.com
>>>
>>>
>>> This is the end of this message.
>>>
>>> --
>>>
>>
>>
>>
>> --
>> Javier González Nicolini
>>
>
>


-- 
Javier González Nicolini

Re: Question on Parallelsim

Posted by Seungtack Baek <se...@precocityllc.com>.
What would be best to do if you have more than the number of cores?

For example, we have 4 spout and 13 bolts and our machine has 32 CPUs with
8 cores each..


*Seungtack Baek | Precocity, LLC*

Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715

*SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
www.precocityllc.com


This is the end of this message.

--

On Mon, Jun 8, 2015 at 6:26 PM, Javier Gonzalez <ja...@gmail.com> wrote:

> I would say, configure so that your total parallelism matches the number
> of cores available (i.e. if you have a topology with X spouts, Y boltAs and
> Z boltBs, make it so that X+Y+Z = cores available).  And one worker per
> machine, inter-JVM communications are expensive. When you have more bolts
> and spouts than available cores, you're losing time to switching available
> cpus between them. In an ideal world, your topology will be able to
> allocate the cores with components in a 1-1 fashion without switching.
>
> Regards,
> JG
>
> On Mon, Jun 8, 2015 at 6:56 PM, Seungtack Baek <
> seungtackbaek@precocityllc.com> wrote:
>
>> I was reading on "How many Workers should I use?" (link
>> <https://storm.apache.org/documentation/FAQ.html#how-many-workers-should-i-use?>)
>> and it suggested us to use parallelism hint that is same as the total
>> number of cores in the cluster. I just want to clarify that this
>> parallelism is solely for this bolt only, without counting acker and spout
>> task, right?
>>
>> Also, even if then number of bolts (not tasks) increases, are we still
>> encouraged to keep the parallelism = total cores in cluster?
>>
>> Thanks,
>> Baek
>>
>>
>> *Seungtack Baek | Precocity, LLC*
>>
>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>
>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>> www.precocityllc.com
>>
>>
>> This is the end of this message.
>>
>> --
>>
>
>
>
> --
> Javier González Nicolini
>

Re: Question on Parallelsim

Posted by Javier Gonzalez <ja...@gmail.com>.
I would say, configure so that your total parallelism matches the number of
cores available (i.e. if you have a topology with X spouts, Y boltAs and Z
boltBs, make it so that X+Y+Z = cores available).  And one worker per
machine, inter-JVM communications are expensive. When you have more bolts
and spouts than available cores, you're losing time to switching available
cpus between them. In an ideal world, your topology will be able to
allocate the cores with components in a 1-1 fashion without switching.

Regards,
JG

On Mon, Jun 8, 2015 at 6:56 PM, Seungtack Baek <
seungtackbaek@precocityllc.com> wrote:

> I was reading on "How many Workers should I use?" (link
> <https://storm.apache.org/documentation/FAQ.html#how-many-workers-should-i-use?>)
> and it suggested us to use parallelism hint that is same as the total
> number of cores in the cluster. I just want to clarify that this
> parallelism is solely for this bolt only, without counting acker and spout
> task, right?
>
> Also, even if then number of bolts (not tasks) increases, are we still
> encouraged to keep the parallelism = total cores in cluster?
>
> Thanks,
> Baek
>
>
> *Seungtack Baek | Precocity, LLC*
>
> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>
> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
> www.precocityllc.com
>
>
> This is the end of this message.
>
> --
>



-- 
Javier González Nicolini