You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Dima Dragan <di...@belleron.net> on 2015/05/19 15:54:01 UTC

Decreasing Complete latency with growing number of executors

Hi everyone,

I have found a strange behavior in topology metrics.

Let`s say, we have 1 node, 2-core machine. simple Storm topology
Spout A -> Bolt B -> Bolt C

Bolt B splits message on 320 parts and  emits (shuffle grouping) each to
Bolt C. Also Bolts B and C make some read/write operations to db.

Input flow is continuous and static.

Based on logic, setting up a more higher number of executors for Bolt C
than number of cores should be useless (the bigger part of threads will be
sleeping).
It is confirmed by increasing execute and process latency.

But I noticed that complete latency has started to decrease. And I do not
understand why.

For example, stats for bolt C:

ExecutorsProcess latency (ms)Complete latency (ms)25.599897.27646.3526.364
28.432345.454

Is it side effect of IO bound tasks?

Thanks in advance.

-- 
Best regards,
Dmytro Dragan

Re: Decreasing Complete latency with growing number of executors

Posted by Dima Dragan <di...@belleron.net>.

Nathan,

Thank you very much for your explanation and time!
I`ve got it.

On Wed, May 20, 2015 at 4:19 PM, Nathan Leung <nc...@gmail.com> wrote:

> I'll use a somewhat canned example for illustrative purposes.  Let's say
> the spout has emitted 1000 tuples and they are sitting in the output queue,
> and your bolt is the only bolt in the system.
>
> If you have 2 executors, each tuple takes 5.6ms.  This means that each
> bolt can process 178.5 tuples / s, and combined they can process 357 tuples
> / s.  This means it will take 2.8 seconds to process all of the tuples.
> The average tuple complete latency in this case will be 1.4ms (divide total
> time by 2 for the average end time, and by 1000 because that is how many
> tuples we have).
>
> If you have 64 executors, each tuple takes 28.5ms.  This means each bolt
> can process 35 tuples / s, and combined they can process 2245 tuples / s
> (note the values aren't precise to make things more legible).  This means
> it takes 0.445 seconds to process your tuples.  Average complete latency is
> 0.222ms.
>
> On Wed, May 20, 2015 at 8:12 AM, Dima Dragan <di...@belleron.net>
> wrote:
>
>> Nathan,
>>
>> Process and execute latency are growing, should it mean that we spend
>> more time for processing tuple, cause it spends more time in bolt queue?
>>
>> I thought that "Complete latency" and "Process latency" should be
>> correlated. Am I right?
>>
>>
>> On Wed, May 20, 2015 at 2:10 PM, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> My point with increased throughput was that if you have items queued
>>> from the spout waiting to be processed, that counts towards the complete
>>> latency for the spout. If your bolts go through the tuples faster (and as
>>> you add more they do, you have 6x speedup from more bolts) then you will
>>> see the complete latency drop.
>>> On May 20, 2015 4:01 AM, "Dima Dragan" <di...@belleron.net> wrote:
>>>
>>>> Thank you, Jeffrey and Devang for your answers.
>>>>
>>>> Jeffrey, as far as I use shuffle grouping, I think, network
>>>> serialization will left, but there will be no network delays (for remove it
>>>> there is localOrShuffling grouping). For all experiments, I use only one
>>>> worker, so it does not explain why complete latency could decrease.
>>>>
>>>> But I think you are right about definitions)
>>>>
>>>> Devang, no, I set up 1 worker and 1 acker for all tests.
>>>>
>>>>
>>>> Best regards,
>>>> Dmytro Dragan
>>>> On May 20, 2015 05:03, "Devang Shah" <de...@gmail.com> wrote:
>>>>
>>>>> Was the number of workers or number of ackers changed across your
>>>>> experiments ? What are the numbers you used ?
>>>>>
>>>>> When you have many executors, increasing the ackers reduces the
>>>>> complete latency.
>>>>>
>>>>> Thanks and Regards,
>>>>> Devang
>>>>>  On 20 May 2015 03:15, "Jeffery Maass" <ma...@gmail.com> wrote:
>>>>>
>>>>>> Maybe the difference has to do with where the executors were
>>>>>> running.  If your entire topology is running within the same worker, it
>>>>>> would mean that a serialization for the worker to worker networking layer
>>>>>> is left out of the picture.  I suppose that would mean the complete latency
>>>>>> could decrease.  At the same time, process latency could very well
>>>>>> increase, since all the work is being done within the same worker.  My
>>>>>> understanding that process latency is measured from the time the tuple
>>>>>> enters the executor until it leaves the executor.  Or was it from the time
>>>>>> the tuple enters the worker until it leaves the worker?  I don't recall.
>>>>>>
>>>>>> I bet a firm definition of the latency terms would shed some light.
>>>>>>
>>>>>> Thank you for your time!
>>>>>>
>>>>>> +++++++++++++++++++++
>>>>>> Jeff Maass <ma...@gmail.com>
>>>>>> linkedin.com/in/jeffmaass
>>>>>> stackoverflow.com/users/373418/maassql
>>>>>> +++++++++++++++++++++
>>>>>>
>>>>>>
>>>>>> On Tue, May 19, 2015 at 9:47 AM, Dima Dragan <
>>>>>> dima.dragan@belleron.net> wrote:
>>>>>>
>>>>>>> Thanks Nathan for your answer,
>>>>>>>
>>>>>>> But I`m afraid that you understand me wrong :  With increasing
>>>>>>> executors by 32x, each executor's throughput *increased* by 5x, but
>>>>>>> complete latency dropped.
>>>>>>>
>>>>>>> On Tue, May 19, 2015 at 5:16 PM, Nathan Leung <nc...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It depends on your application and the characteristics of the io.
>>>>>>>> You increased executors by 32x and each executor's throughput dropped by
>>>>>>>> 5x, so it makes sense that latency will drop.
>>>>>>>> On May 19, 2015 9:54 AM, "Dima Dragan" <di...@belleron.net>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> I have found a strange behavior in topology metrics.
>>>>>>>>>
>>>>>>>>> Let`s say, we have 1 node, 2-core machine. simple Storm topology
>>>>>>>>> Spout A -> Bolt B -> Bolt C
>>>>>>>>>
>>>>>>>>> Bolt B splits message on 320 parts and  emits (shuffle grouping)
>>>>>>>>> each to Bolt C. Also Bolts B and C make some read/write operations to db.
>>>>>>>>>
>>>>>>>>> Input flow is continuous and static.
>>>>>>>>>
>>>>>>>>> Based on logic, setting up a more higher number of executors for
>>>>>>>>> Bolt C than number of cores should be useless (the bigger part of threads
>>>>>>>>> will be sleeping).
>>>>>>>>> It is confirmed by increasing execute and process latency.
>>>>>>>>>
>>>>>>>>> But I noticed that complete latency has started to decrease. And I
>>>>>>>>> do not understand why.
>>>>>>>>>
>>>>>>>>> For example, stats for bolt C:
>>>>>>>>>
>>>>>>>>> ExecutorsProcess latency (ms)Complete latency (ms)25.599897.2764
>>>>>>>>> 6.3526.36428.432345.454
>>>>>>>>>
>>>>>>>>> Is it side effect of IO bound tasks?
>>>>>>>>>
>>>>>>>>> Thanks in advance.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best regards,
>>>>>>>>> Dmytro Dragan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Dmytro Dragan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>
>>
>> --
>> Best regards,
>> Dmytro Dragan
>>
>>
>>
>


-- 
Best regards,
Dmytro Dragan

Re: Decreasing Complete latency with growing number of executors

Posted by Nathan Leung <nc...@gmail.com>.

I'll use a somewhat canned example for illustrative purposes.  Let's say
the spout has emitted 1000 tuples and they are sitting in the output queue,
and your bolt is the only bolt in the system.

If you have 2 executors, each tuple takes 5.6ms.  This means that each bolt
can process 178.5 tuples / s, and combined they can process 357 tuples /
s.  This means it will take 2.8 seconds to process all of the tuples.  The
average tuple complete latency in this case will be 1.4ms (divide total
time by 2 for the average end time, and by 1000 because that is how many
tuples we have).

If you have 64 executors, each tuple takes 28.5ms.  This means each bolt
can process 35 tuples / s, and combined they can process 2245 tuples / s
(note the values aren't precise to make things more legible).  This means
it takes 0.445 seconds to process your tuples.  Average complete latency is
0.222ms.

On Wed, May 20, 2015 at 8:12 AM, Dima Dragan <di...@belleron.net>
wrote:

> Nathan,
>
> Process and execute latency are growing, should it mean that we spend more
> time for processing tuple, cause it spends more time in bolt queue?
>
> I thought that "Complete latency" and "Process latency" should be
> correlated. Am I right?
>
>
> On Wed, May 20, 2015 at 2:10 PM, Nathan Leung <nc...@gmail.com> wrote:
>
>> My point with increased throughput was that if you have items queued from
>> the spout waiting to be processed, that counts towards the complete latency
>> for the spout. If your bolts go through the tuples faster (and as you add
>> more they do, you have 6x speedup from more bolts) then you will see the
>> complete latency drop.
>> On May 20, 2015 4:01 AM, "Dima Dragan" <di...@belleron.net> wrote:
>>
>>> Thank you, Jeffrey and Devang for your answers.
>>>
>>> Jeffrey, as far as I use shuffle grouping, I think, network
>>> serialization will left, but there will be no network delays (for remove it
>>> there is localOrShuffling grouping). For all experiments, I use only one
>>> worker, so it does not explain why complete latency could decrease.
>>>
>>> But I think you are right about definitions)
>>>
>>> Devang, no, I set up 1 worker and 1 acker for all tests.
>>>
>>>
>>> Best regards,
>>> Dmytro Dragan
>>> On May 20, 2015 05:03, "Devang Shah" <de...@gmail.com> wrote:
>>>
>>>> Was the number of workers or number of ackers changed across your
>>>> experiments ? What are the numbers you used ?
>>>>
>>>> When you have many executors, increasing the ackers reduces the
>>>> complete latency.
>>>>
>>>> Thanks and Regards,
>>>> Devang
>>>>  On 20 May 2015 03:15, "Jeffery Maass" <ma...@gmail.com> wrote:
>>>>
>>>>> Maybe the difference has to do with where the executors were running.
>>>>> If your entire topology is running within the same worker, it would mean
>>>>> that a serialization for the worker to worker networking layer is left out
>>>>> of the picture.  I suppose that would mean the complete latency could
>>>>> decrease.  At the same time, process latency could very well increase,
>>>>> since all the work is being done within the same worker.  My understanding
>>>>> that process latency is measured from the time the tuple enters the
>>>>> executor until it leaves the executor.  Or was it from the time the tuple
>>>>> enters the worker until it leaves the worker?  I don't recall.
>>>>>
>>>>> I bet a firm definition of the latency terms would shed some light.
>>>>>
>>>>> Thank you for your time!
>>>>>
>>>>> +++++++++++++++++++++
>>>>> Jeff Maass <ma...@gmail.com>
>>>>> linkedin.com/in/jeffmaass
>>>>> stackoverflow.com/users/373418/maassql
>>>>> +++++++++++++++++++++
>>>>>
>>>>>
>>>>> On Tue, May 19, 2015 at 9:47 AM, Dima Dragan <dima.dragan@belleron.net
>>>>> > wrote:
>>>>>
>>>>>> Thanks Nathan for your answer,
>>>>>>
>>>>>> But I`m afraid that you understand me wrong :  With increasing
>>>>>> executors by 32x, each executor's throughput *increased* by 5x, but
>>>>>> complete latency dropped.
>>>>>>
>>>>>> On Tue, May 19, 2015 at 5:16 PM, Nathan Leung <nc...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> It depends on your application and the characteristics of the io.
>>>>>>> You increased executors by 32x and each executor's throughput dropped by
>>>>>>> 5x, so it makes sense that latency will drop.
>>>>>>> On May 19, 2015 9:54 AM, "Dima Dragan" <di...@belleron.net>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> I have found a strange behavior in topology metrics.
>>>>>>>>
>>>>>>>> Let`s say, we have 1 node, 2-core machine. simple Storm topology
>>>>>>>> Spout A -> Bolt B -> Bolt C
>>>>>>>>
>>>>>>>> Bolt B splits message on 320 parts and  emits (shuffle grouping)
>>>>>>>> each to Bolt C. Also Bolts B and C make some read/write operations to db.
>>>>>>>>
>>>>>>>> Input flow is continuous and static.
>>>>>>>>
>>>>>>>> Based on logic, setting up a more higher number of executors for
>>>>>>>> Bolt C than number of cores should be useless (the bigger part of threads
>>>>>>>> will be sleeping).
>>>>>>>> It is confirmed by increasing execute and process latency.
>>>>>>>>
>>>>>>>> But I noticed that complete latency has started to decrease. And I
>>>>>>>> do not understand why.
>>>>>>>>
>>>>>>>> For example, stats for bolt C:
>>>>>>>>
>>>>>>>> ExecutorsProcess latency (ms)Complete latency (ms)25.599897.27646.3
>>>>>>>> 526.36428.432345.454
>>>>>>>>
>>>>>>>> Is it side effect of IO bound tasks?
>>>>>>>>
>>>>>>>> Thanks in advance.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>> Dmytro Dragan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Dmytro Dragan
>>>>>>
>>>>>>
>>>>>>
>>>>>
>
>
> --
> Best regards,
> Dmytro Dragan
>
>
>

Re: Decreasing Complete latency with growing number of executors

Posted by Dima Dragan <di...@belleron.net>.

Nathan,

Process and execute latency are growing, should it mean that we spend more
time for processing tuple, cause it spends more time in bolt queue?

I thought that "Complete latency" and "Process latency" should be
correlated. Am I right?


On Wed, May 20, 2015 at 2:10 PM, Nathan Leung <nc...@gmail.com> wrote:

> My point with increased throughput was that if you have items queued from
> the spout waiting to be processed, that counts towards the complete latency
> for the spout. If your bolts go through the tuples faster (and as you add
> more they do, you have 6x speedup from more bolts) then you will see the
> complete latency drop.
> On May 20, 2015 4:01 AM, "Dima Dragan" <di...@belleron.net> wrote:
>
>> Thank you, Jeffrey and Devang for your answers.
>>
>> Jeffrey, as far as I use shuffle grouping, I think, network serialization
>> will left, but there will be no network delays (for remove it there is
>> localOrShuffling grouping). For all experiments, I use only one worker, so
>> it does not explain why complete latency could decrease.
>>
>> But I think you are right about definitions)
>>
>> Devang, no, I set up 1 worker and 1 acker for all tests.
>>
>>
>> Best regards,
>> Dmytro Dragan
>> On May 20, 2015 05:03, "Devang Shah" <de...@gmail.com> wrote:
>>
>>> Was the number of workers or number of ackers changed across your
>>> experiments ? What are the numbers you used ?
>>>
>>> When you have many executors, increasing the ackers reduces the complete
>>> latency.
>>>
>>> Thanks and Regards,
>>> Devang
>>>  On 20 May 2015 03:15, "Jeffery Maass" <ma...@gmail.com> wrote:
>>>
>>>> Maybe the difference has to do with where the executors were running.
>>>> If your entire topology is running within the same worker, it would mean
>>>> that a serialization for the worker to worker networking layer is left out
>>>> of the picture.  I suppose that would mean the complete latency could
>>>> decrease.  At the same time, process latency could very well increase,
>>>> since all the work is being done within the same worker.  My understanding
>>>> that process latency is measured from the time the tuple enters the
>>>> executor until it leaves the executor.  Or was it from the time the tuple
>>>> enters the worker until it leaves the worker?  I don't recall.
>>>>
>>>> I bet a firm definition of the latency terms would shed some light.
>>>>
>>>> Thank you for your time!
>>>>
>>>> +++++++++++++++++++++
>>>> Jeff Maass <ma...@gmail.com>
>>>> linkedin.com/in/jeffmaass
>>>> stackoverflow.com/users/373418/maassql
>>>> +++++++++++++++++++++
>>>>
>>>>
>>>> On Tue, May 19, 2015 at 9:47 AM, Dima Dragan <di...@belleron.net>
>>>> wrote:
>>>>
>>>>> Thanks Nathan for your answer,
>>>>>
>>>>> But I`m afraid that you understand me wrong :  With increasing
>>>>> executors by 32x, each executor's throughput *increased* by 5x, but
>>>>> complete latency dropped.
>>>>>
>>>>> On Tue, May 19, 2015 at 5:16 PM, Nathan Leung <nc...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> It depends on your application and the characteristics of the io. You
>>>>>> increased executors by 32x and each executor's throughput dropped by 5x, so
>>>>>> it makes sense that latency will drop.
>>>>>> On May 19, 2015 9:54 AM, "Dima Dragan" <di...@belleron.net>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> I have found a strange behavior in topology metrics.
>>>>>>>
>>>>>>> Let`s say, we have 1 node, 2-core machine. simple Storm topology
>>>>>>> Spout A -> Bolt B -> Bolt C
>>>>>>>
>>>>>>> Bolt B splits message on 320 parts and  emits (shuffle grouping)
>>>>>>> each to Bolt C. Also Bolts B and C make some read/write operations to db.
>>>>>>>
>>>>>>> Input flow is continuous and static.
>>>>>>>
>>>>>>> Based on logic, setting up a more higher number of executors for
>>>>>>> Bolt C than number of cores should be useless (the bigger part of threads
>>>>>>> will be sleeping).
>>>>>>> It is confirmed by increasing execute and process latency.
>>>>>>>
>>>>>>> But I noticed that complete latency has started to decrease. And I
>>>>>>> do not understand why.
>>>>>>>
>>>>>>> For example, stats for bolt C:
>>>>>>>
>>>>>>> ExecutorsProcess latency (ms)Complete latency (ms)25.599897.27646.3
>>>>>>> 526.36428.432345.454
>>>>>>>
>>>>>>> Is it side effect of IO bound tasks?
>>>>>>>
>>>>>>> Thanks in advance.
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Dmytro Dragan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Dmytro Dragan
>>>>>
>>>>>
>>>>>
>>>>


-- 
Best regards,
Dmytro Dragan

Re: Decreasing Complete latency with growing number of executors

Posted by Nathan Leung <nc...@gmail.com>.

My point with increased throughput was that if you have items queued from
the spout waiting to be processed, that counts towards the complete latency
for the spout. If your bolts go through the tuples faster (and as you add
more they do, you have 6x speedup from more bolts) then you will see the
complete latency drop.
On May 20, 2015 4:01 AM, "Dima Dragan" <di...@belleron.net> wrote:

> Thank you, Jeffrey and Devang for your answers.
>
> Jeffrey, as far as I use shuffle grouping, I think, network serialization
> will left, but there will be no network delays (for remove it there is
> localOrShuffling grouping). For all experiments, I use only one worker, so
> it does not explain why complete latency could decrease.
>
> But I think you are right about definitions)
>
> Devang, no, I set up 1 worker and 1 acker for all tests.
>
>
> Best regards,
> Dmytro Dragan
> On May 20, 2015 05:03, "Devang Shah" <de...@gmail.com> wrote:
>
>> Was the number of workers or number of ackers changed across your
>> experiments ? What are the numbers you used ?
>>
>> When you have many executors, increasing the ackers reduces the complete
>> latency.
>>
>> Thanks and Regards,
>> Devang
>>  On 20 May 2015 03:15, "Jeffery Maass" <ma...@gmail.com> wrote:
>>
>>> Maybe the difference has to do with where the executors were running.
>>> If your entire topology is running within the same worker, it would mean
>>> that a serialization for the worker to worker networking layer is left out
>>> of the picture.  I suppose that would mean the complete latency could
>>> decrease.  At the same time, process latency could very well increase,
>>> since all the work is being done within the same worker.  My understanding
>>> that process latency is measured from the time the tuple enters the
>>> executor until it leaves the executor.  Or was it from the time the tuple
>>> enters the worker until it leaves the worker?  I don't recall.
>>>
>>> I bet a firm definition of the latency terms would shed some light.
>>>
>>> Thank you for your time!
>>>
>>> +++++++++++++++++++++
>>> Jeff Maass <ma...@gmail.com>
>>> linkedin.com/in/jeffmaass
>>> stackoverflow.com/users/373418/maassql
>>> +++++++++++++++++++++
>>>
>>>
>>> On Tue, May 19, 2015 at 9:47 AM, Dima Dragan <di...@belleron.net>
>>> wrote:
>>>
>>>> Thanks Nathan for your answer,
>>>>
>>>> But I`m afraid that you understand me wrong :  With increasing
>>>> executors by 32x, each executor's throughput *increased* by 5x, but
>>>> complete latency dropped.
>>>>
>>>> On Tue, May 19, 2015 at 5:16 PM, Nathan Leung <nc...@gmail.com>
>>>> wrote:
>>>>
>>>>> It depends on your application and the characteristics of the io. You
>>>>> increased executors by 32x and each executor's throughput dropped by 5x, so
>>>>> it makes sense that latency will drop.
>>>>> On May 19, 2015 9:54 AM, "Dima Dragan" <di...@belleron.net>
>>>>> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I have found a strange behavior in topology metrics.
>>>>>>
>>>>>> Let`s say, we have 1 node, 2-core machine. simple Storm topology
>>>>>> Spout A -> Bolt B -> Bolt C
>>>>>>
>>>>>> Bolt B splits message on 320 parts and  emits (shuffle grouping) each
>>>>>> to Bolt C. Also Bolts B and C make some read/write operations to db.
>>>>>>
>>>>>> Input flow is continuous and static.
>>>>>>
>>>>>> Based on logic, setting up a more higher number of executors for Bolt
>>>>>> C than number of cores should be useless (the bigger part of threads will
>>>>>> be sleeping).
>>>>>> It is confirmed by increasing execute and process latency.
>>>>>>
>>>>>> But I noticed that complete latency has started to decrease. And I do
>>>>>> not understand why.
>>>>>>
>>>>>> For example, stats for bolt C:
>>>>>>
>>>>>> ExecutorsProcess latency (ms)Complete latency (ms)25.599897.27646.3
>>>>>> 526.36428.432345.454
>>>>>>
>>>>>> Is it side effect of IO bound tasks?
>>>>>>
>>>>>> Thanks in advance.
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Dmytro Dragan
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Dmytro Dragan
>>>>
>>>>
>>>>
>>>

Re: Decreasing Complete latency with growing number of executors

Posted by Dima Dragan <di...@belleron.net>.

Thank you, Jeffrey and Devang for your answers.

Jeffrey, as far as I use shuffle grouping, I think, network serialization
will left, but there will be no network delays (for remove it there is
localOrShuffling grouping). For all experiments, I use only one worker, so
it does not explain why complete latency could decrease.

But I think you are right about definitions)

Devang, no, I set up 1 worker and 1 acker for all tests.


Best regards,
Dmytro Dragan
On May 20, 2015 05:03, "Devang Shah" <de...@gmail.com> wrote:

> Was the number of workers or number of ackers changed across your
> experiments ? What are the numbers you used ?
>
> When you have many executors, increasing the ackers reduces the complete
> latency.
>
> Thanks and Regards,
> Devang
>  On 20 May 2015 03:15, "Jeffery Maass" <ma...@gmail.com> wrote:
>
>> Maybe the difference has to do with where the executors were running.  If
>> your entire topology is running within the same worker, it would mean that
>> a serialization for the worker to worker networking layer is left out of
>> the picture.  I suppose that would mean the complete latency could
>> decrease.  At the same time, process latency could very well increase,
>> since all the work is being done within the same worker.  My understanding
>> that process latency is measured from the time the tuple enters the
>> executor until it leaves the executor.  Or was it from the time the tuple
>> enters the worker until it leaves the worker?  I don't recall.
>>
>> I bet a firm definition of the latency terms would shed some light.
>>
>> Thank you for your time!
>>
>> +++++++++++++++++++++
>> Jeff Maass <ma...@gmail.com>
>> linkedin.com/in/jeffmaass
>> stackoverflow.com/users/373418/maassql
>> +++++++++++++++++++++
>>
>>
>> On Tue, May 19, 2015 at 9:47 AM, Dima Dragan <di...@belleron.net>
>> wrote:
>>
>>> Thanks Nathan for your answer,
>>>
>>> But I`m afraid that you understand me wrong :  With increasing
>>> executors by 32x, each executor's throughput *increased* by 5x, but
>>> complete latency dropped.
>>>
>>> On Tue, May 19, 2015 at 5:16 PM, Nathan Leung <nc...@gmail.com> wrote:
>>>
>>>> It depends on your application and the characteristics of the io. You
>>>> increased executors by 32x and each executor's throughput dropped by 5x, so
>>>> it makes sense that latency will drop.
>>>> On May 19, 2015 9:54 AM, "Dima Dragan" <di...@belleron.net>
>>>> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I have found a strange behavior in topology metrics.
>>>>>
>>>>> Let`s say, we have 1 node, 2-core machine. simple Storm topology
>>>>> Spout A -> Bolt B -> Bolt C
>>>>>
>>>>> Bolt B splits message on 320 parts and  emits (shuffle grouping) each
>>>>> to Bolt C. Also Bolts B and C make some read/write operations to db.
>>>>>
>>>>> Input flow is continuous and static.
>>>>>
>>>>> Based on logic, setting up a more higher number of executors for Bolt
>>>>> C than number of cores should be useless (the bigger part of threads will
>>>>> be sleeping).
>>>>> It is confirmed by increasing execute and process latency.
>>>>>
>>>>> But I noticed that complete latency has started to decrease. And I do
>>>>> not understand why.
>>>>>
>>>>> For example, stats for bolt C:
>>>>>
>>>>> ExecutorsProcess latency (ms)Complete latency (ms)25.599897.27646.3
>>>>> 526.36428.432345.454
>>>>>
>>>>> Is it side effect of IO bound tasks?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Dmytro Dragan
>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Dmytro Dragan
>>>
>>>
>>>
>>

Re: Decreasing Complete latency with growing number of executors

Posted by Devang Shah <de...@gmail.com>.

Was the number of workers or number of ackers changed across your
experiments ? What are the numbers you used ?

When you have many executors, increasing the ackers reduces the complete
latency.

Thanks and Regards,
Devang
 On 20 May 2015 03:15, "Jeffery Maass" <ma...@gmail.com> wrote:

> Maybe the difference has to do with where the executors were running.  If
> your entire topology is running within the same worker, it would mean that
> a serialization for the worker to worker networking layer is left out of
> the picture.  I suppose that would mean the complete latency could
> decrease.  At the same time, process latency could very well increase,
> since all the work is being done within the same worker.  My understanding
> that process latency is measured from the time the tuple enters the
> executor until it leaves the executor.  Or was it from the time the tuple
> enters the worker until it leaves the worker?  I don't recall.
>
> I bet a firm definition of the latency terms would shed some light.
>
> Thank you for your time!
>
> +++++++++++++++++++++
> Jeff Maass <ma...@gmail.com>
> linkedin.com/in/jeffmaass
> stackoverflow.com/users/373418/maassql
> +++++++++++++++++++++
>
>
> On Tue, May 19, 2015 at 9:47 AM, Dima Dragan <di...@belleron.net>
> wrote:
>
>> Thanks Nathan for your answer,
>>
>> But I`m afraid that you understand me wrong :  With increasing executors
>> by 32x, each executor's throughput *increased* by 5x, but complete
>> latency dropped.
>>
>> On Tue, May 19, 2015 at 5:16 PM, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> It depends on your application and the characteristics of the io. You
>>> increased executors by 32x and each executor's throughput dropped by 5x, so
>>> it makes sense that latency will drop.
>>> On May 19, 2015 9:54 AM, "Dima Dragan" <di...@belleron.net> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I have found a strange behavior in topology metrics.
>>>>
>>>> Let`s say, we have 1 node, 2-core machine. simple Storm topology
>>>> Spout A -> Bolt B -> Bolt C
>>>>
>>>> Bolt B splits message on 320 parts and  emits (shuffle grouping) each
>>>> to Bolt C. Also Bolts B and C make some read/write operations to db.
>>>>
>>>> Input flow is continuous and static.
>>>>
>>>> Based on logic, setting up a more higher number of executors for Bolt C
>>>> than number of cores should be useless (the bigger part of threads will be
>>>> sleeping).
>>>> It is confirmed by increasing execute and process latency.
>>>>
>>>> But I noticed that complete latency has started to decrease. And I do
>>>> not understand why.
>>>>
>>>> For example, stats for bolt C:
>>>>
>>>> ExecutorsProcess latency (ms)Complete latency (ms)25.599897.27646.3
>>>> 526.36428.432345.454
>>>>
>>>> Is it side effect of IO bound tasks?
>>>>
>>>> Thanks in advance.
>>>>
>>>> --
>>>> Best regards,
>>>> Dmytro Dragan
>>>>
>>>>
>>>>
>>
>>
>> --
>> Best regards,
>> Dmytro Dragan
>>
>>
>>
>

Re: Decreasing Complete latency with growing number of executors

Posted by Jeffery Maass <ma...@gmail.com>.

Maybe the difference has to do with where the executors were running.  If
your entire topology is running within the same worker, it would mean that
a serialization for the worker to worker networking layer is left out of
the picture.  I suppose that would mean the complete latency could
decrease.  At the same time, process latency could very well increase,
since all the work is being done within the same worker.  My understanding
that process latency is measured from the time the tuple enters the
executor until it leaves the executor.  Or was it from the time the tuple
enters the worker until it leaves the worker?  I don't recall.

I bet a firm definition of the latency terms would shed some light.

Thank you for your time!

+++++++++++++++++++++
Jeff Maass <ma...@gmail.com>
linkedin.com/in/jeffmaass
stackoverflow.com/users/373418/maassql
+++++++++++++++++++++

On Tue, May 19, 2015 at 9:47 AM, Dima Dragan <di...@belleron.net>
wrote:

> Thanks Nathan for your answer,
>
> But I`m afraid that you understand me wrong :  With increasing executors
> by 32x, each executor's throughput *increased* by 5x, but complete
> latency dropped.
>
> On Tue, May 19, 2015 at 5:16 PM, Nathan Leung <nc...@gmail.com> wrote:
>
>> It depends on your application and the characteristics of the io. You
>> increased executors by 32x and each executor's throughput dropped by 5x, so
>> it makes sense that latency will drop.
>> On May 19, 2015 9:54 AM, "Dima Dragan" <di...@belleron.net> wrote:
>>
>>> Hi everyone,
>>>
>>> I have found a strange behavior in topology metrics.
>>>
>>> Let`s say, we have 1 node, 2-core machine. simple Storm topology
>>> Spout A -> Bolt B -> Bolt C
>>>
>>> Bolt B splits message on 320 parts and  emits (shuffle grouping) each to
>>> Bolt C. Also Bolts B and C make some read/write operations to db.
>>>
>>> Input flow is continuous and static.
>>>
>>> Based on logic, setting up a more higher number of executors for Bolt C
>>> than number of cores should be useless (the bigger part of threads will be
>>> sleeping).
>>> It is confirmed by increasing execute and process latency.
>>>
>>> But I noticed that complete latency has started to decrease. And I do
>>> not understand why.
>>>
>>> For example, stats for bolt C:
>>>
>>> ExecutorsProcess latency (ms)Complete latency (ms)25.599897.27646.3526.3
>>> 6428.432345.454
>>>
>>> Is it side effect of IO bound tasks?
>>>
>>> Thanks in advance.
>>>
>>> --
>>> Best regards,
>>> Dmytro Dragan
>>>
>>>
>>>
>
>
> --
> Best regards,
> Dmytro Dragan
>
>
>

Re: Decreasing Complete latency with growing number of executors

Posted by Dima Dragan <di...@belleron.net>.

Thanks Nathan for your answer,

But I`m afraid that you understand me wrong :  With increasing executors by
32x, each executor's throughput *increased* by 5x, but complete latency
dropped.

On Tue, May 19, 2015 at 5:16 PM, Nathan Leung <nc...@gmail.com> wrote:

> It depends on your application and the characteristics of the io. You
> increased executors by 32x and each executor's throughput dropped by 5x, so
> it makes sense that latency will drop.
> On May 19, 2015 9:54 AM, "Dima Dragan" <di...@belleron.net> wrote:
>
>> Hi everyone,
>>
>> I have found a strange behavior in topology metrics.
>>
>> Let`s say, we have 1 node, 2-core machine. simple Storm topology
>> Spout A -> Bolt B -> Bolt C
>>
>> Bolt B splits message on 320 parts and  emits (shuffle grouping) each to
>> Bolt C. Also Bolts B and C make some read/write operations to db.
>>
>> Input flow is continuous and static.
>>
>> Based on logic, setting up a more higher number of executors for Bolt C
>> than number of cores should be useless (the bigger part of threads will be
>> sleeping).
>> It is confirmed by increasing execute and process latency.
>>
>> But I noticed that complete latency has started to decrease. And I do not
>> understand why.
>>
>> For example, stats for bolt C:
>>
>> ExecutorsProcess latency (ms)Complete latency (ms)25.599897.27646.3526.3
>> 6428.432345.454
>>
>> Is it side effect of IO bound tasks?
>>
>> Thanks in advance.
>>
>> --
>> Best regards,
>> Dmytro Dragan
>>
>>
>>


-- 
Best regards,
Dmytro Dragan

Re: Decreasing Complete latency with growing number of executors

Posted by Nathan Leung <nc...@gmail.com>.

It depends on your application and the characteristics of the io. You
increased executors by 32x and each executor's throughput dropped by 5x, so
it makes sense that latency will drop.
On May 19, 2015 9:54 AM, "Dima Dragan" <di...@belleron.net> wrote:

> Hi everyone,
>
> I have found a strange behavior in topology metrics.
>
> Let`s say, we have 1 node, 2-core machine. simple Storm topology
> Spout A -> Bolt B -> Bolt C
>
> Bolt B splits message on 320 parts and  emits (shuffle grouping) each to
> Bolt C. Also Bolts B and C make some read/write operations to db.
>
> Input flow is continuous and static.
>
> Based on logic, setting up a more higher number of executors for Bolt C
> than number of cores should be useless (the bigger part of threads will be
> sleeping).
> It is confirmed by increasing execute and process latency.
>
> But I noticed that complete latency has started to decrease. And I do not
> understand why.
>
> For example, stats for bolt C:
>
> ExecutorsProcess latency (ms)Complete latency (ms)25.599897.27646.3526.364
> 28.432345.454
>
> Is it side effect of IO bound tasks?
>
> Thanks in advance.
>
> --
> Best regards,
> Dmytro Dragan
>
>
>