You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Matthew Lowe <gi...@gmail.com> on 2016/03/22 21:13:51 UTC

[HELP!] Storm Network Performance?

Hello, I seems to be getting decreasing performance for each additional bolt I add to a topology.
I would like to know what performance decreases to expect when making longer topologies.
Here is my current topology:

Spout —shuffleGrouping--> Bolt1 —fieldGrouping--> Bolt2

1 x Spout, 1 x Bolt1, 1 x Bolt2.
3 workers, 3 ackers, each component has 1 task and 1 executor.
Nimbus set up with 3 AWS supervisors which are c4.xlarge.
Each AWS intense has only one worker port available (6000 I think) meaning one worker/component/executor per AWS instance.
The topology uses message guaranteeing (sending an id with the spout tuple and anchoring the tuple in bolts).
I think the average tuple size is <= 450 bytes.
The maxTuplePending is set to 35 (During testing we found 3 ackers and 35 max pending was “good")

When testing with only the Spout (no bolts) I was emitting about 70,000 per second.
When adding the first bolt this it then dropped to about 10,000 - 14,000 per second
Finally, adding the final bolt dropped me down to about 5,000 per second.

When looking at Storm ui the spout takes about 11ms (complete latency, when running full topology)
And each spout takes about 0.2ms to execute.
When running only the Spout the complete latency is a lot less, maybe 2-3ms.

I understand that message guaranteeing takes some performance, but it supposed to drop this much?
Is this a networking issue? My AWS instances have “HIGH” at the networking performance and they are in a placement group.


Thanks for any feedback. Sorry if this is the wrong mailing group.
I will add any information you need in a reply.



Re: [HELP!] Storm Network Performance?

Posted by Olivier Mallassi <ol...@gmail.com>.
Hi Matt

Some ideas (not sure it will help)
- could you check the capacity kpi on the ui. Afair, this is an average on
the last 10 min?
- are you sure your data is evenly distributed across the bolts (when doing
the field grouping) ?

Oliv

On Wednesday, 23 March 2016, Matthew Lowe <gi...@gmail.com> wrote:

> Thanks again for the feedback.
>
> When running `htop` I see that the CPU utilisation for each core is <25%,
> this is true for all components.
> I have a c4.xlarge instance that that 4 vCPUs (2 hyper thread intel cores).
> We are getting a json formatted item from our Kestrel Message queue
> server. Is there a way I can check the tuple size?
> When I save the JSON file on my mac, the file size is about 450 bytes.
>
> Additional information about the spout:
> We grab a batch of kestrel items (4000) and then emit them one by one in
> `nextTuple`. This reduces the need to consistently communicate with kestrel.
> We also batch the confirmations back to kestrel (which is done when we
> have ack’d half the original batch(2000)).
>
> Adding this batching has increased performance.
>
> //Matt
>
> > On 23 Mar 2016, at 10:12, Brian Candler <b.candler@pobox.com
> <javascript:;>> wrote:
> >
> > On 23/03/2016 08:45, Matthew Lowe wrote:
> >> The cpu is not under a lot of strain for the spout
> > On each VM, have you tried running "top" and then hitting "1"? This will
> show you the CPU utilisation per core. If on any VM you see one core at
> 100% then you need to increase parallelism by more threads (executors).
> >
> >> , so my only conclusion can be the networking performance?
> > Are your tuples large? A 1G NIC has a usable capacity of about
> 112MiB/sec. So a limit of 6000/sec would imply 19KB/tuple.
> >
> >> Someone mentioned that is could be JVM related, to do with the GC or
> available memory? Im not sure about this though.
> >> These values are very similar to the ones I get on a c4.large, which
> maxes out at 3.5-4k per second.
> >>
> >> Interestingly I found this when the topology starts:
> >> [INFO] Spout tuples per second: 1
> >> [INFO] Spout tuples per second: 10000
> >> [INFO] Spout tuples per second: 936
> >> [INFO] Spout tuples per second: 1851
> >> [INFO] Spout tuples per second: 4480
> >> [INFO] Spout tuples per second: 5174
> >> [INFO] Spout tuples per second: 6399
> >> It then holds at about 6500 from here on.
> > I expect the spout will send as fast as it can until it reaches
> maxTuplePending, then stop and wait for tuples to be acked, and eventually
> will stabilise at the achievable throughput.
> >
> > Regards,
> >
> > Brian.
> >
>
>

Re: [HELP!] Storm Network Performance?

Posted by Matthew Lowe <gi...@gmail.com>.
Thanks again for the feedback.

When running `htop` I see that the CPU utilisation for each core is <25%, this is true for all components.
I have a c4.xlarge instance that that 4 vCPUs (2 hyper thread intel cores).
We are getting a json formatted item from our Kestrel Message queue server. Is there a way I can check the tuple size?
When I save the JSON file on my mac, the file size is about 450 bytes.

Additional information about the spout:
We grab a batch of kestrel items (4000) and then emit them one by one in `nextTuple`. This reduces the need to consistently communicate with kestrel. 
We also batch the confirmations back to kestrel (which is done when we have ack’d half the original batch(2000)).

Adding this batching has increased performance.

//Matt

> On 23 Mar 2016, at 10:12, Brian Candler <b....@pobox.com> wrote:
> 
> On 23/03/2016 08:45, Matthew Lowe wrote:
>> The cpu is not under a lot of strain for the spout
> On each VM, have you tried running "top" and then hitting "1"? This will show you the CPU utilisation per core. If on any VM you see one core at 100% then you need to increase parallelism by more threads (executors).
> 
>> , so my only conclusion can be the networking performance?
> Are your tuples large? A 1G NIC has a usable capacity of about 112MiB/sec. So a limit of 6000/sec would imply 19KB/tuple.
> 
>> Someone mentioned that is could be JVM related, to do with the GC or available memory? Im not sure about this though.
>> These values are very similar to the ones I get on a c4.large, which maxes out at 3.5-4k per second.
>> 
>> Interestingly I found this when the topology starts:
>> [INFO] Spout tuples per second: 1
>> [INFO] Spout tuples per second: 10000
>> [INFO] Spout tuples per second: 936
>> [INFO] Spout tuples per second: 1851
>> [INFO] Spout tuples per second: 4480
>> [INFO] Spout tuples per second: 5174
>> [INFO] Spout tuples per second: 6399
>> It then holds at about 6500 from here on.
> I expect the spout will send as fast as it can until it reaches maxTuplePending, then stop and wait for tuples to be acked, and eventually will stabilise at the achievable throughput.
> 
> Regards,
> 
> Brian.
> 


Re: [HELP!] Storm Network Performance?

Posted by Brian Candler <b....@pobox.com>.
On 23/03/2016 08:45, Matthew Lowe wrote:
> The cpu is not under a lot of strain for the spout
On each VM, have you tried running "top" and then hitting "1"? This will 
show you the CPU utilisation per core. If on any VM you see one core at 
100% then you need to increase parallelism by more threads (executors).

> , so my only conclusion can be the networking performance?
Are your tuples large? A 1G NIC has a usable capacity of about 
112MiB/sec. So a limit of 6000/sec would imply 19KB/tuple.

> Someone mentioned that is could be JVM related, to do with the GC or available memory? Im not sure about this though.
> These values are very similar to the ones I get on a c4.large, which maxes out at 3.5-4k per second.
>
> Interestingly I found this when the topology starts:
> [INFO] Spout tuples per second: 1
> [INFO] Spout tuples per second: 10000
> [INFO] Spout tuples per second: 936
> [INFO] Spout tuples per second: 1851
> [INFO] Spout tuples per second: 4480
> [INFO] Spout tuples per second: 5174
> [INFO] Spout tuples per second: 6399
> It then holds at about 6500 from here on.
I expect the spout will send as fast as it can until it reaches 
maxTuplePending, then stop and wait for tuples to be acked, and 
eventually will stabilise at the achievable throughput.

Regards,

Brian.


Re: [HELP!] Storm Network Performance?

Posted by Matthew Lowe <gi...@gmail.com>.
Thanks for getting back to me so fast.
I have just tried the following tests:

maxPendingSpout		tuples per second		ackers
35						5k					3

3500					5-6k					3

10000					5.5-6k				3

10000					5.5k					10

They all seem to emit at the same speed. As if I have hit a wall in performance.
The cpu is not under a lot of strain for the spout, so my only conclusion can be the networking performance?
Someone mentioned that is could be JVM related, to do with the GC or available memory? Im not sure about this though.
These values are very similar to the ones I get on a c4.large, which maxes out at 3.5-4k per second.

Interestingly I found this when the topology starts:
[INFO] Spout tuples per second: 1
[INFO] Spout tuples per second: 10000
[INFO] Spout tuples per second: 936
[INFO] Spout tuples per second: 1851
[INFO] Spout tuples per second: 4480
[INFO] Spout tuples per second: 5174
[INFO] Spout tuples per second: 6399
It then holds at about 6500 from here on.

Seems like it emits the 10000 I specify in the maxPending, then it drops.
Part of me is wondering if the bolts are to blame, but the UI and my timers say they take less that a ms to perform.
That can leave only the network speed, right?

Thanks for any help!

> On 22 Mar 2016, at 22:01, Brian Candler <b....@pobox.com> wrote:
> 
> On 22/03/2016 20:13, Matthew Lowe wrote:
>> The maxTuplePending is set to 35 (During testing we found 3 ackers and 35 max pending was “good")
> Have you tried increasing this now you have built your spout-bolt-bolt topology?
>> When testing with only the Spout (no bolts) I was emitting about 70,000 per second.
>> When adding the first bolt this it then dropped to about 10,000 - 14,000 per second
>> Finally, adding the final bolt dropped me down to about 5,000 per second.
>> 
>> When looking at Storm ui the spout takes about 11ms (complete latency, when running full topology)
>> And each spout takes about 0.2ms to execute.
>> When running only the Spout the complete latency is a lot less, maybe 2-3ms.
> Once maxTuplePending is reached the spout stops sending, and won't send another message until a previous message has been fully processed, all the way to the end.
> 
> You have set a limit of 35 messages "in flight" and each message takes 11ms end to end. This gives you an expected throughput of 35 x (1000/11) ~= 3,200 messages per second.
> 
> When running only the spout with 2ms latency you would expect 35 x (1000/2) = 17,500 messages per second.
> 
> Those numbers don't tie exactly with what you see, but gives a rough idea of how maxTuplePending might be limiting your throughput. At least, that's my guess as to what's happening :-)
> 
> Regards,
> 
> Brian.
> 


Re: [HELP!] Storm Network Performance?

Posted by Brian Candler <b....@pobox.com>.
On 22/03/2016 20:13, Matthew Lowe wrote:
> The maxTuplePending is set to 35 (During testing we found 3 ackers and 35 max pending was “good")
Have you tried increasing this now you have built your spout-bolt-bolt 
topology?
> When testing with only the Spout (no bolts) I was emitting about 70,000 per second.
> When adding the first bolt this it then dropped to about 10,000 - 14,000 per second
> Finally, adding the final bolt dropped me down to about 5,000 per second.
>
> When looking at Storm ui the spout takes about 11ms (complete latency, when running full topology)
> And each spout takes about 0.2ms to execute.
> When running only the Spout the complete latency is a lot less, maybe 2-3ms.
Once maxTuplePending is reached the spout stops sending, and won't send 
another message until a previous message has been fully processed, all 
the way to the end.

You have set a limit of 35 messages "in flight" and each message takes 
11ms end to end. This gives you an expected throughput of 35 x (1000/11) 
~= 3,200 messages per second.

When running only the spout with 2ms latency you would expect 35 x 
(1000/2) = 17,500 messages per second.

Those numbers don't tie exactly with what you see, but gives a rough 
idea of how maxTuplePending might be limiting your throughput. At least, 
that's my guess as to what's happening :-)

Regards,

Brian.