You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Alessio Pagliari <pa...@i3s.unice.fr> on 2018/03/30 14:18:46 UTC

Storm throughput

Hi everybody,

I’m trying to do some preliminary tests with storm, to understand how far it can go. Now I’m focusing on trying to understand which is his maximum throughput in terms of tuples per second. I saw the benchmark done by the guys at Hortonworks (ref: https://it.hortonworks.com/blog/microbenchmarking-storm-1-0-performance/ <https://it.hortonworks.com/blog/microbenchmarking-storm-1-0-performance/>) and in the first test they reach a spout emission rate of 3.2 million tuples/s. 

I tried to replicate the test, a simple spout that emits continuously the same string “some data”. Differently from them, I’m using Storm 1.1.1 and the storm cluster is set up on my laptop, anyway I’m just testing one spout not an entire topology, but if you think that more configuration information are needed, just ask. 

To compute the throughput I ask the total amount of tuples processed to the UI APIs each 10s and I subtract it by the previous measure to have the amount of tuples int the last 10s. What the mathematics give to me is something around 32k tuples/s.

I don’t think to be wrong saying that 32k is not even comparable to 3.2 million. Is there something that I’m missing? Is it normal this output?

Thank you for your help and for your time,

Alessio

Re: Storm throughput

Posted by Alessio Pagliari <pa...@i3s.unice.fr>.
> Something is definitely broken in your run or in your measurement method….

The problem doesn’t lie in my measurement method, I double checked trying as you said. Thank you for sharing the topo used, with that I was able to understand where I was failing. Basing my topo on other sample benchmark topologies that I found online, I enabled the option setDebug(true); printing an output message for each tuple was slowing me down, now I’m able to reach a spout emission rate of ~4.5M tuples per second.

Thank you all for the support.

----------
Alessio Pagliari
Scale Team, PhD Student
Université Côte d’Azur, CNRS, I3S



> On 31 Mar 2018, at 07:19, Roshan Naik <ro...@yahoo.com> wrote:
> 
> 
> Something is definitely broken in your run or in your measurement method.... and its not your hardware that is at fault. The machine on which those numbers were run had lots of cores but the cores were not fast at all. Even my mid 2015 macbook pro has faster cores than that machine which had old Intel CPUs.
> 
> You maybe making some mistakes in your calculations. Just run the topo for about 14 mins and take the 10 min window reading directly from the UI and calculate the per sec throughput from that. (that way you disregard the first 3 or 4mins to allow for warm up). Also are you overriding any default settings ?
> 
> 
> Here is the code for the topo that was used :  https://github.com/apache/storm/blob/1.1.x-branch/examples/storm-perf/src/main/java/org/apache/storm/perf/ConstSpoutOnlyTopo.java <https://github.com/apache/storm/blob/1.1.x-branch/examples/storm-perf/src/main/java/org/apache/storm/perf/ConstSpoutOnlyTopo.java> 
> 
> 
> 
> -roshan
> On Friday, March 30, 2018, 8:24:39 AM PDT, Alessio Pagliari <pa...@i3s.unice.fr> wrote:
> 
> 
> Surely they work on a way more powerful cluster, but the topology is composed by just one spout. No parallelization, no bolts, for a total of one worker, so 1 thread in a jvm. Even if I had 100 cores like them it shouldn't make any difference. Please, correct me if I'm wrong.
> 
> Such a topology will assign it's only spout to a worker in a node: so, the multi-node cluster is pointless. Meanwhile, regarding the number of cores, one executor cannot be at the same time on multiple cores, not being a multi-thread process. 
> 
> Is there some Storm or Java behavior that I'm not aware of?
> 
> Thank you,
> 
> Alessio
> 
> Sent from BlueMail <http://www.bluemail.me/r?b=12512>
> On Mar 30, 2018, at 4:28 PM, Jacob Johansen <johansenjuwp@gmail.com <ma...@gmail.com>> wrote:
> for their test, they were using 4 worker nodes (servers) each with 24vCores for a total of 96vCores.
> Most laptops max out at 8vCores and are typically at 4-6vCores 
> 
> Jacob Johansen 
> 
> On Fri, Mar 30, 2018 at 9:18 AM, Alessio Pagliari <pagliari@i3s.unice.fr <ma...@i3s.unice.fr>> wrote: 
> Hi everybody,
> 
> I’m trying to do some preliminary tests with storm, to understand how far it can go. Now I’m focusing on trying to understand which is his maximum throughput in terms of tuples per second. I saw the benchmark done by the guys at Hortonworks (ref:  https://it.hortonworks. com/blog/microbenchmarking- storm-1-0-performance/ <https://it.hortonworks.com/blog/microbenchmarking-storm-1-0-performance/>) and in the first test they reach a spout emission rate of 3.2 million tuples/s. 
> 
> I tried to replicate the test, a simple spout that emits continuously the same string “some data”. Differently from them, I’m using Storm 1.1.1 and the storm cluster is set up on my laptop, anyway I’m just testing one spout not an entire topology, but if you think that more configuration information are needed, just ask. 
> 
> To compute the throughput I ask the total amount of tuples processed to the UI APIs each 10s and I subtract it by the previous measure to have the amount of tuples int the last 10s. What the mathematics give to me is something around 32k tuples/s.
> 
> I don’t think to be wrong saying that 32k is not even comparable to 3.2 million. Is there something that I’m missing? Is it normal this output?
> 
> Thank you for your help and for your time,
> 
> Alessio
> 


Re: Storm throughput

Posted by Roshan Naik <ro...@yahoo.com>.
 
Something is definitely broken in your run or in your measurement method.... and its not your hardware that is at fault. The machine on which those numbers were run had lots of cores but the cores were not fast at all. Even my mid 2015 macbook pro has faster cores than that machine which had old Intel CPUs.
You maybe making some mistakes in your calculations. Just run the topo for about 14 mins and take the 10 min window reading directly from the UI and calculate the per sec throughput from that. (that way you disregard the first 3 or 4mins to allow for warm up). Also are you overriding any default settings ?

Here is the code for the topo that was used :  https://github.com/apache/storm/blob/1.1.x-branch/examples/storm-perf/src/main/java/org/apache/storm/perf/ConstSpoutOnlyTopo.java 


-roshan    On Friday, March 30, 2018, 8:24:39 AM PDT, Alessio Pagliari <pa...@i3s.unice.fr> wrote:  
 
 Surely they work on a way more powerful cluster, but the topology is composed by just one spout. No parallelization, no bolts, for a total of one worker, so 1 thread in a jvm. Even if I had 100 cores like them it shouldn't make any difference. Please, correct me if I'm wrong.

Such a topology will assign it's only spout to a worker in a node: so, the multi-node cluster is pointless. Meanwhile, regarding the number of cores, one executor cannot be at the same time on multiple cores, not being a multi-thread process. 

Is there some Storm or Java behavior that I'm not aware of?

Thank you,

Alessio

Sent from BlueMail On Mar 30, 2018, at 4:28 PM, Jacob Johansen <jo...@gmail.com> wrote:
 for their test, they were using 4 worker nodes (servers) each with 24vCores for a total of 96vCores.  Most laptops max out at 8vCores and are typically at 4-6vCores   
   Jacob Johansen 
   
  On Fri, Mar 30, 2018 at 9:18 AM, Alessio Pagliari <pa...@i3s.unice.fr> wrote: 
 
  Hi everybody,  
   I’m trying to do some preliminary tests with storm, to understand how far it can go. Now I’m focusing on trying to understand which is his maximum throughput in terms of tuples per second. I saw the benchmark done by the guys at Hortonworks (ref:  https://it.hortonworks. com/blog/microbenchmarking- storm-1-0-performance/) and in the first test they reach a spout emission rate of 3.2 million tuples/s.    
   I tried to replicate the test, a simple spout that emits continuously the same string “some data”. Differently from them, I’m using Storm 1.1.1 and the storm cluster is set up on my laptop, anyway I’m just testing one spout not an entire topology, but if you think that more configuration information are needed, just ask.    
   To compute the throughput I ask the total amount of tuples processed to the UI APIs each 10s and I subtract it by the previous measure to have the amount of tuples int the last 10s. What the mathematics give to me is something around 32k tuples/s.   
   I don’t think to be wrong saying that 32k is not even comparable to 3.2 million. Is there something that I’m missing? Is it normal this output?   
   Thank you for your help and for your time,   
   Alessio   
  

  

Re: Storm throughput

Posted by Alessio Pagliari <pa...@i3s.unice.fr>.
Surely they work on a way more powerful cluster, but the topology is composed by just one spout. No parallelization, no bolts, for a total of one worker, so 1 thread in a jvm. Even if I had 100 cores like them it shouldn't make any difference. Please, correct me if I'm wrong.

Such a topology will assign it's only spout to a worker in a node: so, the multi-node cluster is pointless. Meanwhile, regarding the number of cores, one executor cannot be at the same time on multiple cores, not being a multi-thread process. 

Is there some Storm or Java behavior that I'm not aware of?

Thank you,

Alessio

⁣Sent from BlueMail ​

On Mar 30, 2018, 4:28 PM, at 4:28 PM, Jacob Johansen <jo...@gmail.com> wrote:
>for their test, they were using 4 worker nodes (servers) each with
>24vCores
>for a total of 96vCores.
>Most laptops max out at 8vCores and are typically at 4-6vCores
>
>Jacob Johansen
>
>On Fri, Mar 30, 2018 at 9:18 AM, Alessio Pagliari
><pa...@i3s.unice.fr>
>wrote:
>
>> Hi everybody,
>>
>> I’m trying to do some preliminary tests with storm, to understand how
>far
>> it can go. Now I’m focusing on trying to understand which is his
>maximum
>> throughput in terms of tuples per second. I saw the benchmark done by
>the
>> guys at Hortonworks (ref: https://it.hortonworks.
>> com/blog/microbenchmarking-storm-1-0-performance/) and in the first
>test
>> they reach a spout emission rate of 3.2 million tuples/s.
>>
>> I tried to replicate the test, a simple spout that emits continuously
>the
>> same string “some data”. Differently from them, I’m using Storm 1.1.1
>and
>> the storm cluster is set up on my laptop, anyway I’m just testing one
>spout
>> not an entire topology, but if you think that more configuration
>> information are needed, just ask.
>>
>> To compute the throughput I ask the total amount of tuples processed
>to
>> the UI APIs each 10s and I subtract it by the previous measure to
>have the
>> amount of tuples int the last 10s. What the mathematics give to me is
>> something around 32k tuples/s.
>>
>> I don’t think to be wrong saying that 32k is not even comparable to
>3.2
>> million. Is there something that I’m missing? Is it normal this
>output?
>>
>> Thank you for your help and for your time,
>>
>> Alessio
>>

Re: Storm throughput

Posted by Bobby Evans <bo...@apache.org>.
Please be very careful of any benchmark.  You are doing the right thing
trying to reproduce it.  From the article they are talking about a
"microbenchmark". I have no idea what they setup to do that.  If you are a
Hortonworks customer I would suggest that you talk to their support about
that.  I did some micro-benchmarks when I wrote some of the original
performance improvements for 1.x and got somewhere close to 3.2 million
tuples/sec, but that was measuring the maximum throughput a single
disruptor queue could do with a single consumer and multiple producers.  In
the real world the topology numbers were much smaller.

A few things you want to look at when setting up your topology for maximum
throughput are

1) if you need ackers or not?
For each real message sent there are about 2 messages sent to ackers to
track it.  This adds a lot of overhead, so if you don't need guaranteed
delivery shutting off acking helps a lot.

2) flow control
If you do need ackers play around with the topology.max.spout.pending
setting.  If a queue fills up storm will keep working but GC and other
things really slow it down.  The default queue size is 1024, and typically
setting the max pending to around 500 works well for me, but that is very
specific to your topology.

3) GC will kill you
Storm by its very nature churns through a massive amount of object on the
heap.  Tuning GC is critical for maximum performance.  This is our default
options that tend to work well for us.

-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:NewSize=128m -XX:CMSInitiatingOccupancyFraction=70
-XX:-CMSConcurrentMTEnabled -XX:ParallelGCThreads=4

We use concurrent Mark and Sweep even though G1 is out, because in our
experience G1 does not handle the churn as well as CMS does.  We limit the
number of CG threads to 4 because without that java will launch a thread
for each core by default, and having 24 or 96 threads all walking through
memory at the same time can really tax the memory bus.

4) Keep it local if possible
If you don't need to have more then one worker, just stick with a single
worker.  We have been working on making improvements to scheduling and the
shuffle grouping to try and take advantage of locality more, but those are
still waiting for a 2.x release to happen.

Also 2.x has made some really big improvements on performance too, so
hopefully we will get a beta release of 2.x out very soon that you all can
try out.

Thanks,

Bobby


On Fri, Mar 30, 2018 at 9:28 AM Jacob Johansen <jo...@gmail.com>
wrote:

> for their test, they were using 4 worker nodes (servers) each with
> 24vCores for a total of 96vCores.
> Most laptops max out at 8vCores and are typically at 4-6vCores
>
> Jacob Johansen
>
> On Fri, Mar 30, 2018 at 9:18 AM, Alessio Pagliari <pa...@i3s.unice.fr>
> wrote:
>
>> Hi everybody,
>>
>> I’m trying to do some preliminary tests with storm, to understand how far
>> it can go. Now I’m focusing on trying to understand which is his maximum
>> throughput in terms of tuples per second. I saw the benchmark done by the
>> guys at Hortonworks (ref:
>> https://it.hortonworks.com/blog/microbenchmarking-storm-1-0-performance/)
>> and in the first test they reach a spout emission rate of 3.2 million
>> tuples/s.
>>
>> I tried to replicate the test, a simple spout that emits continuously the
>> same string “some data”. Differently from them, I’m using Storm 1.1.1 and
>> the storm cluster is set up on my laptop, anyway I’m just testing one spout
>> not an entire topology, but if you think that more configuration
>> information are needed, just ask.
>>
>> To compute the throughput I ask the total amount of tuples processed to
>> the UI APIs each 10s and I subtract it by the previous measure to have the
>> amount of tuples int the last 10s. What the mathematics give to me is
>> something around 32k tuples/s.
>>
>> I don’t think to be wrong saying that 32k is not even comparable to 3.2
>> million. Is there something that I’m missing? Is it normal this output?
>>
>> Thank you for your help and for your time,
>>
>> Alessio
>>
>
>

Re: Storm throughput

Posted by Jacob Johansen <jo...@gmail.com>.
for their test, they were using 4 worker nodes (servers) each with 24vCores
for a total of 96vCores.
Most laptops max out at 8vCores and are typically at 4-6vCores

Jacob Johansen

On Fri, Mar 30, 2018 at 9:18 AM, Alessio Pagliari <pa...@i3s.unice.fr>
wrote:

> Hi everybody,
>
> I’m trying to do some preliminary tests with storm, to understand how far
> it can go. Now I’m focusing on trying to understand which is his maximum
> throughput in terms of tuples per second. I saw the benchmark done by the
> guys at Hortonworks (ref: https://it.hortonworks.
> com/blog/microbenchmarking-storm-1-0-performance/) and in the first test
> they reach a spout emission rate of 3.2 million tuples/s.
>
> I tried to replicate the test, a simple spout that emits continuously the
> same string “some data”. Differently from them, I’m using Storm 1.1.1 and
> the storm cluster is set up on my laptop, anyway I’m just testing one spout
> not an entire topology, but if you think that more configuration
> information are needed, just ask.
>
> To compute the throughput I ask the total amount of tuples processed to
> the UI APIs each 10s and I subtract it by the previous measure to have the
> amount of tuples int the last 10s. What the mathematics give to me is
> something around 32k tuples/s.
>
> I don’t think to be wrong saying that 32k is not even comparable to 3.2
> million. Is there something that I’m missing? Is it normal this output?
>
> Thank you for your help and for your time,
>
> Alessio
>