You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by ja...@yahoo.com.tw on 2014/06/24 16:10:35 UTC

Storm's performance limits to 1000 tuples/sec

Hi all,

I face a critical problem about performance of my storm topology. I can only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm to set my topology(not trident), and my topology information is as follows:
[Machines] 
I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google compute engine)
Number of workers:12
Number of executers:51
[Topology]
Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
Number of Bolts: 12(There are 5 mysql-dumper bolt here)

KafkaSpout(topic) emits to boltA and boltB
boltA(parallelism=9): parse the avro tuple from kafkaSpout
boltB(parallelism=1): Counting number of bolt only

Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5 mysql-dumper bolt's execute latency is more than 300ms(other bolts are less than 10ms). In addition, my complete latency of these kafkaspouts is more than 2000ms in the beggining, but it drops to 1000ms after a while.

I found this topology can only process 1000 tuples/s or less, but my goal is to process 10000 tuples/s. Is any wrong of my topology config? Actually, my topology is doing simple thing like counting and dumping to mysql only. It seems storm not to have a good performance as it says(million of tuples in a second in 10-node cluster). Can anyone give me some suggestion?

Thanks a lot.

Best regards,
James

Re: Storm's performance limits to 1000 tuples/sec

Posted by Danijel Schiavuzzi <da...@schiavuzzi.com>.

Hi Cody and James,

Both Kafka brokers and Storm Supervisors run on a hypervisor on the same
machine. The topology runs with the number of workers set to 4, and the
Kafka spout fetch size is set to 25 MB. The paralellism of all components
is 4. maxSpoutPending is set to 8. The transport  mechanism is Netty, and
the configuration is pretty standard. All servers run Linux.

Mind you again, this was a simplified, "consume-only" Trident
transactional topology, with a spout and a debugging, throughput-logging
Trident filter only. Adding a groupBy and persistentAggregate (with a
MemoryMapState) dropped this to about 160 000 messages/s.  Messages vary in
size, about 1 to 1,5 KB. All this with Kryo serialization disabled.

Best regards,

Danijel

On Tuesday, June 24, 2014, Cody A. Ray <co...@gmail.com> wrote:

> Hi Danijel -
>
> What sort of hardware are your Kafka brokers and Storm workers running on
> for 400k msgs/s from Kafka example? (We're also running into a throughput
> problem but we haven't run a simplified topology such as the one you
> mention to benchmark yet. I'll email out our specs and stuff in a post to
> the list soon.)
>
> -Cody
>
>
> On Tue, Jun 24, 2014 at 11:13 AM, <jamesweb3@yahoo.com.tw
> <javascript:_e(%7B%7D,'cvml','jamesweb3@yahoo.com.tw');>> wrote:
>
>> Hi,
>>
>> Perhaps MySQL is the bottleneck, I'll try it. However, if some bolt is
>> very busy, will storm be slower to emit tuples? My message type is an avro
>> from kafka, and each avro message is about 3KB. What types of message do
>> you fetch from kafka?
>>
>> Another import question is what kafka-storm do you use? I see so many
>> different versions of them and make me confused. Can you share storm config
>> in your topology and kafkaSpout's config to me?
>>
>> Thank you very much!
>>
>>
>> Best regards,
>> James Fu
>>
>>
>>
>> Danijel Schiavuzzi <danijel@schiavuzzi.com
>> <javascript:_e(%7B%7D,'cvml','danijel@schiavuzzi.com');>> 於 2014/6/25
>> 上午12:02 寫道：
>>
>> Try to run the topology without the MySQL bolt to find out if that's the
>> bottleneck. Do you update the database in batches?  That's an essential
>> optimization you should implement.
>>
>> With a two node Storm cluster I can fetch 450 000 messages/s from Kafka,
>> and that's with a Trident transactional topology (just the spout and a
>> debug filter bolt). Kafka has two nodes with 4 partitions only. Basic Storm
>> should be faster.
>> On Jun 24, 2014 4:12 PM, <jamesweb3@yahoo.com.tw
>> <javascript:_e(%7B%7D,'cvml','jamesweb3@yahoo.com.tw');>> wrote:
>> >
>> > Hi all,
>> >
>> > I face a critical problem about performance of my storm topology. I can
>> only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm
>> to set my topology(not trident), and my topology information is as follows:
>> > [Machines]
>> > I have 1 nimbus and 3 supervisors and each with 2-core CPU in
>> GCE(google compute engine)
>> > Number of workers:12
>> > Number of executers:51
>> > [Topology]
>> > Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
>> > Number of Bolts: 12(There are 5 mysql-dumper bolt here)
>> >
>> > KafkaSpout(topic) emits to boltA and boltB
>> > boltA(parallelism=9): parse the avro tuple from kafkaSpout
>> > boltB(parallelism=1): Counting number of bolt only
>> >
>> > Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5
>> mysql-dumper bolt's execute latency is more than 300ms(other bolts are less
>> than 10ms). In addition, my complete latency of these kafkaspouts is more
>> than 2000ms in the beggining, but it drops to 1000ms after a while.
>> >
>> > I found this topology can only process 1000 tuples/s or less, but my
>> goal is to process 10000 tuples/s. Is any wrong of my topology config?
>> Actually, my topology is doing simple thing like counting and dumping to
>> mysql only. It seems storm not to have a good performance as it
>> says(million of tuples in a second in 10-node cluster). Can anyone give me
>> some suggestion?
>> >
>> > Thanks a lot.
>> >
>> > Best regards,
>> > James
>>
>>
>
>
> --
> Cody A. Ray, LEED AP
> cody.a.ray@gmail.com
> <javascript:_e(%7B%7D,'cvml','cody.a.ray@gmail.com');>
> 215.501.7891
>


-- 
Danijel Schiavuzzi

E: danijel@schiavuzzi.com
W: www.schiavuzzi.com
T: +385989035562
Skype: danijels7

Re: Storm's performance limits to 1000 tuples/sec

Posted by "Cody A. Ray" <co...@gmail.com>.

Hi Danijel -

What sort of hardware are your Kafka brokers and Storm workers running on
for 400k msgs/s from Kafka example? (We're also running into a throughput
problem but we haven't run a simplified topology such as the one you
mention to benchmark yet. I'll email out our specs and stuff in a post to
the list soon.)

-Cody


On Tue, Jun 24, 2014 at 11:13 AM, <ja...@yahoo.com.tw> wrote:

> Hi,
>
> Perhaps MySQL is the bottleneck, I'll try it. However, if some bolt is
> very busy, will storm be slower to emit tuples? My message type is an avro
> from kafka, and each avro message is about 3KB. What types of message do
> you fetch from kafka?
>
> Another import question is what kafka-storm do you use? I see so many
> different versions of them and make me confused. Can you share storm config
> in your topology and kafkaSpout's config to me?
>
> Thank you very much!
>
>
> Best regards,
> James Fu
>
>
>
> Danijel Schiavuzzi <da...@schiavuzzi.com> 於 2014/6/25 上午12:02 寫道：
>
> Try to run the topology without the MySQL bolt to find out if that's the
> bottleneck. Do you update the database in batches?  That's an essential
> optimization you should implement.
>
> With a two node Storm cluster I can fetch 450 000 messages/s from Kafka,
> and that's with a Trident transactional topology (just the spout and a
> debug filter bolt). Kafka has two nodes with 4 partitions only. Basic Storm
> should be faster.
> On Jun 24, 2014 4:12 PM, <ja...@yahoo.com.tw> wrote:
> >
> > Hi all,
> >
> > I face a critical problem about performance of my storm topology. I can
> only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm
> to set my topology(not trident), and my topology information is as follows:
> > [Machines]
> > I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google
> compute engine)
> > Number of workers:12
> > Number of executers:51
> > [Topology]
> > Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
> > Number of Bolts: 12(There are 5 mysql-dumper bolt here)
> >
> > KafkaSpout(topic) emits to boltA and boltB
> > boltA(parallelism=9): parse the avro tuple from kafkaSpout
> > boltB(parallelism=1): Counting number of bolt only
> >
> > Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5
> mysql-dumper bolt's execute latency is more than 300ms(other bolts are less
> than 10ms). In addition, my complete latency of these kafkaspouts is more
> than 2000ms in the beggining, but it drops to 1000ms after a while.
> >
> > I found this topology can only process 1000 tuples/s or less, but my
> goal is to process 10000 tuples/s. Is any wrong of my topology config?
> Actually, my topology is doing simple thing like counting and dumping to
> mysql only. It seems storm not to have a good performance as it
> says(million of tuples in a second in 10-node cluster). Can anyone give me
> some suggestion?
> >
> > Thanks a lot.
> >
> > Best regards,
> > James
>
>


-- 
Cody A. Ray, LEED AP
cody.a.ray@gmail.com
215.501.7891

Re: Storm's performance limits to 1000 tuples/sec

Posted by ja...@yahoo.com.tw.

Hi,

Perhaps MySQL is the bottleneck, I'll try it. However, if some bolt is very busy, will storm be slower to emit tuples? My message type is an avro from kafka, and each avro message is about 3KB. What types of message do you fetch from kafka?

Another import question is what kafka-storm do you use? I see so many different versions of them and make me confused. Can you share storm config in your topology and kafkaSpout's config to me?

Thank you very much!


Best regards,
James Fu



> Danijel Schiavuzzi <da...@schiavuzzi.com> �� 2014/6/25 �W��12:02 �g�D�G
> 
> Try to run the topology without the MySQL bolt to find out if that's the bottleneck. Do you update the database in batches?  That's an essential optimization you should implement.
> 
> With a two node Storm cluster I can fetch 450 000 messages/s from Kafka, and that's with a Trident transactional topology (just the spout and a debug filter bolt). Kafka has two nodes with 4 partitions only. Basic Storm should be faster.
> On Jun 24, 2014 4:12 PM, <ja...@yahoo.com.tw> wrote:
> >
> > Hi all,
> >
> > I face a critical problem about performance of my storm topology. I can only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm to set my topology(not trident), and my topology information is as follows:
> > [Machines]
> > I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google compute engine)
> > Number of workers:12
> > Number of executers:51
> > [Topology]
> > Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
> > Number of Bolts: 12(There are 5 mysql-dumper bolt here)
> >
> > KafkaSpout(topic) emits to boltA and boltB
> > boltA(parallelism=9): parse the avro tuple from kafkaSpout
> > boltB(parallelism=1): Counting number of bolt only
> >
> > Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5 mysql-dumper bolt's execute latency is more than 300ms(other bolts are less than 10ms). In addition, my complete latency of these kafkaspouts is more than 2000ms in the beggining, but it drops to 1000ms after a while.
> >
> > I found this topology can only process 1000 tuples/s or less, but my goal is to process 10000 tuples/s. Is any wrong of my topology config? Actually, my topology is doing simple thing like counting and dumping to mysql only. It seems storm not to have a good performance as it says(million of tuples in a second in 10-node cluster). Can anyone give me some suggestion?
> >
> > Thanks a lot.
> >
> > Best regards,
> > James

Re: Storm's performance limits to 1000 tuples/sec

Posted by Danijel Schiavuzzi <da...@schiavuzzi.com>.

Try to run the topology without the MySQL bolt to find out if that's the
bottleneck. Do you update the database in batches?  That's an essential
optimization you should implement.

With a two node Storm cluster I can fetch 450 000 messages/s from Kafka,
and that's with a Trident transactional topology (just the spout and a
debug filter bolt). Kafka has two nodes with 4 partitions only. Basic Storm
should be faster.
On Jun 24, 2014 4:12 PM, <ja...@yahoo.com.tw> wrote:
>
> Hi all,
>
> I face a critical problem about performance of my storm topology. I can
only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm
to set my topology(not trident), and my topology information is as follows:
> [Machines]
> I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google
compute engine)
> Number of workers:12
> Number of executers:51
> [Topology]
> Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
> Number of Bolts: 12(There are 5 mysql-dumper bolt here)
>
> KafkaSpout(topic) emits to boltA and boltB
> boltA(parallelism=9): parse the avro tuple from kafkaSpout
> boltB(parallelism=1): Counting number of bolt only
>
> Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5
mysql-dumper bolt's execute latency is more than 300ms(other bolts are less
than 10ms). In addition, my complete latency of these kafkaspouts is more
than 2000ms in the beggining, but it drops to 1000ms after a while.
>
> I found this topology can only process 1000 tuples/s or less, but my goal
is to process 10000 tuples/s. Is any wrong of my topology config? Actually,
my topology is doing simple thing like counting and dumping to mysql only.
It seems storm not to have a good performance as it says(million of tuples
in a second in 10-node cluster). Can anyone give me some suggestion?
>
> Thanks a lot.
>
> Best regards,
> James

Re: Storm's performance limits to 1000 tuples/sec

Posted by Michael Rose <mi...@fullcontact.com>.

On serialization, make sure your custom classes are registered with Kryo
otherwise it may use Java serialization (slow)
On Jun 25, 2014 10:30 AM, "Robert Turner" <ro...@bigfoot.com> wrote:

> Serialisation across workers might be your problem, if you can use the
> "localOrShuffle" grouping and arrange that the number of spouts and bolts
> is a multiple of the number of workers then this will minimise the
> serialisation across workers. If there is only one counting bolt for the
> topology then tuples are serialised and sent to the worker with the single
> counting bolt. A better approach might be to have a single counting bolt
> per worker and aggregate those periodically.
>
> Regards
>    Rob Turner.
>
>
> On 24 June 2014 15:10, <ja...@yahoo.com.tw> wrote:
>
>> Hi all,
>>
>> I face a critical problem about performance of my storm topology. I can
>> only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm
>> to set my topology(not trident), and my topology information is as follows:
>> [Machines]
>> I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google
>> compute engine)
>> Number of workers:12
>> Number of executers:51
>> [Topology]
>> Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
>> Number of Bolts: 12(There are 5 mysql-dumper bolt here)
>>
>> KafkaSpout(topic) emits to boltA and boltB
>> boltA(parallelism=9): parse the avro tuple from kafkaSpout
>> boltB(parallelism=1): Counting number of bolt only
>>
>> Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5
>> mysql-dumper bolt's execute latency is more than 300ms(other bolts are less
>> than 10ms). In addition, my complete latency of these kafkaspouts is more
>> than 2000ms in the beggining, but it drops to 1000ms after a while.
>>
>> I found this topology can only process 1000 tuples/s or less, but my goal
>> is to process 10000 tuples/s. Is any wrong of my topology config? Actually,
>> my topology is doing simple thing like counting and dumping to mysql only.
>> It seems storm not to have a good performance as it says(million of tuples
>> in a second in 10-node cluster). Can anyone give me some suggestion?
>>
>> Thanks a lot.
>>
>> Best regards,
>> James
>
>
>
>
> --
> Cheers
>    Rob.
>

RE: Storm's performance limits to 1000 tuples/sec

Posted by 傅駿浩 <ja...@yahoo.com.tw>.

Hi,
I indeed emit a tuple from one bolt to another bolt by Java serialization only. It's a class which stores some results that will send to another bolt. So I only add "implements implements Serializable" after class name to achieve this. Is there a simple way or example how to use Kryo serialization of such class? I read the storm's manual but not really understand. If there's a example, that's much better. Thank you a lot.

Best regards,
James Fu
Robert Turner <ro...@bigfoot.com> 於 2014/6/26 (週四) 12:30 AM 寫道﹕

Serialisation across workers might be your problem, if you can use the "localOrShuffle" grouping and arrange that the number of spouts and bolts is a multiple of the number of workers then this will minimise the serialisation across workers. If there is only one counting bolt for the topology then tuples are serialised and sent to the worker with the single counting bolt. A better approach might be to have a single counting bolt per worker and aggregate those periodically.

Regards
   Rob Turner. 

On 24 June 2014 15:10, <ja...@yahoo.com.tw> wrote:

Hi all,
>
>I face a critical problem about performance of my storm topology. I can only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm to set my topology(not trident), and my topology information is as follows:
>[Machines]
>I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google compute engine)
>Number of workers:12
>Number of executers:51
>[Topology]
>Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
>Number of Bolts: 12(There are 5 mysql-dumper bolt here)
>
>KafkaSpout(topic) emits to boltA and boltB
>boltA(parallelism=9): parse the avro tuple from kafkaSpout
>boltB(parallelism=1): Counting number of bolt only
>
>Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5 mysql-dumper bolt's execute latency is more than 300ms(other bolts are less than 10ms). In addition, my complete latency of these kafkaspouts is more than 2000ms in the beggining, but it drops to 1000ms after a while.
>
>I found this topology can only process 1000 tuples/s or less, but my goal is to process 10000 tuples/s. Is any wrong of my topology config? Actually, my topology is doing simple thing like counting and dumping to mysql only. It seems storm not to have a good performance as it says(million of tuples in a second in 10-node cluster). Can anyone give me some suggestion?
>
>Thanks a lot.
>
>Best regards,
>James

-- 

Cheers
   Rob.

Re: Storm's performance limits to 1000 tuples/sec

Posted by Danijel Schiavuzzi <da...@schiavuzzi.com>.

You should post a screenshot of your topology in Storm UI for us to analyze.

The issue may be any one, or combination of:
* Hardware and OS environment the cluster runs on
* Storm and topology settings (maxSpoutPending, numWorkers, Java or
Kryo serialization, worker JVM settings, etc.)
* Topology structure, i.e. the number, type and
your component's parallelism, the type of bolt groupings, usage of Trident
(incurs a performance hit compared to the plain Storm API).

The choice of groupings is very important, as others already mentioned. You
should strive to minimize inter-worker tuple traffic whenever possible.
First reduce your data, and then route it to the next bolt. Partition data
as much as possible.

LocalOrShuffle grouping is very useful here. And CombinerAggregators in
Trident, for example. Try to measure the network throughput of your cluster
to see if the network is saturated, and monitor your CPU and memory usage.
Monitor for JVM GC pauses too and other parameters. Just tuning a few
parameters can give you and order of magnitude performance boost, but you
should first identify the bottleneck to know which parameter to tune.

As for Kryo serialization, set Config.setFallBackToJavaSerialization to
'false' to disable falling back to Java serialization if Kryo can't be
used, this way you'll know if Kryo is being used and if not, the reason why
(check the logs).

Danijel

On Thursday, June 26, 2014, <ja...@yahoo.com.tw> wrote:

> Hi,
> Yes, you're correct. After my adjustment, it can process 5500 tuples/s of
> whole topology. And I do a simple experiment:
> a. CountingBolt only: 11500 tuples/sec
> b. CountingBolt+parseDataBolt: 6000 tuples/sec
>
> These two bots are both connected from kafkaSpout, so I think
> parseDataBolt is the bottleneck!!
>
> BUT!!! I try to increase the parallelism hint of parseDataBolt from 20 to
> 50, and it almost has no effect of throughput. What's the problems? If I
> need to process more tuples in the future, what's the solution ?
>
>
> Best regards,
> James
>
>
>
> Danijel Schiavuzzi <danijel@schiavuzzi.com
> <javascript:_e(%7B%7D,'cvml','danijel@schiavuzzi.com');>> 於 2014/6/26
> 下午2:50 寫道：
>
> At your current throughput rate, the choice of Java or Kryo serialization
> doesn't matter much. The bottleneck seems to be somewhere else.
>
>

-- 
Danijel Schiavuzzi

E: danijel@schiavuzzi.com
W: www.schiavuzzi.com
T: +385989035562
Skype: danijels7

Re: Storm's performance limits to 1000 tuples/sec

Posted by ja...@yahoo.com.tw.

Hi,
Yes, you're correct. After my adjustment, it can process 5500 tuples/s of whole topology. And I do a simple experiment:
a. CountingBolt only: 11500 tuples/sec
b. CountingBolt+parseDataBolt: 6000 tuples/sec

These two bots are both connected from kafkaSpout, so I think parseDataBolt is the bottleneck!!

BUT!!! I try to increase the parallelism hint of parseDataBolt from 20 to 50, and it almost has no effect of throughput. What's the problems? If I need to process more tuples in the future, what's the solution ?


Best regards,
James



> Danijel Schiavuzzi <da...@schiavuzzi.com> �� 2014/6/26 �U��2:50 �g�D�G
> 
> At your current throughput rate, the choice of Java or Kryo serialization doesn't matter much. The bottleneck seems to be somewhere else.

RE: Storm's performance limits to 1000 tuples/sec

Posted by Danijel Schiavuzzi <da...@schiavuzzi.com>.

At your current throughput rate, the choice of Java or Kryo serialization
doesn't matter much. The bottleneck seems to be somewhere else.

RE: Storm's performance limits to 1000 tuples/sec

Posted by 傅駿浩 <ja...@yahoo.com.tw>.

Hi,
I indeed emit a tuple from one bolt to another bolt by Java serialization only. It's a class which stores some results that will send to another bolt. So I only add "implements implements Serializable" after class name to achieve this. Is there a simple way or example how to use Kryo serialization of such class? I read the storm's manual but not really understand. If there's a example, that's much better. Thank you a lot.

Best regards,
James Fu
Robert Turner <ro...@bigfoot.com> 於 2014/6/26 (週四) 12:30 AM 寫道﹕

Serialisation across workers might be your problem, if you can use the "localOrShuffle" grouping and arrange that the number of spouts and bolts is a multiple of the number of workers then this will minimise the serialisation across workers. If there is only one counting bolt for the topology then tuples are serialised and sent to the worker with the single counting bolt. A better approach might be to have a single counting bolt per worker and aggregate those periodically.

Regards
   Rob Turner. 

On 24 June 2014 15:10, <ja...@yahoo.com.tw> wrote:

Hi all,
>
>I face a critical problem about performance of my storm topology. I can only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm to set my topology(not trident), and my topology information is as follows:
>[Machines]
>I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google compute engine)
>Number of workers:12
>Number of executers:51
>[Topology]
>Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
>Number of Bolts: 12(There are 5 mysql-dumper bolt here)
>
>KafkaSpout(topic) emits to boltA and boltB
>boltA(parallelism=9): parse the avro tuple from kafkaSpout
>boltB(parallelism=1): Counting number of bolt only
>
>Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5 mysql-dumper bolt's execute latency is more than 300ms(other bolts are less than 10ms). In addition, my complete latency of these kafkaspouts is more than 2000ms in the beggining, but it drops to 1000ms after a while.
>
>I found this topology can only process 1000 tuples/s or less, but my goal is to process 10000 tuples/s. Is any wrong of my topology config? Actually, my topology is doing simple thing like counting and dumping to mysql only. It seems storm not to have a good performance as it says(million of tuples in a second in 10-node cluster). Can anyone give me some suggestion?
>
>Thanks a lot.
>
>Best regards,
>James

-- 

Cheers
   Rob.

Re: Storm's performance limits to 1000 tuples/sec

Posted by Robert Turner <ro...@bigfoot.com>.

Serialisation across workers might be your problem, if you can use the
"localOrShuffle" grouping and arrange that the number of spouts and bolts
is a multiple of the number of workers then this will minimise the
serialisation across workers. If there is only one counting bolt for the
topology then tuples are serialised and sent to the worker with the single
counting bolt. A better approach might be to have a single counting bolt
per worker and aggregate those periodically.

Regards
   Rob Turner.


On 24 June 2014 15:10, <ja...@yahoo.com.tw> wrote:

> Hi all,
>
> I face a critical problem about performance of my storm topology. I can
> only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm
> to set my topology(not trident), and my topology information is as follows:
> [Machines]
> I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google
> compute engine)
> Number of workers:12
> Number of executers:51
> [Topology]
> Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
> Number of Bolts: 12(There are 5 mysql-dumper bolt here)
>
> KafkaSpout(topic) emits to boltA and boltB
> boltA(parallelism=9): parse the avro tuple from kafkaSpout
> boltB(parallelism=1): Counting number of bolt only
>
> Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5
> mysql-dumper bolt's execute latency is more than 300ms(other bolts are less
> than 10ms). In addition, my complete latency of these kafkaspouts is more
> than 2000ms in the beggining, but it drops to 1000ms after a while.
>
> I found this topology can only process 1000 tuples/s or less, but my goal
> is to process 10000 tuples/s. Is any wrong of my topology config? Actually,
> my topology is doing simple thing like counting and dumping to mysql only.
> It seems storm not to have a good performance as it says(million of tuples
> in a second in 10-node cluster). Can anyone give me some suggestion?
>
> Thanks a lot.
>
> Best regards,
> James




-- 
Cheers
   Rob.