You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by clay teahouse <cl...@gmail.com> on 2015/02/03 13:03:07 UTC

kafkaspout is very slow

Hi all,

In my topology,  kafka spout is responsible for over 85% of the latency. I
have tried different spout max pending and played with the buffer size and
fetch size, still no luck. Any hint on how to optimize the spout? The issue
doesn't seem to be with the kafka side, as I see high throughput with the
simple kafka consumer.

thank you for your feedback
Clay

Re: kafkaspout is very slow

Posted by Filipa Moura <fi...@gmail.com>.
How many messages are you reading per second?
I had a few problems with my spout originally but it was either because
1) was not acking the messages and because of max pending they weren't been
thrown away from the "queue"
2) buffer size and fetch size was too small: have you tried to figure out
how many bytes you write from Kafka and increase the sizes to that size?
this helped in my case.
3) was trying to read too far from the past when I restarted the topology
so ended up consuming only latest offset.

With the above tweaks I was able to increase my throughput to 9 times
more..it obviously depends on size of messages but this helped me..
as Haralds suggested, have a look at the dashboard and try to understand
where the problem is..


On Wed, Feb 4, 2015 at 9:26 PM, Haralds Ulmanis <ha...@evilezh.net> wrote:

> I'm not sure, that i understand your problem .. but here is few points:
> If you have large pending spout size and slow processing - you will see
> large latency at kafka spout probably. Spout emits message .. it stays in
> queue for long time (that will add latency) .. and finally is processed and
> ack received. You will see queue time + processing time in kafka spout
> latency.
> Take a look at load factors of your bolts - are they close to 1 or more ?
> and load factor of kafka spout.
>
> On 4 February 2015 at 21:19, Andrey Yegorov <an...@gmail.com>
> wrote:
>
>> have you tried increasing max spout pending parameter for the spout?
>>
>> builder.setSpout("kafka",
>>                        new KafkaSpout(spoutConfig),
>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>           //the maximum parallelism you can have on a KafkaSpout is the
>> number of partitions
>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>
>> ----------
>> Andrey Yegorov
>>
>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <cl...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> In my topology,  kafka spout is responsible for over 85% of the latency.
>>> I have tried different spout max pending and played with the buffer size
>>> and fetch size, still no luck. Any hint on how to optimize the spout? The
>>> issue doesn't seem to be with the kafka side, as I see high throughput with
>>> the simple kafka consumer.
>>>
>>> thank you for your feedback
>>> Clay
>>>
>>>
>>
>

Re: kafkaspout is very slow

Posted by clay teahouse <cl...@gmail.com>.
CPU is around 100%

On Wed, Feb 4, 2015 at 9:30 PM, Michael Rose <mi...@fullcontact.com>
wrote:

> How does your CPU look at 23000 tuples/s? Still low?
>
> Have you profiled to see if anything is blocking? Is your spout constantly
> doing work?
>
> *Michael Rose*
> Senior Platform Engineer
> *Full*Contact | fullcontact.com
> <https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures>
> m: +1.720.837.1357 | t: @xorlev
>
>
> All Your Contacts, Updated and In One Place.
> Try FullContact for Free
> <https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures>
>
> On Wed, Feb 4, 2015 at 8:20 PM, clay teahouse <cl...@gmail.com>
> wrote:
>
>> I bumped the kafka buffer/fetch sizes to
>>
>> kafka.fetch.size.bytes:  12582912
>> kafka.buffer.size.bytes: 12582912
>>
>> The throughput almost doubled (to about 23000 un-acked tuples/second). It
>> seems increasing the sizes for these two parameters further does not
>> improve the performance further. Is there anything else that I can try?
>>
>> On Wed, Feb 4, 2015 at 6:51 PM, clay teahouse <cl...@gmail.com>
>> wrote:
>>
>>> 100,000 records is about 12MB.
>>> I'll try bumping the numbers, by 100 fold to see if it makes any
>>> difference.
>>> thanks,
>>> -Clay
>>>
>>> On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura <
>>> filipa.mendesmoura@gmail.com> wrote:
>>>
>>>> I would bump these numbers up by a lot:
>>>>
>>>> kafka.fetch.size.bytes: 102400    kafka.buffer.size.bytes: 102400
>>>>
>>>> Say 10 or 100 times that or more. I dont know by heart how much I
>>>> increased those numbers on my topology.
>>>>
>>>> How many bytes are you writting per minute on kafka? Try dumping 1
>>>> minute of messages to a file to figure out how many bytes that is..
>>>> I am reading (sending data to the topic) about 100,000 records per
>>>> second. My kafka consumer can consume the 3 millions records in less than
>>>> 50 seconds. I have disabled the ack. But with the ack enabled, I won't even
>>>> get 1500 records per second from the topology. With ack disabled, I get
>>>> about 12000/second.
>>>> I don't lose any data, it is just the data is emitted from the spout to
>>>> the bolt very slowly.
>>>>
>>>>  I did bump my buffer sizes but I am not sure if they are sufficient.
>>>>
>>>>     topology.transfer.buffer.size: 2048
>>>>     topology.executor.buffer.size: 65536
>>>>     topology.receiver.buffer.size: 16
>>>>     topology.executor.send.buffer.size: 65536
>>>>
>>>>     kafka.fetch.size.bytes: 102400
>>>>     kafka.buffer.size.bytes: 102400
>>>>
>>>> thanks
>>>> Clay
>>>>
>>>> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <
>>>> filipa.mendesmoura@gmail.com> wrote:
>>>>
>>>>> can you share a  screenshot of the Storm UI for your spout?
>>>>>
>>>>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <cl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>  I have this issue with any amount of load. Different max spout
>>>>>> pendings do not seem to make much a difference. I've lowered this parameter
>>>>>> to 100, still a little difference . At this point the bolt consuming the
>>>>>> data does no processing.
>>>>>>
>>>>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <ha...@evilezh.net>
>>>>>> wrote:
>>>>>>
>>>>>>> I'm not sure, that i understand your problem .. but here is few
>>>>>>> points:
>>>>>>> If you have large pending spout size and slow processing - you will
>>>>>>> see large latency at kafka spout probably. Spout emits message .. it stays
>>>>>>> in queue for long time (that will add latency) .. and finally is processed
>>>>>>> and ack received. You will see queue time + processing time in kafka spout
>>>>>>> latency.
>>>>>>> Take a look at load factors of your bolts - are they close to 1 or
>>>>>>> more ? and load factor of kafka spout.
>>>>>>>
>>>>>>> On 4 February 2015 at 21:19, Andrey Yegorov <
>>>>>>> andrey.yegorov@gmail.com> wrote:
>>>>>>>
>>>>>>>> have you tried increasing max spout pending parameter for the spout?
>>>>>>>>
>>>>>>>> builder.setSpout("kafka",
>>>>>>>>                        new KafkaSpout(spoutConfig),
>>>>>>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>>>>           //the maximum parallelism you can have on a KafkaSpout is
>>>>>>>> the number of partitions
>>>>>>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>>>>>
>>>>>>>> ----------
>>>>>>>> Andrey Yegorov
>>>>>>>>
>>>>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <
>>>>>>>> clayteahouse@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> In my topology,  kafka spout is responsible for over 85% of the
>>>>>>>>> latency. I have tried different spout max pending and played with the
>>>>>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize the
>>>>>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high
>>>>>>>>> throughput with the simple kafka consumer.
>>>>>>>>>
>>>>>>>>> thank you for your feedback
>>>>>>>>> Clay
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: kafkaspout is very slow

Posted by Michael Rose <mi...@fullcontact.com>.
How does your CPU look at 23000 tuples/s? Still low?

Have you profiled to see if anything is blocking? Is your spout constantly
doing work?

*Michael Rose*
Senior Platform Engineer
*Full*Contact | fullcontact.com
<https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures>
m: +1.720.837.1357 | t: @xorlev


All Your Contacts, Updated and In One Place.
Try FullContact for Free
<https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures>

On Wed, Feb 4, 2015 at 8:20 PM, clay teahouse <cl...@gmail.com>
wrote:

> I bumped the kafka buffer/fetch sizes to
>
> kafka.fetch.size.bytes:  12582912
> kafka.buffer.size.bytes: 12582912
>
> The throughput almost doubled (to about 23000 un-acked tuples/second). It
> seems increasing the sizes for these two parameters further does not
> improve the performance further. Is there anything else that I can try?
>
> On Wed, Feb 4, 2015 at 6:51 PM, clay teahouse <cl...@gmail.com>
> wrote:
>
>> 100,000 records is about 12MB.
>> I'll try bumping the numbers, by 100 fold to see if it makes any
>> difference.
>> thanks,
>> -Clay
>>
>> On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura <
>> filipa.mendesmoura@gmail.com> wrote:
>>
>>> I would bump these numbers up by a lot:
>>>
>>> kafka.fetch.size.bytes: 102400    kafka.buffer.size.bytes: 102400
>>>
>>> Say 10 or 100 times that or more. I dont know by heart how much I
>>> increased those numbers on my topology.
>>>
>>> How many bytes are you writting per minute on kafka? Try dumping 1
>>> minute of messages to a file to figure out how many bytes that is..
>>> I am reading (sending data to the topic) about 100,000 records per
>>> second. My kafka consumer can consume the 3 millions records in less than
>>> 50 seconds. I have disabled the ack. But with the ack enabled, I won't even
>>> get 1500 records per second from the topology. With ack disabled, I get
>>> about 12000/second.
>>> I don't lose any data, it is just the data is emitted from the spout to
>>> the bolt very slowly.
>>>
>>>  I did bump my buffer sizes but I am not sure if they are sufficient.
>>>
>>>     topology.transfer.buffer.size: 2048
>>>     topology.executor.buffer.size: 65536
>>>     topology.receiver.buffer.size: 16
>>>     topology.executor.send.buffer.size: 65536
>>>
>>>     kafka.fetch.size.bytes: 102400
>>>     kafka.buffer.size.bytes: 102400
>>>
>>> thanks
>>> Clay
>>>
>>> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <
>>> filipa.mendesmoura@gmail.com> wrote:
>>>
>>>> can you share a  screenshot of the Storm UI for your spout?
>>>>
>>>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <cl...@gmail.com>
>>>> wrote:
>>>>
>>>>>  I have this issue with any amount of load. Different max spout
>>>>> pendings do not seem to make much a difference. I've lowered this parameter
>>>>> to 100, still a little difference . At this point the bolt consuming the
>>>>> data does no processing.
>>>>>
>>>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <ha...@evilezh.net>
>>>>> wrote:
>>>>>
>>>>>> I'm not sure, that i understand your problem .. but here is few
>>>>>> points:
>>>>>> If you have large pending spout size and slow processing - you will
>>>>>> see large latency at kafka spout probably. Spout emits message .. it stays
>>>>>> in queue for long time (that will add latency) .. and finally is processed
>>>>>> and ack received. You will see queue time + processing time in kafka spout
>>>>>> latency.
>>>>>> Take a look at load factors of your bolts - are they close to 1 or
>>>>>> more ? and load factor of kafka spout.
>>>>>>
>>>>>> On 4 February 2015 at 21:19, Andrey Yegorov <andrey.yegorov@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> have you tried increasing max spout pending parameter for the spout?
>>>>>>>
>>>>>>> builder.setSpout("kafka",
>>>>>>>                        new KafkaSpout(spoutConfig),
>>>>>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>>>           //the maximum parallelism you can have on a KafkaSpout is
>>>>>>> the number of partitions
>>>>>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>>>>
>>>>>>> ----------
>>>>>>> Andrey Yegorov
>>>>>>>
>>>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <
>>>>>>> clayteahouse@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> In my topology,  kafka spout is responsible for over 85% of the
>>>>>>>> latency. I have tried different spout max pending and played with the
>>>>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize the
>>>>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high
>>>>>>>> throughput with the simple kafka consumer.
>>>>>>>>
>>>>>>>> thank you for your feedback
>>>>>>>> Clay
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: kafkaspout is very slow

Posted by clay teahouse <cl...@gmail.com>.
I bumped the kafka buffer/fetch sizes to

kafka.fetch.size.bytes:  12582912
kafka.buffer.size.bytes: 12582912

The throughput almost doubled (to about 23000 un-acked tuples/second). It
seems increasing the sizes for these two parameters further does not
improve the performance further. Is there anything else that I can try?

On Wed, Feb 4, 2015 at 6:51 PM, clay teahouse <cl...@gmail.com>
wrote:

> 100,000 records is about 12MB.
> I'll try bumping the numbers, by 100 fold to see if it makes any
> difference.
> thanks,
> -Clay
>
> On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura <filipa.mendesmoura@gmail.com
> > wrote:
>
>> I would bump these numbers up by a lot:
>>
>> kafka.fetch.size.bytes: 102400    kafka.buffer.size.bytes: 102400
>>
>> Say 10 or 100 times that or more. I dont know by heart how much I
>> increased those numbers on my topology.
>>
>> How many bytes are you writting per minute on kafka? Try dumping 1 minute
>> of messages to a file to figure out how many bytes that is..
>> I am reading (sending data to the topic) about 100,000 records per
>> second. My kafka consumer can consume the 3 millions records in less than
>> 50 seconds. I have disabled the ack. But with the ack enabled, I won't even
>> get 1500 records per second from the topology. With ack disabled, I get
>> about 12000/second.
>> I don't lose any data, it is just the data is emitted from the spout to
>> the bolt very slowly.
>>
>>  I did bump my buffer sizes but I am not sure if they are sufficient.
>>
>>     topology.transfer.buffer.size: 2048
>>     topology.executor.buffer.size: 65536
>>     topology.receiver.buffer.size: 16
>>     topology.executor.send.buffer.size: 65536
>>
>>     kafka.fetch.size.bytes: 102400
>>     kafka.buffer.size.bytes: 102400
>>
>> thanks
>> Clay
>>
>> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <
>> filipa.mendesmoura@gmail.com> wrote:
>>
>>> can you share a  screenshot of the Storm UI for your spout?
>>>
>>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <cl...@gmail.com>
>>> wrote:
>>>
>>>>  I have this issue with any amount of load. Different max spout
>>>> pendings do not seem to make much a difference. I've lowered this parameter
>>>> to 100, still a little difference . At this point the bolt consuming the
>>>> data does no processing.
>>>>
>>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <ha...@evilezh.net>
>>>> wrote:
>>>>
>>>>> I'm not sure, that i understand your problem .. but here is few points:
>>>>> If you have large pending spout size and slow processing - you will
>>>>> see large latency at kafka spout probably. Spout emits message .. it stays
>>>>> in queue for long time (that will add latency) .. and finally is processed
>>>>> and ack received. You will see queue time + processing time in kafka spout
>>>>> latency.
>>>>> Take a look at load factors of your bolts - are they close to 1 or
>>>>> more ? and load factor of kafka spout.
>>>>>
>>>>> On 4 February 2015 at 21:19, Andrey Yegorov <an...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> have you tried increasing max spout pending parameter for the spout?
>>>>>>
>>>>>> builder.setSpout("kafka",
>>>>>>                        new KafkaSpout(spoutConfig),
>>>>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>>           //the maximum parallelism you can have on a KafkaSpout is
>>>>>> the number of partitions
>>>>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>>>
>>>>>> ----------
>>>>>> Andrey Yegorov
>>>>>>
>>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <clayteahouse@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> In my topology,  kafka spout is responsible for over 85% of the
>>>>>>> latency. I have tried different spout max pending and played with the
>>>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize the
>>>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high
>>>>>>> throughput with the simple kafka consumer.
>>>>>>>
>>>>>>> thank you for your feedback
>>>>>>> Clay
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: kafkaspout is very slow

Posted by clay teahouse <cl...@gmail.com>.
100,000 records is about 12MB.
I'll try bumping the numbers, by 100 fold to see if it makes any difference.
thanks,
-Clay

On Wed, Feb 4, 2015 at 5:47 PM, Filipa Moura <fi...@gmail.com>
wrote:

> I would bump these numbers up by a lot:
>
> kafka.fetch.size.bytes: 102400    kafka.buffer.size.bytes: 102400
>
> Say 10 or 100 times that or more. I dont know by heart how much I
> increased those numbers on my topology.
>
> How many bytes are you writting per minute on kafka? Try dumping 1 minute
> of messages to a file to figure out how many bytes that is..
> I am reading (sending data to the topic) about 100,000 records per second.
> My kafka consumer can consume the 3 millions records in less than 50
> seconds. I have disabled the ack. But with the ack enabled, I won't even
> get 1500 records per second from the topology. With ack disabled, I get
> about 12000/second.
> I don't lose any data, it is just the data is emitted from the spout to
> the bolt very slowly.
>
>  I did bump my buffer sizes but I am not sure if they are sufficient.
>
>     topology.transfer.buffer.size: 2048
>     topology.executor.buffer.size: 65536
>     topology.receiver.buffer.size: 16
>     topology.executor.send.buffer.size: 65536
>
>     kafka.fetch.size.bytes: 102400
>     kafka.buffer.size.bytes: 102400
>
> thanks
> Clay
>
> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <filipa.mendesmoura@gmail.com
> > wrote:
>
>> can you share a  screenshot of the Storm UI for your spout?
>>
>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <cl...@gmail.com>
>> wrote:
>>
>>>  I have this issue with any amount of load. Different max spout pendings
>>> do not seem to make much a difference. I've lowered this parameter to 100,
>>> still a little difference . At this point the bolt consuming the data does
>>> no processing.
>>>
>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <ha...@evilezh.net>
>>> wrote:
>>>
>>>> I'm not sure, that i understand your problem .. but here is few points:
>>>> If you have large pending spout size and slow processing - you will see
>>>> large latency at kafka spout probably. Spout emits message .. it stays in
>>>> queue for long time (that will add latency) .. and finally is processed and
>>>> ack received. You will see queue time + processing time in kafka spout
>>>> latency.
>>>> Take a look at load factors of your bolts - are they close to 1 or more
>>>> ? and load factor of kafka spout.
>>>>
>>>> On 4 February 2015 at 21:19, Andrey Yegorov <an...@gmail.com>
>>>> wrote:
>>>>
>>>>> have you tried increasing max spout pending parameter for the spout?
>>>>>
>>>>> builder.setSpout("kafka",
>>>>>                        new KafkaSpout(spoutConfig),
>>>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>           //the maximum parallelism you can have on a KafkaSpout is
>>>>> the number of partitions
>>>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>>
>>>>> ----------
>>>>> Andrey Yegorov
>>>>>
>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <cl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> In my topology,  kafka spout is responsible for over 85% of the
>>>>>> latency. I have tried different spout max pending and played with the
>>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize the
>>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high
>>>>>> throughput with the simple kafka consumer.
>>>>>>
>>>>>> thank you for your feedback
>>>>>> Clay
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: kafkaspout is very slow

Posted by Michael Rose <mi...@fullcontact.com>.
You might increase the number of ackers too if acking is slow.

*Michael Rose*
Senior Platform Engineer
*Full*Contact | fullcontact.com
<https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures>
m: +1.720.837.1357 | t: @xorlev


All Your Contacts, Updated and In One Place.
Try FullContact for Free
<https://www.fullcontact.com/?utm_source=FullContact%20-%20Email%20Signatures&utm_medium=email&utm_content=Signature%20Link&utm_campaign=FullContact%20-%20Email%20Signatures>

On Wed, Feb 4, 2015 at 4:47 PM, Filipa Moura <fi...@gmail.com>
wrote:

> I would bump these numbers up by a lot:
>
> kafka.fetch.size.bytes: 102400    kafka.buffer.size.bytes: 102400
>
> Say 10 or 100 times that or more. I dont know by heart how much I
> increased those numbers on my topology.
>
> How many bytes are you writting per minute on kafka? Try dumping 1 minute
> of messages to a file to figure out how many bytes that is..
> I am reading (sending data to the topic) about 100,000 records per second.
> My kafka consumer can consume the 3 millions records in less than 50
> seconds. I have disabled the ack. But with the ack enabled, I won't even
> get 1500 records per second from the topology. With ack disabled, I get
> about 12000/second.
> I don't lose any data, it is just the data is emitted from the spout to
> the bolt very slowly.
>
>  I did bump my buffer sizes but I am not sure if they are sufficient.
>
>     topology.transfer.buffer.size: 2048
>     topology.executor.buffer.size: 65536
>     topology.receiver.buffer.size: 16
>     topology.executor.send.buffer.size: 65536
>
>     kafka.fetch.size.bytes: 102400
>     kafka.buffer.size.bytes: 102400
>
> thanks
> Clay
>
> On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <filipa.mendesmoura@gmail.com
> > wrote:
>
>> can you share a  screenshot of the Storm UI for your spout?
>>
>> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <cl...@gmail.com>
>> wrote:
>>
>>>  I have this issue with any amount of load. Different max spout pendings
>>> do not seem to make much a difference. I've lowered this parameter to 100,
>>> still a little difference . At this point the bolt consuming the data does
>>> no processing.
>>>
>>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <ha...@evilezh.net>
>>> wrote:
>>>
>>>> I'm not sure, that i understand your problem .. but here is few points:
>>>> If you have large pending spout size and slow processing - you will see
>>>> large latency at kafka spout probably. Spout emits message .. it stays in
>>>> queue for long time (that will add latency) .. and finally is processed and
>>>> ack received. You will see queue time + processing time in kafka spout
>>>> latency.
>>>> Take a look at load factors of your bolts - are they close to 1 or more
>>>> ? and load factor of kafka spout.
>>>>
>>>> On 4 February 2015 at 21:19, Andrey Yegorov <an...@gmail.com>
>>>> wrote:
>>>>
>>>>> have you tried increasing max spout pending parameter for the spout?
>>>>>
>>>>> builder.setSpout("kafka",
>>>>>                        new KafkaSpout(spoutConfig),
>>>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>>           //the maximum parallelism you can have on a KafkaSpout is
>>>>> the number of partitions
>>>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>>
>>>>> ----------
>>>>> Andrey Yegorov
>>>>>
>>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <cl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> In my topology,  kafka spout is responsible for over 85% of the
>>>>>> latency. I have tried different spout max pending and played with the
>>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize the
>>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high
>>>>>> throughput with the simple kafka consumer.
>>>>>>
>>>>>> thank you for your feedback
>>>>>> Clay
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: kafkaspout is very slow

Posted by Filipa Moura <fi...@gmail.com>.
I would bump these numbers up by a lot:

kafka.fetch.size.bytes: 102400    kafka.buffer.size.bytes: 102400

Say 10 or 100 times that or more. I dont know by heart how much I increased
those numbers on my topology.

How many bytes are you writting per minute on kafka? Try dumping 1 minute
of messages to a file to figure out how many bytes that is..
I am reading (sending data to the topic) about 100,000 records per second.
My kafka consumer can consume the 3 millions records in less than 50
seconds. I have disabled the ack. But with the ack enabled, I won't even
get 1500 records per second from the topology. With ack disabled, I get
about 12000/second.
I don't lose any data, it is just the data is emitted from the spout to the
bolt very slowly.

 I did bump my buffer sizes but I am not sure if they are sufficient.

    topology.transfer.buffer.size: 2048
    topology.executor.buffer.size: 65536
    topology.receiver.buffer.size: 16
    topology.executor.send.buffer.size: 65536

    kafka.fetch.size.bytes: 102400
    kafka.buffer.size.bytes: 102400

thanks
Clay

On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <fi...@gmail.com>
wrote:

> can you share a  screenshot of the Storm UI for your spout?
>
> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <cl...@gmail.com>
> wrote:
>
>>  I have this issue with any amount of load. Different max spout pendings
>> do not seem to make much a difference. I've lowered this parameter to 100,
>> still a little difference . At this point the bolt consuming the data does
>> no processing.
>>
>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <ha...@evilezh.net>
>> wrote:
>>
>>> I'm not sure, that i understand your problem .. but here is few points:
>>> If you have large pending spout size and slow processing - you will see
>>> large latency at kafka spout probably. Spout emits message .. it stays in
>>> queue for long time (that will add latency) .. and finally is processed and
>>> ack received. You will see queue time + processing time in kafka spout
>>> latency.
>>> Take a look at load factors of your bolts - are they close to 1 or more
>>> ? and load factor of kafka spout.
>>>
>>> On 4 February 2015 at 21:19, Andrey Yegorov <an...@gmail.com>
>>> wrote:
>>>
>>>> have you tried increasing max spout pending parameter for the spout?
>>>>
>>>> builder.setSpout("kafka",
>>>>                        new KafkaSpout(spoutConfig),
>>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>           //the maximum parallelism you can have on a KafkaSpout is the
>>>> number of partitions
>>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>
>>>> ----------
>>>> Andrey Yegorov
>>>>
>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <cl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> In my topology,  kafka spout is responsible for over 85% of the
>>>>> latency. I have tried different spout max pending and played with the
>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize the
>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high
>>>>> throughput with the simple kafka consumer.
>>>>>
>>>>> thank you for your feedback
>>>>> Clay
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: kafkaspout is very slow

Posted by clay teahouse <cl...@gmail.com>.
I am reading (sending data to the topic) about 100,000 records per second.
My kafka consumer can consume the 3 millions records in less than 50
seconds. I have disabled the ack. But with the ack enabled, I won't even
get 1500 records per second from the topology. With ack disabled, I get
about 12000/second.
I don't lose any data, it is just the data is emitted from the spout to the
bolt very slowly.

 I did bump my buffer sizes but I am not sure if they are sufficient.

    topology.transfer.buffer.size: 2048
    topology.executor.buffer.size: 65536
    topology.receiver.buffer.size: 16
    topology.executor.send.buffer.size: 65536

    kafka.fetch.size.bytes: 102400
    kafka.buffer.size.bytes: 102400

thanks
Clay

On Wed, Feb 4, 2015 at 4:24 PM, Filipa Moura <fi...@gmail.com>
wrote:

> can you share a  screenshot of the Storm UI for your spout?
>
> On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <cl...@gmail.com>
> wrote:
>
>>  I have this issue with any amount of load. Different max spout pendings
>> do not seem to make much a difference. I've lowered this parameter to 100,
>> still a little difference . At this point the bolt consuming the data does
>> no processing.
>>
>> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <ha...@evilezh.net>
>> wrote:
>>
>>> I'm not sure, that i understand your problem .. but here is few points:
>>> If you have large pending spout size and slow processing - you will see
>>> large latency at kafka spout probably. Spout emits message .. it stays in
>>> queue for long time (that will add latency) .. and finally is processed and
>>> ack received. You will see queue time + processing time in kafka spout
>>> latency.
>>> Take a look at load factors of your bolts - are they close to 1 or more
>>> ? and load factor of kafka spout.
>>>
>>> On 4 February 2015 at 21:19, Andrey Yegorov <an...@gmail.com>
>>> wrote:
>>>
>>>> have you tried increasing max spout pending parameter for the spout?
>>>>
>>>> builder.setSpout("kafka",
>>>>                        new KafkaSpout(spoutConfig),
>>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>>           //the maximum parallelism you can have on a KafkaSpout is the
>>>> number of partitions
>>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>>
>>>> ----------
>>>> Andrey Yegorov
>>>>
>>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <cl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> In my topology,  kafka spout is responsible for over 85% of the
>>>>> latency. I have tried different spout max pending and played with the
>>>>> buffer size and fetch size, still no luck. Any hint on how to optimize the
>>>>> spout? The issue doesn't seem to be with the kafka side, as I see high
>>>>> throughput with the simple kafka consumer.
>>>>>
>>>>> thank you for your feedback
>>>>> Clay
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: kafkaspout is very slow

Posted by Filipa Moura <fi...@gmail.com>.
can you share a  screenshot of the Storm UI for your spout?

On Wed, Feb 4, 2015 at 9:58 PM, clay teahouse <cl...@gmail.com>
wrote:

>  I have this issue with any amount of load. Different max spout pendings
> do not seem to make much a difference. I've lowered this parameter to 100,
> still a little difference . At this point the bolt consuming the data does
> no processing.
>
> On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <ha...@evilezh.net>
> wrote:
>
>> I'm not sure, that i understand your problem .. but here is few points:
>> If you have large pending spout size and slow processing - you will see
>> large latency at kafka spout probably. Spout emits message .. it stays in
>> queue for long time (that will add latency) .. and finally is processed and
>> ack received. You will see queue time + processing time in kafka spout
>> latency.
>> Take a look at load factors of your bolts - are they close to 1 or more ?
>> and load factor of kafka spout.
>>
>> On 4 February 2015 at 21:19, Andrey Yegorov <an...@gmail.com>
>> wrote:
>>
>>> have you tried increasing max spout pending parameter for the spout?
>>>
>>> builder.setSpout("kafka",
>>>                        new KafkaSpout(spoutConfig),
>>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>>           //the maximum parallelism you can have on a KafkaSpout is the
>>> number of partitions
>>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>>
>>> ----------
>>> Andrey Yegorov
>>>
>>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <cl...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> In my topology,  kafka spout is responsible for over 85% of the
>>>> latency. I have tried different spout max pending and played with the
>>>> buffer size and fetch size, still no luck. Any hint on how to optimize the
>>>> spout? The issue doesn't seem to be with the kafka side, as I see high
>>>> throughput with the simple kafka consumer.
>>>>
>>>> thank you for your feedback
>>>> Clay
>>>>
>>>>
>>>
>>
>

Re: kafkaspout is very slow

Posted by clay teahouse <cl...@gmail.com>.
 I have this issue with any amount of load. Different max spout pendings do
not seem to make much a difference. I've lowered this parameter to 100,
still a little difference . At this point the bolt consuming the data does
no processing.

On Wed, Feb 4, 2015 at 3:26 PM, Haralds Ulmanis <ha...@evilezh.net> wrote:

> I'm not sure, that i understand your problem .. but here is few points:
> If you have large pending spout size and slow processing - you will see
> large latency at kafka spout probably. Spout emits message .. it stays in
> queue for long time (that will add latency) .. and finally is processed and
> ack received. You will see queue time + processing time in kafka spout
> latency.
> Take a look at load factors of your bolts - are they close to 1 or more ?
> and load factor of kafka spout.
>
> On 4 February 2015 at 21:19, Andrey Yegorov <an...@gmail.com>
> wrote:
>
>> have you tried increasing max spout pending parameter for the spout?
>>
>> builder.setSpout("kafka",
>>                        new KafkaSpout(spoutConfig),
>>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>>           //the maximum parallelism you can have on a KafkaSpout is the
>> number of partitions
>>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>>
>> ----------
>> Andrey Yegorov
>>
>> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <cl...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> In my topology,  kafka spout is responsible for over 85% of the latency.
>>> I have tried different spout max pending and played with the buffer size
>>> and fetch size, still no luck. Any hint on how to optimize the spout? The
>>> issue doesn't seem to be with the kafka side, as I see high throughput with
>>> the simple kafka consumer.
>>>
>>> thank you for your feedback
>>> Clay
>>>
>>>
>>
>

Re: kafkaspout is very slow

Posted by Haralds Ulmanis <ha...@evilezh.net>.
I'm not sure, that i understand your problem .. but here is few points:
If you have large pending spout size and slow processing - you will see
large latency at kafka spout probably. Spout emits message .. it stays in
queue for long time (that will add latency) .. and finally is processed and
ack received. You will see queue time + processing time in kafka spout
latency.
Take a look at load factors of your bolts - are they close to 1 or more ?
and load factor of kafka spout.

On 4 February 2015 at 21:19, Andrey Yegorov <an...@gmail.com>
wrote:

> have you tried increasing max spout pending parameter for the spout?
>
> builder.setSpout("kafka",
>                        new KafkaSpout(spoutConfig),
>                        TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>           .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
>           //the maximum parallelism you can have on a KafkaSpout is the
> number of partitions
>           .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);
>
> ----------
> Andrey Yegorov
>
> On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <cl...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> In my topology,  kafka spout is responsible for over 85% of the latency.
>> I have tried different spout max pending and played with the buffer size
>> and fetch size, still no luck. Any hint on how to optimize the spout? The
>> issue doesn't seem to be with the kafka side, as I see high throughput with
>> the simple kafka consumer.
>>
>> thank you for your feedback
>> Clay
>>
>>
>

Re: kafkaspout is very slow

Posted by Andrey Yegorov <an...@gmail.com>.
have you tried increasing max spout pending parameter for the spout?

builder.setSpout("kafka",
                       new KafkaSpout(spoutConfig),
                       TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
          .setNumTasks(TOPOLOGY_NUM_TASKS_KAFKA_SPOUT)
          //the maximum parallelism you can have on a KafkaSpout is the
number of partitions
          .setMaxSpoutPending(*TOPOLOGY_MAX_SPOUT_PENDING*);

----------
Andrey Yegorov

On Tue, Feb 3, 2015 at 4:03 AM, clay teahouse <cl...@gmail.com>
wrote:

> Hi all,
>
> In my topology,  kafka spout is responsible for over 85% of the latency. I
> have tried different spout max pending and played with the buffer size and
> fetch size, still no luck. Any hint on how to optimize the spout? The issue
> doesn't seem to be with the kafka side, as I see high throughput with the
> simple kafka consumer.
>
> thank you for your feedback
> Clay
>
>