You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Sid <ja...@gmail.com> on 2015/04/24 20:16:58 UTC

Storm multilang performance

My storm topology has python bolts using multilang support.

Kafka_spout(java) -> format_filter_bolt -> process_bolt

My storm cluster has 3 32 core EC2 instances. format_filter_bolt has 1
executor and 1 task, process_bolt has 36 executor and 32 tasks. I have max
spout pending = 250.

I observe that format_filter_bolt has a execute latency of 0.018 and
process latency of 7.585, while process_bolt has an execute latency of
0.008 and process latency 94664.242!

These numbers look strange already. Comparing the two bolts, one with lower
execute latency and handling fewer tuples has magnitude times larger
process latency.

Looking for suggestions on what I can do improve the latencies without
throwing more computation at it. Thanks

Re: Storm multilang performance

Posted by Srikanth <sr...@gmail.com>.

I guess tuples are waiting for them to be read by your python bolt.
200ms per tuple is a lot of processing time.Your up stream bolt/spout might
have emitted thousands of tuple by then and they have no where to go.
Have you measured how many tuples were emitted per sec by your spout?
Add a time stamp while emitting to see much time a tuple spent waiting to
be picked up. See if this time increases gradually.

Regarding large string payload, you may want to experiment custom
serializer for ShellBolt. May be protobuf or JSON with compression. YMMV.

Unix IO redirection has its limitation with huge payload but in general it
is much faster than TCP. If you do want to try writing to DB, use Redis.

Srikanth

On Mon, Apr 27, 2015 at 7:22 PM, Sid <ja...@gmail.com> wrote:

> Thanks Srikanth. I played with 1-2 tasks per executor, it helps a bit but
> not by much. I need to fieldshuffle, so parallelism won't solve my problem.
>
> I profiled the python bolt, and it takes order of 200ms per tuple, which
> is in line with the execute latency. But process latency is in 20s of
> minutes!
> Played with worker and executor buffer sizes but it didn't help either.
>
> It seems that tuples are waiting somewhere in a queue but I'm not able to
> reduce that wait. Really low max spout pending does help but anything above
> 256 kills the performance.
>
> One more data point: I have a large json encoded object(str) in my tuple.
> Thinking of dumping it in a db and referencing it in the tuple.
>
> Open to suggestions. Thanks.
>
> Sid
> On Apr 27, 2015 4:10 PM, "Srikanth" <sr...@gmail.com> wrote:
>
>> Why have you configured 32 tasks on 36 executors? Set noof task to at
>> least 36.
>> Looks like your Python bolt takes some time to process a tuple. You may
>> need to tune that or give it more threads.
>> If you are not maxing out on resource usage, set noof task to 72 and see
>> if that helps.
>>
>> Srikanth
>>
>> On Fri, Apr 24, 2015 at 2:16 PM, Sid <ja...@gmail.com> wrote:
>>
>>> My storm topology has python bolts using multilang support.
>>>
>>> Kafka_spout(java) -> format_filter_bolt -> process_bolt
>>>
>>> My storm cluster has 3 32 core EC2 instances. format_filter_bolt has 1
>>> executor and 1 task, process_bolt has 36 executor and 32 tasks. I have max
>>> spout pending = 250.
>>>
>>> I observe that format_filter_bolt has a execute latency of 0.018 and
>>> process latency of 7.585, while process_bolt has an execute latency of
>>> 0.008 and process latency 94664.242!
>>>
>>> These numbers look strange already. Comparing the two bolts, one with
>>> lower execute latency and handling fewer tuples has magnitude times larger
>>> process latency.
>>>
>>> Looking for suggestions on what I can do improve the latencies without
>>> throwing more computation at it. Thanks
>>>
>>
>>

Re: Storm multilang performance

Posted by Sid <ja...@gmail.com>.

Thanks Srikanth. I played with 1-2 tasks per executor, it helps a bit but
not by much. I need to fieldshuffle, so parallelism won't solve my problem.

I profiled the python bolt, and it takes order of 200ms per tuple, which is
in line with the execute latency. But process latency is in 20s of minutes!
Played with worker and executor buffer sizes but it didn't help either.

It seems that tuples are waiting somewhere in a queue but I'm not able to
reduce that wait. Really low max spout pending does help but anything above
256 kills the performance.

One more data point: I have a large json encoded object(str) in my tuple.
Thinking of dumping it in a db and referencing it in the tuple.

Open to suggestions. Thanks.

Sid
On Apr 27, 2015 4:10 PM, "Srikanth" <sr...@gmail.com> wrote:

> Why have you configured 32 tasks on 36 executors? Set noof task to at
> least 36.
> Looks like your Python bolt takes some time to process a tuple. You may
> need to tune that or give it more threads.
> If you are not maxing out on resource usage, set noof task to 72 and see
> if that helps.
>
> Srikanth
>
> On Fri, Apr 24, 2015 at 2:16 PM, Sid <ja...@gmail.com> wrote:
>
>> My storm topology has python bolts using multilang support.
>>
>> Kafka_spout(java) -> format_filter_bolt -> process_bolt
>>
>> My storm cluster has 3 32 core EC2 instances. format_filter_bolt has 1
>> executor and 1 task, process_bolt has 36 executor and 32 tasks. I have max
>> spout pending = 250.
>>
>> I observe that format_filter_bolt has a execute latency of 0.018 and
>> process latency of 7.585, while process_bolt has an execute latency of
>> 0.008 and process latency 94664.242!
>>
>> These numbers look strange already. Comparing the two bolts, one with
>> lower execute latency and handling fewer tuples has magnitude times larger
>> process latency.
>>
>> Looking for suggestions on what I can do improve the latencies without
>> throwing more computation at it. Thanks
>>
>
>

Re: Storm multilang performance

Posted by Srikanth <sr...@gmail.com>.

Why have you configured 32 tasks on 36 executors? Set noof task to at least
36.
Looks like your Python bolt takes some time to process a tuple. You may
need to tune that or give it more threads.
If you are not maxing out on resource usage, set noof task to 72 and see if
that helps.

Srikanth

On Fri, Apr 24, 2015 at 2:16 PM, Sid <ja...@gmail.com> wrote:

> My storm topology has python bolts using multilang support.
>
> Kafka_spout(java) -> format_filter_bolt -> process_bolt
>
> My storm cluster has 3 32 core EC2 instances. format_filter_bolt has 1
> executor and 1 task, process_bolt has 36 executor and 32 tasks. I have max
> spout pending = 250.
>
> I observe that format_filter_bolt has a execute latency of 0.018 and
> process latency of 7.585, while process_bolt has an execute latency of
> 0.008 and process latency 94664.242!
>
> These numbers look strange already. Comparing the two bolts, one with
> lower execute latency and handling fewer tuples has magnitude times larger
> process latency.
>
> Looking for suggestions on what I can do improve the latencies without
> throwing more computation at it. Thanks
>