You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Roshan Naik <ro...@hortonworks.com> on 2017/09/04 19:46:11 UTC
Re: Performance issues on idle trident topology
Roee, Have you found a resolution ? If not please bring it up on the dev list.
-roshan
From: Roee Shenberg <sh...@alooma.io>
Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
Date: Wednesday, August 30, 2017 at 5:43 AM
To: "user@storm.apache.org" <us...@storm.apache.org>
Subject: Re: Performance issues on idle trident topology
Hi Roshan,
The only unusual setting I'm using is topology.disruptor.batch.timeout.millis set to 20 ms instead of the default 1ms, following my previous performance issues.
Roee
On Wed, Aug 30, 2017 at 3:50 AM, Roshan Naik <ro...@hortonworks.com>> wrote:
Is it running with default settings ? If not, may want to share any non-default settings being used. Somebody (more familiar with this than myself) might be able to chime in.
-roshan
From: Roee Shenberg <sh...@alooma.io>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Tuesday, August 29, 2017 at 9:03 AM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: Performance issues on idle trident topology
Hi,
I'm running Storm 1.0.2 + Trident and I have an issue with idle topologies consuming many CPU cycles (around four entire cores), after having processed some messages.
I'm not quite sure what the issue is, because it requires some activity on the topology to trigger said behavior.
My research on what's causing the issue has reached a dead end since it's not easy to replicate the matter (it happens reliably after a few days of idle time).
I managed to make the issue happen on a 1-machine cluster with only one topology running on it, and saw the following:
sudo strace -f -c -p 24996
strace: Process 24996 attached with 242 threads
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
77.96 228.512766 260 878277 epoll_wait
17.78 52.106948 76628 680 180 futex
3.73 10.924000 1365500 8 8 restart_syscall
0.20 0.584934 2600 225 epoll_ctl
0.20 0.574205 1704 337 read
0.14 0.416201 1850 225 write
0.00 0.000000 0 1 stat
0.00 0.000000 0 1 mmap
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 1 rt_sigreturn
0.00 0.000000 0 1 madvise
------ ----------- ----------- --------- --------- ----------------
100.00 293.119054 879757 188 total
That's an unreasonable amount of epoll_wait calls, and I noticed a large amount of ZooKeeper connections: nc -n | grep 2181 | wc -l returned 25 connections
The topology statistics were ~2400 tuples emitted and ~3000 tuples transferred in the 10 minute window. Basically idle.
Next, I produced a flamegraph including kernel times while tracing the process, which I couldn't attach as a fully-functional, clickable SVG so I took a screenshot of it:
[line image 1]
The flamegraph shows the entire time is spent in org.apache.storm.shade.org<http://org.apache.storm.shade.org>.apache.zookeeper.ClientCnxn$SendThread.run
Split 50/50 between sys_epoll_wait in the kernel and java code, mostly the java side of select, with a non negligible amount of time spent in org/apache/zookeeper/ClientCnxn$SendThread:::clientTunneledAuthenticationInProgress
This seems like the zookeeper element is malfunctioning somehow, but I'm not sure how.
Do you have any advice regarding knobs to tweak in storm or ZK config or additional diagnostics to run? (the relevant ZK config AFAIK is the tick interval, which is at the default 2 seconds)
Thanks,
Roee
Re: Performance issues on idle trident topology
Posted by Roee Shenberg <sh...@alooma.io>.
Ok, the problem was an interaction between a broken linux API (epoll, which
underlies Java selectors) and a broken async replacement for
java.lang.Process called NuProcess. Not related to Storm.
Roee
P.S. There's a good description of the general issue here:
https://idea.popcount.org/2017-03-20-epoll-is-fundamentally-broken-22/
On Sun, Oct 8, 2017 at 6:45 PM, Roee Shenberg <sh...@alooma.io> wrote:
> Not going away. This time around I did a per-thread cpu use breakdown and
> went back to jstack, and I verified all the offending threads are
> of the form Thread-N-spout/bolt-<NAME>-executor[33 33]-SendThread(<ZK
> server address>)
>
> It looks like zookeeper is not clearing a socket after select() (well,
> epoll_wait()) somehow and so keeps on selecting as fast as it can. Using
> argdist (a piece of ebpf magic, part of bcc), a nice histogram of calls
> parametrized by timeout parameter looks as follows:
> timeout : count distribution
> 0 -> 1 : 0 |
> |
> 2 -> 3 : 0 |
> |
> 4 -> 7 : 0 |
> |
> 8 -> 15 : 0 |
> |
> 16 -> 31 : 0 |
> |
> 32 -> 63 : 0 |
> |
> 64 -> 127 : 0 |
> |
> 128 -> 255 : 4 |
> |
> 256 -> 511 : 6 |
> |
> 512 -> 1023 : 0 |
> |
> 1024 -> 2047 : 145259 |*****
> |
> 2048 -> 4095 : 1042417 |*****************************
> ***********|
> 4096 -> 8191 : 284079 |**********
> |
>
> These are actual numbers— calls on a live system in the span of one
> second. 1.5m calls to epoll_wait(), none with low timeouts, so I see no
> other possible explanation for how epoll_wait() is called enough times per
> second except that it's somehow not cleared.
>
> I've enabled all log levels for zookeeper in an attempt to glean something
> interesting (I haven't), and I've started looking over the zookeeper client
> code (3.4.5, matching up to the version we get bundled with storm 1.0.2)
>
> Any and all suggestions regarding what next are welcome.
>
> Thanks,
> Roee
>
> On Sun, Sep 17, 2017 at 10:05 AM, Roee Shenberg <sh...@alooma.io>
> wrote:
>
>> Unfortunately, I haven't found a reliable way to replicate this behavior,
>> so I've moved on for now. Will update if the behavior recurs.
>>
>> On Mon, Sep 4, 2017 at 10:46 PM, Roshan Naik <ro...@hortonworks.com>
>> wrote:
>>
>>> Roee, Have you found a resolution ? If not please bring it up on the dev
>>> list.
>>>
>>> -roshan
>>>
>>>
>>>
>>> *From: *Roee Shenberg <sh...@alooma.io>
>>> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
>>> *Date: *Wednesday, August 30, 2017 at 5:43 AM
>>>
>>> *To: *"user@storm.apache.org" <us...@storm.apache.org>
>>> *Subject: *Re: Performance issues on idle trident topology
>>>
>>>
>>>
>>> Hi Roshan,
>>>
>>>
>>>
>>> The only unusual setting I'm using is topology.disruptor.batch.ti
>>> meout.millis set to 20 ms instead of the default 1ms, following my
>>> previous performance issues.
>>>
>>>
>>>
>>> Roee
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Aug 30, 2017 at 3:50 AM, Roshan Naik <ro...@hortonworks.com>
>>> wrote:
>>>
>>> Is it running with default settings ? If not, may want to share any
>>> non-default settings being used. Somebody (more familiar with this than
>>> myself) might be able to chime in.
>>>
>>> -roshan
>>>
>>>
>>>
>>> *From: *Roee Shenberg <sh...@alooma.io>
>>> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
>>> *Date: *Tuesday, August 29, 2017 at 9:03 AM
>>> *To: *"user@storm.apache.org" <us...@storm.apache.org>
>>> *Subject: *Re: Performance issues on idle trident topology
>>>
>>>
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I'm running Storm 1.0.2 + Trident and I have an issue with idle
>>> topologies consuming many CPU cycles (around four entire cores), after
>>> having processed some messages.
>>>
>>> I'm not quite sure what the issue is, because it requires some activity
>>> on the topology to trigger said behavior.
>>>
>>>
>>>
>>> My research on what's causing the issue has reached a dead end since
>>> it's not easy to replicate the matter (it happens reliably after a few days
>>> of idle time).
>>>
>>>
>>>
>>> I managed to make the issue happen on a 1-machine cluster with only one
>>> topology running on it, and saw the following:
>>>
>>>
>>>
>>> sudo strace -f -c -p 24996
>>>
>>> strace: Process 24996 attached with 242 threads
>>>
>>> % time seconds usecs/call calls errors syscall
>>>
>>> ------ ----------- ----------- --------- --------- ----------------
>>>
>>> 77.96 228.512766 260 878277 epoll_wait
>>>
>>> 17.78 52.106948 76628 680 180 futex
>>>
>>> 3.73 10.924000 1365500 8 8 restart_syscall
>>>
>>> 0.20 0.584934 2600 225 epoll_ctl
>>>
>>> 0.20 0.574205 1704 337 read
>>>
>>> 0.14 0.416201 1850 225 write
>>>
>>> 0.00 0.000000 0 1 stat
>>>
>>> 0.00 0.000000 0 1 mmap
>>>
>>> 0.00 0.000000 0 1 rt_sigprocmask
>>>
>>> 0.00 0.000000 0 1 rt_sigreturn
>>>
>>> 0.00 0.000000 0 1 madvise
>>>
>>> ------ ----------- ----------- --------- --------- ----------------
>>>
>>> 100.00 293.119054 879757 188 total
>>>
>>>
>>>
>>> That's an unreasonable amount of epoll_wait calls, and I noticed a large
>>> amount of ZooKeeper connections: nc -n | grep 2181 | wc -l returned 25
>>> connections
>>>
>>>
>>>
>>> The topology statistics were ~2400 tuples emitted and ~3000 tuples
>>> transferred in the 10 minute window. Basically idle.
>>>
>>>
>>>
>>> Next, I produced a flamegraph including kernel times while tracing the
>>> process, which I couldn't attach as a fully-functional, clickable SVG so I
>>> took a screenshot of it:
>>>
>>> [image: line image 1]
>>>
>>>
>>> The flamegraph shows the entire time is spent in
>>> org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run
>>>
>>>
>>>
>>> Split 50/50 between sys_epoll_wait in the kernel and java code, mostly
>>> the java side of select, with a non negligible amount of time spent in
>>> org/apache/zookeeper/ClientCnxn$SendThread:::clientTunnel
>>> edAuthenticationInProgress
>>>
>>>
>>>
>>> This seems like the zookeeper element is malfunctioning somehow, but I'm
>>> not sure how.
>>>
>>>
>>>
>>> Do you have any advice regarding knobs to tweak in storm or ZK config or
>>> additional diagnostics to run? (the relevant ZK config AFAIK is the tick
>>> interval, which is at the default 2 seconds)
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Roee
>>>
>>>
>>>
>>>
>>>
>>
>>
>
Re: Performance issues on idle trident topology
Posted by Roee Shenberg <sh...@alooma.io>.
Not going away. This time around I did a per-thread cpu use breakdown and
went back to jstack, and I verified all the offending threads are
of the form Thread-N-spout/bolt-<NAME>-executor[33 33]-SendThread(<ZK
server address>)
It looks like zookeeper is not clearing a socket after select() (well,
epoll_wait()) somehow and so keeps on selecting as fast as it can. Using
argdist (a piece of ebpf magic, part of bcc), a nice histogram of calls
parametrized by timeout parameter looks as follows:
timeout : count distribution
0 -> 1 : 0 |
|
2 -> 3 : 0 |
|
4 -> 7 : 0 |
|
8 -> 15 : 0 |
|
16 -> 31 : 0 |
|
32 -> 63 : 0 |
|
64 -> 127 : 0 |
|
128 -> 255 : 4 |
|
256 -> 511 : 6 |
|
512 -> 1023 : 0 |
|
1024 -> 2047 : 145259 |*****
|
2048 -> 4095 : 1042417
|****************************************|
4096 -> 8191 : 284079 |**********
|
These are actual numbers— calls on a live system in the span of one second.
1.5m calls to epoll_wait(), none with low timeouts, so I see no other
possible explanation for how epoll_wait() is called enough times per second
except that it's somehow not cleared.
I've enabled all log levels for zookeeper in an attempt to glean something
interesting (I haven't), and I've started looking over the zookeeper client
code (3.4.5, matching up to the version we get bundled with storm 1.0.2)
Any and all suggestions regarding what next are welcome.
Thanks,
Roee
On Sun, Sep 17, 2017 at 10:05 AM, Roee Shenberg <sh...@alooma.io> wrote:
> Unfortunately, I haven't found a reliable way to replicate this behavior,
> so I've moved on for now. Will update if the behavior recurs.
>
> On Mon, Sep 4, 2017 at 10:46 PM, Roshan Naik <ro...@hortonworks.com>
> wrote:
>
>> Roee, Have you found a resolution ? If not please bring it up on the dev
>> list.
>>
>> -roshan
>>
>>
>>
>> *From: *Roee Shenberg <sh...@alooma.io>
>> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
>> *Date: *Wednesday, August 30, 2017 at 5:43 AM
>>
>> *To: *"user@storm.apache.org" <us...@storm.apache.org>
>> *Subject: *Re: Performance issues on idle trident topology
>>
>>
>>
>> Hi Roshan,
>>
>>
>>
>> The only unusual setting I'm using is topology.disruptor.batch.ti
>> meout.millis set to 20 ms instead of the default 1ms, following my
>> previous performance issues.
>>
>>
>>
>> Roee
>>
>>
>>
>>
>>
>> On Wed, Aug 30, 2017 at 3:50 AM, Roshan Naik <ro...@hortonworks.com>
>> wrote:
>>
>> Is it running with default settings ? If not, may want to share any
>> non-default settings being used. Somebody (more familiar with this than
>> myself) might be able to chime in.
>>
>> -roshan
>>
>>
>>
>> *From: *Roee Shenberg <sh...@alooma.io>
>> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
>> *Date: *Tuesday, August 29, 2017 at 9:03 AM
>> *To: *"user@storm.apache.org" <us...@storm.apache.org>
>> *Subject: *Re: Performance issues on idle trident topology
>>
>>
>>
>>
>>
>> Hi,
>>
>>
>>
>> I'm running Storm 1.0.2 + Trident and I have an issue with idle
>> topologies consuming many CPU cycles (around four entire cores), after
>> having processed some messages.
>>
>> I'm not quite sure what the issue is, because it requires some activity
>> on the topology to trigger said behavior.
>>
>>
>>
>> My research on what's causing the issue has reached a dead end since it's
>> not easy to replicate the matter (it happens reliably after a few days of
>> idle time).
>>
>>
>>
>> I managed to make the issue happen on a 1-machine cluster with only one
>> topology running on it, and saw the following:
>>
>>
>>
>> sudo strace -f -c -p 24996
>>
>> strace: Process 24996 attached with 242 threads
>>
>> % time seconds usecs/call calls errors syscall
>>
>> ------ ----------- ----------- --------- --------- ----------------
>>
>> 77.96 228.512766 260 878277 epoll_wait
>>
>> 17.78 52.106948 76628 680 180 futex
>>
>> 3.73 10.924000 1365500 8 8 restart_syscall
>>
>> 0.20 0.584934 2600 225 epoll_ctl
>>
>> 0.20 0.574205 1704 337 read
>>
>> 0.14 0.416201 1850 225 write
>>
>> 0.00 0.000000 0 1 stat
>>
>> 0.00 0.000000 0 1 mmap
>>
>> 0.00 0.000000 0 1 rt_sigprocmask
>>
>> 0.00 0.000000 0 1 rt_sigreturn
>>
>> 0.00 0.000000 0 1 madvise
>>
>> ------ ----------- ----------- --------- --------- ----------------
>>
>> 100.00 293.119054 879757 188 total
>>
>>
>>
>> That's an unreasonable amount of epoll_wait calls, and I noticed a large
>> amount of ZooKeeper connections: nc -n | grep 2181 | wc -l returned 25
>> connections
>>
>>
>>
>> The topology statistics were ~2400 tuples emitted and ~3000 tuples
>> transferred in the 10 minute window. Basically idle.
>>
>>
>>
>> Next, I produced a flamegraph including kernel times while tracing the
>> process, which I couldn't attach as a fully-functional, clickable SVG so I
>> took a screenshot of it:
>>
>> [image: line image 1]
>>
>>
>> The flamegraph shows the entire time is spent in
>> org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run
>>
>>
>>
>> Split 50/50 between sys_epoll_wait in the kernel and java code, mostly
>> the java side of select, with a non negligible amount of time spent in
>> org/apache/zookeeper/ClientCnxn$SendThread:::clientTunnel
>> edAuthenticationInProgress
>>
>>
>>
>> This seems like the zookeeper element is malfunctioning somehow, but I'm
>> not sure how.
>>
>>
>>
>> Do you have any advice regarding knobs to tweak in storm or ZK config or
>> additional diagnostics to run? (the relevant ZK config AFAIK is the tick
>> interval, which is at the default 2 seconds)
>>
>>
>>
>> Thanks,
>>
>> Roee
>>
>>
>>
>>
>>
>
>
Re: Performance issues on idle trident topology
Posted by Roee Shenberg <sh...@alooma.io>.
Unfortunately, I haven't found a reliable way to replicate this behavior,
so I've moved on for now. Will update if the behavior recurs.
On Mon, Sep 4, 2017 at 10:46 PM, Roshan Naik <ro...@hortonworks.com> wrote:
> Roee, Have you found a resolution ? If not please bring it up on the dev
> list.
>
> -roshan
>
>
>
> *From: *Roee Shenberg <sh...@alooma.io>
> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Date: *Wednesday, August 30, 2017 at 5:43 AM
>
> *To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Subject: *Re: Performance issues on idle trident topology
>
>
>
> Hi Roshan,
>
>
>
> The only unusual setting I'm using is topology.disruptor.batch.
> timeout.millis set to 20 ms instead of the default 1ms, following my
> previous performance issues.
>
>
>
> Roee
>
>
>
>
>
> On Wed, Aug 30, 2017 at 3:50 AM, Roshan Naik <ro...@hortonworks.com>
> wrote:
>
> Is it running with default settings ? If not, may want to share any
> non-default settings being used. Somebody (more familiar with this than
> myself) might be able to chime in.
>
> -roshan
>
>
>
> *From: *Roee Shenberg <sh...@alooma.io>
> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Date: *Tuesday, August 29, 2017 at 9:03 AM
> *To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Subject: *Re: Performance issues on idle trident topology
>
>
>
>
>
> Hi,
>
>
>
> I'm running Storm 1.0.2 + Trident and I have an issue with idle topologies
> consuming many CPU cycles (around four entire cores), after having
> processed some messages.
>
> I'm not quite sure what the issue is, because it requires some activity on
> the topology to trigger said behavior.
>
>
>
> My research on what's causing the issue has reached a dead end since it's
> not easy to replicate the matter (it happens reliably after a few days of
> idle time).
>
>
>
> I managed to make the issue happen on a 1-machine cluster with only one
> topology running on it, and saw the following:
>
>
>
> sudo strace -f -c -p 24996
>
> strace: Process 24996 attached with 242 threads
>
> % time seconds usecs/call calls errors syscall
>
> ------ ----------- ----------- --------- --------- ----------------
>
> 77.96 228.512766 260 878277 epoll_wait
>
> 17.78 52.106948 76628 680 180 futex
>
> 3.73 10.924000 1365500 8 8 restart_syscall
>
> 0.20 0.584934 2600 225 epoll_ctl
>
> 0.20 0.574205 1704 337 read
>
> 0.14 0.416201 1850 225 write
>
> 0.00 0.000000 0 1 stat
>
> 0.00 0.000000 0 1 mmap
>
> 0.00 0.000000 0 1 rt_sigprocmask
>
> 0.00 0.000000 0 1 rt_sigreturn
>
> 0.00 0.000000 0 1 madvise
>
> ------ ----------- ----------- --------- --------- ----------------
>
> 100.00 293.119054 879757 188 total
>
>
>
> That's an unreasonable amount of epoll_wait calls, and I noticed a large
> amount of ZooKeeper connections: nc -n | grep 2181 | wc -l returned 25
> connections
>
>
>
> The topology statistics were ~2400 tuples emitted and ~3000 tuples
> transferred in the 10 minute window. Basically idle.
>
>
>
> Next, I produced a flamegraph including kernel times while tracing the
> process, which I couldn't attach as a fully-functional, clickable SVG so I
> took a screenshot of it:
>
> [image: line image 1]
>
>
> The flamegraph shows the entire time is spent in
> org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run
>
>
>
> Split 50/50 between sys_epoll_wait in the kernel and java code, mostly the
> java side of select, with a non negligible amount of time spent in
> org/apache/zookeeper/ClientCnxn$SendThread:::
> clientTunneledAuthenticationInProgress
>
>
>
> This seems like the zookeeper element is malfunctioning somehow, but I'm
> not sure how.
>
>
>
> Do you have any advice regarding knobs to tweak in storm or ZK config or
> additional diagnostics to run? (the relevant ZK config AFAIK is the tick
> interval, which is at the default 2 seconds)
>
>
>
> Thanks,
>
> Roee
>
>
>
>
>