You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Roshan Naik <ro...@hortonworks.com> on 2017/09/04 19:46:11 UTC

Re: Performance issues on idle trident topology

Roee, Have you found a resolution ? If not please bring it up on the dev list.
-roshan

From: Roee Shenberg <sh...@alooma.io>
Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
Date: Wednesday, August 30, 2017 at 5:43 AM
To: "user@storm.apache.org" <us...@storm.apache.org>
Subject: Re: Performance issues on idle trident topology

Hi Roshan,

The only unusual setting I'm using is topology.disruptor.batch.timeout.millis set to 20 ms instead of the default 1ms, following my previous performance issues.

Roee


On Wed, Aug 30, 2017 at 3:50 AM, Roshan Naik <ro...@hortonworks.com>> wrote:
Is it running with default settings ? If not, may want to share any non-default settings being used. Somebody (more familiar with this than myself) might be able to  chime in.
-roshan

From: Roee Shenberg <sh...@alooma.io>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Tuesday, August 29, 2017 at 9:03 AM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: Performance issues on idle trident topology


Hi,

I'm running Storm 1.0.2 + Trident and I have an issue with idle topologies consuming many CPU cycles (around four entire cores), after having processed some messages.

I'm not quite sure what the issue is, because it requires some activity on the topology to trigger said behavior.

My research on what's causing the issue has reached a dead end since it's not easy to replicate the matter (it happens reliably after a few days of idle time).

I managed to make the issue happen on a 1-machine cluster with only one topology running on it, and saw the following:

sudo strace -f -c -p 24996
strace: Process 24996 attached with 242 threads
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 77.96  228.512766         260    878277           epoll_wait
 17.78   52.106948       76628       680       180 futex
  3.73   10.924000     1365500         8         8 restart_syscall
  0.20    0.584934        2600       225           epoll_ctl
  0.20    0.574205        1704       337           read
  0.14    0.416201        1850       225           write
  0.00    0.000000           0         1           stat
  0.00    0.000000           0         1           mmap
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         1           rt_sigreturn
  0.00    0.000000           0         1           madvise
------ ----------- ----------- --------- --------- ----------------
100.00  293.119054                879757       188 total

That's an unreasonable amount of epoll_wait calls, and I noticed a large amount of ZooKeeper connections: nc -n | grep 2181 | wc -l returned 25 connections

The topology statistics were ~2400 tuples emitted and ~3000 tuples transferred in the 10 minute window. Basically idle.

Next, I produced a flamegraph including kernel times while tracing the process, which I couldn't attach as a fully-functional, clickable SVG so I took a screenshot of it:
[line image 1]

The flamegraph shows the entire time is spent in org.apache.storm.shade.org<http://org.apache.storm.shade.org>.apache.zookeeper.ClientCnxn$SendThread.run

Split 50/50 between sys_epoll_wait in the kernel and java code, mostly the java side of select, with a non negligible amount of time spent in org/apache/zookeeper/ClientCnxn$SendThread:::clientTunneledAuthenticationInProgress

This seems like the zookeeper element is malfunctioning somehow, but I'm not sure how.

Do you have any advice regarding knobs to tweak in storm or ZK config or additional diagnostics to run? (the relevant ZK config AFAIK is the tick interval, which is at the default 2 seconds)

Thanks,
Roee



Re: Performance issues on idle trident topology

Posted by Roee Shenberg <sh...@alooma.io>.
Ok, the problem was an interaction between a broken linux API (epoll, which
underlies Java selectors) and a broken async replacement for
java.lang.Process called NuProcess. Not related to Storm.

Roee

P.S. There's a good description of the general issue here:
https://idea.popcount.org/2017-03-20-epoll-is-fundamentally-broken-22/

On Sun, Oct 8, 2017 at 6:45 PM, Roee Shenberg <sh...@alooma.io> wrote:

> Not going away. This time around I did a per-thread cpu use breakdown and
> went back to jstack, and I verified all the offending threads are
> of the form Thread-N-spout/bolt-<NAME>-executor[33 33]-SendThread(<ZK
> server address>)
>
> It looks like zookeeper is not clearing a socket after select() (well,
> epoll_wait()) somehow and so keeps on selecting as fast as it can. Using
> argdist (a piece of ebpf magic, part of bcc), a nice histogram of calls
> parametrized by timeout parameter looks as follows:
> timeout             : count     distribution
>          0 -> 1          : 0        |
>   |
>          2 -> 3          : 0        |
>   |
>          4 -> 7          : 0        |
>   |
>          8 -> 15         : 0        |
>   |
>         16 -> 31         : 0        |
>   |
>         32 -> 63         : 0        |
>   |
>         64 -> 127        : 0        |
>   |
>        128 -> 255        : 4        |
>   |
>        256 -> 511        : 6        |
>   |
>        512 -> 1023       : 0        |
>   |
>       1024 -> 2047       : 145259   |*****
>    |
>       2048 -> 4095       : 1042417  |*****************************
> ***********|
>       4096 -> 8191       : 284079   |**********
>   |
>
> These are actual numbers— calls on a live system in the span of one
> second. 1.5m calls to epoll_wait(), none with low timeouts, so I see no
> other possible explanation for how epoll_wait() is called enough times per
> second except that it's somehow not cleared.
>
> I've enabled all log levels for zookeeper in an attempt to glean something
> interesting (I haven't), and I've started looking over the zookeeper client
> code (3.4.5, matching up to the version we get bundled with storm 1.0.2)
>
> Any and all suggestions regarding what next are welcome.
>
> Thanks,
> Roee
>
> On Sun, Sep 17, 2017 at 10:05 AM, Roee Shenberg <sh...@alooma.io>
> wrote:
>
>> Unfortunately, I haven't found a reliable way to replicate this behavior,
>> so I've moved on for now. Will update if the behavior recurs.
>>
>> On Mon, Sep 4, 2017 at 10:46 PM, Roshan Naik <ro...@hortonworks.com>
>> wrote:
>>
>>> Roee, Have you found a resolution ? If not please bring it up on the dev
>>> list.
>>>
>>> -roshan
>>>
>>>
>>>
>>> *From: *Roee Shenberg <sh...@alooma.io>
>>> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
>>> *Date: *Wednesday, August 30, 2017 at 5:43 AM
>>>
>>> *To: *"user@storm.apache.org" <us...@storm.apache.org>
>>> *Subject: *Re: Performance issues on idle trident topology
>>>
>>>
>>>
>>> Hi Roshan,
>>>
>>>
>>>
>>> The only unusual setting I'm using is topology.disruptor.batch.ti
>>> meout.millis set to 20 ms instead of the default 1ms, following my
>>> previous performance issues.
>>>
>>>
>>>
>>> Roee
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Aug 30, 2017 at 3:50 AM, Roshan Naik <ro...@hortonworks.com>
>>> wrote:
>>>
>>> Is it running with default settings ? If not, may want to share any
>>> non-default settings being used. Somebody (more familiar with this than
>>> myself) might be able to  chime in.
>>>
>>> -roshan
>>>
>>>
>>>
>>> *From: *Roee Shenberg <sh...@alooma.io>
>>> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
>>> *Date: *Tuesday, August 29, 2017 at 9:03 AM
>>> *To: *"user@storm.apache.org" <us...@storm.apache.org>
>>> *Subject: *Re: Performance issues on idle trident topology
>>>
>>>
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I'm running Storm 1.0.2 + Trident and I have an issue with idle
>>> topologies consuming many CPU cycles (around four entire cores), after
>>> having processed some messages.
>>>
>>> I'm not quite sure what the issue is, because it requires some activity
>>> on the topology to trigger said behavior.
>>>
>>>
>>>
>>> My research on what's causing the issue has reached a dead end since
>>> it's not easy to replicate the matter (it happens reliably after a few days
>>> of idle time).
>>>
>>>
>>>
>>> I managed to make the issue happen on a 1-machine cluster with only one
>>> topology running on it, and saw the following:
>>>
>>>
>>>
>>> sudo strace -f -c -p 24996
>>>
>>> strace: Process 24996 attached with 242 threads
>>>
>>> % time     seconds  usecs/call     calls    errors syscall
>>>
>>> ------ ----------- ----------- --------- --------- ----------------
>>>
>>>  77.96  228.512766         260    878277           epoll_wait
>>>
>>>  17.78   52.106948       76628       680       180 futex
>>>
>>>   3.73   10.924000     1365500         8         8 restart_syscall
>>>
>>>   0.20    0.584934        2600       225           epoll_ctl
>>>
>>>   0.20    0.574205        1704       337           read
>>>
>>>   0.14    0.416201        1850       225           write
>>>
>>>   0.00    0.000000           0         1           stat
>>>
>>>   0.00    0.000000           0         1           mmap
>>>
>>>   0.00    0.000000           0         1           rt_sigprocmask
>>>
>>>   0.00    0.000000           0         1           rt_sigreturn
>>>
>>>   0.00    0.000000           0         1           madvise
>>>
>>> ------ ----------- ----------- --------- --------- ----------------
>>>
>>> 100.00  293.119054                879757       188 total
>>>
>>>
>>>
>>> That's an unreasonable amount of epoll_wait calls, and I noticed a large
>>> amount of ZooKeeper connections: nc -n | grep 2181 | wc -l returned 25
>>> connections
>>>
>>>
>>>
>>> The topology statistics were ~2400 tuples emitted and ~3000 tuples
>>> transferred in the 10 minute window. Basically idle.
>>>
>>>
>>>
>>> Next, I produced a flamegraph including kernel times while tracing the
>>> process, which I couldn't attach as a fully-functional, clickable SVG so I
>>> took a screenshot of it:
>>>
>>> [image: line image 1]
>>>
>>>
>>> The flamegraph shows the entire time is spent in
>>> org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run
>>>
>>>
>>>
>>> Split 50/50 between sys_epoll_wait in the kernel and java code, mostly
>>> the java side of select, with a non negligible amount of time spent in
>>> org/apache/zookeeper/ClientCnxn$SendThread:::clientTunnel
>>> edAuthenticationInProgress
>>>
>>>
>>>
>>> This seems like the zookeeper element is malfunctioning somehow, but I'm
>>> not sure how.
>>>
>>>
>>>
>>> Do you have any advice regarding knobs to tweak in storm or ZK config or
>>> additional diagnostics to run? (the relevant ZK config AFAIK is the tick
>>> interval, which is at the default 2 seconds)
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Roee
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: Performance issues on idle trident topology

Posted by Roee Shenberg <sh...@alooma.io>.
Not going away. This time around I did a per-thread cpu use breakdown and
went back to jstack, and I verified all the offending threads are
of the form Thread-N-spout/bolt-<NAME>-executor[33 33]-SendThread(<ZK
server address>)

It looks like zookeeper is not clearing a socket after select() (well,
epoll_wait()) somehow and so keeps on selecting as fast as it can. Using
argdist (a piece of ebpf magic, part of bcc), a nice histogram of calls
parametrized by timeout parameter looks as follows:
timeout             : count     distribution
         0 -> 1          : 0        |
  |
         2 -> 3          : 0        |
  |
         4 -> 7          : 0        |
  |
         8 -> 15         : 0        |
  |
        16 -> 31         : 0        |
  |
        32 -> 63         : 0        |
  |
        64 -> 127        : 0        |
  |
       128 -> 255        : 4        |
  |
       256 -> 511        : 6        |
  |
       512 -> 1023       : 0        |
  |
      1024 -> 2047       : 145259   |*****
 |
      2048 -> 4095       : 1042417
|****************************************|
      4096 -> 8191       : 284079   |**********
  |

These are actual numbers— calls on a live system in the span of one second.
1.5m calls to epoll_wait(), none with low timeouts, so I see no other
possible explanation for how epoll_wait() is called enough times per second
except that it's somehow not cleared.

I've enabled all log levels for zookeeper in an attempt to glean something
interesting (I haven't), and I've started looking over the zookeeper client
code (3.4.5, matching up to the version we get bundled with storm 1.0.2)

Any and all suggestions regarding what next are welcome.

Thanks,
Roee

On Sun, Sep 17, 2017 at 10:05 AM, Roee Shenberg <sh...@alooma.io> wrote:

> Unfortunately, I haven't found a reliable way to replicate this behavior,
> so I've moved on for now. Will update if the behavior recurs.
>
> On Mon, Sep 4, 2017 at 10:46 PM, Roshan Naik <ro...@hortonworks.com>
> wrote:
>
>> Roee, Have you found a resolution ? If not please bring it up on the dev
>> list.
>>
>> -roshan
>>
>>
>>
>> *From: *Roee Shenberg <sh...@alooma.io>
>> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
>> *Date: *Wednesday, August 30, 2017 at 5:43 AM
>>
>> *To: *"user@storm.apache.org" <us...@storm.apache.org>
>> *Subject: *Re: Performance issues on idle trident topology
>>
>>
>>
>> Hi Roshan,
>>
>>
>>
>> The only unusual setting I'm using is topology.disruptor.batch.ti
>> meout.millis set to 20 ms instead of the default 1ms, following my
>> previous performance issues.
>>
>>
>>
>> Roee
>>
>>
>>
>>
>>
>> On Wed, Aug 30, 2017 at 3:50 AM, Roshan Naik <ro...@hortonworks.com>
>> wrote:
>>
>> Is it running with default settings ? If not, may want to share any
>> non-default settings being used. Somebody (more familiar with this than
>> myself) might be able to  chime in.
>>
>> -roshan
>>
>>
>>
>> *From: *Roee Shenberg <sh...@alooma.io>
>> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
>> *Date: *Tuesday, August 29, 2017 at 9:03 AM
>> *To: *"user@storm.apache.org" <us...@storm.apache.org>
>> *Subject: *Re: Performance issues on idle trident topology
>>
>>
>>
>>
>>
>> Hi,
>>
>>
>>
>> I'm running Storm 1.0.2 + Trident and I have an issue with idle
>> topologies consuming many CPU cycles (around four entire cores), after
>> having processed some messages.
>>
>> I'm not quite sure what the issue is, because it requires some activity
>> on the topology to trigger said behavior.
>>
>>
>>
>> My research on what's causing the issue has reached a dead end since it's
>> not easy to replicate the matter (it happens reliably after a few days of
>> idle time).
>>
>>
>>
>> I managed to make the issue happen on a 1-machine cluster with only one
>> topology running on it, and saw the following:
>>
>>
>>
>> sudo strace -f -c -p 24996
>>
>> strace: Process 24996 attached with 242 threads
>>
>> % time     seconds  usecs/call     calls    errors syscall
>>
>> ------ ----------- ----------- --------- --------- ----------------
>>
>>  77.96  228.512766         260    878277           epoll_wait
>>
>>  17.78   52.106948       76628       680       180 futex
>>
>>   3.73   10.924000     1365500         8         8 restart_syscall
>>
>>   0.20    0.584934        2600       225           epoll_ctl
>>
>>   0.20    0.574205        1704       337           read
>>
>>   0.14    0.416201        1850       225           write
>>
>>   0.00    0.000000           0         1           stat
>>
>>   0.00    0.000000           0         1           mmap
>>
>>   0.00    0.000000           0         1           rt_sigprocmask
>>
>>   0.00    0.000000           0         1           rt_sigreturn
>>
>>   0.00    0.000000           0         1           madvise
>>
>> ------ ----------- ----------- --------- --------- ----------------
>>
>> 100.00  293.119054                879757       188 total
>>
>>
>>
>> That's an unreasonable amount of epoll_wait calls, and I noticed a large
>> amount of ZooKeeper connections: nc -n | grep 2181 | wc -l returned 25
>> connections
>>
>>
>>
>> The topology statistics were ~2400 tuples emitted and ~3000 tuples
>> transferred in the 10 minute window. Basically idle.
>>
>>
>>
>> Next, I produced a flamegraph including kernel times while tracing the
>> process, which I couldn't attach as a fully-functional, clickable SVG so I
>> took a screenshot of it:
>>
>> [image: line image 1]
>>
>>
>> The flamegraph shows the entire time is spent in
>> org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run
>>
>>
>>
>> Split 50/50 between sys_epoll_wait in the kernel and java code, mostly
>> the java side of select, with a non negligible amount of time spent in
>> org/apache/zookeeper/ClientCnxn$SendThread:::clientTunnel
>> edAuthenticationInProgress
>>
>>
>>
>> This seems like the zookeeper element is malfunctioning somehow, but I'm
>> not sure how.
>>
>>
>>
>> Do you have any advice regarding knobs to tweak in storm or ZK config or
>> additional diagnostics to run? (the relevant ZK config AFAIK is the tick
>> interval, which is at the default 2 seconds)
>>
>>
>>
>> Thanks,
>>
>> Roee
>>
>>
>>
>>
>>
>
>

Re: Performance issues on idle trident topology

Posted by Roee Shenberg <sh...@alooma.io>.
Unfortunately, I haven't found a reliable way to replicate this behavior,
so I've moved on for now. Will update if the behavior recurs.

On Mon, Sep 4, 2017 at 10:46 PM, Roshan Naik <ro...@hortonworks.com> wrote:

> Roee, Have you found a resolution ? If not please bring it up on the dev
> list.
>
> -roshan
>
>
>
> *From: *Roee Shenberg <sh...@alooma.io>
> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Date: *Wednesday, August 30, 2017 at 5:43 AM
>
> *To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Subject: *Re: Performance issues on idle trident topology
>
>
>
> Hi Roshan,
>
>
>
> The only unusual setting I'm using is topology.disruptor.batch.
> timeout.millis set to 20 ms instead of the default 1ms, following my
> previous performance issues.
>
>
>
> Roee
>
>
>
>
>
> On Wed, Aug 30, 2017 at 3:50 AM, Roshan Naik <ro...@hortonworks.com>
> wrote:
>
> Is it running with default settings ? If not, may want to share any
> non-default settings being used. Somebody (more familiar with this than
> myself) might be able to  chime in.
>
> -roshan
>
>
>
> *From: *Roee Shenberg <sh...@alooma.io>
> *Reply-To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Date: *Tuesday, August 29, 2017 at 9:03 AM
> *To: *"user@storm.apache.org" <us...@storm.apache.org>
> *Subject: *Re: Performance issues on idle trident topology
>
>
>
>
>
> Hi,
>
>
>
> I'm running Storm 1.0.2 + Trident and I have an issue with idle topologies
> consuming many CPU cycles (around four entire cores), after having
> processed some messages.
>
> I'm not quite sure what the issue is, because it requires some activity on
> the topology to trigger said behavior.
>
>
>
> My research on what's causing the issue has reached a dead end since it's
> not easy to replicate the matter (it happens reliably after a few days of
> idle time).
>
>
>
> I managed to make the issue happen on a 1-machine cluster with only one
> topology running on it, and saw the following:
>
>
>
> sudo strace -f -c -p 24996
>
> strace: Process 24996 attached with 242 threads
>
> % time     seconds  usecs/call     calls    errors syscall
>
> ------ ----------- ----------- --------- --------- ----------------
>
>  77.96  228.512766         260    878277           epoll_wait
>
>  17.78   52.106948       76628       680       180 futex
>
>   3.73   10.924000     1365500         8         8 restart_syscall
>
>   0.20    0.584934        2600       225           epoll_ctl
>
>   0.20    0.574205        1704       337           read
>
>   0.14    0.416201        1850       225           write
>
>   0.00    0.000000           0         1           stat
>
>   0.00    0.000000           0         1           mmap
>
>   0.00    0.000000           0         1           rt_sigprocmask
>
>   0.00    0.000000           0         1           rt_sigreturn
>
>   0.00    0.000000           0         1           madvise
>
> ------ ----------- ----------- --------- --------- ----------------
>
> 100.00  293.119054                879757       188 total
>
>
>
> That's an unreasonable amount of epoll_wait calls, and I noticed a large
> amount of ZooKeeper connections: nc -n | grep 2181 | wc -l returned 25
> connections
>
>
>
> The topology statistics were ~2400 tuples emitted and ~3000 tuples
> transferred in the 10 minute window. Basically idle.
>
>
>
> Next, I produced a flamegraph including kernel times while tracing the
> process, which I couldn't attach as a fully-functional, clickable SVG so I
> took a screenshot of it:
>
> [image: line image 1]
>
>
> The flamegraph shows the entire time is spent in
> org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run
>
>
>
> Split 50/50 between sys_epoll_wait in the kernel and java code, mostly the
> java side of select, with a non negligible amount of time spent in
> org/apache/zookeeper/ClientCnxn$SendThread:::
> clientTunneledAuthenticationInProgress
>
>
>
> This seems like the zookeeper element is malfunctioning somehow, but I'm
> not sure how.
>
>
>
> Do you have any advice regarding knobs to tweak in storm or ZK config or
> additional diagnostics to run? (the relevant ZK config AFAIK is the tick
> interval, which is at the default 2 seconds)
>
>
>
> Thanks,
>
> Roee
>
>
>
>
>