You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geode.apache.org by Bruce Schuchardt <bs...@pivotal.io> on 2016/10/26 23:50:51 UTC

CI failures due to JMX client heartbeat thread

We have 23 CI failure tickets concerning suspect strings from "JMX 
client heartbeat" threads.  I think we should add this to 
ExpectedStrings.java and close these tickets.  The suspect strings are 
coming from javax.management.remote.rmi.RMIConnector and I don't think 
there's anything we can do about them.  Most, if not all, are assigned 
to security or JMX components.


    GEODE-1858
    GEODE-1955
    GEODE-1492
    GEODE-1539
    GEODE-1538
    GEODE-1773
    GEODE-1922
    GEODE-1482
    GEODE-1475
    GEODE-1480
    GEODE-1481
    GEODE-1826
    GEODE-1825
    GEODE-1820
    GEODE-1878
    GEODE-1879
    GEODE-1877
    GEODE-1875
    GEODE-1876
    GEODE-1499
    GEODE-1476
    GEODE-1869
    GEODE-1769


Re: CI failures due to JMX client heartbeat thread

Posted by Kirk Lund <kl...@pivotal.io>.
Just caught this message. I'll fix ConnectToLocatorSSLDUnitTest and look
for other tests that aren't tearing down gfsh properly.

-Kirk


On Thu, Oct 27, 2016 at 12:49 PM, Jinmei Liao <ji...@pivotal.io> wrote:

> Thanks Bruce. Looking at the ConnectToLocatorSSLDUnitTest, we did have a
> gfsh there that's not properly closed. I will work with Kirk to close this.
> Thanks!
>
> On Thu, Oct 27, 2016 at 11:46 AM, Bruce Schuchardt <bschuchardt@pivotal.io
> >
> wrote:
>
> > I think the code is in Gfsh.notifyDisconnect().  I think the test that's
> > polluting others is ConnectToLocatorSSLDUnitTest.  I had a failure with
> > this suspect string in a test that ran just after it. I added a thread
> dump
> > in the @After of that test and see that it's leaving one of these threads
> > behind along with some Gfsh Launcher threads.
> >
> >
> > Le 10/27/2016 à 10:54 AM, Kirk Lund a écrit :
> >
> >> Bruce, can you please point me at the JDK code you think is generating
> the
> >> fatal message? I looked in RMIConnector.java and can't find it in there.
> >>
> >> Maybe we can fix ordering of tearDown or something else in order to
> "fix"
> >> this message.
> >>
> >> Thanks,
> >> Kirk
> >>
> >>
> >> On Wed, Oct 26, 2016 at 5:00 PM, Dan Smith <ds...@pivotal.io> wrote:
> >>
> >> Just looking at a couple of these bugs, these are fatal level errors. Do
> >>> you know what is causing them or what affect this might have?
> >>>
> >>> [fatal 2016/09/29 12:12:03.891 PDT <JMX client heartbeat 3> tid=0x18d]
> >>> (tid=397 msgId=39) No longer connected to cc6-co6.gemstone.com[27162].
> >>>
> >>> -Dan
> >>>
> >>> On Wed, Oct 26, 2016 at 4:50 PM, Bruce Schuchardt <
> >>> bschuchardt@pivotal.io>
> >>> wrote:
> >>>
> >>> We have 23 CI failure tickets concerning suspect strings from "JMX
> client
> >>>> heartbeat" threads.  I think we should add this to
> ExpectedStrings.java
> >>>>
> >>> and
> >>>
> >>>> close these tickets.  The suspect strings are coming from
> >>>> javax.management.remote.rmi.RMIConnector and I don't think there's
> >>>> anything we can do about them.  Most, if not all, are assigned to
> >>>>
> >>> security
> >>>
> >>>> or JMX components.
> >>>>
> >>>>
> >>>>     GEODE-1858
> >>>>     GEODE-1955
> >>>>     GEODE-1492
> >>>>     GEODE-1539
> >>>>     GEODE-1538
> >>>>     GEODE-1773
> >>>>     GEODE-1922
> >>>>     GEODE-1482
> >>>>     GEODE-1475
> >>>>     GEODE-1480
> >>>>     GEODE-1481
> >>>>     GEODE-1826
> >>>>     GEODE-1825
> >>>>     GEODE-1820
> >>>>     GEODE-1878
> >>>>     GEODE-1879
> >>>>     GEODE-1877
> >>>>     GEODE-1875
> >>>>     GEODE-1876
> >>>>     GEODE-1499
> >>>>     GEODE-1476
> >>>>     GEODE-1869
> >>>>     GEODE-1769
> >>>>
> >>>>
> >>>>
> >
>
>
> --
> Cheers
>
> Jinmei

Re: CI failures due to JMX client heartbeat thread

Posted by Jinmei Liao <ji...@pivotal.io>.
Thanks Bruce. Looking at the ConnectToLocatorSSLDUnitTest, we did have a
gfsh there that's not properly closed. I will work with Kirk to close this.
Thanks!

On Thu, Oct 27, 2016 at 11:46 AM, Bruce Schuchardt <bs...@pivotal.io>
wrote:

> I think the code is in Gfsh.notifyDisconnect().  I think the test that's
> polluting others is ConnectToLocatorSSLDUnitTest.  I had a failure with
> this suspect string in a test that ran just after it. I added a thread dump
> in the @After of that test and see that it's leaving one of these threads
> behind along with some Gfsh Launcher threads.
>
>
> Le 10/27/2016 à 10:54 AM, Kirk Lund a écrit :
>
>> Bruce, can you please point me at the JDK code you think is generating the
>> fatal message? I looked in RMIConnector.java and can't find it in there.
>>
>> Maybe we can fix ordering of tearDown or something else in order to "fix"
>> this message.
>>
>> Thanks,
>> Kirk
>>
>>
>> On Wed, Oct 26, 2016 at 5:00 PM, Dan Smith <ds...@pivotal.io> wrote:
>>
>> Just looking at a couple of these bugs, these are fatal level errors. Do
>>> you know what is causing them or what affect this might have?
>>>
>>> [fatal 2016/09/29 12:12:03.891 PDT <JMX client heartbeat 3> tid=0x18d]
>>> (tid=397 msgId=39) No longer connected to cc6-co6.gemstone.com[27162].
>>>
>>> -Dan
>>>
>>> On Wed, Oct 26, 2016 at 4:50 PM, Bruce Schuchardt <
>>> bschuchardt@pivotal.io>
>>> wrote:
>>>
>>> We have 23 CI failure tickets concerning suspect strings from "JMX client
>>>> heartbeat" threads.  I think we should add this to ExpectedStrings.java
>>>>
>>> and
>>>
>>>> close these tickets.  The suspect strings are coming from
>>>> javax.management.remote.rmi.RMIConnector and I don't think there's
>>>> anything we can do about them.  Most, if not all, are assigned to
>>>>
>>> security
>>>
>>>> or JMX components.
>>>>
>>>>
>>>>     GEODE-1858
>>>>     GEODE-1955
>>>>     GEODE-1492
>>>>     GEODE-1539
>>>>     GEODE-1538
>>>>     GEODE-1773
>>>>     GEODE-1922
>>>>     GEODE-1482
>>>>     GEODE-1475
>>>>     GEODE-1480
>>>>     GEODE-1481
>>>>     GEODE-1826
>>>>     GEODE-1825
>>>>     GEODE-1820
>>>>     GEODE-1878
>>>>     GEODE-1879
>>>>     GEODE-1877
>>>>     GEODE-1875
>>>>     GEODE-1876
>>>>     GEODE-1499
>>>>     GEODE-1476
>>>>     GEODE-1869
>>>>     GEODE-1769
>>>>
>>>>
>>>>
>


-- 
Cheers

Jinmei

Re: CI failures due to JMX client heartbeat thread

Posted by Kirk Lund <kl...@apache.org>.
The "JMX client heartbeat n" threads are created
by com.sun.jmx.remote.internal.ClientCommunicatorAdmin.

GFSH registers JMXConnectionListener which is a non-inner class hiding
inside ./geode-core/src/main/java/org/apache/geode/management/internal/cli/shell/JmxOperationInvoker.java
-- we should extract all such classes to their own java file or make then
static inner classes. JMXConnectionListener
implements javax.management.NotificationListener and is registered
with JMXConnector.

"gfshFileLogger.severe" is the line in notifyDisconnect that is causing
these fatal level log messages.

It's not clear to me what would cause GFSH to keep the JDK's "JMX client
heartbeat n" threads around (I haven't actually seen this yet). Maybe the
way JmxOperationInvoker is implemented is incorrect and leaks or keeps some
JMX resources around. GFSH does keep a reference to JmxOperationInvoker and
I don't see that cleared anywhere (such as when it might disconnect from a
JMX server).

It seems reasonable to lower the log level for the log messages (I hate to
suppress it entirely though). Beyond that we should file a bug for the
leaking of "JMX client heartbeat n" threads. I suspect leaking these
threads would mostly occur in tests that are connecting/disconnecting
a GFSH instance repeatedly. The typical user of GFSH isn't going to
repeatedly connect/disconnect like this.

Maybe the tests that are hitting this are closing the manager 1st and then
GFSH's JMXConnectionListener detects this but then never frees the
JMXConnectionListener or JmxOperationInvoker.

-Kirk


On Thu, Oct 27, 2016 at 11:46 AM, Bruce Schuchardt <bs...@pivotal.io>
wrote:

> I think the code is in Gfsh.notifyDisconnect().  I think the test that's
> polluting others is ConnectToLocatorSSLDUnitTest.  I had a failure with
> this suspect string in a test that ran just after it. I added a thread dump
> in the @After of that test and see that it's leaving one of these threads
> behind along with some Gfsh Launcher threads.
>
>
> Le 10/27/2016 à 10:54 AM, Kirk Lund a écrit :
>
>> Bruce, can you please point me at the JDK code you think is generating the
>> fatal message? I looked in RMIConnector.java and can't find it in there.
>>
>> Maybe we can fix ordering of tearDown or something else in order to "fix"
>> this message.
>>
>> Thanks,
>> Kirk
>>
>>
>> On Wed, Oct 26, 2016 at 5:00 PM, Dan Smith <ds...@pivotal.io> wrote:
>>
>> Just looking at a couple of these bugs, these are fatal level errors. Do
>>> you know what is causing them or what affect this might have?
>>>
>>> [fatal 2016/09/29 12:12:03.891 PDT <JMX client heartbeat 3> tid=0x18d]
>>> (tid=397 msgId=39) No longer connected to cc6-co6.gemstone.com[27162].
>>>
>>> -Dan
>>>
>>> On Wed, Oct 26, 2016 at 4:50 PM, Bruce Schuchardt <
>>> bschuchardt@pivotal.io>
>>> wrote:
>>>
>>> We have 23 CI failure tickets concerning suspect strings from "JMX client
>>>> heartbeat" threads.  I think we should add this to ExpectedStrings.java
>>>>
>>> and
>>>
>>>> close these tickets.  The suspect strings are coming from
>>>> javax.management.remote.rmi.RMIConnector and I don't think there's
>>>> anything we can do about them.  Most, if not all, are assigned to
>>>>
>>> security
>>>
>>>> or JMX components.
>>>>
>>>>
>>>>     GEODE-1858
>>>>     GEODE-1955
>>>>     GEODE-1492
>>>>     GEODE-1539
>>>>     GEODE-1538
>>>>     GEODE-1773
>>>>     GEODE-1922
>>>>     GEODE-1482
>>>>     GEODE-1475
>>>>     GEODE-1480
>>>>     GEODE-1481
>>>>     GEODE-1826
>>>>     GEODE-1825
>>>>     GEODE-1820
>>>>     GEODE-1878
>>>>     GEODE-1879
>>>>     GEODE-1877
>>>>     GEODE-1875
>>>>     GEODE-1876
>>>>     GEODE-1499
>>>>     GEODE-1476
>>>>     GEODE-1869
>>>>     GEODE-1769
>>>>
>>>>
>>>>
>

Re: CI failures due to JMX client heartbeat thread

Posted by Bruce Schuchardt <bs...@pivotal.io>.
I think the code is in Gfsh.notifyDisconnect().  I think the test that's 
polluting others is ConnectToLocatorSSLDUnitTest.  I had a failure with 
this suspect string in a test that ran just after it. I added a thread 
dump in the @After of that test and see that it's leaving one of these 
threads behind along with some Gfsh Launcher threads.

Le 10/27/2016 � 10:54 AM, Kirk Lund a �crit :
> Bruce, can you please point me at the JDK code you think is generating the
> fatal message? I looked in RMIConnector.java and can't find it in there.
>
> Maybe we can fix ordering of tearDown or something else in order to "fix"
> this message.
>
> Thanks,
> Kirk
>
>
> On Wed, Oct 26, 2016 at 5:00 PM, Dan Smith <ds...@pivotal.io> wrote:
>
>> Just looking at a couple of these bugs, these are fatal level errors. Do
>> you know what is causing them or what affect this might have?
>>
>> [fatal 2016/09/29 12:12:03.891 PDT <JMX client heartbeat 3> tid=0x18d]
>> (tid=397 msgId=39) No longer connected to cc6-co6.gemstone.com[27162].
>>
>> -Dan
>>
>> On Wed, Oct 26, 2016 at 4:50 PM, Bruce Schuchardt <bs...@pivotal.io>
>> wrote:
>>
>>> We have 23 CI failure tickets concerning suspect strings from "JMX client
>>> heartbeat" threads.  I think we should add this to ExpectedStrings.java
>> and
>>> close these tickets.  The suspect strings are coming from
>>> javax.management.remote.rmi.RMIConnector and I don't think there's
>>> anything we can do about them.  Most, if not all, are assigned to
>> security
>>> or JMX components.
>>>
>>>
>>>     GEODE-1858
>>>     GEODE-1955
>>>     GEODE-1492
>>>     GEODE-1539
>>>     GEODE-1538
>>>     GEODE-1773
>>>     GEODE-1922
>>>     GEODE-1482
>>>     GEODE-1475
>>>     GEODE-1480
>>>     GEODE-1481
>>>     GEODE-1826
>>>     GEODE-1825
>>>     GEODE-1820
>>>     GEODE-1878
>>>     GEODE-1879
>>>     GEODE-1877
>>>     GEODE-1875
>>>     GEODE-1876
>>>     GEODE-1499
>>>     GEODE-1476
>>>     GEODE-1869
>>>     GEODE-1769
>>>
>>>


Re: CI failures due to JMX client heartbeat thread

Posted by Kirk Lund <kl...@apache.org>.
Bruce, can you please point me at the JDK code you think is generating the
fatal message? I looked in RMIConnector.java and can't find it in there.

Maybe we can fix ordering of tearDown or something else in order to "fix"
this message.

Thanks,
Kirk


On Wed, Oct 26, 2016 at 5:00 PM, Dan Smith <ds...@pivotal.io> wrote:

> Just looking at a couple of these bugs, these are fatal level errors. Do
> you know what is causing them or what affect this might have?
>
> [fatal 2016/09/29 12:12:03.891 PDT <JMX client heartbeat 3> tid=0x18d]
> (tid=397 msgId=39) No longer connected to cc6-co6.gemstone.com[27162].
>
> -Dan
>
> On Wed, Oct 26, 2016 at 4:50 PM, Bruce Schuchardt <bs...@pivotal.io>
> wrote:
>
> > We have 23 CI failure tickets concerning suspect strings from "JMX client
> > heartbeat" threads.  I think we should add this to ExpectedStrings.java
> and
> > close these tickets.  The suspect strings are coming from
> > javax.management.remote.rmi.RMIConnector and I don't think there's
> > anything we can do about them.  Most, if not all, are assigned to
> security
> > or JMX components.
> >
> >
> >    GEODE-1858
> >    GEODE-1955
> >    GEODE-1492
> >    GEODE-1539
> >    GEODE-1538
> >    GEODE-1773
> >    GEODE-1922
> >    GEODE-1482
> >    GEODE-1475
> >    GEODE-1480
> >    GEODE-1481
> >    GEODE-1826
> >    GEODE-1825
> >    GEODE-1820
> >    GEODE-1878
> >    GEODE-1879
> >    GEODE-1877
> >    GEODE-1875
> >    GEODE-1876
> >    GEODE-1499
> >    GEODE-1476
> >    GEODE-1869
> >    GEODE-1769
> >
> >
>

Re: CI failures due to JMX client heartbeat thread

Posted by Bruce Schuchardt <bs...@pivotal.io>.
I spent some time digging into this more and it looks like these threads 
are probably left behind by tests using Gfsh.  It creates a JMX 
heartbeat thread that issues "severe" (i.e. "fatal") warnings with the 
message we're seeing

>       public void notifyDisconnect(String endPoints) {
>         String message =
>     CliStrings.format(CliStrings.GFSH__MSG__NO_LONGER_CONNECTED_TO_0,
>     new Object[] {endPoints});
>         printAsSevere(LINE_SEPARATOR + message);
>         if (gfshFileLogger.severeEnabled()) {
>           gfshFileLogger.severe(message);
>         }
>         setPromptPath(
>     org.apache.geode.management.internal.cli.converters.RegionPathConverter.DEFAULT_APP_CONTEXT_PATH);
>       }

In a recent test run I can see that ConnectToLocatorSSLDUnitTest ran 
just before a test that logged this fatal-level message.  Adding an 
@After to ConnectToLocatorSSLDUnitTest that dumps threads I can see that 
it is leaving behind a "JMX client heartbeat 4" thread as well as 
several "Gfsh Launcher" threads.

We should fix these tests to properly terminate Gfsh threads during cleanup.


Le 10/26/2016 � 5:00 PM, Dan Smith a �crit :
> Just looking at a couple of these bugs, these are fatal level errors. Do
> you know what is causing them or what affect this might have?
>
> [fatal 2016/09/29 12:12:03.891 PDT <JMX client heartbeat 3> tid=0x18d]
> (tid=397 msgId=39) No longer connected to cc6-co6.gemstone.com[27162].
>
> -Dan
>
> On Wed, Oct 26, 2016 at 4:50 PM, Bruce Schuchardt <bs...@pivotal.io>
> wrote:
>
>> We have 23 CI failure tickets concerning suspect strings from "JMX client
>> heartbeat" threads.  I think we should add this to ExpectedStrings.java and
>> close these tickets.  The suspect strings are coming from
>> javax.management.remote.rmi.RMIConnector and I don't think there's
>> anything we can do about them.  Most, if not all, are assigned to security
>> or JMX components.
>>
>>
>>     GEODE-1858
>>     GEODE-1955
>>     GEODE-1492
>>     GEODE-1539
>>     GEODE-1538
>>     GEODE-1773
>>     GEODE-1922
>>     GEODE-1482
>>     GEODE-1475
>>     GEODE-1480
>>     GEODE-1481
>>     GEODE-1826
>>     GEODE-1825
>>     GEODE-1820
>>     GEODE-1878
>>     GEODE-1879
>>     GEODE-1877
>>     GEODE-1875
>>     GEODE-1876
>>     GEODE-1499
>>     GEODE-1476
>>     GEODE-1869
>>     GEODE-1769
>>
>>


Re: CI failures due to JMX client heartbeat thread

Posted by Dan Smith <ds...@pivotal.io>.
Just looking at a couple of these bugs, these are fatal level errors. Do
you know what is causing them or what affect this might have?

[fatal 2016/09/29 12:12:03.891 PDT <JMX client heartbeat 3> tid=0x18d]
(tid=397 msgId=39) No longer connected to cc6-co6.gemstone.com[27162].

-Dan

On Wed, Oct 26, 2016 at 4:50 PM, Bruce Schuchardt <bs...@pivotal.io>
wrote:

> We have 23 CI failure tickets concerning suspect strings from "JMX client
> heartbeat" threads.  I think we should add this to ExpectedStrings.java and
> close these tickets.  The suspect strings are coming from
> javax.management.remote.rmi.RMIConnector and I don't think there's
> anything we can do about them.  Most, if not all, are assigned to security
> or JMX components.
>
>
>    GEODE-1858
>    GEODE-1955
>    GEODE-1492
>    GEODE-1539
>    GEODE-1538
>    GEODE-1773
>    GEODE-1922
>    GEODE-1482
>    GEODE-1475
>    GEODE-1480
>    GEODE-1481
>    GEODE-1826
>    GEODE-1825
>    GEODE-1820
>    GEODE-1878
>    GEODE-1879
>    GEODE-1877
>    GEODE-1875
>    GEODE-1876
>    GEODE-1499
>    GEODE-1476
>    GEODE-1869
>    GEODE-1769
>
>