You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/08/29 00:11:53 UTC

[jira] [Commented] (MESOS-670) GroupTest.GroupJoinWithDisconnect fails on master.

    [ https://issues.apache.org/jira/browse/MESOS-670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752939#comment-13752939 ] 

Benjamin Mahler commented on MESOS-670:
---------------------------------------

Confirmed this is the commit:

{noformat}
commit eb1cd4a7c0ad4310f090d4f0643cf4059ac5246b
Author: Benjamin Mahler <bm...@twitter.com>
Date:   Mon Aug 26 18:26:07 2013 -0700

    Upgraded ZooKeeper from 3.3.4 to 3.3.6.

    From: Vinson Lee <vl...@twitter.com>
    Review: https://reviews.apache.org/r/13598
{noformat}

It appears that a 'make clean' is required to correctly pick up the ZK change and cause the test failures, which is why I didn't catch this when running make check  prior to committing. Likely the same reason Vinson didn't notice.

I've looked through the ZK code, and it appears to be broken in 3.3.6 if one makes the following sequence of calls:

ZooKeeperServer.startup() -> ZooKeeperServer.shutdown() -> ZooKeeperServer.startup()

In 3.3.6:
{code}
372    public void startup() {        
373        if (sessionTracker == null) {
374            createSessionTracker(); // Creates a new session tracker.
375        }
376        startSessionTracker(); // Calls Thread.start() on the session tracker, unconditionally! This throws a java.lang.IllegalThreadStateException.
377        setupRequestProcessors();
378
379        registerJMX();
380
381        synchronized (this) {
382            running = true;
383            notifyAll();
384        }
385    }
{code}

In 3.3.4:
{code}
370    public void startup() {        
371        createSessionTracker(); // Creates a new session tracker and starts it.
372        setupRequestProcessors();
373
374        registerJMX();
375
376        synchronized (this) {
377            running = true;
378            notifyAll();
379        }
380    }
{code}

It's difficult to tell from the documentation whether we're using the API correctly or whether this was an accidental bug when they pulled it into the 3.3.x branch.

This is the ZK commit:
{noformat}
➜  zookeeper-3.3.6  svn log --revision 1239983
------------------------------------------------------------------------
r1239983 | mahadev | 2012-02-02 18:41:08 -0800 (Thu, 02 Feb 2012) | 1 line

ZOOKEEPER-1367. Data inconsistencies and unexpired ephemeral nodes after cluster restart. (Benjamin Reed via mahadev)
------------------------------------------------------------------------
{noformat}

See ZOOKEEPER-1367.
                
> GroupTest.GroupJoinWithDisconnect fails on master.
> --------------------------------------------------
>
>                 Key: MESOS-670
>                 URL: https://issues.apache.org/jira/browse/MESOS-670
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>            Reporter: Benjamin Mahler
>            Assignee: Benjamin Mahler
>
> [ RUN      ] GroupTest.GroupJoinWithDisconnect
> 2013-08-28 14:15:21,348:40067(0x11c447000):ZOO_ERROR@handle_socket_error_msg@1579: Socket [127.0.0.1:64547] zk retcode=-4, errno=61(Connection refused): server refused to accept the client
> Exception in thread "AWT-AppKit" java.lang.IllegalThreadStateException
> 	at java.lang.Thread.start(Thread.java:656)
> 	at org.apache.zookeeper.server.ZooKeeperServer.startSessionTracker(ZooKeeperServer.java:402)
> 	at org.apache.zookeeper.server.ZooKeeperServer.startup(ZooKeeperServer.java:376)
> 	at org.apache.zookeeper.server.NIOServerCnxn$Factory.startup(NIOServerCnxn.java:161)
> Caught a JVM exception, not propagating
> I committed this patch from Vinson Lee:
> https://reviews.apache.org/r/13598/
> It appears this has possibly affected the ZK tests.
> There appears to be a code change between 3.3.4 and 3.3.6 relevant to this issue:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.zookeeper/zookeeper/3.3.4/org/apache/zookeeper/server/ZooKeeperServer.java#370
> vs
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.zookeeper/zookeeper/3.3.6/org/apache/zookeeper/server/ZooKeeperServer.java#372
> I'll dig a little further, hopefully I can avoid needing to revert this commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira