You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Akihiro Suda (JIRA)" <ji...@apache.org> on 2015/04/13 06:34:12 UTC

[jira] [Created] (ZOOKEEPER-2162) infinite exception loop occurs when dataDir is lost

Akihiro Suda created ZOOKEEPER-2162:
---------------------------------------

             Summary: infinite exception loop occurs when dataDir is lost
                 Key: ZOOKEEPER-2162
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2162
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.5.0
            Reporter: Akihiro Suda


This sequence leads server.1 and server.2 to infinite exception loop.

 * Start server.1 and server.2 with the initial ensemble server.1=participant, server.2=observer.
   In this time, acceptedEpoch\[i\] == currentEpoch\[i\] == 1 for i = 1, 2.
 * Invoke reconfig so that acceptedEpoch\[i\] and currentEpoch\[i\] grows up to 2.
 * Kill server.2
 * Remove dataDir of server.2 excluding the myid file.
   (In real production environments, both of confDir and dataDir can be lost due to reprovisioning)
 * Start server.2
 * server.1 and server.2 enters infinite exception loop.
   The log (threshold is set to INFO in log4j.properties) size can reach > 100MB in 30 seconds.

AFAIK, the bug can be reproduced with ZooKeeper@f5fb50ed2591ba9a24685a227bb5374759516828 (Apr 7, 2015).

I made a Docker container so that people who are interested can reproduce the bug easily. (Sorry for no JUnit tests right now)
{noformat}
$ docker run -i -t --rm akihirosuda/zookeeper-bug01
Reproducing the bug: infinite exception loop occurs when dataDir is lost
* Resetting
* Starting [1,2] with the initial ensemble [1]
* Sleeping for 3 seconds
* Invoking Reconfig [1]->[2]
* Sleeping for 3 seconds
* Killing server.2 (pid=10542)
* Sleeping for 3 seconds
* Resetting /zk02_data
* Starting server.2
* Sleeping for 30 seconds
/zk01_log: 81665114 bytes
The log dir is extremely large. Perhaps the bug was REPRODUCED!
/zk02_log: 23949367 bytes
The log dir is extremely large. Perhaps the bug was REPRODUCED!
* Exiting
{noformat}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)