You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "suja s (JIRA)" <ji...@apache.org> on 2012/06/28 12:25:42 UTC

[jira] [Created] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

suja s created ZOOKEEPER-1496:
---------------------------------

             Summary: Ephemeral node not getting cleared even after client has exited
                 Key: ZOOKEEPER-1496
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
             Project: ZooKeeper
          Issue Type: Bug
            Reporter: suja s
         Attachments: Logs.rar

In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.

Zk version: 3.4.3

Ephemeral node still exists in Zookeeper: 

HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 

Tue Jun 26 16:07:04 IST 2012 
HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
Connecting to xx.xx.xx.55:2182 
Welcome to ZooKeeper! 
JLine support is enabled 
[zk: xx.xx.xx.55:2182(CONNECTING) 0] 
WATCHER:: 

WatchedEvent state:SyncConnected type:None path:null 

[zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 

haclusternn2HOST-10-18-40-102 �� 
cZxid = 0x200000075 
ctime = Tue Jun 26 13:10:19 IST 2012 
mZxid = 0x200000075 
mtime = Tue Jun 26 13:10:19 IST 2012 
pZxid = 0x200000075 
cversion = 0 
dataVersion = 0 
aclVersion = 0 
ephemeralOwner = 0x1382791d4e50004 
dataLength = 42 
numChildren = 0 
[zk: xx.xx.xx.55:2182(CONNECTED) 1] 

Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 

HOST-10-18-40-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/10.18.40.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 

HOST-10-18-40-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/10.18.40.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 


 Close session and create requests coming almost parallely. 


Env:
Hadoop setup.
We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
NN102 was active and NN55 was standby. 
FailoverController at 102 got shut down due to ZK connection error. 
The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403872#comment-13403872 ] 

Hadoop QA commented on ZOOKEEPER-1496:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12533973/ZOOKEEPER-1496.1.patch
  against trunk revision 1353683.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1113//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1113//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1113//console

This message is automatically generated.
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403003#comment-13403003 ] 

Rakesh R commented on ZOOKEEPER-1496:
-------------------------------------

When I go through the SessionTrackerImpl, removing the session just before expiring as follows:
{code}
if (set != null) {
   for (SessionImpl s : set.sessions) {
      sessionsById.remove(s.sessionId);
      expirer.expire(s);
   }
}
{code}

Instead of sessionsById.remove(s.sessionId); will mark the session as closing s.isClosing = true
How does it sound?

-Rakesh
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>         Attachments: Logs.rar
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403765#comment-13403765 ] 

Hadoop QA commented on ZOOKEEPER-1496:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12533949/ZOOKEEPER-1496.patch
  against trunk revision 1353683.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1112//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1112//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1112//console

This message is automatically generated.
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>         Attachments: Logs.rar, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456852#comment-13456852 ] 

Rakesh R commented on ZOOKEEPER-1496:
-------------------------------------

Thanks a lot Mahadev for the review.

Yeah its true, attached latest patch addressing the comments. Also added one more test case for removeSession logic, as I feel its impacted.

Could you please look at the latest one.
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: suja s
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.3.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Camille Fournier (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403753#comment-13403753 ] 

Camille Fournier commented on ZOOKEEPER-1496:
---------------------------------------------

Rakesh, when you have a patch ready please hit the "submit patch" option on the JIRA, which will trigger the patch to be picked up by the automated build and run.

I am on vacation and don't have time to look at this right now but will try to take a closer look in the next few days if no one else can.
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>         Attachments: Logs.rar, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Assigned] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh R reassigned ZOOKEEPER-1496:
-----------------------------------

    Assignee: Rakesh R
    
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>            Assignee: Rakesh R
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh R updated ZOOKEEPER-1496:
--------------------------------

    Attachment: ZOOKEEPER-1496.patch

Hi All,

Attached the proposed fix based on my analysis, please help me to validate the proposal. 

I'm trying to reproduce in my test case.
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>         Attachments: Logs.rar, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402997#comment-13402997 ] 

Rakesh R commented on ZOOKEEPER-1496:
-------------------------------------

I've gone through the issue and attaching the analysis. Apologies in advance for the length of this comment - I'm trying to cover all the details.

*Environment details:*
ZK-91 -> Leader
ZK-55 -> Follower
ZK-102 -> Follower
ZKClient session timeout -> 5seconds
ZKClient side logic:- What I observed is, zclient is not waiting for any SyncConnected event before creating ephemeral znode 'ActiveStandbyElectorLock'. Also, on every connectionloss it is creating the ephemeral znode 'ActiveStandbyElectorLock'.


*Root Cause:* I'm suspecting there is a race condition between session expiry and FinalRequestProcessor. I could see, the expiry logic is removing the SessionImpl and is raising the 'closesession' request, other side FinalRequestProcessor is creating new SessionImpl into the SessionTracker for the 'createsession' request.

This case will come only if FinalRequestProcessor has some delay in processing the 'createsession' and in between the same session is getting expired.
----

I've done the analysis in perspective of Leader ZooKeeper server(ZK-91) and is as follows:

*Stage-1)* Client has sent connection request and the transaction id is '0x200000070'
Following log shows, the server has received connection request and after successful proposal, send committing request to the Follower. But the Leader side the transaction is in progress, as the request not reaches the FinalRequestProcessor.
{noformat}
2012-06-26 13:10:12,566 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:createSession cxid:0x0 zxid:0x200000070 txntype:-10 reqpath:n/a

2012-06-26 13:10:13,001 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:createSession cxid:0x0 zxid:0x200000070 txntype:-10 reqpath:n/a

2012-06-26 13:10:13,202 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:createSession cxid:0x0 zxid:0x200000070 txntype:-10 reqpath:n/a
{noformat}
----

*Stage-2)* Following log shows the Leader has'nt seen any pings from 0x1382791d4e50004 and expiring the session by creating 'closesession' request.
 If we see the SessionTrackerImpl, it is removing the SessionImpl corresponding to the 0x1382791d4e50004 before raising the expiry request. 
{code}
     sessionsById.remove(s.sessionId);
     expirer.expire(s);
{code}
At the same time, zkclient has sent 'create' command to the server - 0x200000072 transactionid, now the sessionTracker.checkSession(sessionId, owner) will not have the SessionImpl and get null then throwing the session expiry exception.
{code}
        SessionImpl session = sessionsById.get(sessionId);
        if (session == null || session.isClosing()) {
            throw new KeeperException.SessionExpiredException();
        }
{code}
{noformat}
2012-06-26 13:10:18,653 [myid:3] - INFO  [SessionTracker:ZooKeeperServer@325] - Expiring session 0x1382791d4e50004, timeout of 5000ms exceeded
2012-06-26 13:10:18,803 [myid:3] - INFO  [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x1382791d4e50004 type:create cxid:0x1 zxid:0x200000072 txntype:-1 reqpath:/hadoop-ha/hacluster/ActiveStandbyElectorLock Error Path:null Error:KeeperErrorCode = Session expired
{noformat}
----

*Stage-3)* Following message shows, 'closesession' 0x200000074 has got enough Ack and committing. But it didn't reaches the FinalRequestProcessor, so the session info will exists in the sessionTracker except the 'SessionImpl' object(as session is expired). 
{noformat}
2012-06-26 13:10:18,834 [myid:3] - INFO  [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x1382791d4e50004
2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
{noformat}
----

*Stage-4)* Following log shows, 'createSession' 0x200000070 has reached FinalRequestProcessor and same time received a 'create' request from the client.
Now, the FinalRequestProcessor is processing the transaction and adding the SessionImpl back to the SessionTrackerImpl. The 'create' request is seeing this new SessionImpl and will continue successfully to next processor.
{code}
FinalRequestProcessor.java
rc = zks.processTxn(hdr, txn);       
{code}

ZooKeeperServer.java
{code}
    public ProcessTxnResult processTxn(TxnHeader hdr, Record txn) {
        ProcessTxnResult rc;
        int opCode = hdr.getType();
        long sessionId = hdr.getClientId();
        rc = getZKDatabase().processTxn(hdr, txn);
        if (opCode == OpCode.createSession) {
            if (txn instanceof CreateSessionTxn) {
                CreateSessionTxn cst = (CreateSessionTxn) txn;
                sessionTracker.addSession(sessionId, cst
                        .getTimeOut());
            } else {
                LOG.warn("*****>>>>> Got "
                        + txn.getClass() + " "
                        + txn.toString());
            }
        } else if (opCode == OpCode.closeSession) {
            sessionTracker.removeSession(sessionId);
        }
        return rc;
    }
{code}

{noformat}
2012-06-26 13:10:19,886 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:createSession cxid:0x0 zxid:0x200000070 txntype:-10 reqpath:n/a
{noformat}
----

*Stage-5)* 'closesession' 0x200000074 request has come to the FinalRequestProcessor and do the session closure
{noformat}
2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
{noformat}
----

*Stage-6)* Now the 'create' 0x200000075 command, reaches in FinalRequestProcessor and resulting the orphan ephmeral znode.
{noformat}
2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
{noformat}


-Rakesh
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>         Attachments: Logs.rar
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-1496:
-------------------------------------

    Fix Version/s: 3.5.0
    
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: suja s
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-1496:
-------------------------------------

    Fix Version/s: 3.4.4

This looks like a critical bugfix.
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: suja s
>            Assignee: Rakesh R
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh R updated ZOOKEEPER-1496:
--------------------------------

    Affects Version/s: 3.4.3
    
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.3
>            Reporter: suja s
>            Assignee: Rakesh R
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh R updated ZOOKEEPER-1496:
--------------------------------

    Attachment: ZOOKEEPER-1496.1.patch
    
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456863#comment-13456863 ] 

Hadoop QA commented on ZOOKEEPER-1496:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12545370/ZOOKEEPER-1496.3.patch
  against trunk revision 1385378.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1183//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1183//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1183//console

This message is automatically generated.
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: suja s
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.3.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456860#comment-13456860 ] 

Mahadev konar commented on ZOOKEEPER-1496:
------------------------------------------

Rakesh,
 The patch looks good to me. Ill wait for hudson to check this in. We are good to go for 3.4 RC now! Thanks Rakesh!
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: suja s
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.3.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412663#comment-13412663 ] 

Rakesh R commented on ZOOKEEPER-1496:
-------------------------------------

Attached the patch with test case. Please review the analysis and the proposed solution.

In our NN-HA cluster, similar issue has been reported twice as the session expiry time is very small and default value is 5seconds.

Thanks,
Rakesh
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>            Assignee: Rakesh R
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456758#comment-13456758 ] 

Mahadev konar commented on ZOOKEEPER-1496:
------------------------------------------

Rakesh,
 I looked at the patch and it looks good, except for this one:

{code}
-                set = sessionSets.remove(nextExpirationTime);
+                SessionSet set = sessionSets.get(nextExpirationTime);
{code}

I think the remove still needs to happen else the session sets will keep growing in the hashset.

Also, 

{code}
         if (s != null) {
-            sessionSets.get(s.tickTime).sessions.remove(s);
+            SessionSet sessionSet = sessionSets.get(s.tickTime);
+            sessionSet.sessions.remove(s);
+            // Cleanup sessionSets, if no session exists
+            if (sessionSet.sessions.size() == 0) {
+                sessionSets.remove(s.tickTime);
+            }
         }
     }

{code}

I see that you are removing the sessionSet once the session cleans up but I think we need to still do the remove the session set when iterating for expiry.

Does that make sense?

                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: suja s
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444834#comment-13444834 ] 

Rakesh R commented on ZOOKEEPER-1496:
-------------------------------------

Thanks Mahadev for the interest. Could you please review the proposed fix, that I had made in the patch.
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: suja s
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "suja s (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

suja s updated ZOOKEEPER-1496:
------------------------------

    Attachment: Logs.rar

Attaching logs
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>         Attachments: Logs.rar
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOST-10-18-40-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOST-10-18-40-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/10.18.40.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOST-10-18-40-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/10.18.40.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456939#comment-13456939 ] 

Hudson commented on ZOOKEEPER-1496:
-----------------------------------

Integrated in ZooKeeper-trunk #1682 (See [https://builds.apache.org/job/ZooKeeper-trunk/1682/])
    ZOOKEEPER-1496. Ephemeral node not getting cleared even after client has exited. (Rakesh R via mahadev) (Revision 1386496)

     Result = SUCCESS
mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1386496
Files : 
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java
* /zookeeper/trunk/src/java/test/org/apache/zookeeper/server/SessionTrackerTest.java

                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: suja s
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.3.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403864#comment-13403864 ] 

Rakesh R commented on ZOOKEEPER-1496:
-------------------------------------

Thanks Camille for the interest. I've attached an initial draft for reviewing the problem and proposed solution.
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh R updated ZOOKEEPER-1496:
--------------------------------

    Attachment: ZOOKEEPER-1496.3.patch
    
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: suja s
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.3.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-1496:
-------------------------------------

    Component/s: server
    
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: suja s
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13412655#comment-13412655 ] 

Hadoop QA commented on ZOOKEEPER-1496:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12536199/ZOOKEEPER-1496.2.patch
  against trunk revision 1357711.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1138//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1138//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1138//console

This message is automatically generated.
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>            Assignee: Rakesh R
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "suja s (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

suja s updated ZOOKEEPER-1496:
------------------------------

    Description: 
In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.

Zk version: 3.4.3

Ephemeral node still exists in Zookeeper: 

HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 

Tue Jun 26 16:07:04 IST 2012 
HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
Connecting to xx.xx.xx.55:2182 
Welcome to ZooKeeper! 
JLine support is enabled 
[zk: xx.xx.xx.55:2182(CONNECTING) 0] 
WATCHER:: 

WatchedEvent state:SyncConnected type:None path:null 

[zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 

haclusternn2HOSt-xx-xx-xx-102 �� 
cZxid = 0x200000075 
ctime = Tue Jun 26 13:10:19 IST 2012 
mZxid = 0x200000075 
mtime = Tue Jun 26 13:10:19 IST 2012 
pZxid = 0x200000075 
cversion = 0 
dataVersion = 0 
aclVersion = 0 
ephemeralOwner = 0x1382791d4e50004 
dataLength = 42 
numChildren = 0 
[zk: xx.xx.xx.55:2182(CONNECTED) 1] 

Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 

HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 

HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 


 Close session and create requests coming almost parallely. 


Env:
Hadoop setup.
We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
NN102 was active and NN55 was standby. 
FailoverController at 102 got shut down due to ZK connection error. 
The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK


  was:
In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.

Zk version: 3.4.3

Ephemeral node still exists in Zookeeper: 

HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 

Tue Jun 26 16:07:04 IST 2012 
HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
Connecting to xx.xx.xx.55:2182 
Welcome to ZooKeeper! 
JLine support is enabled 
[zk: xx.xx.xx.55:2182(CONNECTING) 0] 
WATCHER:: 

WatchedEvent state:SyncConnected type:None path:null 

[zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 

haclusternn2HOST-10-18-40-102 �� 
cZxid = 0x200000075 
ctime = Tue Jun 26 13:10:19 IST 2012 
mZxid = 0x200000075 
mtime = Tue Jun 26 13:10:19 IST 2012 
pZxid = 0x200000075 
cversion = 0 
dataVersion = 0 
aclVersion = 0 
ephemeralOwner = 0x1382791d4e50004 
dataLength = 42 
numChildren = 0 
[zk: xx.xx.xx.55:2182(CONNECTED) 1] 

Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 

HOST-10-18-40-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/10.18.40.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 

HOST-10-18-40-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/10.18.40.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 


 Close session and create requests coming almost parallely. 


Env:
Hadoop setup.
We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
NN102 was active and NN55 was standby. 
FailoverController at 102 got shut down due to ZK connection error. 
The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK


    
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>         Attachments: Logs.rar
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated ZOOKEEPER-1496:
-------------------------------------

    Priority: Critical  (was: Major)
    
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: suja s
>            Assignee: Rakesh R
>            Priority: Critical
>             Fix For: 3.4.4, 3.5.0
>
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

Posted by "Rakesh R (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rakesh R updated ZOOKEEPER-1496:
--------------------------------

    Attachment: ZOOKEEPER-1496.2.patch
    
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>            Assignee: Rakesh R
>         Attachments: Logs.rar, ZOOKEEPER-1496.1.patch, ZOOKEEPER-1496.2.patch, ZOOKEEPER-1496.patch
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira