You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Rakesh R (JIRA)" <ji...@apache.org> on 2012/06/28 12:49:43 UTC

[jira] [Commented] (ZOOKEEPER-1496) Ephemeral node not getting cleared even after client has exited

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402997#comment-13402997 ] 

Rakesh R commented on ZOOKEEPER-1496:
-------------------------------------

I've gone through the issue and attaching the analysis. Apologies in advance for the length of this comment - I'm trying to cover all the details.

*Environment details:*
ZK-91 -> Leader
ZK-55 -> Follower
ZK-102 -> Follower
ZKClient session timeout -> 5seconds
ZKClient side logic:- What I observed is, zclient is not waiting for any SyncConnected event before creating ephemeral znode 'ActiveStandbyElectorLock'. Also, on every connectionloss it is creating the ephemeral znode 'ActiveStandbyElectorLock'.


*Root Cause:* I'm suspecting there is a race condition between session expiry and FinalRequestProcessor. I could see, the expiry logic is removing the SessionImpl and is raising the 'closesession' request, other side FinalRequestProcessor is creating new SessionImpl into the SessionTracker for the 'createsession' request.

This case will come only if FinalRequestProcessor has some delay in processing the 'createsession' and in between the same session is getting expired.
----

I've done the analysis in perspective of Leader ZooKeeper server(ZK-91) and is as follows:

*Stage-1)* Client has sent connection request and the transaction id is '0x200000070'
Following log shows, the server has received connection request and after successful proposal, send committing request to the Follower. But the Leader side the transaction is in progress, as the request not reaches the FinalRequestProcessor.
{noformat}
2012-06-26 13:10:12,566 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:createSession cxid:0x0 zxid:0x200000070 txntype:-10 reqpath:n/a

2012-06-26 13:10:13,001 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:createSession cxid:0x0 zxid:0x200000070 txntype:-10 reqpath:n/a

2012-06-26 13:10:13,202 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:createSession cxid:0x0 zxid:0x200000070 txntype:-10 reqpath:n/a
{noformat}
----

*Stage-2)* Following log shows the Leader has'nt seen any pings from 0x1382791d4e50004 and expiring the session by creating 'closesession' request.
 If we see the SessionTrackerImpl, it is removing the SessionImpl corresponding to the 0x1382791d4e50004 before raising the expiry request. 
{code}
     sessionsById.remove(s.sessionId);
     expirer.expire(s);
{code}
At the same time, zkclient has sent 'create' command to the server - 0x200000072 transactionid, now the sessionTracker.checkSession(sessionId, owner) will not have the SessionImpl and get null then throwing the session expiry exception.
{code}
        SessionImpl session = sessionsById.get(sessionId);
        if (session == null || session.isClosing()) {
            throw new KeeperException.SessionExpiredException();
        }
{code}
{noformat}
2012-06-26 13:10:18,653 [myid:3] - INFO  [SessionTracker:ZooKeeperServer@325] - Expiring session 0x1382791d4e50004, timeout of 5000ms exceeded
2012-06-26 13:10:18,803 [myid:3] - INFO  [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@627] - Got user-level KeeperException when processing sessionid:0x1382791d4e50004 type:create cxid:0x1 zxid:0x200000072 txntype:-1 reqpath:/hadoop-ha/hacluster/ActiveStandbyElectorLock Error Path:null Error:KeeperErrorCode = Session expired
{noformat}
----

*Stage-3)* Following message shows, 'closesession' 0x200000074 has got enough Ack and committing. But it didn't reaches the FinalRequestProcessor, so the session info will exists in the sessionTracker except the 'SessionImpl' object(as session is expired). 
{noformat}
2012-06-26 13:10:18,834 [myid:3] - INFO  [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@476] - Processed session termination for sessionid: 0x1382791d4e50004
2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
{noformat}
----

*Stage-4)* Following log shows, 'createSession' 0x200000070 has reached FinalRequestProcessor and same time received a 'create' request from the client.
Now, the FinalRequestProcessor is processing the transaction and adding the SessionImpl back to the SessionTrackerImpl. The 'create' request is seeing this new SessionImpl and will continue successfully to next processor.
{code}
FinalRequestProcessor.java
rc = zks.processTxn(hdr, txn);       
{code}

ZooKeeperServer.java
{code}
    public ProcessTxnResult processTxn(TxnHeader hdr, Record txn) {
        ProcessTxnResult rc;
        int opCode = hdr.getType();
        long sessionId = hdr.getClientId();
        rc = getZKDatabase().processTxn(hdr, txn);
        if (opCode == OpCode.createSession) {
            if (txn instanceof CreateSessionTxn) {
                CreateSessionTxn cst = (CreateSessionTxn) txn;
                sessionTracker.addSession(sessionId, cst
                        .getTimeOut());
            } else {
                LOG.warn("*****>>>>> Got "
                        + txn.getClass() + " "
                        + txn.toString());
            }
        } else if (opCode == OpCode.closeSession) {
            sessionTracker.removeSession(sessionId);
        }
        return rc;
    }
{code}

{noformat}
2012-06-26 13:10:19,886 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:createSession cxid:0x0 zxid:0x200000070 txntype:-10 reqpath:n/a
{noformat}
----

*Stage-5)* 'closesession' 0x200000074 request has come to the FinalRequestProcessor and do the session closure
{noformat}
2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
{noformat}
----

*Stage-6)* Now the 'create' 0x200000075 command, reaches in FinalRequestProcessor and resulting the orphan ephmeral znode.
{noformat}
2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
{noformat}


-Rakesh
                
> Ephemeral node not getting cleared even after client has exited
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1496
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1496
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: suja s
>         Attachments: Logs.rar
>
>
> In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.
> Zk version: 3.4.3
> Ephemeral node still exists in Zookeeper: 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date 
> Tue Jun 26 16:07:04 IST 2012 
> HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 
> Connecting to xx.xx.xx.55:2182 
> Welcome to ZooKeeper! 
> JLine support is enabled 
> [zk: xx.xx.xx.55:2182(CONNECTING) 0] 
> WATCHER:: 
> WatchedEvent state:SyncConnected type:None path:null 
> [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock 
> haclusternn2HOSt-xx-xx-xx-102 �� 
> cZxid = 0x200000075 
> ctime = Tue Jun 26 13:10:19 IST 2012 
> mZxid = 0x200000075 
> mtime = Tue Jun 26 13:10:19 IST 2012 
> pZxid = 0x200000075 
> cversion = 0 
> dataVersion = 0 
> aclVersion = 0 
> ephemeralOwner = 0x1382791d4e50004 
> dataLength = 42 
> numChildren = 0 
> [zk: xx.xx.xx.55:2182(CONNECTED) 1] 
> Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 
> 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 
> HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 
> 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
> 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 
>  Close session and create requests coming almost parallely. 
> Env:
> Hadoop setup.
> We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
> NN102 was active and NN55 was standby. 
> FailoverController at 102 got shut down due to ZK connection error. 
> The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira