You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Daniel C (JIRA)" <ji...@apache.org> on 2018/04/04 20:10:00 UTC

[jira] [Created] (ZOOKEEPER-3018) Ephemeral node not deleted after session is gone

Daniel C created ZOOKEEPER-3018:
-----------------------------------

             Summary: Ephemeral node not deleted after session is gone
                 Key: ZOOKEEPER-3018
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3018
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.4.6
         Environment: Linux 4.1.12-112.14.10.el6uek.x86_64 #2 SMP x86_64 GNU/Linux
            Reporter: Daniel C


We have a live Zookeeper environment (quorum size is 2) and observed a strange behavior:
 * Kafka created 2 ephemeral nodes /brokers/ids/822712429 and /brokers/ids/707577499 on 2018-03-12 03:30:36.933
 * The Kafka clients were long gone but as of today (20+ days after), the two ephemeral nodes are still present

 

Troubleshooting:

1) Lists the outstanding sessions and ephemeral nodes

 
{noformat}
$ echo dump | nc $SERVER1 2181
SessionTracker dump:
org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
ephemeral nodes dump:
Sessions with Ephemerals (2):
0x162183ea9f70003:
               /brokers/ids/822712429
0x162183ea9f70002:
               /brokers/ids/707577499
               /controller
{noformat}
 

 

2) stat on /brokers/ids/822712429

 
{noformat}
zk> stat /brokers/ids/822712429
czxid: 4294967344
mzxid: 4294967344
pzxid: 4294967344
ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
version: 0
cversion: 0
aversion: 0
owner: 99668799174148099
datalen: 102
children: 0
{noformat}
 

 

3) List full connection/session details for all clients connected

 
{noformat}
$ echo cons | nc $SERVER1 2181
 /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
 /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=0x162183ea9f70c22,lop=PING,est=1522713395028,to=40000,lcxid=0x12,lzxid=0xffffffffffffffff,lresp=1522717802117,llat=0,minlat=0,avglat=0,maxlat=31)
{noformat}
 

 

 
{noformat}
$ echo cons | nc $SERVER2 2181
 /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
 /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,minlat=0,avglat=0,maxlat=1235)
{noformat}
 

 

4) health

 
{noformat}
$ echo mntr | nc $SERVER1 2181
zk_version           3.4.6-1569965, built on 02/20/2014 09:09 GMT
zk_avg_latency  0
zk_max_latency 443
zk_min_latency  0
zk_packets_received       11158019
zk_packets_sent               11158244
zk_num_alive_connections           2
zk_outstanding_requests              0
zk_server_state follower
zk_znode_count               344
zk_watch_count               0
zk_ephemerals_count     3
zk_approximate_data_size          36654
zk_open_file_descriptor_count   33
zk_max_file_descriptor_count     65536
{noformat}
 

 

5) Server logs with related sessions:
{noformat}
Only found these logs from Server1 related to the sessions (0x162183ea9f70002 and 0x162183ea9f70003):

2018-03-12 03:28:35,127 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.196.18.60:26775

2018-03-12 03:28:35,131 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.196.18.60:26775; will be dropped if server is in r-o mode

2018-03-12 03:28:35,131 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.196.18.60:26775

2018-03-12 03:28:35,137 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70002 with negotiated timeout 9000 for client /10.196.18.60:26775

 
2018-03-12 03:30:36,415 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.247.114.70:39260

2018-03-12 03:30:36,422 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.247.114.70:39260; will be dropped if server is in r-o mode

2018-03-12 03:30:36,423 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.247.114.70:39260

2018-03-12 03:30:36,428 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70003 with negotiated timeout 9000 for client /10.247.114.70:39260

 
2018-03-31 01:29:58,865 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.247.114.70:39260 which had sessionid 0x162183ea9f70003{noformat}
6) Txn logs on the two ephemeral nodes /brokers/ids/707577499 and /brokers/ids/822712429:
{noformat}
3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x6 zxid 0x10000001b create '/brokers/ids,,v{s{31,s{'world,'anyone}}},F,1

3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x2c zxid 0x100000028 create '/brokers/ids/707577499,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235333135363931222c22686f7374223a22736c6331336e79692e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,1

3/11/18 8:30:36 PM PDT session 0x162183ea9f70003 cxid 0x14 zxid 0x100000030 create '/brokers/ids/822712429,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235343336393139222c22686f7374223a22736c6331336e796a2e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,2{noformat}
 

7) Additional questions from [~andorm]
{noformat}
1) Why is the session closed, the client closed it or the cluster expired it?

[Daniel Chan] in this case, the client got killed and we expect the session would be expired by the cluster

 
2) which server was the session attached to - the first (44sec max

lat) or one of the others? Which server was the leader?

[Daniel Chan] The sessions creating the ephemeral nodes were attached to Server1 (443 max latency) while Server2 is the leader

 
3) the znode exists on all 4 servers, is that right?

[Daniel Chan] The cluster has 2 members not 4, and the ephemeral nodes are present on both servers

 {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)