You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Andor Molnar (JIRA)" <ji...@apache.org> on 2018/04/05 06:15:00 UTC

[jira] [Commented] (ZOOKEEPER-3018) Ephemeral node not deleted after session is gone

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426516#comment-16426516 ] 

Andor Molnar commented on ZOOKEEPER-3018:
-----------------------------------------

[~danielchan] Thanks for opening this jira. It's better to track the investigation here.

Are you able to verify the issue with the latest stable version (3.4.11 or 3.4.12) by any chance?

> Ephemeral node not deleted after session is gone
> ------------------------------------------------
>
>                 Key: ZOOKEEPER-3018
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3018
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.6
>         Environment: Linux 4.1.12-112.14.10.el6uek.x86_64 #2 SMP x86_64 GNU/Linux
>            Reporter: Daniel C
>            Priority: Major
>
> We have a live Zookeeper environment (quorum size is 2) and observed a strange behavior:
>  * Kafka created 2 ephemeral nodes /brokers/ids/822712429 and /brokers/ids/707577499 on 2018-03-12 03:30:36.933
>  * The Kafka clients were long gone but as of today (20+ days after), the two ephemeral nodes are still present
>  
> Troubleshooting:
> 1) Lists the outstanding sessions and ephemeral nodes
>  
> {noformat}
> $ echo dump | nc $SERVER1 2181
> SessionTracker dump:
> org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
> ephemeral nodes dump:
> Sessions with Ephemerals (2):
> 0x162183ea9f70003:
>                /brokers/ids/822712429
> 0x162183ea9f70002:
>                /brokers/ids/707577499
>                /controller
> {noformat}
>  
>  
> 2) stat on /brokers/ids/822712429
>  
> {noformat}
> zk> stat /brokers/ids/822712429
> czxid: 4294967344
> mzxid: 4294967344
> pzxid: 4294967344
> ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> version: 0
> cversion: 0
> aversion: 0
> owner: 99668799174148099
> datalen: 102
> children: 0
> {noformat}
>  
>  
> 3) List full connection/session details for all clients connected
>  
> {noformat}
> $ echo cons | nc $SERVER1 2181
>  /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
>  /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=0x162183ea9f70c22,lop=PING,est=1522713395028,to=40000,lcxid=0x12,lzxid=0xffffffffffffffff,lresp=1522717802117,llat=0,minlat=0,avglat=0,maxlat=31)
> {noformat}
>  
>  
>  
> {noformat}
> $ echo cons | nc $SERVER2 2181
>  /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
>  /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,minlat=0,avglat=0,maxlat=1235)
> {noformat}
>  
>  
> 4) health
>  
> {noformat}
> $ echo mntr | nc $SERVER1 2181
> zk_version           3.4.6-1569965, built on 02/20/2014 09:09 GMT
> zk_avg_latency  0
> zk_max_latency 443
> zk_min_latency  0
> zk_packets_received       11158019
> zk_packets_sent               11158244
> zk_num_alive_connections           2
> zk_outstanding_requests              0
> zk_server_state follower
> zk_znode_count               344
> zk_watch_count               0
> zk_ephemerals_count     3
> zk_approximate_data_size          36654
> zk_open_file_descriptor_count   33
> zk_max_file_descriptor_count     65536
> {noformat}
>  
>  
> 5) Server logs with related sessions:
> {noformat}
> Only found these logs from Server1 related to the sessions (0x162183ea9f70002 and 0x162183ea9f70003):
> 2018-03-12 03:28:35,127 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.196.18.60:26775
> 2018-03-12 03:28:35,131 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.196.18.60:26775; will be dropped if server is in r-o mode
> 2018-03-12 03:28:35,131 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.196.18.60:26775
> 2018-03-12 03:28:35,137 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70002 with negotiated timeout 9000 for client /10.196.18.60:26775
>  
> 2018-03-12 03:30:36,415 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.247.114.70:39260
> 2018-03-12 03:30:36,422 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.247.114.70:39260; will be dropped if server is in r-o mode
> 2018-03-12 03:30:36,423 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.247.114.70:39260
> 2018-03-12 03:30:36,428 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70003 with negotiated timeout 9000 for client /10.247.114.70:39260
>  
> 2018-03-31 01:29:58,865 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.247.114.70:39260 which had sessionid 0x162183ea9f70003{noformat}
> 6) Txn logs on the two ephemeral nodes /brokers/ids/707577499 and /brokers/ids/822712429:
> {noformat}
> 3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x6 zxid 0x10000001b create '/brokers/ids,,v{s{31,s{'world,'anyone}}},F,1
> 3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x2c zxid 0x100000028 create '/brokers/ids/707577499,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235333135363931222c22686f7374223a22736c6331336e79692e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,1
> 3/11/18 8:30:36 PM PDT session 0x162183ea9f70003 cxid 0x14 zxid 0x100000030 create '/brokers/ids/822712429,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235343336393139222c22686f7374223a22736c6331336e796a2e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,2{noformat}
>  
> 7) Additional questions from [~andorm]
> {noformat}
> 1) Why is the session closed, the client closed it or the cluster expired it?
> [Daniel Chan] in this case, the client got killed and we expect the session would be expired by the cluster
>  
> 2) which server was the session attached to - the first (44sec max
> lat) or one of the others? Which server was the leader?
> [Daniel Chan] The sessions creating the ephemeral nodes were attached to Server1 (443 max latency) while Server2 is the leader
>  
> 3) the znode exists on all 4 servers, is that right?
> [Daniel Chan] The cluster has 2 members not 4, and the ephemeral nodes are present on both servers
>  {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)