You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Andor Molnar (JIRA)" <ji...@apache.org> on 2018/04/18 19:28:00 UTC

[jira] [Comment Edited] (ZOOKEEPER-3018) Ephemeral node not deleted after session is gone

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443067#comment-16443067 ] 

Andor Molnar edited comment on ZOOKEEPER-3018 at 4/18/18 7:27 PM:
------------------------------------------------------------------

[~danielchan]

Do you see the same behaviour on your servers? e.g. new leader election around session expiration

...or there's no sign of session expiration in the logs at all?

If possible, would you please attach the log files of all your Zk servers?


was (Author: andorm):
[~danielchan]

Do you see the same behaviour on your servers? e.g. new leader election around session expiration

> Ephemeral node not deleted after session is gone
> ------------------------------------------------
>
>                 Key: ZOOKEEPER-3018
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3018
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.6
>         Environment: Linux 4.1.12-112.14.10.el6uek.x86_64 #2 SMP x86_64 GNU/Linux
>            Reporter: Daniel C
>            Priority: Major
>
> We have a live Zookeeper environment (quorum size is 2) and observed a strange behavior:
>  * Kafka created 2 ephemeral nodes /brokers/ids/822712429 and /brokers/ids/707577499 on 2018-03-12 03:30:36.933
>  * The Kafka clients were long gone but as of today (20+ days after), the two ephemeral nodes are still present
>  
> Troubleshooting:
> 1) Lists the outstanding sessions and ephemeral nodes
>  
> {noformat}
> $ echo dump | nc $SERVER1 2181
> SessionTracker dump:
> org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
> ephemeral nodes dump:
> Sessions with Ephemerals (2):
> 0x162183ea9f70003:
>                /brokers/ids/822712429
> 0x162183ea9f70002:
>                /brokers/ids/707577499
>                /controller
> {noformat}
>  
>  
> 2) stat on /brokers/ids/822712429
>  
> {noformat}
> zk> stat /brokers/ids/822712429
> czxid: 4294967344
> mzxid: 4294967344
> pzxid: 4294967344
> ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> version: 0
> cversion: 0
> aversion: 0
> owner: 99668799174148099
> datalen: 102
> children: 0
> {noformat}
>  
>  
> 3) List full connection/session details for all clients connected
>  
> {noformat}
> $ echo cons | nc $SERVER1 2181
>  /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
>  /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=0x162183ea9f70c22,lop=PING,est=1522713395028,to=40000,lcxid=0x12,lzxid=0xffffffffffffffff,lresp=1522717802117,llat=0,minlat=0,avglat=0,maxlat=31)
> {noformat}
>  
>  
>  
> {noformat}
> $ echo cons | nc $SERVER2 2181
>  /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
>  /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,minlat=0,avglat=0,maxlat=1235)
> {noformat}
>  
>  
> 4) health
>  
> {noformat}
> $ echo mntr | nc $SERVER1 2181
> zk_version           3.4.6-1569965, built on 02/20/2014 09:09 GMT
> zk_avg_latency  0
> zk_max_latency 443
> zk_min_latency  0
> zk_packets_received       11158019
> zk_packets_sent               11158244
> zk_num_alive_connections           2
> zk_outstanding_requests              0
> zk_server_state follower
> zk_znode_count               344
> zk_watch_count               0
> zk_ephemerals_count     3
> zk_approximate_data_size          36654
> zk_open_file_descriptor_count   33
> zk_max_file_descriptor_count     65536
> {noformat}
>  
>  
> 5) Server logs with related sessions:
> {noformat}
> Only found these logs from Server1 related to the sessions (0x162183ea9f70002 and 0x162183ea9f70003):
> 2018-03-12 03:28:35,127 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.196.18.60:26775
> 2018-03-12 03:28:35,131 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.196.18.60:26775; will be dropped if server is in r-o mode
> 2018-03-12 03:28:35,131 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.196.18.60:26775
> 2018-03-12 03:28:35,137 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70002 with negotiated timeout 9000 for client /10.196.18.60:26775
>  
> 2018-03-12 03:30:36,415 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.247.114.70:39260
> 2018-03-12 03:30:36,422 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.247.114.70:39260; will be dropped if server is in r-o mode
> 2018-03-12 03:30:36,423 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.247.114.70:39260
> 2018-03-12 03:30:36,428 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70003 with negotiated timeout 9000 for client /10.247.114.70:39260
>  
> 2018-03-31 01:29:58,865 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.247.114.70:39260 which had sessionid 0x162183ea9f70003{noformat}
> 6) Txn logs on the two ephemeral nodes /brokers/ids/707577499 and /brokers/ids/822712429:
> {noformat}
> 3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x6 zxid 0x10000001b create '/brokers/ids,,v{s{31,s{'world,'anyone}}},F,1
> 3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x2c zxid 0x100000028 create '/brokers/ids/707577499,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235333135363931222c22686f7374223a22736c6331336e79692e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,1
> 3/11/18 8:30:36 PM PDT session 0x162183ea9f70003 cxid 0x14 zxid 0x100000030 create '/brokers/ids/822712429,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235343336393139222c22686f7374223a22736c6331336e796a2e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,2{noformat}
>  
> 7) Additional questions from [~andorm]
> {noformat}
> 1) Why is the session closed, the client closed it or the cluster expired it?
> [Daniel Chan] in this case, the client got killed and we expect the session would be expired by the cluster
>  
> 2) which server was the session attached to - the first (44sec max
> lat) or one of the others? Which server was the leader?
> [Daniel Chan] The sessions creating the ephemeral nodes were attached to Server1 (443 max latency) while Server2 is the leader
>  
> 3) the znode exists on all 4 servers, is that right?
> [Daniel Chan] The cluster has 2 members not 4, and the ephemeral nodes are present on both servers
>  {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)