You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Daniel Chan <da...@oracle.com> on 2018/04/03 01:22:56 UTC

[3.4.6] Ephemeral node not deleted after session is gone

We have a live Zookeeper environment (quorum size is 2) and observed a strange behavior:
Kafka created 2 ephemeral nodes /brokers/ids/822712429 and /brokers/ids/707577499 on 2018-03-12 03:30:36.933
The Kafka clients were long gone but as of today, the two ephemeral nodes are still present

Troubleshooting:
1) Lists the outstanding sessions and ephemeral nodes
$ echo dump | nc $SERVER1 2181
SessionTracker dump:
org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
ephemeral nodes dump:
Sessions with Ephemerals (2):
0x162183ea9f70003:
	/brokers/ids/822712429
0x162183ea9f70002:
	/brokers/ids/707577499
	/controller

2) stat on /brokers/ids/822712429
zk> stat /brokers/ids/822712429
czxid: 4294967344
mzxid: 4294967344
pzxid: 4294967344
ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
version: 0
cversion: 0
aversion: 0
owner: 99668799174148099
datalen: 102
children: 0

3) List full connection/session details for all clients connected
$ echo cons | nc $SERVER1 2181
 /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
 /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=0x162183ea9f70c22,lop=PING,est=1522713395028,to=40000,lcxid=0x12,lzxid=0xffffffffffffffff,lresp=1522717802117,llat=0,minlat=0,avglat=0,maxlat=31)

$ echo cons | nc $SERVER2 2181
 /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
 /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,minlat=0,avglat=0,maxlat=1235)

4) health
$ echo mntr | nc $SERVER1 2181
zk_version	3.4.6-1569965, built on 02/20/2014 09:09 GMT
zk_avg_latency	0
zk_max_latency	443
zk_min_latency	0
zk_packets_received	11158019
zk_packets_sent	11158244
zk_num_alive_connections	2
zk_outstanding_requests	0
zk_server_state	follower
zk_znode_count	344
zk_watch_count	0
zk_ephemerals_count	3
zk_approximate_data_size	36654
zk_open_file_descriptor_count	33
zk_max_file_descriptor_count	65536

5) Could not find any special exception from zookeeper logs about the two sessions

Is this a known bug in version 3.4.6? what could be the potential cause of the issue?

Thanks,
Daniel

RE: [3.4.6] Ephemeral node not deleted after session is gone

Posted by Daniel Chan <da...@oracle.com>.
Filed a bug at https://issues.apache.org/jira/browse/ZOOKEEPER-3018

Thanks,
Daniel

-----Original Message-----
From: Daniel Chan 
Sent: Tuesday, April 3, 2018 11:49 AM
To: user@zookeeper.apache.org
Subject: RE: [3.4.6] Ephemeral node not deleted after session is gone

Hi Andor,

Please see my replies and requested information inline.

Thanks,
Daniel

-----Original Message-----
From: Andor Molnar [mailto:andor@cloudera.com]
Sent: Tuesday, April 3, 2018 2:26 AM
To: user@zookeeper.apache.org
Subject: Re: [3.4.6] Ephemeral node not deleted after session is gone

There're a few questions on the original thread which might be useful to answer here as well:

1) Why is the session closed, the client closed it or the cluster expired it?
[Daniel Chan] in this case, the client got killed and we expect the session would be expired by the cluster

2) which server was the session attached to - the first (44sec max
lat) or one of the others? Which server was the leader?
[Daniel Chan] The sessions creating the ephemeral nodes were attached to Server1 (443 max latency) while Server2 is the leader

3) the znode exists on all 4 servers, is that right?
[Daniel Chan] The cluster has 2 members not 4, and the ephemeral nodes are present on both servers

Would also be useful to attach server logs related to the session expiration as well as LogFormatter output of txn log files about the nodes.
[Daniel Chan] Only found these logs from Server1 related to the sessions (0x162183ea9f70002 and 0x162183ea9f70003):
2018-03-12 03:28:35,127 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.196.18.60:26775
2018-03-12 03:28:35,131 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.196.18.60:26775; will be dropped if server is in r-o mode
2018-03-12 03:28:35,131 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.196.18.60:26775
2018-03-12 03:28:35,137 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70002 with negotiated timeout 9000 for client /10.196.18.60:26775

2018-03-12 03:30:36,415 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.247.114.70:39260
2018-03-12 03:30:36,422 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.247.114.70:39260; will be dropped if server is in r-o mode
2018-03-12 03:30:36,423 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.247.114.70:39260
2018-03-12 03:30:36,428 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70003 with negotiated timeout 9000 for client /10.247.114.70:39260

2018-03-31 01:29:58,865 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.247.114.70:39260 which had sessionid 0x162183ea9f70003

Txn logs on the two ephemeral nodes /brokers/ids/707577499 and /brokers/ids/822712429:
3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x6 zxid 0x10000001b create '/brokers/ids,,v{s{31,s{'world,'anyone}}},F,1
3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x2c zxid 0x100000028 create '/brokers/ids/707577499,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235333135363931222c22686f7374223a22736c6331336e79692e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,1
3/11/18 8:30:36 PM PDT session 0x162183ea9f70003 cxid 0x14 zxid 0x100000030 create '/brokers/ids/822712429,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235343336393139222c22686f7374223a22736c6331336e796a2e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,2

Regards,
Andor


On Tue, Apr 3, 2018 at 10:34 AM, Andor Molnar <an...@cloudera.com> wrote:

> Hi Daniel,
>
> Thanks for the bugreport.
> Interesting that this issue should have been fixed already by ages:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
> _jira_browse_ZOOKEEPER-2D1208&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qI
> rMUB65eapI_JnE&r=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns&m=eycsMys
> gttgbjNW3XhfWJ8TgkcWxEFjljV-TpzD5EFU&s=ryVABxZ1RLdjrc3D4I6M5ZpST_jU6GV
> QDWoE4AH83L0&e=
>
> Regards,
> Andor
>
>
> On Tue, Apr 3, 2018 at 3:22 AM, Daniel Chan 
> <da...@oracle.com>
> wrote:
>
>> We have a live Zookeeper environment (quorum size is 2) and observed 
>> a strange behavior:
>> Kafka created 2 ephemeral nodes /brokers/ids/822712429 and
>> /brokers/ids/707577499 on 2018-03-12 03:30:36.933 The Kafka clients 
>> were long gone but as of today, the two ephemeral nodes are still 
>> present
>>
>> Troubleshooting:
>> 1) Lists the outstanding sessions and ephemeral nodes $ echo dump | 
>> nc $SERVER1 2181 SessionTracker dump:
>> org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
>> ephemeral nodes dump:
>> Sessions with Ephemerals (2):
>> 0x162183ea9f70003:
>>         /brokers/ids/822712429
>> 0x162183ea9f70002:
>>         /brokers/ids/707577499
>>         /controller
>>
>> 2) stat on /brokers/ids/822712429
>> zk> stat /brokers/ids/822712429
>> czxid: 4294967344
>> mzxid: 4294967344
>> pzxid: 4294967344
>> ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
>> mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
>> version: 0
>> cversion: 0
>> aversion: 0
>> owner: 99668799174148099
>> datalen: 102
>> children: 0
>>
>> 3) List full connection/session details for all clients connected $ 
>> echo cons | nc $SERVER1 2181
>>  /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
>>  /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=
>> 0x162183ea9f70c22,lop=PING,est=1522713395028,to=40000,
>> lcxid=0x12,lzxid=0xffffffffffffffff,lresp=1522717802117,
>> llat=0,minlat=0,avglat=0,maxlat=31)
>>
>> $ echo cons | nc $SERVER2 2181
>>  /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
>>  /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,
>> sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000
>> ,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,
>> minlat=0,avglat=0,maxlat=1235)
>>
>> 4) health
>> $ echo mntr | nc $SERVER1 2181
>> zk_version      3.4.6-1569965, built on 02/20/2014 09:09 GMT
>> zk_avg_latency  0
>> zk_max_latency  443
>> zk_min_latency  0
>> zk_packets_received     11158019
>> zk_packets_sent 11158244
>> zk_num_alive_connections        2
>> zk_outstanding_requests 0
>> zk_server_state follower
>> zk_znode_count  344
>> zk_watch_count  0
>> zk_ephemerals_count     3
>> zk_approximate_data_size        36654
>> zk_open_file_descriptor_count   33
>> zk_max_file_descriptor_count    65536
>>
>> 5) Could not find any special exception from zookeeper logs about the 
>> two sessions
>>
>> Is this a known bug in version 3.4.6? what could be the potential 
>> cause of the issue?
>>
>> Thanks,
>> Daniel
>>
>
>

RE: [3.4.6] Ephemeral node not deleted after session is gone

Posted by Daniel Chan <da...@oracle.com>.
Hi Andor,

Please see my replies and requested information inline.

Thanks,
Daniel

-----Original Message-----
From: Andor Molnar [mailto:andor@cloudera.com] 
Sent: Tuesday, April 3, 2018 2:26 AM
To: user@zookeeper.apache.org
Subject: Re: [3.4.6] Ephemeral node not deleted after session is gone

There're a few questions on the original thread which might be useful to answer here as well:

1) Why is the session closed, the client closed it or the cluster expired it?
[Daniel Chan] in this case, the client got killed and we expect the session would be expired by the cluster

2) which server was the session attached to - the first (44sec max
lat) or one of the others? Which server was the leader?
[Daniel Chan] The sessions creating the ephemeral nodes were attached to Server1 (443 max latency) while Server2 is the leader

3) the znode exists on all 4 servers, is that right?
[Daniel Chan] The cluster has 2 members not 4, and the ephemeral nodes are present on both servers

Would also be useful to attach server logs related to the session expiration as well as LogFormatter output of txn log files about the nodes.
[Daniel Chan] Only found these logs from Server1 related to the sessions (0x162183ea9f70002 and 0x162183ea9f70003):
2018-03-12 03:28:35,127 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.196.18.60:26775
2018-03-12 03:28:35,131 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.196.18.60:26775; will be dropped if server is in r-o mode
2018-03-12 03:28:35,131 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.196.18.60:26775
2018-03-12 03:28:35,137 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70002 with negotiated timeout 9000 for client /10.196.18.60:26775

2018-03-12 03:30:36,415 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.247.114.70:39260
2018-03-12 03:30:36,422 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /10.247.114.70:39260; will be dropped if server is in r-o mode
2018-03-12 03:30:36,423 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.247.114.70:39260
2018-03-12 03:30:36,428 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established session 0x162183ea9f70003 with negotiated timeout 9000 for client /10.247.114.70:39260

2018-03-31 01:29:58,865 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.247.114.70:39260 which had sessionid 0x162183ea9f70003

Txn logs on the two ephemeral nodes /brokers/ids/707577499 and /brokers/ids/822712429:
3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x6 zxid 0x10000001b create '/brokers/ids,,v{s{31,s{'world,'anyone}}},F,1
3/11/18 8:28:35 PM PDT session 0x162183ea9f70002 cxid 0x2c zxid 0x100000028 create '/brokers/ids/707577499,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235333135363931222c22686f7374223a22736c6331336e79692e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,1
3/11/18 8:30:36 PM PDT session 0x162183ea9f70003 cxid 0x14 zxid 0x100000030 create '/brokers/ids/822712429,#7b226a6d785f706f7274223a31303130332c2274696d657374616d70223a2231353230383235343336393139222c22686f7374223a22736c6331336e796a2e75732e6f7261636c652e636f6d222c2276657273696f6e223a312c22706f7274223a393039327d,v{s{31,s{'world,'anyone}}},T,2

Regards,
Andor


On Tue, Apr 3, 2018 at 10:34 AM, Andor Molnar <an...@cloudera.com> wrote:

> Hi Daniel,
>
> Thanks for the bugreport.
> Interesting that this issue should have been fixed already by ages:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
> _jira_browse_ZOOKEEPER-2D1208&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qI
> rMUB65eapI_JnE&r=JE3yjNS4hXa8nS9n2uFCwEqMvv18hzzEnqunUhCoEns&m=eycsMys
> gttgbjNW3XhfWJ8TgkcWxEFjljV-TpzD5EFU&s=ryVABxZ1RLdjrc3D4I6M5ZpST_jU6GV
> QDWoE4AH83L0&e=
>
> Regards,
> Andor
>
>
> On Tue, Apr 3, 2018 at 3:22 AM, Daniel Chan 
> <da...@oracle.com>
> wrote:
>
>> We have a live Zookeeper environment (quorum size is 2) and observed 
>> a strange behavior:
>> Kafka created 2 ephemeral nodes /brokers/ids/822712429 and
>> /brokers/ids/707577499 on 2018-03-12 03:30:36.933 The Kafka clients 
>> were long gone but as of today, the two ephemeral nodes are still 
>> present
>>
>> Troubleshooting:
>> 1) Lists the outstanding sessions and ephemeral nodes $ echo dump | 
>> nc $SERVER1 2181 SessionTracker dump:
>> org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
>> ephemeral nodes dump:
>> Sessions with Ephemerals (2):
>> 0x162183ea9f70003:
>>         /brokers/ids/822712429
>> 0x162183ea9f70002:
>>         /brokers/ids/707577499
>>         /controller
>>
>> 2) stat on /brokers/ids/822712429
>> zk> stat /brokers/ids/822712429
>> czxid: 4294967344
>> mzxid: 4294967344
>> pzxid: 4294967344
>> ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
>> mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
>> version: 0
>> cversion: 0
>> aversion: 0
>> owner: 99668799174148099
>> datalen: 102
>> children: 0
>>
>> 3) List full connection/session details for all clients connected $ 
>> echo cons | nc $SERVER1 2181
>>  /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
>>  /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=
>> 0x162183ea9f70c22,lop=PING,est=1522713395028,to=40000,
>> lcxid=0x12,lzxid=0xffffffffffffffff,lresp=1522717802117,
>> llat=0,minlat=0,avglat=0,maxlat=31)
>>
>> $ echo cons | nc $SERVER2 2181
>>  /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
>>  /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,
>> sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000
>> ,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,
>> minlat=0,avglat=0,maxlat=1235)
>>
>> 4) health
>> $ echo mntr | nc $SERVER1 2181
>> zk_version      3.4.6-1569965, built on 02/20/2014 09:09 GMT
>> zk_avg_latency  0
>> zk_max_latency  443
>> zk_min_latency  0
>> zk_packets_received     11158019
>> zk_packets_sent 11158244
>> zk_num_alive_connections        2
>> zk_outstanding_requests 0
>> zk_server_state follower
>> zk_znode_count  344
>> zk_watch_count  0
>> zk_ephemerals_count     3
>> zk_approximate_data_size        36654
>> zk_open_file_descriptor_count   33
>> zk_max_file_descriptor_count    65536
>>
>> 5) Could not find any special exception from zookeeper logs about the 
>> two sessions
>>
>> Is this a known bug in version 3.4.6? what could be the potential 
>> cause of the issue?
>>
>> Thanks,
>> Daniel
>>
>
>

Re: [3.4.6] Ephemeral node not deleted after session is gone

Posted by Andor Molnar <an...@cloudera.com>.
There're a few questions on the original thread which might be useful to
answer here as well:

1) Why is the session closed, the client closed it or the cluster expired
it?

2) which server was the session attached to - the first (44sec max
lat) or one of the others? Which server was the leader?

3) the znode exists on all 4 servers, is that right?

Would also be useful to attach server logs related to the session
expiration as well as LogFormatter output of txn log files about the nodes.

Regards,
Andor


On Tue, Apr 3, 2018 at 10:34 AM, Andor Molnar <an...@cloudera.com> wrote:

> Hi Daniel,
>
> Thanks for the bugreport.
> Interesting that this issue should have been fixed already by ages:
> https://issues.apache.org/jira/browse/ZOOKEEPER-1208
>
> Regards,
> Andor
>
>
> On Tue, Apr 3, 2018 at 3:22 AM, Daniel Chan <da...@oracle.com>
> wrote:
>
>> We have a live Zookeeper environment (quorum size is 2) and observed a
>> strange behavior:
>> Kafka created 2 ephemeral nodes /brokers/ids/822712429 and
>> /brokers/ids/707577499 on 2018-03-12 03:30:36.933
>> The Kafka clients were long gone but as of today, the two ephemeral nodes
>> are still present
>>
>> Troubleshooting:
>> 1) Lists the outstanding sessions and ephemeral nodes
>> $ echo dump | nc $SERVER1 2181
>> SessionTracker dump:
>> org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
>> ephemeral nodes dump:
>> Sessions with Ephemerals (2):
>> 0x162183ea9f70003:
>>         /brokers/ids/822712429
>> 0x162183ea9f70002:
>>         /brokers/ids/707577499
>>         /controller
>>
>> 2) stat on /brokers/ids/822712429
>> zk> stat /brokers/ids/822712429
>> czxid: 4294967344
>> mzxid: 4294967344
>> pzxid: 4294967344
>> ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
>> mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
>> version: 0
>> cversion: 0
>> aversion: 0
>> owner: 99668799174148099
>> datalen: 102
>> children: 0
>>
>> 3) List full connection/session details for all clients connected
>> $ echo cons | nc $SERVER1 2181
>>  /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
>>  /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=
>> 0x162183ea9f70c22,lop=PING,est=1522713395028,to=40000,
>> lcxid=0x12,lzxid=0xffffffffffffffff,lresp=1522717802117,
>> llat=0,minlat=0,avglat=0,maxlat=31)
>>
>> $ echo cons | nc $SERVER2 2181
>>  /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
>>  /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,
>> sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000
>> ,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,
>> minlat=0,avglat=0,maxlat=1235)
>>
>> 4) health
>> $ echo mntr | nc $SERVER1 2181
>> zk_version      3.4.6-1569965, built on 02/20/2014 09:09 GMT
>> zk_avg_latency  0
>> zk_max_latency  443
>> zk_min_latency  0
>> zk_packets_received     11158019
>> zk_packets_sent 11158244
>> zk_num_alive_connections        2
>> zk_outstanding_requests 0
>> zk_server_state follower
>> zk_znode_count  344
>> zk_watch_count  0
>> zk_ephemerals_count     3
>> zk_approximate_data_size        36654
>> zk_open_file_descriptor_count   33
>> zk_max_file_descriptor_count    65536
>>
>> 5) Could not find any special exception from zookeeper logs about the two
>> sessions
>>
>> Is this a known bug in version 3.4.6? what could be the potential cause
>> of the issue?
>>
>> Thanks,
>> Daniel
>>
>
>

Re: [3.4.6] Ephemeral node not deleted after session is gone

Posted by Andor Molnar <an...@cloudera.com>.
Hi Daniel,

Thanks for the bugreport.
Interesting that this issue should have been fixed already by ages:
https://issues.apache.org/jira/browse/ZOOKEEPER-1208

Regards,
Andor


On Tue, Apr 3, 2018 at 3:22 AM, Daniel Chan <da...@oracle.com>
wrote:

> We have a live Zookeeper environment (quorum size is 2) and observed a
> strange behavior:
> Kafka created 2 ephemeral nodes /brokers/ids/822712429 and
> /brokers/ids/707577499 on 2018-03-12 03:30:36.933
> The Kafka clients were long gone but as of today, the two ephemeral nodes
> are still present
>
> Troubleshooting:
> 1) Lists the outstanding sessions and ephemeral nodes
> $ echo dump | nc $SERVER1 2181
> SessionTracker dump:
> org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
> ephemeral nodes dump:
> Sessions with Ephemerals (2):
> 0x162183ea9f70003:
>         /brokers/ids/822712429
> 0x162183ea9f70002:
>         /brokers/ids/707577499
>         /controller
>
> 2) stat on /brokers/ids/822712429
> zk> stat /brokers/ids/822712429
> czxid: 4294967344
> mzxid: 4294967344
> pzxid: 4294967344
> ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> version: 0
> cversion: 0
> aversion: 0
> owner: 99668799174148099
> datalen: 102
> children: 0
>
> 3) List full connection/session details for all clients connected
> $ echo cons | nc $SERVER1 2181
>  /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
>  /10.248.88.235:40430[1](queued=0,recved=345,sent=345,
> sid=0x162183ea9f70c22,lop=PING,est=1522713395028,to=
> 40000,lcxid=0x12,lzxid=0xffffffffffffffff,lresp=
> 1522717802117,llat=0,minlat=0,avglat=0,maxlat=31)
>
> $ echo cons | nc $SERVER2 2181
>  /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
>  /10.247.114.69:42679[1](queued=0,recved=73800,sent=
> 73800,sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=
> 9000,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,minlat=0,
> avglat=0,maxlat=1235)
>
> 4) health
> $ echo mntr | nc $SERVER1 2181
> zk_version      3.4.6-1569965, built on 02/20/2014 09:09 GMT
> zk_avg_latency  0
> zk_max_latency  443
> zk_min_latency  0
> zk_packets_received     11158019
> zk_packets_sent 11158244
> zk_num_alive_connections        2
> zk_outstanding_requests 0
> zk_server_state follower
> zk_znode_count  344
> zk_watch_count  0
> zk_ephemerals_count     3
> zk_approximate_data_size        36654
> zk_open_file_descriptor_count   33
> zk_max_file_descriptor_count    65536
>
> 5) Could not find any special exception from zookeeper logs about the two
> sessions
>
> Is this a known bug in version 3.4.6? what could be the potential cause of
> the issue?
>
> Thanks,
> Daniel
>