You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Maharajan Shunmuga Sundaram (JIRA)" <ji...@apache.org> on 2017/01/15 16:48:26 UTC

[jira] [Commented] (KAFKA-4634) Issue of one kafka brokers not listed in zookeeper

    [ https://issues.apache.org/jira/browse/KAFKA-4634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823181#comment-15823181 ] 

Maharajan Shunmuga Sundaram commented on KAFKA-4634:
----------------------------------------------------

Here is the history of node 211 (ez6) connected to zookeeper

zk.log.1:[2017-01-13 00:23:03,076] INFO Session establishment complete on server cz1/10.254.2.19:2181, sessionid = 0x159963f352b0004, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
zk.log.1:[2017-01-13 00:28:32,492] INFO Session establishment complete on server cz1/10.254.2.19:2181, sessionid = 0x159963f352b0004, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
zk.log.1:[2017-01-13 00:29:15,234] INFO Session establishment complete on server cz2/10.254.2.29:2181, sessionid = 0x159963f352b0004, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
zk.log.1:[2017-01-13 00:29:54,491] INFO Session establishment complete on server cz/10.254.2.1:2182, sessionid = 0x159963f352b0004, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
zk.log.1:[2017-01-13 00:52:24,093] INFO Session establishment complete on server cz1/10.254.2.19:2181, sessionid = 0x159964e1e060013, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
zk.log.1:[2017-01-13 00:57:31,991] INFO Session establishment complete on server cz1/10.254.2.19:2181, sessionid = 0x159966708470000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
zk.log.1:[2017-01-13 00:57:43,908] INFO Session establishment complete on server cz/10.254.2.1:2182, sessionid = 0x159966708470000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
zk.log.1:[2017-01-13 01:05:21,130] INFO Session establishment complete on server cz2/10.254.2.29:2181, sessionid = 0x159966708470000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)

zk.log.1:[2017-01-13 01:05:28,513] INFO Session establishment complete on server cz1/10.254.2.19:2181, sessionid = 0x159966e47760000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
zk.log.1:[2017-01-13 01:35:52,068] INFO Session establishment complete on server cz/10.254.2.1:2182, sessionid = 0x159966e47760000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
zk.log.1:[2017-01-13 01:40:22,562] INFO Session establishment complete on server cz2/10.254.2.29:2181, sessionid = 0x159966e47760000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)

For the last session 0x159966e47760000, ez6 (Node: 211) has connected three times with different zookeepers.

If I see stats on zookeeper, I do see stats for connected 211.
 /10.254.2.211:45260[1](queued=0,recved=186033,sent=186033)

In CZ2,
>>> netstat -tulpan | grep 45260
tcp6       0      0 10.254.2.29:2181       10.254.2.211:45260     ESTABLISHED 3585/java

In EZ6,
>>>  netstat -tulpan | grep 45260
tcp6       0      0 10.254.2.211:45260     10.254.2.29:2181       ESTABLISHED 3684/java

-------

If I see connection
 /10.254.2.211:45260[1](queued=0,recved=186457,sent=186457,sid=0x159966e47760000,lop=PING,est=1484289622561,to=6000,lcxid=0x14740,lzxid=0x70001440d,lresp=1484496803926,llat=1,minlat=0,avglat=0,maxlat=32)

It lists session established time which matches broker log.
1484289622561 - Friday, January 13, 2017 1:40:22 AM

Last response time matches recent time
1484496803926 - Sunday, January 15, 2017 11:13:23 AM

But the session doesn't show up with broker id 211. Not sure what is the issue. Any help is appreciated. 

ls /brokers/ids
[212, 210, 112, 113, 213, 19, 110, 111, 29]

>> echo dump | nc cz2 2181
SessionTracker dump:
Session Sets (4):
0 expire at Sun Jan 15 11:22:40 EST 2017:
0 expire at Sun Jan 15 11:22:42 EST 2017:
8 expire at Sun Jan 15 11:22:44 EST 2017:
        0x259968e41e30000
        0x35996670d5d0000
        0x159968e41dd0000
        0x159966708470004
        0x159966e47760000
        0x159966708470003
        0x2599672df260000
        0x259966708550000
2 expire at Sun Jan 15 11:22:46 EST 2017:
        0x35996670d5d0001
        0x259966708550001
ephemeral nodes dump:
Sessions with Ephemerals (9):
0x259966708550000:
        /brokers/ids/112
0x259968e41e30000:
        /brokers/ids/213
0x159968e41dd0000:
        /brokers/ids/19
0x159966708470003:
        /brokers/ids/110
0x35996670d5d0000:
        /brokers/ids/113
        /controller
0x259966708550001:
        /brokers/ids/111
0x159966708470004:
        /brokers/ids/212
0x2599672df260000:
        /brokers/ids/29
0x35996670d5d0001:
        /brokers/ids/210

> Issue of one kafka brokers not listed in zookeeper
> --------------------------------------------------
>
>                 Key: KAFKA-4634
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4634
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.2.1
>         Environment: Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
> kafka_2.10-0.8.2.1
>            Reporter: Maharajan Shunmuga Sundaram
>
> Hi,
> We have incident that one of the 10 brokers not listed in brokers list of zookeeper.
> This is verified by running following command
> >> echo dump | nc cz2 2181
> SessionTracker dump:
> Session Sets (4):
> 0 expire at Fri Jan 13 22:32:14 EST 2017:
> 0 expire at Fri Jan 13 22:32:16 EST 2017:
> 7 expire at Fri Jan 13 22:32:18 EST 2017:
>         0x259968e41e30000
>         0x35996670d5d0001
>         0x35996670d5d0000
>         0x159966708470004
>         0x159966e47760000
>         0x159966708470003
>         0x2599672df260000
> 3 expire at Fri Jan 13 22:32:20 EST 2017:
>         0x159968e41dd0000
>         0x259966708550001
>         0x259966708550000
> ephemeral nodes dump:
> Sessions with Ephemerals (9):
> 0x259966708550000:
>         /brokers/ids/112
> 0x259968e41e30000:
>         /brokers/ids/213
> 0x159968e41dd0000:
>         /brokers/ids/19
> 0x159966708470003:
>         /brokers/ids/110
> 0x35996670d5d0000:
>         /brokers/ids/113
>         /controller
> 0x259966708550001:
>         /brokers/ids/111
> 0x159966708470004:
>         /brokers/ids/212
> 0x2599672df260000:
>         /brokers/ids/29
> 0x35996670d5d0001:
>         /brokers/ids/210
> ------
> There are 10 sessions, but only 9 sessions are listed with brokers.
> Broker with id 211 is not listed. Session 0x159966e47760000 is not shown with broker id 211.
> In the broker side log, I do see it is connected
> >> zgrep "0x159966e47760000" *log*
>  
> zk.log:[2017-01-13 01:05:28,513] INFO Session establishment complete on server cz1/10.254.2.19:2181, sessionid = 0x159966e47760000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:38,163] INFO Unable to read additional data from server sessionid 0x159966e47760000, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:39,101] WARN Session 0x159966e47760000 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:40,121] INFO Unable to read additional data from server sessionid 0x159966e47760000, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:41,770] WARN Session 0x159966e47760000 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:42,439] WARN Session 0x159966e47760000 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:43,235] INFO Unable to read additional data from server sessionid 0x159966e47760000, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:44,950] WARN Session 0x159966e47760000 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:35:45,837] INFO Unable to read additional data from server sessionid 0x159966e47760000, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> .
> .
> .
> .
> zk.log:[2017-01-13 01:40:14,818] INFO Unable to read additional data from server sessionid 0x159966e47760000, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:40:15,916] WARN Session 0x159966e47760000 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:40:19,692] INFO Client session timed out, have not heard from server in 3676ms for sessionid 0x159966e47760000, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:40:20,632] INFO Unable to read additional data from server sessionid 0x159966e47760000, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:40:20,814] WARN Session 0x159966e47760000 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:40:22,089] WARN Session 0x159966e47760000 for server null, unexpected error, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> zk.log:[2017-01-13 01:40:22,562] INFO Session establishment complete on server cz2/10.254.2.29:2181, sessionid = 0x159966e47760000, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
> After several tries, broker connected with zookeeper cz2:2181. I am not sure how to debug this issue. It would be helpful if someone provides way to debug this issue.
> Regards,
> Maharajan S



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)