You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "karl (Jira)" <ji...@apache.org> on 2021/04/29 03:43:00 UTC
[jira] [Created] (ZOOKEEPER-4288) zkServer runok responese 'imok'
but any request(include zkCli.sh) responese 'client session timed out'
karl created ZOOKEEPER-4288:
-------------------------------
Summary: zkServer runok responese 'imok' but any request(include zkCli.sh) responese 'client session timed out'
Key: ZOOKEEPER-4288
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4288
Project: ZooKeeper
Issue Type: Bug
Components: server
Affects Versions: 3.5.5
Reporter: karl
Attachments: itoa-zookeeper-dp1-5bc799c97f-t7wpz.jstack.txt, itoa-zookeeper-dp1-5bc799c97f-t7wpz.log, itoa-zookeeper-dp2-6d89756876-k2ds2.log
I'm running Zookeeper from the 3.5.5 docker image in k8s. 3 pod are located on three nodes,which communicates with each other with k8s cluset service. I did the following operates: shut down node 2 for 3 hours and then joined the cluster. At this time, I found that zookeeper could not respond to*{color:#FF0000} any requests(include zkCli.sh){color}*. I used the runok and mntr commands to test the server, and only found one exception: outstanding_requests is abnormally high and keeps growing (I have a Kafka cluster using it)
my cluster:
{code:java}
//my zookeeper pods
[root@node1 ~]# kubectl get po -nsa -o wide | grep zookeeper-dp
itoa-zookeeper-dp1-5bc799c97f-t7wpz 1/1 Running 0 17h 177.177.166.244 node1 <none> <none>
itoa-zookeeper-dp2-6d89756876-k2ds2 1/1 Running 0 13h 177.177.104.35 node2 <none> <none>
itoa-zookeeper-dp3-b55ddd6c7-8h5p7 1/1 Running 0 16h 177.177.135.54 node3 <none> <none>
//my zookeeper cluster svc
[root@node1 ~]# kubectl get svc -nsa -o wide | grep zookeeper-dp 44h node=itoa-zookeeper-dp3
itoa-zookeeper-service1 ClusterIP 10.96.235.75 <none> 2181/TCP,2888/TCP,3888/TCP 44h node=itoa-zookeeper-dp1
itoa-zookeeper-service2 ClusterIP 10.96.161.122 <none> 2181/TCP,2888/TCP,3888/TCP 44h node=itoa-zookeeper-dp2
itoa-zookeeper-service3 ClusterIP 10.96.67.1 <none> 2181/TCP,2888/TCP,3888/TCP 44h node=itoa-zookeeper-dp3{code}
my cluster conf:
{code:java}
[root@node1 ~]# echo conf | nc 177.177.166.244 2181
clientPort=2181
secureClientPort=-1
dataDir=/opt/zookeeper/data/version-2
dataDirSize=135540411
dataLogDir=/opt/zookeeper/data/version-2
dataLogSize=135540411
tickTime=2000
maxClientCnxns=60
minSessionTimeout=4000
maxSessionTimeout=40000
serverId=1
initLimit=10
syncLimit=5
electionAlg=3
electionPort=3888
quorumPort=2888
peerType=0
membership:
server.1=0.0.0.0:2888:3888:participant
server.2=itoa-zookeeper-service2:2888:3888:participant
server.3=itoa-zookeeper-service3:2888:3888:participant
version=0
{code}
runok & mntr response
{code:java}
//177.177.166.244
[root@node1 ~]# echo ruok | nc 177.177.166.244 2181 && echo mntr | nc 177.177.166.244 2181
imokzk_version 3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built on 05/03/2019 12:07 GMT
zk_avg_latency 2
zk_max_latency 2
zk_min_latency 2
zk_packets_received 740
zk_packets_sent 105
zk_num_alive_connections 1
zk_outstanding_requests 634
zk_server_state follower
zk_znode_count 8092
zk_watch_count 0
zk_ephemerals_count 5
zk_approximate_data_size 708815
zk_open_file_descriptor_count 100
zk_max_file_descriptor_count 1048576
//177.177.135.54
[root@node1 ~]# echo ruok | nc 177.177.135.54 2181 && echo mntr | nc 177.177.135.54 2181
imokzk_version 3.5.5-390fe37ea45dee01bf87dc1c042b5e3dcce88653, built on 05/03/2019 12:07 GMT
zk_avg_latency 0
zk_max_latency 0
zk_min_latency 0
zk_packets_received 727
zk_packets_sent 107
zk_num_alive_connections 1
zk_outstanding_requests 619
zk_server_state leader
zk_znode_count 8092
zk_watch_count 0
zk_ephemerals_count 5
zk_approximate_data_size 708815
zk_open_file_descriptor_count 101
zk_max_file_descriptor_count 1048576
zk_followers 1
zk_synced_followers 1
zk_pending_syncs 0
zk_last_proposal_size 32
zk_max_proposal_size 162
zk_min_proposal_size 32
{code}
zkCli..sh responese 'client session timed out'
{code:java}
[zk: itoa-zookeeper-service1:2181,itoa-zookeeper-service2:2181,itoa-zookeeper-service3:2181(CONNECTING) 0] ls /
2021-04-29 11:08:22,459 [myid:itoa-zookeeper-service3:2181] - WARN [main-SendThread(itoa-zookeeper-service3:2181):ClientCnxn$SendThread@1190] - Client session timed out, have not heard from server in 10007ms for sessionid 0x0
2021-04-29 11:08:22,459 [myid:itoa-zookeeper-service3:2181] - INFO [main-SendThread(itoa-zookeeper-service3:2181):ClientCnxn$SendThread@1238] - Client session timed out, have not heard from server in 10007ms for sessionid 0x0, closing socket connection and attempting reconnect
KeeperErrorCode = ConnectionLoss for /
[zk: itoa-zookeeper-service1:2181,itoa-zookeeper-service2:2181,itoa-zookeeper-service3:2181(CONNECTING) 1] 2021-04-29 11:08:22,743 [myid:itoa-zookeeper-service2:2181] - INFO [main-SendThread(itoa-zookeeper-service2:2181):ClientCnxn$SendThread@1112] - Opening socket connection to server itoa-zookeeper-service2/10.96.161.122:2181. Will not attempt to authenticate using SASL (unknown error)
2021-04-29 11:08:22,744 [myid:itoa-zookeeper-service2:2181] - INFO [main-SendThread(itoa-zookeeper-service2:2181):ClientCnxn$SendThread@959] - Socket connection established, initiating session, client: /177.177.166.244:45456, server: itoa-zookeeper-service2/10.96.161.122:2181
2021-04-29 11:08:22,747 [myid:itoa-zookeeper-service2:2181] - INFO [main-SendThread(itoa-zookeeper-service2:2181):ClientCnxn$SendThread@1240] - Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2021-04-29 11:08:23,761 [myid:itoa-zookeeper-service1:2181] - INFO [main-SendThread(itoa-zookeeper-service1:2181):ClientCnxn$SendThread@1112] - Opening socket connection to server itoa-zookeeper-service1/10.96.235.75:2181. Will not attempt to authenticate using SASL (unknown error)
2021-04-29 11:08:23,762 [myid:itoa-zookeeper-service1:2181] - INFO [main-SendThread(itoa-zookeeper-service1:2181):ClientCnxn$SendThread@959] - Socket connection established, initiating session, client: /177.177.166.244:42428, server: itoa-zookeeper-service1/10.96.235.75:2181
{code}
I shutted down node2 at 2021-04-28 17:12,uped at 20:40. my pod logs see in the attachment
These other information I collected:
* The tcp link information of one of the pods(no abnormalities in 3 pods):
{code:java}
[root@node1 ~]# nsenter -n -t $(docker inspect -f '{{.State.Pid}}' $(docker ps | grep zookeeper-dp1 | grep -v pause | awk '{print $1}')) netstat -plan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:35883 0.0.0.0:* LISTEN 109187/java
tcp 0 0 0.0.0.0:3888 0.0.0.0:* LISTEN 109187/java
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 109187/java
tcp 0 0 0.0.0.0:2181 0.0.0.0:* LISTEN 109187/java
tcp 0 0 177.177.166.244:3888 177.177.104.35:57886 ESTABLISHED 109187/java
tcp 0 0 177.177.166.244:42204 10.96.67.1:2888 ESTABLISHED 109187/java
tcp 0 0 177.177.166.244:3888 177.177.135.54:44462 ESTABLISHED 109187/java
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ACC ] STREAM LISTENING 2342381971 109187/java /tmp/.java_pid1.tmp
unix 2 [ ] STREAM CONNECTED 2243039212 109187/java
unix 2 [ ] STREAM CONNECTED 2243034965 109187/java
{code}
* Jstack information of one of the pods (see attachment,The other two pods have similar information), I suspect that a large number of waiting on condition threads are messing up
--
This message was sent by Atlassian Jira
(v8.3.4#803005)