You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Denis SALGON (contractor)" <De...@amadeus.com> on 2018/10/04 12:16:31 UTC

kafka/zookeepeer disconnection/reconnection

CONFIDENTIAL & RESTRICTED

Hi Everybody,

I'm facing a problem I don't manage to find an explanation.
I've got ian openshift pod with kafka 0.10.2.1 and a zookeeper 3.4.9. We have only one topic feeded by  a kafka connector running in another pod that sends around 21 messages per second.
For some reasons, for the moment, the topic has only one partition and there's only one broker. The 3 specific kafka connector topics has been well created except for the number of replication because of the number of broker.

In kafka logs we have often (around 15 times a day) this kind incident (kafka seems to loose hos connection to zookeeper):
[ 2018-10-02 06:56:03,404] WARN Client session timed out, have not heard from server in 4036ms for sessionid 0x165fc77d1ae002e (org.apache.zookeeper.ClientCnxn)
[2018-10-02 06:56:03,431] INFO Client session timed out, have not heard from server in 4036ms for sessionid 0x165fc77d1ae002e, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2018-10-02 06:56:05,555] INFO Session establishment complete on server zookeeper-inval-cluster.default.svc.cluster.local/10.224.32.63:2181, sessionid = 0x165fc77d1ae002e, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2018-10-02 09:44:58,831] WARN Client session timed out, have not heard from server in 7070ms for sessionid 0x165fc77d1ae002e (org.apache.zookeeper.ClientCnxn)
[2018-10-02 09:44:58,832] INFO Client session timed out, have not heard from server in 7070ms for sessionid 0x165fc77d1ae002e, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2018-10-02 09:45:00,277] WARN Unable to reconnect to ZooKeeper service, session 0x165fc77d1ae002e has expired (org.apache.zookeeper.ClientCnxn)
[2018-10-02 09:45:00,277] INFO Unable to reconnect to ZooKeeper service, session 0x165fc77d1ae002e has expired, closing socket connection (org.apache.zookeeper.ClientCnxn)
[2018-10-02 09:45:01,844] INFO EventThread shut down for session: 0x165fc77d1ae002e (org.apache.zookeeper.ClientCnxn)

On the zookeeper side, we have this logs:
2018-10-02 06:56:03,487 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x165fc77d1ae002e, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
        at java.lang.Thread.run(Thread.java:748)
2018-10-02 06:56:03,515 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /10.225.11.1:37874 which had sessionid 0x165fc77d1ae002e
2018-10-02 06:56:05,518 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /10.225.11.1:34696
2018-10-02 06:56:05,545 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@921] - Client attempting to renew session 0x165fc77d1ae002e at /10.225.11.1:34696
2018-10-02 06:56:05,545 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@673] - Established session 0x165fc77d1ae002e with negotiated timeout 6000 for client /10.225.11.1:34696

Ok, the reconnection works fine and externally, for our application, everything is ok.
But, I wondering if it could be the sign of something under sized or something that's going wrong. I 'googled' the problem of course, but I didn't find any satisfying explanation or solution.
Is there a anybody who faced this kind of behaviour of for giving something to investigate further ?

Anyway, thank you very much for any help or suggestion.

Denis.