You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Aishwarya Soni (JIRA)" <ji...@apache.org> on 2018/10/11 03:42:00 UTC

[jira] [Comment Edited] (ZOOKEEPER-3036) Unexpected exception in zookeeper

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645913#comment-16645913 ] 

Aishwarya Soni edited comment on ZOOKEEPER-3036 at 10/11/18 3:41 AM:
---------------------------------------------------------------------

We got the same issue a couple of days back. We are running zookeeper in a containerized AWS environment and we had to restart the problem container to get the above issue resolved. The issue comes when the port binding doesn't happen. When the container becomes unhealthy, it doesn't release the port and when it tries to bind to that port to join the quorum, as the port was already in use and never released, it throws the exception of *Unexpected exception causing shutdown while sock still open*

This is where the binding happens, QuorumCnxManager class in zookeeper,
*ss.socket().bind(new InetSocketAddress(port));*

In LearnerHandler.java class, it tries to access the port and as the port is still being used, it throws the exception**

*if (sock != null && !sock.isClosed()) {LOG.error("Unexpected exception causing shutdown while sock "+ "still open", e);*

Most of the cases, the port might not be null.


was (Author: ashishsoni1991@yahoo.co.in):
We got the same issue a couple of days back. We are running zookeeper in a containerized AWS environment and we had to restart the problem container to get the above issue resolved. The issue comes when the port binding doesn't happen. When the container becomes unhealthy, it doesn't release the port and when it tries to bind to that port to join the quorum, as the port was already in use and never released, it throws the exception of *Unexpected exception causing shutdown while sock still open*

> Unexpected exception in zookeeper
> ---------------------------------
>
>                 Key: ZOOKEEPER-3036
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.4.10
>         Environment: 3 Zookeepers, 5 kafka servers
>            Reporter: Oded
>            Priority: Critical
>
> We got an issue with one of the zookeeprs (Leader), causing the entire kafka cluster to fail:
> 2018-05-09 02:29:01,730 [myid:3] - ERROR [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>         at java.net.SocketInputStream.read(SocketInputStream.java:171)
>         at java.net.SocketInputStream.read(SocketInputStream.java:141)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>         at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
> 2018-05-09 02:29:01,730 [myid:3] - WARN  [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - ******* GOODBYE /192.168.0.91:42490 ********
>  
> We would expect that zookeeper will choose another Leader and the Kafka cluster will continue to work as expected, but that was not the case.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)