You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Qian Zhang <zh...@gmail.com> on 2019/05/19 10:15:39 UTC

Why does ZooKeeper follower shutdown itself when it can not read from leader

Hi,

I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
connected due to a hardware issue, and then I found the 4 followers just
shutdown, here is the logs:

> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> following the leader
>                                       java.net.SocketTimeoutException:
> Read timed out
>                                         at
> java.net.SocketInputStream.socketRead0(Native Method)
>                                         at
> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>                                         at
> java.net.SocketInputStream.read(SocketInputStream.java:171)
>                                         at
> java.net.SocketInputStream.read(SocketInputStream.java:141)
>                                         at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>                                         at
> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>                                         at
> java.io.DataInputStream.readInt(DataInputStream.java:387)
>                                         at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>                                         at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>                                         at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>                                         at
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>                                         at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>                                         at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connectio
> n from /10.249.255.10:42306
> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] - Connection request from old cl
> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - Client attempting to establish
>  new session at /10.249.255.10:42306
> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
> unrecoverable error, from threa
> d : FollowerRequestProcessor:1
>                                       java.net.SocketException: Socket
> closed
>                                         at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
>                                         at
> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
>                                         at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>                                         at
> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>                                         at
> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
>                                         at
> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
>                                         at
> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
>                                       java.lang.Exception: shutdown
> Follower
>                                         at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>                                         at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)


I am confused why all followers shutdown in this case which makes the whole
ZooKeeper unusable for a short period, shouldn't they elect a new leader
instead? Thanks!


Regards,
Qian Zhang

Re: Why does ZooKeeper follower shutdown itself when it can not read from leader

Posted by Andor Molnar <an...@cloudera.com.INVALID>.
Hi Qian,

Which version of ZooKeeper are you using?
Would you please share the config files and leader logs too?
Also looks like you're trying to connect with an older client:
>>> Connection request from old client /10.249.255.10:42306; will be
dropped if server is in r-o mode

Andor



On Wed, May 22, 2019 at 2:52 AM Qian Zhang <zh...@gmail.com> wrote:

> Anyone has any ideas?
>
> Regards,
> Qian Zhang
>
>
> On Sun, May 19, 2019 at 6:15 PM Qian Zhang <zh...@gmail.com> wrote:
>
> > Hi,
> >
> > I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
> > connected due to a hardware issue, and then I found the 4 followers just
> > shutdown, here is the logs:
> >
> >> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
> >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> >> following the leader
> >>                                       java.net.SocketTimeoutException:
> >> Read timed out
> >>                                         at
> >> java.net.SocketInputStream.socketRead0(Native Method)
> >>                                         at
> >> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> >>                                         at
> >> java.net.SocketInputStream.read(SocketInputStream.java:171)
> >>                                         at
> >> java.net.SocketInputStream.read(SocketInputStream.java:141)
> >>                                         at
> >> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> >>                                         at
> >> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> >>                                         at
> >> java.io.DataInputStream.readInt(DataInputStream.java:387)
> >>                                         at
> >> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> >>                                         at
> >>
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> >>                                         at
> >>
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
> >>                                         at
> >> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> >>                                         at
> >>
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> >>                                         at
> >> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
> >> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
> >> Accepted socket connectio
> >> n from /10.249.255.10:42306
> >> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
> >> Connection request from old cl
> >> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
> >> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] -
> Client
> >> attempting to establish
> >>  new session at /10.249.255.10:42306
> >> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
> >> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
> >> unrecoverable error, from threa
> >> d : FollowerRequestProcessor:1
> >>                                       java.net.SocketException: Socket
> >> closed
> >>                                         at
> >> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
> >>                                         at
> >> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> >>                                         at
> >> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> >>                                         at
> >> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> >>                                         at
> >> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
> >>                                         at
> >> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
> >>                                         at
> >>
> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
> >> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown
> called
> >>                                       java.lang.Exception: shutdown
> >> Follower
> >>                                         at
> >> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> >>                                         at
> >> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
> >
> >
> > I am confused why all followers shutdown in this case which makes the
> > whole ZooKeeper unusable for a short period, shouldn't they elect a new
> > leader instead? Thanks!
> >
> >
> > Regards,
> > Qian Zhang
> >
>

Re: Why does ZooKeeper follower shutdown itself when it can not read from leader

Posted by Qian Zhang <zh...@gmail.com>.
I see, thank you Patrick!


Regards,
Qian Zhang


On Thu, May 23, 2019 at 9:26 AM Qian Zhang <zh...@gmail.com> wrote:

> Hi Andor,
>
> I am using ZooKeeper release 3.4.10.
>
> I checked the code, if follower fails to read from leader (e.g., read
> timeout), it will close the socket, see
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/Follower.java#L91:L85 for
> details. And once the socket is close, it will make follower fails to write
> (I guess same socket is used here) which will be treated as an severe
> unrecoverable error, and then shutdown follower, see
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java#L90:L95
>  and
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/ZooKeeperCriticalThread.java#L48:L51
> .
>
> So it seems shutting down follower when it cannot read from leader is the
> design behavior? Or if my understanding is wrong can you please let me know
> the design behavior in this case? Thanks!
>
>
> Regards,
> Qian Zhang
>
>
> On Wed, May 22, 2019 at 8:52 AM Qian Zhang <zh...@gmail.com> wrote:
>
>> Anyone has any ideas?
>>
>> Regards,
>> Qian Zhang
>>
>>
>> On Sun, May 19, 2019 at 6:15 PM Qian Zhang <zh...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
>>> connected due to a hardware issue, and then I found the 4 followers just
>>> shutdown, here is the logs:
>>>
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>>>> following the leader
>>>>                                       java.net.SocketTimeoutException:
>>>> Read timed out
>>>>                                         at
>>>> java.net.SocketInputStream.socketRead0(Native Method)
>>>>                                         at
>>>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>>>                                         at
>>>> java.net.SocketInputStream.read(SocketInputStream.java:171)
>>>>                                         at
>>>> java.net.SocketInputStream.read(SocketInputStream.java:141)
>>>>                                         at
>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>>>>                                         at
>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>>>>                                         at
>>>> java.io.DataInputStream.readInt(DataInputStream.java:387)
>>>>                                         at
>>>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>>>>                                         at
>>>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
>>>> Accepted socket connectio
>>>> n from /10.249.255.10:42306
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
>>>> Connection request from old cl
>>>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] -
>>>> Client attempting to establish
>>>>  new session at /10.249.255.10:42306
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
>>>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
>>>> unrecoverable error, from threa
>>>> d : FollowerRequestProcessor:1
>>>>                                       java.net.SocketException: Socket
>>>> closed
>>>>                                         at
>>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
>>>>                                         at
>>>> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
>>>>                                         at
>>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>>>                                         at
>>>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown
>>>> called
>>>>                                       java.lang.Exception: shutdown
>>>> Follower
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
>>>
>>>
>>> I am confused why all followers shutdown in this case which makes the
>>> whole ZooKeeper unusable for a short period, shouldn't they elect a new
>>> leader instead? Thanks!
>>>
>>>
>>> Regards,
>>> Qian Zhang
>>>
>>

Re: Why does ZooKeeper follower shutdown itself when it can not read from leader

Posted by Patrick Hunt <ph...@apache.org>.
That was/is the original intent.  ZK was built to "fail fast" when it
didn't know how to handle a particular case, or that case might be error
prone to handle. The expectation is that the parent will restart the ZK
server process when it fails.

Patrick

On Wed, May 22, 2019 at 6:27 PM Qian Zhang <zh...@gmail.com> wrote:

> Hi Andor,
>
> I am using ZooKeeper release 3.4.10.
>
> I checked the code, if follower fails to read from leader (e.g., read
> timeout), it will close the socket, see
>
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/Follower.java#L91:L85
> for
> details. And once the socket is close, it will make follower fails to write
> (I guess same socket is used here) which will be treated as an severe
> unrecoverable error, and then shutdown follower, see
>
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java#L90:L95
>  and
>
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/ZooKeeperCriticalThread.java#L48:L51
> .
>
> So it seems shutting down follower when it cannot read from leader is the
> design behavior? Or if my understanding is wrong can you please let me know
> the design behavior in this case? Thanks!
>
>
> Regards,
> Qian Zhang
>
>
> On Wed, May 22, 2019 at 8:52 AM Qian Zhang <zh...@gmail.com> wrote:
>
> > Anyone has any ideas?
> >
> > Regards,
> > Qian Zhang
> >
> >
> > On Sun, May 19, 2019 at 6:15 PM Qian Zhang <zh...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
> >> connected due to a hardware issue, and then I found the 4 followers just
> >> shutdown, here is the logs:
> >>
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
> >>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> >>> following the leader
> >>>                                       java.net.SocketTimeoutException:
> >>> Read timed out
> >>>                                         at
> >>> java.net.SocketInputStream.socketRead0(Native Method)
> >>>                                         at
> >>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> >>>                                         at
> >>> java.net.SocketInputStream.read(SocketInputStream.java:171)
> >>>                                         at
> >>> java.net.SocketInputStream.read(SocketInputStream.java:141)
> >>>                                         at
> >>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> >>>                                         at
> >>> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> >>>                                         at
> >>> java.io.DataInputStream.readInt(DataInputStream.java:387)
> >>>                                         at
> >>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> >>>                                         at
> >>>
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> >>>                                         at
> >>>
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> >>>                                         at
> >>>
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
> >>> Accepted socket connectio
> >>> n from /10.249.255.10:42306
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
> >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
> >>> Connection request from old cl
> >>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] -
> >>> Client attempting to establish
> >>>  new session at /10.249.255.10:42306
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
> >>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
> >>> unrecoverable error, from threa
> >>> d : FollowerRequestProcessor:1
> >>>                                       java.net.SocketException: Socket
> >>> closed
> >>>                                         at
> >>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
> >>>                                         at
> >>> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> >>>                                         at
> >>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> >>>                                         at
> >>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> >>>                                         at
> >>>
> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
> >>>                                         at
> >>>
> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown
> called
> >>>                                       java.lang.Exception: shutdown
> >>> Follower
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
> >>
> >>
> >> I am confused why all followers shutdown in this case which makes the
> >> whole ZooKeeper unusable for a short period, shouldn't they elect a new
> >> leader instead? Thanks!
> >>
> >>
> >> Regards,
> >> Qian Zhang
> >>
> >
>

Re: Why does ZooKeeper follower shutdown itself when it can not read from leader

Posted by Patrick Hunt <ph...@apache.org>.
That was/is the original intent.  ZK was built to "fail fast" when it
didn't know how to handle a particular case, or that case might be error
prone to handle. The expectation is that the parent will restart the ZK
server process when it fails.

Patrick

On Wed, May 22, 2019 at 6:27 PM Qian Zhang <zh...@gmail.com> wrote:

> Hi Andor,
>
> I am using ZooKeeper release 3.4.10.
>
> I checked the code, if follower fails to read from leader (e.g., read
> timeout), it will close the socket, see
>
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/Follower.java#L91:L85
> for
> details. And once the socket is close, it will make follower fails to write
> (I guess same socket is used here) which will be treated as an severe
> unrecoverable error, and then shutdown follower, see
>
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java#L90:L95
>  and
>
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/ZooKeeperCriticalThread.java#L48:L51
> .
>
> So it seems shutting down follower when it cannot read from leader is the
> design behavior? Or if my understanding is wrong can you please let me know
> the design behavior in this case? Thanks!
>
>
> Regards,
> Qian Zhang
>
>
> On Wed, May 22, 2019 at 8:52 AM Qian Zhang <zh...@gmail.com> wrote:
>
> > Anyone has any ideas?
> >
> > Regards,
> > Qian Zhang
> >
> >
> > On Sun, May 19, 2019 at 6:15 PM Qian Zhang <zh...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
> >> connected due to a hardware issue, and then I found the 4 followers just
> >> shutdown, here is the logs:
> >>
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
> >>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> >>> following the leader
> >>>                                       java.net.SocketTimeoutException:
> >>> Read timed out
> >>>                                         at
> >>> java.net.SocketInputStream.socketRead0(Native Method)
> >>>                                         at
> >>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> >>>                                         at
> >>> java.net.SocketInputStream.read(SocketInputStream.java:171)
> >>>                                         at
> >>> java.net.SocketInputStream.read(SocketInputStream.java:141)
> >>>                                         at
> >>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> >>>                                         at
> >>> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> >>>                                         at
> >>> java.io.DataInputStream.readInt(DataInputStream.java:387)
> >>>                                         at
> >>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> >>>                                         at
> >>>
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> >>>                                         at
> >>>
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> >>>                                         at
> >>>
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
> >>> Accepted socket connectio
> >>> n from /10.249.255.10:42306
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
> >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
> >>> Connection request from old cl
> >>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] -
> >>> Client attempting to establish
> >>>  new session at /10.249.255.10:42306
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
> >>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
> >>> unrecoverable error, from threa
> >>> d : FollowerRequestProcessor:1
> >>>                                       java.net.SocketException: Socket
> >>> closed
> >>>                                         at
> >>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
> >>>                                         at
> >>> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
> >>>                                         at
> >>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> >>>                                         at
> >>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> >>>                                         at
> >>>
> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
> >>>                                         at
> >>>
> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
> >>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
> >>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown
> called
> >>>                                       java.lang.Exception: shutdown
> >>> Follower
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> >>>                                         at
> >>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
> >>
> >>
> >> I am confused why all followers shutdown in this case which makes the
> >> whole ZooKeeper unusable for a short period, shouldn't they elect a new
> >> leader instead? Thanks!
> >>
> >>
> >> Regards,
> >> Qian Zhang
> >>
> >
>

Re: Why does ZooKeeper follower shutdown itself when it can not read from leader

Posted by Qian Zhang <zh...@gmail.com>.
I see, thank you Patrick!


Regards,
Qian Zhang


On Thu, May 23, 2019 at 9:26 AM Qian Zhang <zh...@gmail.com> wrote:

> Hi Andor,
>
> I am using ZooKeeper release 3.4.10.
>
> I checked the code, if follower fails to read from leader (e.g., read
> timeout), it will close the socket, see
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/Follower.java#L91:L85 for
> details. And once the socket is close, it will make follower fails to write
> (I guess same socket is used here) which will be treated as an severe
> unrecoverable error, and then shutdown follower, see
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java#L90:L95
>  and
> https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/ZooKeeperCriticalThread.java#L48:L51
> .
>
> So it seems shutting down follower when it cannot read from leader is the
> design behavior? Or if my understanding is wrong can you please let me know
> the design behavior in this case? Thanks!
>
>
> Regards,
> Qian Zhang
>
>
> On Wed, May 22, 2019 at 8:52 AM Qian Zhang <zh...@gmail.com> wrote:
>
>> Anyone has any ideas?
>>
>> Regards,
>> Qian Zhang
>>
>>
>> On Sun, May 19, 2019 at 6:15 PM Qian Zhang <zh...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
>>> connected due to a hardware issue, and then I found the 4 followers just
>>> shutdown, here is the logs:
>>>
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>>>> following the leader
>>>>                                       java.net.SocketTimeoutException:
>>>> Read timed out
>>>>                                         at
>>>> java.net.SocketInputStream.socketRead0(Native Method)
>>>>                                         at
>>>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>>>                                         at
>>>> java.net.SocketInputStream.read(SocketInputStream.java:171)
>>>>                                         at
>>>> java.net.SocketInputStream.read(SocketInputStream.java:141)
>>>>                                         at
>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>>>>                                         at
>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>>>>                                         at
>>>> java.io.DataInputStream.readInt(DataInputStream.java:387)
>>>>                                         at
>>>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>>>>                                         at
>>>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
>>>> Accepted socket connectio
>>>> n from /10.249.255.10:42306
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
>>>> Connection request from old cl
>>>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] -
>>>> Client attempting to establish
>>>>  new session at /10.249.255.10:42306
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
>>>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
>>>> unrecoverable error, from threa
>>>> d : FollowerRequestProcessor:1
>>>>                                       java.net.SocketException: Socket
>>>> closed
>>>>                                         at
>>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
>>>>                                         at
>>>> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
>>>>                                         at
>>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>>>                                         at
>>>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
>>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown
>>>> called
>>>>                                       java.lang.Exception: shutdown
>>>> Follower
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>>>>                                         at
>>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
>>>
>>>
>>> I am confused why all followers shutdown in this case which makes the
>>> whole ZooKeeper unusable for a short period, shouldn't they elect a new
>>> leader instead? Thanks!
>>>
>>>
>>> Regards,
>>> Qian Zhang
>>>
>>

Re: Why does ZooKeeper follower shutdown itself when it can not read from leader

Posted by Qian Zhang <zh...@gmail.com>.
Hi Andor,

I am using ZooKeeper release 3.4.10.

I checked the code, if follower fails to read from leader (e.g., read
timeout), it will close the socket, see
https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/Follower.java#L91:L85
for
details. And once the socket is close, it will make follower fails to write
(I guess same socket is used here) which will be treated as an severe
unrecoverable error, and then shutdown follower, see
https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java#L90:L95
 and
https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/ZooKeeperCriticalThread.java#L48:L51
.

So it seems shutting down follower when it cannot read from leader is the
design behavior? Or if my understanding is wrong can you please let me know
the design behavior in this case? Thanks!


Regards,
Qian Zhang


On Wed, May 22, 2019 at 8:52 AM Qian Zhang <zh...@gmail.com> wrote:

> Anyone has any ideas?
>
> Regards,
> Qian Zhang
>
>
> On Sun, May 19, 2019 at 6:15 PM Qian Zhang <zh...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
>> connected due to a hardware issue, and then I found the 4 followers just
>> shutdown, here is the logs:
>>
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>>> following the leader
>>>                                       java.net.SocketTimeoutException:
>>> Read timed out
>>>                                         at
>>> java.net.SocketInputStream.socketRead0(Native Method)
>>>                                         at
>>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>>                                         at
>>> java.net.SocketInputStream.read(SocketInputStream.java:171)
>>>                                         at
>>> java.net.SocketInputStream.read(SocketInputStream.java:141)
>>>                                         at
>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>>>                                         at
>>> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>>>                                         at
>>> java.io.DataInputStream.readInt(DataInputStream.java:387)
>>>                                         at
>>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>>>                                         at
>>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
>>> Accepted socket connectio
>>> n from /10.249.255.10:42306
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
>>> Connection request from old cl
>>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] -
>>> Client attempting to establish
>>>  new session at /10.249.255.10:42306
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
>>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
>>> unrecoverable error, from threa
>>> d : FollowerRequestProcessor:1
>>>                                       java.net.SocketException: Socket
>>> closed
>>>                                         at
>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
>>>                                         at
>>> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
>>>                                         at
>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>>                                         at
>>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
>>>                                       java.lang.Exception: shutdown
>>> Follower
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
>>
>>
>> I am confused why all followers shutdown in this case which makes the
>> whole ZooKeeper unusable for a short period, shouldn't they elect a new
>> leader instead? Thanks!
>>
>>
>> Regards,
>> Qian Zhang
>>
>

Re: Why does ZooKeeper follower shutdown itself when it can not read from leader

Posted by Qian Zhang <zh...@gmail.com>.
Hi Andor,

I am using ZooKeeper release 3.4.10.

I checked the code, if follower fails to read from leader (e.g., read
timeout), it will close the socket, see
https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/Follower.java#L91:L85
for
details. And once the socket is close, it will make follower fails to write
(I guess same socket is used here) which will be treated as an severe
unrecoverable error, and then shutdown follower, see
https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/quorum/FollowerRequestProcessor.java#L90:L95
 and
https://github.com/apache/zookeeper/blob/release-3.4.10/src/java/main/org/apache/zookeeper/server/ZooKeeperCriticalThread.java#L48:L51
.

So it seems shutting down follower when it cannot read from leader is the
design behavior? Or if my understanding is wrong can you please let me know
the design behavior in this case? Thanks!


Regards,
Qian Zhang


On Wed, May 22, 2019 at 8:52 AM Qian Zhang <zh...@gmail.com> wrote:

> Anyone has any ideas?
>
> Regards,
> Qian Zhang
>
>
> On Sun, May 19, 2019 at 6:15 PM Qian Zhang <zh...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
>> connected due to a hardware issue, and then I found the 4 followers just
>> shutdown, here is the logs:
>>
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>>> following the leader
>>>                                       java.net.SocketTimeoutException:
>>> Read timed out
>>>                                         at
>>> java.net.SocketInputStream.socketRead0(Native Method)
>>>                                         at
>>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>>                                         at
>>> java.net.SocketInputStream.read(SocketInputStream.java:171)
>>>                                         at
>>> java.net.SocketInputStream.read(SocketInputStream.java:141)
>>>                                         at
>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>>>                                         at
>>> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>>>                                         at
>>> java.io.DataInputStream.readInt(DataInputStream.java:387)
>>>                                         at
>>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>>>                                         at
>>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
>>> Accepted socket connectio
>>> n from /10.249.255.10:42306
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
>>> Connection request from old cl
>>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] -
>>> Client attempting to establish
>>>  new session at /10.249.255.10:42306
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
>>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
>>> unrecoverable error, from threa
>>> d : FollowerRequestProcessor:1
>>>                                       java.net.SocketException: Socket
>>> closed
>>>                                         at
>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
>>>                                         at
>>> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
>>>                                         at
>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>>                                         at
>>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
>>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
>>>                                       java.lang.Exception: shutdown
>>> Follower
>>>                                         at
>>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>>>                                         at
>>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
>>
>>
>> I am confused why all followers shutdown in this case which makes the
>> whole ZooKeeper unusable for a short period, shouldn't they elect a new
>> leader instead? Thanks!
>>
>>
>> Regards,
>> Qian Zhang
>>
>

Re: Why does ZooKeeper follower shutdown itself when it can not read from leader

Posted by Qian Zhang <zh...@gmail.com>.
Anyone has any ideas?

Regards,
Qian Zhang


On Sun, May 19, 2019 at 6:15 PM Qian Zhang <zh...@gmail.com> wrote:

> Hi,
>
> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
> connected due to a hardware issue, and then I found the 4 followers just
> shutdown, here is the logs:
>
>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>> following the leader
>>                                       java.net.SocketTimeoutException:
>> Read timed out
>>                                         at
>> java.net.SocketInputStream.socketRead0(Native Method)
>>                                         at
>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>                                         at
>> java.net.SocketInputStream.read(SocketInputStream.java:171)
>>                                         at
>> java.net.SocketInputStream.read(SocketInputStream.java:141)
>>                                         at
>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>>                                         at
>> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>>                                         at
>> java.io.DataInputStream.readInt(DataInputStream.java:387)
>>                                         at
>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>>                                         at
>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>>                                         at
>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>>                                         at
>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>>                                         at
>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>>                                         at
>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
>> Accepted socket connectio
>> n from /10.249.255.10:42306
>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
>> Connection request from old cl
>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - Client
>> attempting to establish
>>  new session at /10.249.255.10:42306
>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
>> unrecoverable error, from threa
>> d : FollowerRequestProcessor:1
>>                                       java.net.SocketException: Socket
>> closed
>>                                         at
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
>>                                         at
>> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
>>                                         at
>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>                                         at
>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>>                                         at
>> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
>>                                         at
>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
>>                                         at
>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
>>                                       java.lang.Exception: shutdown
>> Follower
>>                                         at
>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>>                                         at
>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
>
>
> I am confused why all followers shutdown in this case which makes the
> whole ZooKeeper unusable for a short period, shouldn't they elect a new
> leader instead? Thanks!
>
>
> Regards,
> Qian Zhang
>

Re: Why does ZooKeeper follower shutdown itself when it can not read from leader

Posted by Qian Zhang <zh...@gmail.com>.
Anyone has any ideas?

Regards,
Qian Zhang


On Sun, May 19, 2019 at 6:15 PM Qian Zhang <zh...@gmail.com> wrote:

> Hi,
>
> I have a ZooKeeper cluster which has 5 nodes. Today the leader cannot be
> connected due to a hardware issue, and then I found the 4 followers just
> shutdown, here is the logs:
>
>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
>> following the leader
>>                                       java.net.SocketTimeoutException:
>> Read timed out
>>                                         at
>> java.net.SocketInputStream.socketRead0(Native Method)
>>                                         at
>> java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>>                                         at
>> java.net.SocketInputStream.read(SocketInputStream.java:171)
>>                                         at
>> java.net.SocketInputStream.read(SocketInputStream.java:141)
>>                                         at
>> java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>>                                         at
>> java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>>                                         at
>> java.io.DataInputStream.readInt(DataInputStream.java:387)
>>                                         at
>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>>                                         at
>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>>                                         at
>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>>                                         at
>> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>>                                         at
>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>>                                         at
>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:937)
>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] -
>> Accepted socket connectio
>> n from /10.249.255.10:42306
>> May 18 15:34:28 MD001076 java[29148]: [myid:1] WARN
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@896] -
>> Connection request from old cl
>> ient /10.249.255.10:42306; will be dropped if server is in r-o mode
>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@942] - Client
>> attempting to establish
>>  new session at /10.249.255.10:42306
>> May 18 15:34:28 MD001076 java[29148]: [myid:1] ERROR
>> [FollowerRequestProcessor:1:ZooKeeperCriticalThread@49] - Severe
>> unrecoverable error, from threa
>> d : FollowerRequestProcessor:1
>>                                       java.net.SocketException: Socket
>> closed
>>                                         at
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
>>                                         at
>> java.net.SocketOutputStream.write(SocketOutputStream.java:155)
>>                                         at
>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>>                                         at
>> java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>>                                         at
>> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
>>                                         at
>> org.apache.zookeeper.server.quorum.Learner.request(Learner.java:188)
>>                                         at
>> org.apache.zookeeper.server.quorum.FollowerRequestProcessor.run(FollowerRequestProcessor.java:90)
>> May 18 15:34:28 MD001076 java[29148]: [myid:1] INFO
>> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
>>                                       java.lang.Exception: shutdown
>> Follower
>>                                         at
>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
>>                                         at
>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:941)
>
>
> I am confused why all followers shutdown in this case which makes the
> whole ZooKeeper unusable for a short period, shouldn't they elect a new
> leader instead? Thanks!
>
>
> Regards,
> Qian Zhang
>