You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by zoo_js <sa...@cyber-itus.com> on 2019/02/27 07:05:17 UTC

Zookeeper crashes with EOF Exception

Hi all, 

We have a 3 node zookeeper cluster used for Vault as HA.  Starting a few
days ago, the entire cluster crashes a few times per day, all nodes at the
exact same time. We are running some load test using vault for Data
encryption. Per minute 1000 keys unique keys will be generated, Once the
issue started around 270,000 keys.

The following exception is got from the syslog, not sure what's causing this
crash. Please help to proceed..

2019-02-26 22:35:18,831 [myid:1] - WARN 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@90] - Exception when
following the leader
java.io.EOFException
       at
java.base/java.io.DataInputStream.readFully(DataInputStream.java:202)
       at
java.base/java.io.DataInputStream.readFully(DataInputStream.java:170)
       at
org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:94)
       at
org.apache.zookeeper.server.DataNode.deserialize(DataNode.java:165)
       at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
       at
org.apache.zookeeper.server.DataTree.deserialize(DataTree.java:1076)
       at
org.apache.zookeeper.server.util.SerializeUtils.deserializeSnapshot(SerializeUtils.java:130)
       at
org.apache.zookeeper.server.ZKDatabase.deserializeSnapshot(ZKDatabase.java:452)
       at
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:340)
       at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83)
       at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)
2019-02-26 22:35:19,349 [myid:1] - INFO 
[QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@169] - shutdown called
java.lang.Exception: shutdown Follower
       at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:169)

thanks 
JS

  




--
Sent from: http://zookeeper-user.578899.n2.nabble.com/

Re: Zookeeper crashes with EOF Exception

Posted by zoo_js <sa...@cyber-itus.com>.
Feb 28 12:35:15 ip-172-26-5-140 vault[17903]: 2019-02-28T12:35:15.210Z
[ERROR] storage.zookeeper: failed to release distributed lock: error="zk:
node does not exist"
Feb 28 12:35:15 ip-172-26-5-140 vault[17903]: 2019-02-28T12:35:15.210Z
[INFO]  storage.zookeeper: launching automated distributed lock release




--
Sent from: http://zookeeper-user.578899.n2.nabble.com/

Re: Zookeeper crashes with EOF Exception

Posted by zoo_js <sa...@cyber-itus.com>.
I deleted the snapshot and log, ran the script again and got the following
error at 257530

Zookeeper crashes with [ERROR] storage.zookeeper: failed to release
distributed lock



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/

Re: Zookeeper crashes with EOF Exception

Posted by Bob  Sheehan <bs...@vmware.com.INVALID>.
unsubscribe


On 2/27/19, 7:32 PM, "zoo_js" <sa...@cyber-itus.com> wrote:

    Thanks for your response.
    
    ZKeeper crashes when trying to create 84151 th key. I will delete the
    snapshots and logs, and will run the whole load testing again. 
    
    I am running on version 3.4.13 in AWS Lightsail, ubuntu system with 2GB Ram,
    60GB SD and 1CPU. 
    Is it good idea to delete the  snapshot/log periodically ? I tried the
    ./zkCleanup.sh -n 3 but does not seem to do any deletion of log / snapshot,
    please help me out.
    
    
    
    
    --
    Sent from: https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fzookeeper-user.578899.n2.nabble.com%2F&amp;data=02%7C01%7Cbsheehan%40vmware.com%7C2fe313748a044d7e077008d69d2d5f1b%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636869215529974144&amp;sdata=GNJTzAYZRBSNBEFXCv3gLaC%2BOY9NG2W3JOGWCG8O0jg%3D&amp;reserved=0
    


Re: Zookeeper crashes with EOF Exception

Posted by zoo_js <sa...@cyber-itus.com>.
Thanks for your response.

ZKeeper crashes when trying to create 84151 th key. I will delete the
snapshots and logs, and will run the whole load testing again. 

I am running on version 3.4.13 in AWS Lightsail, ubuntu system with 2GB Ram,
60GB SD and 1CPU. 
Is it good idea to delete the  snapshot/log periodically ? I tried the
./zkCleanup.sh -n 3 but does not seem to do any deletion of log / snapshot,
please help me out.




--
Sent from: http://zookeeper-user.578899.n2.nabble.com/

Re: Zookeeper crashes with EOF Exception

Posted by Norbert Kalmar <nk...@cloudera.com.INVALID>.
Sounds like your snapshot is corrupted. But you said ZK is running fine for
some amount of time then crashes?
Maybe it's an invalid PROPOSE message.
By the way, sounds a bit similar to this issue:
https://issues.apache.org/jira/browse/ZOOKEEPER-1955

If it is possible, delete the snapshot and txn logs from data dir (you will
lose your data!) and restart the clusters.
Which version of ZK are you using?

On Wed, Feb 27, 2019 at 12:29 PM zoo_js <sa...@cyber-itus.com>
wrote:

> There is 3 snapshot files with 1.01 GB size, each file at around 330 MB of
> size. I have a 56GB of hard disk space available.
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>

Re: Zookeeper crashes with EOF Exception

Posted by zoo_js <sa...@cyber-itus.com>.
There is 3 snapshot files with 1.01 GB size, each file at around 330 MB of
size. I have a 56GB of hard disk space available. 



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/

Re: Zookeeper crashes with EOF Exception

Posted by zoo_js <sa...@cyber-itus.com>.
I am not sure about reading the snapshot, can you help me with the command /
steps to do the same ? I did not run out of disk space, the machine has 60gb
of space. 

thanks 
JS



--
Sent from: http://zookeeper-user.578899.n2.nabble.com/

Re: Zookeeper crashes with EOF Exception

Posted by Norbert Kalmar <nk...@cloudera.com.INVALID>.
Hi JS,

Looks like there was a Leader election, and during sync phase
(syncWithLeader), the follower tried to deserialize the snapshot, but it is
an incomplete file, hence the EOF exception.
How big is your snapshot? Did you run out of disc space?
Also worth checking for fsync warnings / errors in the log.

Hope this helps.

Regards,
Norbert

On Wed, Feb 27, 2019 at 8:05 AM zoo_js <sa...@cyber-itus.com>
wrote:

> Hi all,
>
> We have a 3 node zookeeper cluster used for Vault as HA.  Starting a few
> days ago, the entire cluster crashes a few times per day, all nodes at the
> exact same time. We are running some load test using vault for Data
> encryption. Per minute 1000 keys unique keys will be generated, Once the
> issue started around 270,000 keys.
>
> The following exception is got from the syslog, not sure what's causing
> this
> crash. Please help to proceed..
>
> 2019-02-26 22:35:18,831 [myid:1] - WARN
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@90] - Exception when
> following the leader
> java.io.EOFException
>        at
> java.base/java.io.DataInputStream.readFully(DataInputStream.java:202)
>        at
> java.base/java.io.DataInputStream.readFully(DataInputStream.java:170)
>        at
> org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:94)
>        at
> org.apache.zookeeper.server.DataNode.deserialize(DataNode.java:165)
>        at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>        at
> org.apache.zookeeper.server.DataTree.deserialize(DataTree.java:1076)
>        at
>
> org.apache.zookeeper.server.util.SerializeUtils.deserializeSnapshot(SerializeUtils.java:130)
>        at
>
> org.apache.zookeeper.server.ZKDatabase.deserializeSnapshot(ZKDatabase.java:452)
>        at
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:340)
>        at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:83)
>        at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)
> 2019-02-26 22:35:19,349 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@169] - shutdown called
> java.lang.Exception: shutdown Follower
>        at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:169)
>
> thanks
> JS
>
>
>
>
>
>
> --
> Sent from: http://zookeeper-user.578899.n2.nabble.com/
>