You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Gunnar Wagenknecht <gu...@wagenknecht.org> on 2012/09/05 09:43:01 UTC

ZooKeeper Cluster Crash resulted in not loadable database

Hi,

I'm investigating a crash of a ZooKeeper 3.3.4 cluster. It seems that
the cause of the crash was an issue in the networking layer. All the ZK
server suddenly lost connections to clients as well as all between
themselves. Only a few seconds later, all ZooKeeper servers had issues
loading their database because of the following exception.

ERROR [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@224]
Failed to increment parent cversion for: /a/b/c
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
NoNode for /a/b/c
at DataTree.incrementCversion(DataTree.java:1218)
at FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:222)
at FileTxnSnapLog.restore(FileTxnSnapLog.java:150)
at ZKDatabase.loadDataBase(ZKDatabase.java:222)
at QuorumPeer.getLastLoggedZxid(QuorumPeer.java:493)
at FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:632)
at FastLeaderElection.lookForLeader(FastLeaderElection.java:660)
at QuorumPeer.run(QuorumPeer.java:622)

WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@497]
Unable to load database

Note that the path "/a/b/c" was different on all servers. Thus, each
server tried to restore a different transaction.

The only way I was able to bring the cluster back online was to delete
all the transaction logs on all servers and start with the latest snapshot.

I have all the logs and snapshots available for investigation. Are there
any tools to help an investigation? I'd like to find out how such a
network outage could possibly cause such an inconsistent/instable state
in the system. I noticed a few stability fixes in 3.3.5/3.3.6. Thus, an
upgrade is already scheduled.

Any help is appreciated.

-Gunnar



-- 
Gunnar Wagenknecht
gunnar@wagenknecht.org
http://wagenknecht.org/


Re: ZooKeeper Cluster Crash resulted in not loadable database

Posted by Camille Fournier <ca...@apache.org>.
You can try running them through org.apache.zookeeper.server.LogFormatter
and see what comes out. That's where I would start.

C

On Wed, Sep 5, 2012 at 3:43 AM, Gunnar Wagenknecht
<gu...@wagenknecht.org>wrote:

> Hi,
>
> I'm investigating a crash of a ZooKeeper 3.3.4 cluster. It seems that
> the cause of the crash was an issue in the networking layer. All the ZK
> server suddenly lost connections to clients as well as all between
> themselves. Only a few seconds later, all ZooKeeper servers had issues
> loading their database because of the following exception.
>
> ERROR [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@224]
> Failed to increment parent cversion for: /a/b/c
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for /a/b/c
> at DataTree.incrementCversion(DataTree.java:1218)
> at FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:222)
> at FileTxnSnapLog.restore(FileTxnSnapLog.java:150)
> at ZKDatabase.loadDataBase(ZKDatabase.java:222)
> at QuorumPeer.getLastLoggedZxid(QuorumPeer.java:493)
> at FastLeaderElection.getInitLastLoggedZxid(FastLeaderElection.java:632)
> at FastLeaderElection.lookForLeader(FastLeaderElection.java:660)
> at QuorumPeer.run(QuorumPeer.java:622)
>
> WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@497]
> Unable to load database
>
> Note that the path "/a/b/c" was different on all servers. Thus, each
> server tried to restore a different transaction.
>
> The only way I was able to bring the cluster back online was to delete
> all the transaction logs on all servers and start with the latest snapshot.
>
> I have all the logs and snapshots available for investigation. Are there
> any tools to help an investigation? I'd like to find out how such a
> network outage could possibly cause such an inconsistent/instable state
> in the system. I noticed a few stability fixes in 3.3.5/3.3.6. Thus, an
> upgrade is already scheduled.
>
> Any help is appreciated.
>
> -Gunnar
>
>
>
> --
> Gunnar Wagenknecht
> gunnar@wagenknecht.org
> http://wagenknecht.org/
>
>