You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Laxman (JIRA)" <ji...@apache.org> on 2011/06/24 06:58:47 UTC

[jira] [Commented] (ZOOKEEPER-1109) Zookeeper service is down when SyncRequestProcessor meets any exception.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054239#comment-13054239 ] 

Laxman commented on ZOOKEEPER-1109:
-----------------------------------

Reposting the comments and analysis

I've also gone through Ted's earlier response on disk full scenario.
http://www.google.co.in/url?sa=t&source=web&cd=3&ved=0CCAQFjAC&url=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fzookeeper-user%2F201106.mbox%2F%253CBANLkTimzQjXZvDKnP6xQLF9jHfsaz6JstA%40mail.gmail.com%253E&ei=FBQETvPWIcLNrQfk75yaDA&usg=AFQjCNFTkguyxTligpz1TZBmkqe9Osz-uw

We feel, even when one of the cluster member's disk is full, we should not interrupt the complete service from entire cluster.

*Thread dumps*

The following thread dump shows the QuorumPeerMain thread is infntely waiting inside SyncRequestProcessor. 

{noformat}
"Thread-2" prio=10 tid=0x0810a400 nid=0x1695 in Object.wait() [0xac783000] 
   java.lang.Thread.State: WAITING (on object monitor) 
        at java.lang.Object.wait(Native Method) 
        - waiting on <0xb804f5e8> (a org.apache.zookeeper.server.SyncRequestProcessor) 
        at java.lang.Thread.join(Thread.java:1143) 
        - locked <0xb804f5e8> (a org.apache.zookeeper.server.SyncRequestProcessor) 
        at java.lang.Thread.join(Thread.java:1196) 
        at org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:171) 
        at org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:79) 
        at org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:513) 
        at org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:413) 
        at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:411) 
        at org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694) 
        at org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:126) 

"SyncThread:2" prio=10 tid=0xad7fd800 nid=0x4acb in Object.wait() [0xac9ba000] 
   java.lang.Thread.State: WAITING (on object monitor) 
        at java.lang.Object.wait(Native Method) 
        - waiting on <0xb8030d00> (a org.apache.zookeeper.server.quorum.QuorumPeerMain$1) 
        at java.lang.Thread.join(Thread.java:1143) 
        - locked <0xb8030d00> (a org.apache.zookeeper.server.quorum.QuorumPeerMain$1) 
        at java.lang.Thread.join(Thread.java:1196) 
        at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79) 
        at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24) 
        at java.lang.Shutdown.runHooks(Shutdown.java:79) 
        at java.lang.Shutdown.sequence(Shutdown.java:123) 
        at java.lang.Shutdown.exit(Shutdown.java:168) 
        - locked <0xf01ff3b0> (a java.lang.Class for java.lang.Shutdown) 
        at java.lang.Runtime.exit(Runtime.java:90) 
        at java.lang.System.exit(System.java:904) 
        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:149)
{noformat}


*Logs*

{noformat}
2011-06-21 10:09:59,730 - FATAL [SyncThread:2:SyncRequestProcessor@148] - Severe unrecoverable error, exiting 
java.io.IOException: No space left on device 
        at java.io.FileOutputStream.writeBytes(Native Method) 
        at java.io.FileOutputStream.write(FileOutputStream.java:260) 
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) 
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) 
        at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:305) 
        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:324) 
        at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) 
        at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:158) 
        at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98) 
2011-06-21 10:09:59,732 - INFO  [Thread-2:QuorumPeer@691] - The Quorum server is going for shutdown 
2011-06-21 10:09:59,732 - INFO  [Thread-2:Leader@393] - Shutdown called 
java.lang.Exception: shutdown Leader! reason: quorum Peer shutdown 
        at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:393) 
        at org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694) 
        at org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:126) 
2011-06-21 10:09:59,733 - INFO  [Thread-6:Leader$LearnerCnxAcceptor@243] - exception while shutting down acceptor: java.net.SocketException: Socket closed 
2011-06-21 10:09:59,758 - INFO  [ProcessThread:-1:PrepRequestProcessor@120] - PrepRequestProcessor exited loop! 
2011-06-21 10:09:59,758 - INFO  [CommitProcessor:2:CommitProcessor@150] - CommitProcessor exited loop! 
2011-06-21 10:09:59,759 - INFO  [Thread-2:FinalRequestProcessor@379] - shutdown of request processor complete 
2011-06-21 10:10:00,000 - INFO  [SessionTracker:SessionTrackerImpl@165] - SessionTrackerImpl exited loop! 
{noformat}


> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1109
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
>            Reporter: Laxman
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.
>  
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.
> *Root Cause* 
> this.join() is invoked in the same thread where System.exit(11) has been triggered.
> When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira