You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Rakesh <ra...@huawei.com> on 2011/06/21 13:42:34 UTC

Zookeeper Open Source defect : Zookeeper is not shut down completely when dataDir disk space is full

Dear All,
 
 
We have found one issue when testing the disk space full scenario. Please go
through my observations and let me know any similar issues has been fixed in
the community. If not fixed, we can contribute with a better solution.
 
 
Defect: Zookeeper is not shut down completely when dataDir disk space is
full and no service is available.
-----------
 
 
Scenario: 
-------------- 
If the leader zookeeper data dir disk is made full, the zookeeper is trying
to shut down. But it is waiting infinitely in the SyncRequestProcessor
thread. After some time, disk space is made available, clients are not able
to establish connection with zk again. 


Root Cause: this.join() is invoked in the same thread where System.exit(11)
has been invoked.
-----------------
When disk space full happens, It got the exception as follows 'No space left
on device' and invoked System.exit(11) from the SyncRequestProcessor
thread(The following logs shows the same). Before exiting JVM, ZK will
execute the ShutdownHook of QuorumPeerMain and the flow comes to
SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same
thread where System.exit(11) has been invoked. 

2011-06-21 10:09:59,730 - FATAL [SyncThread:2:SyncRequestProcessor@148] -
Severe unrecoverable error, exiting 
java.io.IOException: No space left on device 
        at java.io.FileOutputStream.writeBytes(Native Method) 
        at java.io.FileOutputStream.write(FileOutputStream.java:260) 
        at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) 
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)

        at
org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:30
5) 
        at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog
.java:324) 
        at
org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) 
        at
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.
java:158) 
        at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.ja
va:98) 
2011-06-21 10:09:59,732 - INFO  [Thread-2:ZooKeeperPurger@108] - Shutting
down zookeeper purger. 
2011-06-21 10:09:59,732 - INFO  [Thread-2:QuorumPeer@691] - The Quorum
server is going for shutdown 
2011-06-21 10:09:59,732 - INFO  [Thread-2:Leader@393] - Shutdown called 
java.lang.Exception: shutdown Leader! reason: quorum Peer shutdown 
        at
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:393) 
        at
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694) 
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:
126) 
2011-06-21 10:09:59,733 - INFO  [Thread-6:Leader$LearnerCnxAcceptor@243] -
exception while shutting down acceptor: java.net.SocketException: Socket
closed 
2011-06-21 10:09:59,758 - INFO  [ProcessThread:-1:PrepRequestProcessor@120]
- PrepRequestProcessor exited loop! 
2011-06-21 10:09:59,758 - INFO  [CommitProcessor:2:CommitProcessor@150] -
CommitProcessor exited loop! 
2011-06-21 10:09:59,759 - INFO  [Thread-2:FinalRequestProcessor@379] -
shutdown of request processor complete 
2011-06-21 10:10:00,000 - INFO  [SessionTracker:SessionTrackerImpl@165] -
SessionTrackerImpl exited loop! 



The following threadumps shows the QuorumPeerMain thread is infntely waiting
inside SyncRequestProcessor. 

"Thread-2" prio=10 tid=0x0810a400 nid=0x1695 in Object.wait() [0xac783000] 
   java.lang.Thread.State: WAITING (on object monitor) 
        at java.lang.Object.wait(Native Method) 
        - waiting on <0xb804f5e8> (a
org.apache.zookeeper.server.SyncRequestProcessor) 
        at java.lang.Thread.join(Thread.java:1143) 
        - locked <0xb804f5e8> (a
org.apache.zookeeper.server.SyncRequestProcessor) 
        at java.lang.Thread.join(Thread.java:1196) 
        at
org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcess
or.java:171) 
        at
org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(Proposa
lRequestProcessor.java:79) 
        at
org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcess
or.java:513) 
        at
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:41
3) 
        at
org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:411) 
        at
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694) 
        at
org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:
126) 


Suggestion: 
------------------
Here SyncRequestProcessor thread is initiating the System.exit(11), its not
required to go for this.join() during shutdown. 


Rakesh R
HUAWEI TECHNOLOGIES CO.,LTD. huawei_logo 


Address: Solitaire Building, Domlur
Bangalore
Karnataka, India
www.huawei.com
----------------------------------------------------------------------------
---------------------------------------------------------
This e-mail and its attachments contain confidential information from
HUAWEI, which 
is intended only for the person or entity whose address is listed above. Any
use of the 
information contained herein in any way (including, but not limited to,
total or partial 
disclosure, reproduction, or dissemination) by persons other than the
intended 
recipient(s) is prohibited. If you receive this e-mail in error, please
notify the sender by 
phone or email immediately and delete it!