You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Ji Zhang <zh...@gmail.com> on 2013/01/04 04:17:52 UTC

ZooKeeper Doesn't Quit When OOM Occur

Hi,

I'm using ZooKeeper 3.4.3, and yesterday one of the nodes is down due to
OutOfMemory Error:

2013-01-03 18:36:58,566 [myid:3] - ERROR
[CommitProcessor:3:CommitProcessor@148] - Unexpected exception causing
CommitProcessor to exit
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2367)
2013-01-03 18:36:58,754 [myid:3] - INFO
 [CommitProcessor:3:CommitProcessor@150] - CommitProcessor exited loop!
2013-01-03 18:37:01,276 [myid:3] - ERROR [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory$1@49] - Thread
Thread[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181,5,main] died
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.HashMap.newKeyIterator(HashMap.java:853)
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread "SyncThread:3"
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread "QuorumPeer[myid=3]/0.0.0.0:2181"
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread "main"
2013-01-03 18:37:03,343 [myid:3] - ERROR
[SyncThread:3:SyncRequestProcessor@151] - Severe unrecoverable error,
exiting
2013-01-03 18:37:04,465 [myid:3] - ERROR
[SyncThread:3:NIOServerCnxnFactory$1@49] - Thread
Thread[SyncThread:3,5,main] died
2013-01-03 18:41:43,477 [myid:3] - INFO
 [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
Removing file: Jan 3, 2013 5:21:21 AM
/var/zookeeper/version-2/log.4001d7bd9
Removing file: Jan 3, 2013 10:36:02 AM
 /var/zookeeper/version-2/log.4001ee156
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread "PurgeTask"\

Actually there are a lot of other stuff are running on this server, so I
don't blame it for throwing OOM. But what bothers me is that when
encountering OOME, ZooKeeper process doesn't quit. I'm using supervisord to
monitor zk process, so if it does follow the fail-fast strategy, it'll be
restarted afterwards.

Any explanation for this?

Thanks.