You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Stephen McCants (JIRA)" <ji...@apache.org> on 2010/09/03 17:08:32 UTC

[jira] Created: (ZOOKEEPER-865) Runaway thread

Runaway thread
--------------

                 Key: ZOOKEEPER-865
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-865
             Project: Zookeeper
          Issue Type: Bug
    Affects Versions: 3.3.1, 3.3.0
         Environment: Linux; Java 1.6; x86;
            Reporter: Stephen McCants
            Priority: Critical


I'm starting a standalone Zookeeper server (v3.3.1).  That starts normally and does not have a runaway thread.

Next, I start an based Eclipse application that is using ZK 3.3.0 to register itself with the ZooKeeper server (3.3.1).  The Eclipse application using the following arguments to Eclipse:

-Dzoodiscovery.autoStart=true
-Dzoodiscovery.flavor=zoodiscovery.flavor.centralized=smccants.austin.ibm.com

When the Eclipse application starts, the ZK server prints out:

2010-09-03 09:59:46,006 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /9.53.189.11:42271
2010-09-03 09:59:46,039 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@776] - Client attempting to establish new session at /9.53.189.11:42271
2010-09-03 09:59:46,045 - INFO  [SyncThread:0:NIOServerCnxn@1579] - Established session 0x12ad81b90000002 with negotiated timeout 4000 for client /9.53.189.11:42271
2010-09-03 09:59:46,046 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /9.53.189.11:42272
2010-09-03 09:59:46,078 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@776] - Client attempting to establish new session at /9.53.189.11:42272
2010-09-03 09:59:46,080 - INFO  [SyncThread:0:NIOServerCnxn@1579] - Established session 0x12ad81b90000003 with negotiated timeout 4000 for client /9.53.189.11:42272

Then both the Eclipse application and the ZK server go into runaway states and consume 100% of the CPU.

Here is a view from top:

  PID USER        PR  NI  VIRT    RES  SHR S %CPU %MEM    TIME+  COMMAND
4949 smccants  15   0  597m  78m 5964 S    66.2      1.0      1:03.14 autosubmitter
4876 smccants  17   0  554m  27m 6688 S    30.9       0.3     0:34.74 java

PID 4949 (autosubmitter) is the Eclipse application and is using more than twice the CPU of PID 4876 (java) which is the ZK server.  They will continue in this state indefinitely.

I can attach a debugger to the Eclipse application and if I stop the thread named "pool-1-thread-2-SendThread(smccants.austin.ibm.com:2181)" and the runaway condition stops on both the application and ZK server.  However the ZK server reports:

2010-09-03 10:03:38,001 - INFO  [SessionTracker:ZooKeeperServer@315] - Expiring session 0x12ad81b90000003, timeout of 4000ms exceeded
2010-09-03 10:03:38,002 - INFO  [ProcessThread:-1:PrepRequestProcessor@208] - Processed session termination for sessionid: 0x12ad81b90000003
2010-09-03 10:03:38,005 - INFO  [SyncThread:0:NIOServerCnxn@1434] - Closed socket connection for client /9.53.189.11:42272 which had sessionid 0x12ad81b90000003

Here is the stack trace from the suspended thread:

EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native method]	
EPollArrayWrapper.poll(long) line: 215	
EPollSelectorImpl.doSelect(long) line: 77	
EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69	
EPollSelectorImpl(SelectorImpl).select(long) line: 80	
ClientCnxn$SendThread.run() line: 1066	

Any ideas what might be going wrong?

Thanks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.