You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jeff Whiting <je...@qualtrics.com> on 2012/09/05 17:36:14 UTC

RegionServer not shutting down in a timely manner

I had to stop all the region servers in my cluster but they get stuck for a long time.  However they 
will eventually shutdown.

I think this maybe the reason for the long shutdown time:

2012-09-05 09:26:31,436 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 
Encountered a SocketTimeoutException. Since thecall to the remote cluster timed out, which is 
usually caused by a machine failure or a massive slowdown, sleeping 1000 times 100

I've included some logs from the shutdown as well as a jstack from when it is waiting for shutdown.

Thanks,
~Jeff


2012-09-05 09:25:59,584 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: 
regionserver:60020-0x33959ee4a000331 Received ZooKeeper Event, type=NodeChildrenChanged, 
state=SyncConnected, path=/ut1.h1/rs
2012-09-05 09:25:59,587 INFO org.apache.zookeeper.ZooKeeper: Session: 0x33959ee4a000331 closed
2012-09-05 09:25:59,587 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server 
ds7.h1.ut1.qprod.net,60020,1346857766978; zookeeper connection closed.
2012-09-05 09:25:59,587 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2012-09-05 09:25:59,587 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Waiting for 
Split Thread to finish...
2012-09-05 09:25:59,590 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Waiting for 
Large Compaction Thread to finish...
2012-09-05 09:25:59,590 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Waiting for 
Small Compaction Thread to finish...
2012-09-05 09:25:59,590 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 
Closing source 1 because: Region server is closing
2012-09-05 09:26:00,609 INFO org.apache.hadoop.hbase.regionserver.Leases: 
regionserver60020.leaseChecker closing leases
2012-09-05 09:26:00,617 INFO org.apache.hadoop.hbase.regionserver.Leases: 
regionserver60020.leaseChecker closed leases
2012-09-05 09:26:31,436 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 
Encountered a SocketTimeoutException. Since thecall to the remote cluster timed out, which is 
usually caused by a machine failure or a massive slowdown, sleeping 1000 times 100




[prod.ut1.vw][root@ds7 ~]# sudo -u hbase /usr/java/default/bin/jstack 14870
2012-09-05 09:27:32
Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):

"Attach Listener" daemon prio=10 tid=0x000000004a86c000 nid=0x4f9e waiting on condition 
[0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

"Thread-8" prio=10 tid=0x000000004a7a4000 nid=0x4d5d in Object.wait() [0x000000004c371000]
    java.lang.Thread.State: WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     at java.lang.Thread.join(Thread.java:1143)
     - locked <0x00002aaab3543090> (a java.lang.Thread)
     at org.apache.hadoop.hbase.util.Threads.shutdown(Threads.java:93)
     at org.apache.hadoop.hbase.util.Threads.shutdown(Threads.java:81)
     at org.apache.hadoop.hbase.regionserver.ShutdownHook$ShutdownHookThread.run(ShutdownHook.java:114)
     at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

"SIGTERM handler" daemon prio=10 tid=0x000000004a9d1000 nid=0x4d5b in Object.wait() [0x000000004c16f000]
    java.lang.Thread.State: WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     at java.lang.Thread.join(Thread.java:1143)
     - locked <0x00002aaab51d9e50> (a org.apache.hadoop.util.ShutdownHookManager$1)
     at java.lang.Thread.join(Thread.java:1196)
     at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79)
     at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24)
     at java.lang.Shutdown.runHooks(Shutdown.java:79)
     at java.lang.Shutdown.sequence(Shutdown.java:123)
     at java.lang.Shutdown.exit(Shutdown.java:168)
     - locked <0x00002aad1f5da530> (a java.lang.Class for java.lang.Shutdown)
     at java.lang.Terminator$1.handle(Terminator.java:35)
     at sun.misc.Signal$1.run(Signal.java:195)
     at java.lang.Thread.run(Thread.java:619)

"regionserver60020.replicationSource,1" daemon prio=10 tid=0x00002aad3c0a6800 nid=0x3a96 waiting on 
condition [0x000000004023d000]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
     at java.lang.Thread.sleep(Native Method)
     at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.sleepForRetries(ReplicationSource.java:550)
     at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:631)
     at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)

"Timer thread for monitoring jvm" daemon prio=10 tid=0x00002aad3c1a4800 nid=0x3a8e in Object.wait() 
[0x0000000043560000]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     at java.util.TimerThread.mainLoop(Timer.java:509)
     - locked <0x00002aaab525ad40> (a java.util.TaskQueue)
     at java.util.TimerThread.run(Timer.java:462)

"Timer thread for monitoring hbase" daemon prio=10 tid=0x00002aad3c18a000 nid=0x3a8d in 
Object.wait() [0x000000004345f000]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     at java.util.TimerThread.mainLoop(Timer.java:509)
     - locked <0x00002aaab525ad60> (a java.util.TaskQueue)
     at java.util.TimerThread.run(Timer.java:462)

"regionserver60020.decayingSampleTick.1" daemon prio=10 tid=0x00002aad3c1a2000 nid=0x3a8c waiting on 
condition [0x000000004335e000]
    java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for  <0x00002aaab5252ff8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
     at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:583)
     at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:576)
     at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:619)

"regionserver60020-EventThread" daemon prio=10 tid=0x00002aad3c1a8000 nid=0x3a88 waiting on 
condition [0x00000000414e7000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for  <0x00002aaab5253048> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:493)

"regionserver60020-SendThread(zk2.ut1.qprod.net:2181)" daemon prio=10 tid=0x00002aad3c1ad800 
nid=0x3a87 runnable [0x0000000040b0e000]
    java.lang.Thread.State: RUNNABLE
     at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
     at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
     - locked <0x00002aaab5252608> (a sun.nio.ch.Util$1)
     - locked <0x00002aaab52525f0> (a java.util.Collections$UnmodifiableSet)
     - locked <0x00002aaab524fe00> (a sun.nio.ch.EPollSelectorImpl)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
     at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:274)
     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1039)

"DestroyJavaVM" prio=10 tid=0x00002aad38b04800 nid=0x3a61 waiting on condition [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

"regionserver60020-EventThread" daemon prio=10 tid=0x00002aad3c395800 nid=0x3a84 waiting on 
condition [0x0000000040ebf000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for  <0x00002aaab36ca1e8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:493)

"regionserver60020-SendThread(zk5.ut1.qprod.net:2181)" daemon prio=10 tid=0x00002aad3c394800 
nid=0x3a83 runnable [0x0000000040c63000]
    java.lang.Thread.State: RUNNABLE
     at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
     at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
     at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
     at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
     - locked <0x00002aaab35aa670> (a sun.nio.ch.Util$1)
     - locked <0x00002aaab35aa688> (a java.util.Collections$UnmodifiableSet)
     - locked <0x00002aaab35b4bd8> (a sun.nio.ch.EPollSelectorImpl)
     at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
     at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:274)
     at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1039)

"HftpFileSystem-DelegationTokenRenewer" daemon prio=10 tid=0x00002aad38ab6000 nid=0x3a82 waiting on 
condition [0x000000004053d000]
    java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for  <0x00002aaab36efb78> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
     at java.util.concurrent.DelayQueue.take(DelayQueue.java:160)
     at 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenRenewer.run(DelegationTokenRenewer.java:152)

"regionserver60020" prio=10 tid=0x00002aad38a9a800 nid=0x3a7f in Object.wait() [0x000000004315c000]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     at java.lang.Thread.join(Thread.java:1151)
     - locked <0x00002aaab51dd798> (a 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource)
     at org.apache.hadoop.hbase.util.Threads.shutdown(Threads.java:93)
     at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.terminate(ReplicationSource.java:708)
     at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.terminate(ReplicationSource.java:695)
     at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.join(ReplicationSourceManager.java:237)
     at org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:98)
     at org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1607)
     at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:765)
     at java.lang.Thread.run(Thread.java:619)

"Timer thread for monitoring rpc" daemon prio=10 tid=0x00002aad38a58800 nid=0x3a7c in Object.wait() 
[0x0000000040dbe000]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     at java.util.TimerThread.mainLoop(Timer.java:509)
     - locked <0x00002aaab36d9338> (a java.util.TaskQueue)
     at java.util.TimerThread.run(Timer.java:462)

"Low Memory Detector" daemon prio=10 tid=0x00002aad38022800 nid=0x3a6e runnable [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=10 tid=0x00002aad38020000 nid=0x3a6d waiting on condition 
[0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x00002aad3801e000 nid=0x3a6c waiting on condition 
[0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00002aad3801b800 nid=0x3a6b runnable [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

"Surrogate Locker Thread (CMS)" daemon prio=10 tid=0x00002aad38019800 nid=0x3a6a waiting on 
condition [0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x000000004a5ed000 nid=0x3a69 in Object.wait() [0x000000004204b000]
    java.lang.Thread.State: WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
     - locked <0x00002aaab35439c0> (a java.lang.ref.ReferenceQueue$Lock)
     at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
     at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x000000004a5eb000 nid=0x3a68 in Object.wait() 
[0x0000000041d4f000]
    java.lang.Thread.State: WAITING (on object monitor)
     at java.lang.Object.wait(Native Method)
     at java.lang.Object.wait(Object.java:485)
     at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
     - locked <0x00002aaab3543b40> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x000000004a5e7000 nid=0x3a67 runnable

"Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x000000004a4d2000 nid=0x3a62 runnable

"Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x000000004a4d4000 nid=0x3a63 runnable

"Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x000000004a4d6000 nid=0x3a64 runnable

"Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x000000004a4d7800 nid=0x3a65 runnable

"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x000000004a566800 nid=0x3a66 runnable
"VM Periodic Task Thread" prio=10 tid=0x00002aad3802e000 nid=0x3a6f waiting on condition

JNI global references: 1558

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com