You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by chenchun <ch...@gmail.com> on 2013/10/12 17:32:12 UTC

Hive ZooKeeper Connection Closed during unlock

Hi, 
Yesterday, we encountered a problem that our serveral hive processes stack on the unlockWithRetry method for a long time. 
We did a research on the dumped memory of hive and found out that there are over 400 ZooKeeperHiveLock objects, but the state of ZooKeeper Object is Closed. With 400 locks, the hive process need 1*10*400=4000 minutes to release the whole locks with default configure hive.lock.sleep.between.retries=60 hive.unlock.numretries=10. 
With a little track, I find https://issues.apache.org/jira/browse/HIVE-3913  fixes reconnect zookeeper issue only in ZooKeeperHiveLockManager#lock but not in ZooKeeperHiveLockManager#unlockWithRetry. The attachment is a imitate method of HIVE-3913 to resolve the issue. But I wonder if it is appropriate enough, or we can use cubator ? http://curator.incubator.apache.org/getting-started.html

Below is the thread and heap dump of the hive process.

Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode):

"Attach Listener" daemon prio=10 tid=0x000000000683f000 nid=0x34d0 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

"LeaseChecker" daemon prio=10 tid=0x0000000006693800 nid=0x2713 waiting on condition [0x0000000042af7000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376)
at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
- None

"Service Thread" daemon prio=10 tid=0x00002aaab8001000 nid=0x2651 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

"C2 CompilerThread1" daemon prio=10 tid=0x0000000005c7c800 nid=0x2650 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

"C2 CompilerThread0" daemon prio=10 tid=0x0000000005c71000 nid=0x264f waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

"Signal Dispatcher" daemon prio=10 tid=0x0000000005c6f000 nid=0x264e runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

"Finalizer" daemon prio=10 tid=0x0000000005c22000 nid=0x264d in Object.wait() [0x00000000427f4000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x000000078324b110> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)

   Locked ownable synchronizers:
- None

"Reference Handler" daemon prio=10 tid=0x0000000005c1a000 nid=0x264c in Object.wait() [0x0000000041900000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x000000078328fbc0> (a java.lang.ref.Reference$Lock)

   Locked ownable synchronizers:
- None

"main" prio=10 tid=0x0000000005b76800 nid=0x263d waiting on condition [0x0000000040f46000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlockWithRetry(ZooKeeperHiveLockManager.java:426)
at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlock(ZooKeeperHiveLockManager.java:415)
at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.releaseLocks(ZooKeeperHiveLockManager.java:257)
at org.apache.hadoop.hive.ql.Driver.releaseLocks(Driver.java:864)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:953)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

   Locked ownable synchronizers:
- None

"VM Thread" prio=10 tid=0x0000000005c12800 nid=0x264b runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000005b84800 nid=0x263e runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000005b86000 nid=0x263f runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000005b88000 nid=0x2640 runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000005b8a000 nid=0x2641 runnable 

"GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000005b8b800 nid=0x2642 runnable 

"GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000005b8d800 nid=0x2643 runnable 

"GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000005b8f800 nid=0x2644 runnable 

"GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000005b91000 nid=0x2645 runnable 

"GC task thread#8 (ParallelGC)" prio=10 tid=0x0000000005b93000 nid=0x2646 runnable 

"GC task thread#9 (ParallelGC)" prio=10 tid=0x0000000005b95000 nid=0x2647 runnable 

"GC task thread#10 (ParallelGC)" prio=10 tid=0x0000000005b96800 nid=0x2648 runnable 

"GC task thread#11 (ParallelGC)" prio=10 tid=0x0000000005b98800 nid=0x2649 runnable 

"GC task thread#12 (ParallelGC)" prio=10 tid=0x0000000005b9a800 nid=0x264a runnable 

"VM Periodic Task Thread" prio=10 tid=0x00002aaab800c000 nid=0x2652 waiting on condition 

JNI global references: 294







-- 
chenchun