You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Chris Nauroth (Jira)" <ji...@apache.org> on 2022/10/26 17:57:00 UTC

[jira] [Commented] (HIVE-26669) Hive Metastore become unresponsive

    [ https://issues.apache.org/jira/browse/HIVE-26669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17624650#comment-17624650 ] 

Chris Nauroth commented on HIVE-26669:
--------------------------------------

It appears these threads are stuck trying to initialize the raw client, while trying to acquire a lock for safely updating configuration:

https://github.com/apache/hive/blob/rel/release-3.1.0/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L344

There must be some other thread already holding the lock (0x00007f9ad0795c48). Do you have a full thread dump? Finding the thread that already holds the lock would be the next best step for troubleshooting. The other thread could be holding the lock for a long time for numerous reasons (hanging socket connection to the database, spinning in a loop due to some bug, etc.).

BTW, the line numbers in the stack trace don't seem to line up exactly with version 3.1.0, which you indicated in the Affects Version field, so I wonder if this is really a different version or perhaps something with custom patches.

> Hive Metastore become unresponsive
> ----------------------------------
>
>                 Key: HIVE-26669
>                 URL: https://issues.apache.org/jira/browse/HIVE-26669
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 3.1.0
>            Reporter: Sandeep Gade
>            Priority: Critical
>
> We are experiencing issues with Hive Metastore where it goes unresponsive. Initial investigation shows thousands of thread in WAITING (parking) state as shown below:
>     1    java.lang.Thread.State: BLOCKED (on object monitor)
>     772    java.lang.Thread.State: RUNNABLE
>       2    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>      13    java.lang.Thread.State: TIMED_WAITING (parking)
>       5    java.lang.Thread.State: TIMED_WAITING (sleeping)
>       3    java.lang.Thread.State: WAITING (on object monitor)
>   14308    java.lang.Thread.State: WAITING (parking)
> ==============
> Almost all of the threads are stuck at 'parking to wait for  <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)'
>  
>      15         - parking to wait for  <0x00007f9ad06c9c10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   14288         - parking to wait for  <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
>       1         - parking to wait for  <0x00007f9ad0a161f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad0a39248> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad0adb0a0> (a java.util.concurrent.SynchronousQueue$TransferQueue)
>       5         - parking to wait for  <0x00007f9ad0b12278> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad0b12518> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad0b44878> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad0cbe8f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad1318d60> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       1         - parking to wait for  <0x00007f9ad1478c10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       5         - parking to wait for  <0x00007f9ad1494ff8> (a java.util.concurrent.SynchronousQueue$TransferQueue)
> ======================
> complete stack:
> "pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x00007f977bfc9800 nid=0x62011 waiting on condition [0x00007f959d917000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>         at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>         at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
>         at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
>         at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:351)
>         at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77)
>         at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
>         at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:59)
>         at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:750)
>         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:718)
>         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:712)
>         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1488)
>         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1470)
>         at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>         at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>         at com.sun.proxy.$Proxy30.get_database(Unknown Source)
>         at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:15014)
>         at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:14998)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>         at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
>         at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>         at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750)
>    Locked ownable synchronizers:
>         - <0x00007fae9f0d8c20> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> ======================
> Looking at linux process, Hive exhausts its 'max processes count' while the issue is happening
> set to:
> Max processes             16000                16000                processes
> As a workaround, we restart Metastores and it works fine for few days.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)