You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Eugene Chung (Jira)" <ji...@apache.org> on 2021/10/30 03:00:00 UTC

[jira] [Created] (HIVE-25663) Need to modify table/partition lock acquisition retry for Zookeeper option

Eugene Chung created HIVE-25663:
-----------------------------------

             Summary: Need to modify table/partition lock acquisition retry for Zookeeper option
                 Key: HIVE-25663
                 URL: https://issues.apache.org/jira/browse/HIVE-25663
             Project: Hive
          Issue Type: Improvement
          Components: Locking
            Reporter: Eugene Chung
            Assignee: Eugene Chung
         Attachments: image-2021-10-30-11-54-42-164.png

 
{code:java}
LOCK TABLE default.my_table PARTITION (log_date='2021-10-30') EXCLUSIVE;
SET hive.query.timeout.seconds=5;
SELECT * FROM default.my_table WHERE log_date='2021-10-30' LIMIT 10;
{code}
 

If you execute the three SQLs above in the same session, the last SELECT will be cancelled by timeout error. The problem is that when you execute 'show locks', you will see a SHARED lock of default.my_table is remained for 100 minutes, if you are using ZooKeeperHiveLockManager.

!image-2021-10-30-11-54-42-164.png|width=873,height=411!

I am going to explain the problem one by one.

 

The SELECT SQL which gets some data from a partitioned table 

 
{code:java}
SELECT * FROM my_table WHERE log_date='2021-10-30' LIMIT 10{code}
 

needs two SHARED locks in order. The two SHARED locks are
 * default.my_table
 * default.my_table@log_date=2021-10-30

Before executing the SQL, an EXCLUSIVE lock of the partition exists. I can simulate it easily with a DDL like below;

 
{code:java}
LOCK TABLE default.my_table PARTITION (log_date='2021-10-30') EXCLUSIVE{code}
 

The SELECT SQL can't acquire the SHARED lock of the partition and it retries to acquire it as specified by two configurations. The default values mean it will retry for 100 minutes.
 * hive.lock.sleep.between.retries=60s
 * hive.lock.numretries=100

 

If query.timeout is set to 5 seconds, the SELECT SQL is cancelled 5 seconds later and the client returns with timeout error. But the SHARED lock of the my_table is still remained. It's because [the current ZooKeeperHiveLockManager just logs InterruptedException|https://github.com/apache/hive/blob/8a8e03d02003aa3543f46f595b4425fd8c156ad9/ql/src/java/org/apache/hadoop/hive/ql/lockmgr/zookeeper/ZooKeeperHiveLockManager.java#L326] and still goes on lock retry. This also means that the SQL processing thread is still doing its job for 100 minutes(by default) even though the SQL is cancelled. If the same SQL is executed 3 times, you can see 3 threads each of which thread dump is like below;

 
{code:java}
"HiveServer2-Background-Pool: Thread-154" #154 prio=5 os_prio=0 tid=0x00007f0ac91cb000 nid=0x13d25 waiting on condition [0x000
07f0aa2ce2000]
 java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.lock(ZooKeeperHiveLockManager.java:303)
 at org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.lock(ZooKeeperHiveLockManager.java:207)
 at org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager.acquireLocks(DummyTxnManager.java:199)
 at org.apache.hadoop.hive.ql.Driver.acquireLocks(Driver.java:1610)
 at org.apache.hadoop.hive.ql.Driver.lockAndRespond(Driver.java:1796)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1966)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1710)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1704)
 at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
 at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:217)
 at org.apache.hive.service.cli.operation.SQLOperation.access$500(SQLOperation.java:87)
 at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:309)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
 at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:322)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748){code}
 

 

I think ZooKeeperHiveLockManager should not swallow the unexpected exception. I should retry for expected ones.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)