You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by guangyy <gi...@git.apache.org> on 2018/11/02 23:29:04 UTC
[GitHub] hive pull request #484: HIVE-16839: Fix a race condidtion during concurrent ...
GitHub user guangyy opened a pull request:
https://github.com/apache/hive/pull/484
HIVE-16839: Fix a race condidtion during concurrent partition drops
We have seen a leaked lock on hive metastore DB which caused all
PARTITION insertion failed on timeout waiting for lock until the
metastore service is restarted.
A transaction dump on the DB shows there is a thread that is Sleep which
potentiall holds the the lock, like:
```
trx_id: 33603171058
trx_state: RUNNING
trx_started: 2018-10-23 06:43:22
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 70298
trx_mysql_thread_id: 275402202
trx_query: NULL
trx_operation_state: NULL
trx_tables_in_use: 0
trx_tables_locked: 0
trx_lock_structs: 21286
trx_lock_memory_bytes: 2881064
trx_rows_locked: 98810
trx_rows_modified: 49012
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_adaptive_hash_latched: 0
trx_adaptive_hash_timeout: 0
trx_is_read_only: 0
trx_autocommit_non_locking: 0
ID: 275402202
USER: metastore_gold
HOST: 10.37.182.82:36684
DB: metastoregold
COMMAND: Sleep
TIME: 1
STATE:
INFO: NULL
duration: 1316
Given the HOST ip, we trace back to the hive metastore instance and found the following exceptions:
No such database row
org.datanucleus.exceptions.NucleusObjectNotFoundException: No such database row
at org.datanucleus.store.rdbms.request.FetchRequest.execute(FetchRequest.java:357)
at org.datanucleus.store.rdbms.RDBMSPersistenceHandler.fetchObject(RDBMSPersistenceHandler.java:324)
at org.datanucleus.state.AbstractStateManager.loadFieldsFromDatastore(AbstractStateManager.java:1120)
at org.datanucleus.state.JDOStateManager.loadSpecifiedFields(JDOStateManager.java:2916)
at org.datanucleus.state.JDOStateManager.isLoaded(JDOStateManager.java:3219)
```
The problem is that the caller expects a NULL if the partition does not exist, however, the convertToPart function would throw
an exception which lead to the leak.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/guangyy/hive HIVE-16839
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/hive/pull/484.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #484
----
commit 5137027ee658990dd1503c09c13a73e2848d8deb
Author: Guang Yang <gu...@...>
Date: 2018-11-02T23:21:35Z
HIVE-16839: Fix a race condidtion during concurrent partition drops
----
---