You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Matteo Bertozzi (JIRA)" <ji...@apache.org> on 2015/07/03 02:46:05 UTC
[jira] [Resolved] (HBASE-14016) Procedure V2: NPE in a delete table
follow by create table closely
[ https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matteo Bertozzi resolved HBASE-14016.
-------------------------------------
Resolution: Duplicate
sorry closing as duplicate of HBASE-14017
(we don't need a full lock)
> Procedure V2: NPE in a delete table follow by create table closely
> ------------------------------------------------------------------
>
> Key: HBASE-14016
> URL: https://issues.apache.org/jira/browse/HBASE-14016
> Project: HBase
> Issue Type: Bug
> Components: proc-v2
> Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
> Reporter: Stephen Yuan Jiang
> Assignee: Stephen Yuan Jiang
>
> In our internal test for HBASE 1.1, we found a race condition that delete table followed by create table closely would leak zk lock due to NPE in ProcedureFairRunQueues
> {noformat}
> Exception in thread "ProcedureExecutorThread-0" java.lang.NullPointerException
> at org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279)
> at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280)
> at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674)
> {noformat}
> Here is the code that cause the race condition:
> {code}
> protected boolean markTableAsDeleted(final TableName table) {
> TableRunQueue queue = getRunQueue(table);
> if (queue != null) {
> ...
> if (queue.isEmpty() && !queue.isLocked()) {
> fairq.remove(table);
> ...
> }
> public boolean tryWrite(final TableLockManager lockManager,
> final TableName tableName, final String purpose) {
> ...
> tableLock = lockManager.writeLock(tableName, purpose);
> try {
> tableLock.acquire();
> ...
> wlock = true;
> ...
> }
> {code}
> The root cause is: wlock is set too late and not protect the queue be deleted.
> - Thread 1: create table is running; queue is empty - tryWrite() acquire the lock (now wlock is still false)
> - Thread 2: markTableAsDeleted see the queue empty and wlock= false
> - Thread 1: set wlock=true - too late
> - Thread 2: delete the queue
> - Thread 1: never able to release the lock - NPE trying to get queue
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)