You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Stephen Yuan Jiang (JIRA)" <ji...@apache.org> on 2015/07/03 02:37:04 UTC

[jira] [Created] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely

Stephen Yuan Jiang created HBASE-14016:
------------------------------------------

             Summary: Procedure V2: NPE in a delete table follow by create table closely
                 Key: HBASE-14016
                 URL: https://issues.apache.org/jira/browse/HBASE-14016
             Project: HBase
          Issue Type: Bug
          Components: proc-v2
    Affects Versions: 1.1.1, 2.0.0, 1.2.0, 1.3.0
            Reporter: Stephen Yuan Jiang
            Assignee: Stephen Yuan Jiang


In our internal test for HBASE 1.1, we found a race condition that delete table followed by create table closely would leak zk lock due to NPE in ProcedureFairRunQueues
{noformat}
Exception in thread "ProcedureExecutorThread-0" java.lang.NullPointerException
	at org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279)
	at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280)
	at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58)
	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674)
{noformat}

Here is the code that cause the race condition:
{code}
protected boolean markTableAsDeleted(final TableName table) {
    TableRunQueue queue = getRunQueue(table);
    if (queue != null) {
        ...
        if (queue.isEmpty() && !queue.isLocked()) {
          fairq.remove(table);
    ...
}

public boolean tryWrite(final TableLockManager lockManager,
        final TableName tableName, final String purpose) {
        ...
        tableLock = lockManager.writeLock(tableName, purpose);
        try {
          tableLock.acquire();
      ...
        wlock = true;
    ...
}
{code}

The root cause is: wlock is set too late and not protect the queue be deleted.
- Thread 1: create table is running; queue is empty - tryWrite() acquire the lock (now wlock is still false)
- Thread 2: markTableAsDeleted see the queue empty and wlock= false
- Thread 1: set wlock=true - too late
- Thread 2: delete the queue
- Thread 1: never able to release the lock - NPE trying to get queue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)