You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Matteo Bertozzi (JIRA)" <ji...@apache.org> on 2015/07/03 02:46:05 UTC

[jira] [Resolved] (HBASE-14016) Procedure V2: NPE in a delete table follow by create table closely

     [ https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi resolved HBASE-14016.
-------------------------------------
    Resolution: Duplicate

sorry closing as duplicate of HBASE-14017
(we don't need a full lock)

> Procedure V2: NPE in a delete table follow by create table closely
> ------------------------------------------------------------------
>
>                 Key: HBASE-14016
>                 URL: https://issues.apache.org/jira/browse/HBASE-14016
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>    Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
>            Reporter: Stephen Yuan Jiang
>            Assignee: Stephen Yuan Jiang
>
> In our internal test for HBASE 1.1, we found a race condition that delete table followed by create table closely would leak zk lock due to NPE in ProcedureFairRunQueues
> {noformat}
> Exception in thread "ProcedureExecutorThread-0" java.lang.NullPointerException
> 	at org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279)
> 	at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280)
> 	at org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58)
> 	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674)
> {noformat}
> Here is the code that cause the race condition:
> {code}
> protected boolean markTableAsDeleted(final TableName table) {
>     TableRunQueue queue = getRunQueue(table);
>     if (queue != null) {
>         ...
>         if (queue.isEmpty() && !queue.isLocked()) {
>           fairq.remove(table);
>     ...
> }
> public boolean tryWrite(final TableLockManager lockManager,
>         final TableName tableName, final String purpose) {
>         ...
>         tableLock = lockManager.writeLock(tableName, purpose);
>         try {
>           tableLock.acquire();
>       ...
>         wlock = true;
>     ...
> }
> {code}
> The root cause is: wlock is set too late and not protect the queue be deleted.
> - Thread 1: create table is running; queue is empty - tryWrite() acquire the lock (now wlock is still false)
> - Thread 2: markTableAsDeleted see the queue empty and wlock= false
> - Thread 1: set wlock=true - too late
> - Thread 2: delete the queue
> - Thread 1: never able to release the lock - NPE trying to get queue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)