You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Duo Zhang (JIRA)" <ji...@apache.org> on 2018/10/25 03:47:00 UTC

[jira] [Reopened] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

     [ https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Duo Zhang reopened HBASE-21364:
-------------------------------

Reopen to push to master and branch-2, as HBASE-21384 needs the changes here.

> Procedure holds the lock should put to front of the queue after restart
> -----------------------------------------------------------------------
>
>                 Key: HBASE-21364
>                 URL: https://issues.apache.org/jira/browse/HBASE-21364
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.1.0, 2.0.2
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Blocker
>             Fix For: 2.1.1, 2.0.3
>
>         Attachments: HBASE-21364.branch-2.0.001.patch, HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable procedures back to the queue to execute. The order is not the problem before HBASE-20846 since the first one to execute will acquire the lock itself. But since the locks will restored after HBASE-20846. If we execute a procedure without the lock first before a procedure with the lock in the same queue, there is a race condition that we may not be able to execute all procedures in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the table's queue, but the table's shard lock was lock by a Region Procedure. Since no one takes the exclusive lock, the queue is put to run queue to execute. But soon, the worker thread see the procedure can't execute because it doesn't hold the lock, so it will stop execute and remove the queue from run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock and the region's exclusive lock is put to the table's queue. But, since the queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)