You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2018/03/08 06:52:00 UTC
[jira] [Updated] (HBASE-20152) [AMv2] DisableTableProcedure versus
ServerCrashProcedure
[ https://issues.apache.org/jira/browse/HBASE-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-20152:
--------------------------
Description:
Seeing a small spate of issues where disabled tables/regions are being assigned. Usually they happen when a DisableTableProcedure is running concurrent with a ServerCrashProcedure. See below. See associated HBASE-20131. This is umbrella issue for fixing.
h3. Deadlock
From HBASE-20137, 'TestRSGroups is Flakey', https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325
{code}
* SCP is running because a server was aborted in test.
* SCP starts AssignProcedure of region X from crashed server.
* DisableTable Procedure runs because test has finished and we're doing table delete. Queues
* UnassignProcedure for region X.
* Disable Unassign gets Lock on region X first.
* SCP AssignProcedure tries to get lock, waits on lock.
* DisableTable Procedure UnassignProcedure RPC fails because server is down (Thats why the SCP).
* Tries to expire the server it failed the RPC against. Fails (currently being SCP'd).
* DisableTable Procedure Unassign is suspended. It is a suspend with lock on region X held
* SCP can't run because lock on X is held
* Test timesout.
{code}
h3. Delete of online Regions
Saw this in nightly failure #452 for branch-2 in TestSplitTransactionOnCluster.org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
{code}
* DisableTableProcedure is queued before SCP.
* DisableTableProcedure Unassign fails because can't RPC to crashed server and can't expire.
* Unassign is Stuck in suspend.
* SCP runs and cleans up suspended Disable Unassign.
* SCP completes which includes assign of Disable Unassign region.
* Disable Unassign completes
* Disable completes.
* A scheduled Drop Table Procedure runs (its end of test).
* Succeeds deleting regions that are actually assigned (see above where SCP assigned region).
{code}
was:
Seeing a small spate of issues where disabled tables/regions are being assigned. Usually they happen when a DisableTableProcedure is running concurrent with a ServerCrashProcedure. See below. See associated HBASE-20131. This is umbrella issue for fixing.
.h2 Deadlock
From HBASE-20137, 'TestRSGroups is Flakey', https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325
{code}
* SCP is running because a server was aborted in test.
* SCP starts AssignProcedure of region X from crashed server.
* DisableTable Procedure runs because test has finished and we're doing table delete. Queues
* UnassignProcedure for region X.
* Disable Unassign gets Lock on region X first.
* SCP AssignProcedure tries to get lock, waits on lock.
* DisableTable Procedure UnassignProcedure RPC fails because server is down (Thats why the SCP).
* Tries to expire the server it failed the RPC against. Fails (currently being SCP'd).
* DisableTable Procedure Unassign is suspended. It is a suspend with lock on region X held
* SCP can't run because lock on X is held
* Test timesout.
{code}
.h2 Delete of online Regions
Saw this in nightly failure #452 for branch-2 in TestSplitTransactionOnCluster.org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
{code}
* DisableTableProcedure is queued before SCP.
* DisableTableProcedure Unassign fails because can't RPC to crashed server and can't expire.
* Unassign is Stuck in suspend.
* SCP runs and cleans up suspended Disable Unassign.
* SCP completes which includes assign of Disable Unassign region.
* Disable Unassign completes
* Disable completes.
* A scheduled Drop Table Procedure runs (its end of test).
* Succeeds deleting regions that are actually assigned (see above where SCP assigned region).
{code}
> [AMv2] DisableTableProcedure versus ServerCrashProcedure
> --------------------------------------------------------
>
> Key: HBASE-20152
> URL: https://issues.apache.org/jira/browse/HBASE-20152
> Project: HBase
> Issue Type: Bug
> Components: amv2
> Reporter: stack
> Assignee: stack
> Priority: Major
>
> Seeing a small spate of issues where disabled tables/regions are being assigned. Usually they happen when a DisableTableProcedure is running concurrent with a ServerCrashProcedure. See below. See associated HBASE-20131. This is umbrella issue for fixing.
> h3. Deadlock
> From HBASE-20137, 'TestRSGroups is Flakey', https://issues.apache.org/jira/browse/HBASE-20137?focusedCommentId=16390325&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16390325
> {code}
> * SCP is running because a server was aborted in test.
> * SCP starts AssignProcedure of region X from crashed server.
> * DisableTable Procedure runs because test has finished and we're doing table delete. Queues
> * UnassignProcedure for region X.
> * Disable Unassign gets Lock on region X first.
> * SCP AssignProcedure tries to get lock, waits on lock.
> * DisableTable Procedure UnassignProcedure RPC fails because server is down (Thats why the SCP).
> * Tries to expire the server it failed the RPC against. Fails (currently being SCP'd).
> * DisableTable Procedure Unassign is suspended. It is a suspend with lock on region X held
> * SCP can't run because lock on X is held
> * Test timesout.
> {code}
> h3. Delete of online Regions
> Saw this in nightly failure #452 for branch-2 in TestSplitTransactionOnCluster.org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
> {code}
> * DisableTableProcedure is queued before SCP.
> * DisableTableProcedure Unassign fails because can't RPC to crashed server and can't expire.
> * Unassign is Stuck in suspend.
> * SCP runs and cleans up suspended Disable Unassign.
> * SCP completes which includes assign of Disable Unassign region.
> * Disable Unassign completes
> * Disable completes.
> * A scheduled Drop Table Procedure runs (its end of test).
> * Succeeds deleting regions that are actually assigned (see above where SCP assigned region).
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)