You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2020/06/08 09:23:00 UTC

[jira] [Updated] (HBASE-24117) Shutdown AssignmentManager before ProcedureExecutor may cause SCP to accidentally skip assigning a region

     [ https://issues.apache.org/jira/browse/HBASE-24117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Duo Zhang updated HBASE-24117:
------------------------------
    Summary: Shutdown AssignmentManager before ProcedureExecutor may cause SCP to accidentally skip assigning a region  (was: If move target RS crashes, move fails if concurrent master crash)

> Shutdown AssignmentManager before ProcedureExecutor may cause SCP to accidentally skip assigning a region
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-24117
>                 URL: https://issues.apache.org/jira/browse/HBASE-24117
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>            Reporter: Michael Stack
>            Assignee: Duo Zhang
>            Priority: Critical
>             Fix For: 3.0.0-alpha-1, 2.3.0, 2.2.6
>
>         Attachments: org.apache.hadoop.hbase.master.assignment.TestCloseRegionWhileRSCrash-output.txt
>
>
> I saw this on TestCloseRegionWithRSCrash. The Region 788a516d1f86af98e0a16bcc1afe4fa1 was being moved to RS  example.com,62652,1586032098445 just after it was killed. The Move Close fails because the RS has no node in the Master. The Move then tries to 'confirm' the close but it fails because no remote RS. We are then to wait in this state until operator or some other procedure intervenes to 'fix' the state. Normally a ServerCrashProcedure would do the job but in this test the Master is restarted after the RS is killed, a condition we do not accommodate.
> Let me attach the test log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)