You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sanjeet Nishad (Jira)" <ji...@apache.org> on 2020/12/10 07:23:00 UTC

[jira] [Commented] (HBASE-22404) Open/Close region request may be executed twice when master restart

    [ https://issues.apache.org/jira/browse/HBASE-22404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247055#comment-17247055 ] 

Sanjeet Nishad commented on HBASE-22404:
----------------------------------------

Hi [~zhangduo] & [~zghao], we are using HBase-2.2.3. Recently we also faced similar problem where regionserver ignored a procedure (closeRegionProcedure) due to duplicate pid which lead region to stuck in RIT.

Analysis:
1. After Hmaster failover, master in-memory proc-id was reset. 
2. Upon new DisableTable client request, Master dispatched a closeRegionProcedure to RS and suspended the proc.
3. But RS ignored the current CloseRegionProcedure request without doing anything since RS had already executed a procedure with same id.

Since no UnAssignRegionHandler was created at Step-3, so RS did not send any reportRegionStateTransition to HM. And at HMaster side the procedure remain in suspended state because we awake the suspended procedure on reportRegionStateTransition. So region stuck in RIT forever until unless we restart HM & RS.

> Open/Close region request may be executed twice when master restart
> -------------------------------------------------------------------
>
>                 Key: HBASE-22404
>                 URL: https://issues.apache.org/jira/browse/HBASE-22404
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 3.0.0-alpha-1, 2.2.0, 2.3.0
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.2.0, 2.3.0
>
>
> We found this problem when run ITBLL for our internal branch which based branch-2.2.
>  # Master A schedule a TRSP which will reopen region1. And this TRSP firstly schdule a sub remote procedure: CloseRegionProcedure and send the close region request to RS.
>  # Master A shutdown and Master B is the new active master. And restore this TRSP and the remote procedure CloseRegionProcedure.
>  # RS reported to the new Master B and the CloseRegionProcedure finished. Then the TRSP schdule a new OpenRegionProcedure and send open region request to RS.
>  # {color:#FF0000}But meanwhile Master B send the close region request to RS again{color}.
>  # The open region request finished firstly and report to master succeed. The master thought the region was opened on RS. But the RS excuted the close region request again and closed the region1.
>  # The Master thought the region opened but the RS closed the region. Then the new TRSP will stuck forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)