You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Mohammad Arshad (Jira)" <ji...@apache.org> on 2020/07/09 20:37:00 UTC

[jira] [Commented] (HBASE-24676) Meta region assignment is blocked when all RS in meta table group are restarted.

    [ https://issues.apache.org/jira/browse/HBASE-24676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154912#comment-17154912 ] 

Mohammad Arshad commented on HBASE-24676:
-----------------------------------------

When default rsgroup server is stopped or started, default group server list is updated in hbase:rsgroup table and in memory. But when last region server is stopped, updating server list in hbase:rsgroup table hangs because hbase:meta table is down as there is no region server to host the meta region.

AssignmentManager keeps trying to assign the Meta but as no region server is available it keeps failing.

Now when a default rsgroup region server started it is not updated in default server list as previous update call is still hanged. So default group server list update call is waiting for the Meta to be online and AssignmentManager is waiting for new servers to be added into default rsgroup server list. Both are in race condition.

> Meta region assignment is blocked when all RS in meta table group are restarted.
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-24676
>                 URL: https://issues.apache.org/jira/browse/HBASE-24676
>             Project: HBase
>          Issue Type: Bug
>          Components: rsgroup
>    Affects Versions: 2.2.3
>            Reporter: Mohammad Arshad
>            Assignee: Mohammad Arshad
>            Priority: Major
>
> This issue happened in a test cluster. The issue does not reproduce easily. But we can reproduce it with debug points in code.
> Steps to reproduce:
> # Install a HBase cluster with three RS(rs1,rs2 and rs3) and one Master
> # Create two rsgroups r1 and r2 and move rs1 to r1 and rs2 to r2
> {code}
> add_rsgroup 'r1';add_rsgroup 'r2';move_servers_rsgroup 'r1',['rs1Host:16020'];move_servers_rsgroup 'r2',['rs2Host:16020']
> {code}
> # Create a table t1
> {code}create 't1','f1','f2';put't1','r1','f1:c1','v1'{code}
> # Start debugging master, put debug point in while loop of {code}org.apache.hadoop.hbase.rsgroup.RSGroupInfoManagerImpl.ServerEventsListenerThread#run{code} method.
> # Stop rs3
> # When debug flow comes, wait around 30 seconds to let the meta be offline and then let the debug flow execute. By now meta will be offline as rs3 is stopped. HMaster UI will hang as meta is offline.
> # Now start rs3, after start meta should be online and Master UI should open.
> # No, still master UI hangs, then you have reproduced the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)