You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Viraj Jasani (Jira)" <ji...@apache.org> on 2021/11/08 17:49:00 UTC

[jira] [Created] (HBASE-26433) Rollback from ZK-less to ZK-based assignment could produce inconsistent state - doubly assigned regions

Viraj Jasani created HBASE-26433:
------------------------------------

             Summary: Rollback from ZK-less to ZK-based assignment could produce inconsistent state - doubly assigned regions
                 Key: HBASE-26433
                 URL: https://issues.apache.org/jira/browse/HBASE-26433
             Project: HBase
          Issue Type: Bug
    Affects Versions: 1.7.1
            Reporter: Viraj Jasani
            Assignee: Viraj Jasani
             Fix For: 1.7.2


By enabling configĀ {_}hbase.assignment.usezk.migrating{_}, we initiate the transition of HBase 1.x cluster from default ZK-based region assignment to ZK-less region assignments. Once the migration is enabled, any subsequent region transition is going to add two additional CQs in meta: info:sn and info:state. The workflow that adds new CQs in meta should be the only workflow reading it (unless it requires coordination among multiple workflows), however that is not the case here. Reading info:sn and info:state to rebuild user region states in RegionStateStore data structure is a hidden bug because it doesn't restrict the usage for only ZK-less region assignment.

What are the effects?

After enabling ZK-less migration, if we revert it back, info:state and info:sn are not reverted. Moreover, new active master rebuilds the region states in memory and use this info. So if all regions have consistent info:sn values (i.e. consistent with info:server and info:serverstartcode), nothing goes wrong and this is likely going to happen when we revert the config with rolling restart of masters. However, after this config revert, if any region moves, only info:server and info:serverstartcode get updated but info:sn and info:state values stay the same. Because of the missing condition, subsequent active master restart would try to rebuild regions and assign regions as per info:sn, but those regions are already OPEN on info:server, hence we get doubly assigned regions.

We need two part fix for this:
 # Guard reading of info:sn and info:state with proper conditions.
 # Once active master init is complete, if ZK-based region assignment is enabled and redundant CQs are available in meta (info:sn and info:state), delete them all.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)