You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Daniel Roudnitsky (Jira)" <ji...@apache.org> on 2024/04/17 15:43:00 UTC

[jira] [Created] (HBASE-28533) Region split failure due to region quota limit leaves Hmaster's in memory state for the region in SPLITTING after procedure rollback

Daniel Roudnitsky created HBASE-28533:
-----------------------------------------

             Summary: Region split failure due to region quota limit leaves Hmaster's in memory state for the region in SPLITTING after procedure rollback
                 Key: HBASE-28533
                 URL: https://issues.apache.org/jira/browse/HBASE-28533
             Project: HBase
          Issue Type: Bug
          Components: Region Assignment
    Affects Versions: 2.5.8
         Environment: HBase Version 2.5.8, r37444de6531b1bdabf2e445c83d0268ab1a6f919, Thu Feb 29 15:37:32 PST 2024
            Reporter: Daniel Roudnitsky


When a SplitTableRegionProcedure is run for a region whose namespace is at its maximum region quota limit, the split procedure will fail and rollback, and Hmaster's in memory RegionStateNode for the region is left in a SPLITTING state. Hmaster will then refuse to start any subsequent merge/split/move procedures for that region because it believes the region is not OPEN, until it is restarted and the in memory record of region states is reset.

In the first step of the split procedure SPLIT_TABLE_REGION_PREPARE the parent region's RegionStateNode state is set to SPLITTING, and the transition is not written to the meta table. In the next step SPLIT_TABLE_REGION_PRE_OPERATION the region quota check is done, QuotaExceededException is thrown and the procedure ends in ROLLEDBACK state without reverting the RegionStateNode back to OPEN state. Hmaster is left believing the region is in a SPLITTING state according to its in memory RegionStates, while the region is still online on the assigned region server and according to meta.

To reproduce in HBase shell:

{code:java}
> create_namespace 'test_ns', {'hbase.namespace.quota.maxregions'=> 2}
> create 'test_ns:test_table', 'f1', {NUMREGIONS => 2, SPLITALGO => 'UniformSplit'}
> region_a = <first region from list_regions 'test_ns:test_table'>
> region_b = <second region from list_regions 'test_ns:test_table'>

> split region_a, 'x'
# HMaster will report: 
pid=405, state=ROLLEDBACK, exception=org.apache.hadoop.hbase.quotas.QuotaExceededException via master-split-regions:org.apache.hadoop.hbase.quotas.QuotaExceededException: Region split not possible for :<region_a> as quota limits are exceeded ; SplitTableRegionProcedure table=test_ns:test_table, parent=...

> merge_region region_a, region_b
ERROR: org.apache.hadoop.hbase.exceptions.MergeRegionException: org.apache.hadoop.hbase.client.DoNotRetryRegionException: <region_a> is not OPEN; state=SPLITTING

> stop_master # trigger hmaster failover 
> merge_region region_a, region_b # merge now succeeds {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)