You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2015/11/21 04:16:11 UTC

[jira] [Commented] (HBASE-14867) SimpleRegionNormalizer needs to have better heuristics to trigger merge operation

    [ https://issues.apache.org/jira/browse/HBASE-14867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020207#comment-15020207 ] 

Enis Soztutar commented on HBASE-14867:
---------------------------------------

SimpleRN is too simple :) 
I think we should introduce a more sophisticated RN. 
 - Run every 5 min by default rather than 30 min. This will be similar to balancer.
 - SRN computes only 1 action per run. This clearly will not work with with 10K region tables. We should be able to compute a batch of normalization plans.
 - RN should look at best possible actions in terms of splits or merges not only for smallest or largest regions. In a single pass, we should be able to calculate whether to split or merge for every pair of neighbors. 

> SimpleRegionNormalizer needs to have better heuristics to trigger merge operation
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-14867
>                 URL: https://issues.apache.org/jira/browse/HBASE-14867
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.2.0
>            Reporter: Romil Choksi
>
> SimpleRegionNormalizer needs to have better heuristics to trigger merge operation. SimpleRegionNormalizer is not able to trigger a merge action if the table's smallest region has neighboring regions that are larger than table's average region size, whereas there are other smaller regions whose combined size is less than the average region size. 
> For example, 
> - Consider a table with six region, say r1 to r6. 
> - Keep r1 as empty and create some data say, 100K rows of data for each of the regions r2, r3 and r4. Create smaller amount of data for regions r5 and r6, say about 27K rows of data.
> - Run the normalizer. Verify the number the regions for that table and also check the master log to see if any merge action was triggered as a result of normalization. 
> In such scenario, it would be better to have a merge action triggered for those two smaller regions r5 and r6 even though either of them is not the smallest one



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)