You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Xu Cang (JIRA)" <ji...@apache.org> on 2019/06/28 22:56:00 UTC

[jira] [Comment Edited] (HBASE-22349) Stochastic Load Balancer skips balancing when node is replaced in cluster

    [ https://issues.apache.org/jira/browse/HBASE-22349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875277#comment-16875277 ] 

Xu Cang edited comment on HBASE-22349 at 6/28/19 10:55 PM:
-----------------------------------------------------------

This is a very good observation. One of my co-worker observed and debugged the similar issue in our environment.

Obviously we don't went RS holds 0 regions and LB still think it is 'balanced'. Besides tweaking 'minCostNeedBalance', maybe we can introduce a rule that when RS holds 0 region, it sill trigger balancing regardless. 

 


was (Author: xucang):
This is very good observation. One of my co-worker observed and debugged the similar issue in our environment.

Obviously we don't went RS holds 0 regions and LB still think it is 'balanced'. Besides tweaking 'minCostNeedBalance', maybe we can introduce a rule that when RS holds 0 region, it sill trigger balancing regardless. 

 

> Stochastic Load Balancer skips balancing when node is replaced in cluster
> -------------------------------------------------------------------------
>
>                 Key: HBASE-22349
>                 URL: https://issues.apache.org/jira/browse/HBASE-22349
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.4.4
>            Reporter: Suthan Phillips
>            Priority: Major
>         Attachments: Hbase-22349.pdf
>
>
> In EMR cluster, whenever I replace one of the nodes, the regions never get rebalanced.
> The default minCostNeedBalance set to 0.05 is too high.
> The region count on the servers were: 21, 21, 20, 20, 20, 20, 21, 20, 20, 20 = 203
> Once a node(region server) got replaced with a new node (terminated and EMR recreated a node), the region count on the servers became: 23, 0, 23, 22, 22, 22, 22, 23, 23, 23 = 203
> From hbase-master-logs, I can see the below WARN which indicates that the default minCostNeedBalance does not hold good for these scenarios.
> ##
> 2019-04-29 09:31:37,027 WARN  [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] cleaner.CleanerChore: WALs outstanding under hdfs://ip-172-31-35-122.ec2.internal:8020/user/hbase/oldWALs2019-04-29 09:31:42,920 INFO  [ip-172-31-35-122.ec2.internal,16000,1556524892897_ChoreService_1] balancer.StochasticLoadBalancer: Skipping load balancing because balanced cluster; total cost is 52.041826194833405, sum multiplier is 1102.0 min cost which need balance is 0.05
> ##
> To mitigate this, I had to modify the default minCostNeedBalance to lower value like 0.01f and restart Region Servers and Hbase Master. After modifying this value to 0.01f I could see the regions getting re-balanced.
> This has led me to the following questions which I would like to get it answered from the HBase experts.
> 1)What are the factors that affect the value of total cost and sum multiplier? How could we determine the right minCostNeedBalance value for any cluster?
> 2)How did Hbase arrive at setting the default value to 0.05f? Is it optimal value? If yes, then what is the recommended way to mitigate this scenario? 
> Attached: Steps to reproduce
>  
> Note: HBase-17565 patch is already applied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)