You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Charlie Qiangeng Xu (JIRA)" <ji...@apache.org> on 2016/11/24 13:18:58 UTC
[jira] [Issue Comment Deleted] (HBASE-17110) Add an "Overall Strategy" option(balanced both on table level and server level) to SimpleLoadBalancer

     [ https://issues.apache.org/jira/browse/HBASE-17110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Charlie Qiangeng Xu updated HBASE-17110:
----------------------------------------
    Comment: was deleted

(was: Hi Anoop, ideal I shouldn't add extra for SLB, but only reuse the original code flow in HMaster can't satisfy the 
need for this new requirement.
Irrespective of the type of balancer, the original flow works as:
Step 1 Hmaster:balance() call-> RegionStates:getAssignmentsByTable() 
Step 2 RegionStates:getAssignmentsByTable()   check if it is bytable or not:
   (1). if bytable: return a map of  <TableName, Map<ServerName, List<HRegionInfo>>> of the whole cluster
   (2). if not bytable: return a map of Map<TableName, Map<ServerName, List<HRegionInfo>>> , but all TableName would be replaced by "hbase:ensemble"
step 3 A for loop to execute the plan generation: 
  for (Entry<TableName, Map<ServerName, List<HRegionInfo>>> e : assignmentsByTable.entrySet()) { 
       ----add balance plan for the table of this TableName-----
  }
So if it is bytable, every time the Balancer class  generate plan for a specific table. 
However for not bytable, since there is only one table called "hbase:ensemble", the for loop block would loop once and do everything together.
For the enhancement, we want to do every table separately but still aware the the server loads of whole cluster as well. So for SLB, I add logic in HMaster to get whole server loads. 

If we decide not going this way, there are two condition:

(1) user's configuration is not bytable: SLB will get  a Map<TableName, Map<ServerName, List<HRegionInfo>>> with only one table name:  "hbase:ensemble"
we need to first get load for each server from this map. Then, with the code logic copy from getAssignmentsByTable into SLB, we can separate the map to entries with real tablename, then loop through that new map.

(2) user's configuration is bytable: I will call RegionStates:getAssignmentsByTable() to get server load info at the first loop. 
Another problem would be I have to check the users' configuration to decide (1) or (2) should I execute.

 

)

> Add an "Overall Strategy" option(balanced both on table level and server level) to SimpleLoadBalancer
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-17110
>                 URL: https://issues.apache.org/jira/browse/HBASE-17110
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer
>    Affects Versions: 2.0.0, 1.2.4
>            Reporter: Charlie Qiangeng Xu
>            Assignee: Charlie Qiangeng Xu
>         Attachments: HBASE-17110-V2.patch, HBASE-17110-V3.patch, HBASE-17110-V4.patch, HBASE-17110-V5.patch, HBASE-17110.patch
>
>
> This jira is about an enhancement of simpleLoadBalancer. Here we introduce a new strategy: "bytableOverall" which could be controlled by adding:
> {noformat}
> <property>
>   <name>hbase.master.loadbalance.bytableOverall</name>
>   <value>true</value>
> </property>
> {noformat}
> We have been using the strategy on our largest cluster for several months. it's proven to be very helpful and stable, especially, the result is quite visible to the users.
> Here is the reason why it's helpful:
> When operating large scale clusters(our case), some companies still prefer to use {{SimpleLoadBalancer}} due to its simplicity, quick balance plan generation, etc. Current SimpleLoadBalancer has two modes: 
> 1. byTable, which only guarantees that the regions of one table could be uniformly distributed. 
> 2. byCluster, which ignores the distribution within tables and balance the regions all together.
> If the pressures on different tables are different, the first byTable option is the preferable one in most case. Yet, this choice sacrifice the cluster level balance and would cause some servers to have significantly higher load, e.g. 242 regions on server A but 417 regions on server B.(real world stats)
> Consider this case,  a cluster has 3 tables and 4 servers:
> {noformat}
>   server A has 3 regions: table1:1, table2:1, table3:1
>   server B has 3 regions: table1:2, table2:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 0 regions.
> {noformat}
> From the byTable strategy's perspective, the cluster has already been perfectly balanced on table level. But a perfect status should be like:
> {noformat}
>   server A has 2 regions: table2:1, table3:1
>   server B has 2 regions: table1:2, table3:2
>   server C has 3 regions: table1:3, table2:3, table3:3
>   server D has 2 regions: table1:1, table2:2
> {noformat}
> We can see the server loads change from 3,3,3,0 to 2,2,3,2, while the table1, table2 and table3 still keep balanced.   
> And this is what the new mode "byTableOverall" can achieve.
> Two UTs have been added as well and the last one demonstrates the advantage of the new strategy.
> Also, a onConfigurationChange method has been implemented to hot control the "slop" variable.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)