You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Pierre Zemb (JIRA)" <ji...@apache.org> on 2019/06/22 17:20:00 UTC
[jira] [Updated] (HBASE-22618) Provide a way to have Heterogeneous deployment

     [ https://issues.apache.org/jira/browse/HBASE-22618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Zemb updated HBASE-22618:
--------------------------------
    Description: 
Hi,

We wouls like to open the discussion about bringing the possibility to have regions deployed on {color:#222222}Heterogeneous deployment{color}, i.e Hbase cluster running different kind of hardware.
h2. Why?
 * Cloud deployments means that we may not be able to have the same hardware throughout the years
 * Some tables may need special requirements such as SSD whereas others should be using hard-drives
 * {color:#222222} {color}*in our usecase*{color:#222222}(single table, dedicated HBase and Hadoop tuned for our usecase, good key distribution){color}*, the number of regions per RS was the real limit for us*{color:#222222}.{color}

h2. Our usecase

We found out that *in our usecase*(single table, dedicated HBase and Hadoop tuned for our usecase, good key distribution)*, the number of regions per RS was the real limit for us*.

Over the years, due to historical reasons and also the need to benchmark new machines, we ended-up with differents groups of hardware: some servers can handle only 180 regions, whereas the biggest can handle more than 900. Because of such a difference, we had to disable the LoadBalancing to avoid the {{roundRobinAssigmnent}}. We developed some internal tooling which are responsible for load balancing regions across RegionServers. That was 1.5 year ago.
h2. Our Proof-of-concept

We did work on a Proof-of-concept [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java], and some early tests [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java], [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.java], and [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.java]. We wrote the balancer for our use-case, which means that:
 * there is one table
 * there is no region-replica
 * good key dispersion
 * there is no regions on master

A rule file is loaded before balancing. It contains lines of rules. A rule is composed of a regexp for hostname, and a limit. For example, we could have:

 
{quote}rs[0-9] 200

rs1[0-9] 50
{quote}
 

RegionServers with hostname matching the first rules will have a limit of 200, and the others 50. If there's no match, a default is set.

Thanks to the rule, we have two informations: the max number of regions for this cluster, and the rules for each servers. {{HeterogeneousBalancer}} will try to balance regions according to their capacity.

Let's take an example. Let's say that we have 20 RS:
 * 10 RS, named through {{rs0}} to {{rs9}} loaded with 60 regions each, and each can handle 200 regions.
 * 10 RS, named through {{rs10}} to {{rs19}} loaded with 60 regions each, and each can support 50 regions.

Based on the following rules:

 
{quote}rs[0-9] 200

rs1[0-9] 50
{quote}
 

The second group is overloaded, whereas the first group has plenty of space.

We know that we can handle at maximum *2500 regions* (200*10 + 50*10) and we have currently *1200 regions* (60*20). {{HeterogeneousBalancer}} will understand that the cluster is *full at 48.0%* (1200/2500). Based on this information, we will then *try to put all the RegionServers to ~48% of load according to the rules.* In this case, it will move regions from the second group to the first.

The balancer will:
 * compute how many regions needs to be moved. In our example, by moving 36 regions on rs10, we could go from 120.0% to 46.0%
 * select regions with lowest data-locality
 * try to find an appropriate RS for the region. We will take the lowest available RS.

h2. Other implementations and ideas

Clay Baenziger proposed this idea on the dev ML:
{quote}{color:#222222}Could it work to have the stochastic load balancer use [pluggable cost functions instead of this static list of cost functions|[https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java#L198]]? Then, could this type of a load balancer be implemented simply as a new cost function which folks could choose to load and mix with the others?{color}
{quote}
{color:#222222}I think this could be an interesting way to include user-functions in the mix. As you know your hardawre and the pattern access, you can easily know which metrics is important for balancing, for us, it will only be the number of regions, but we could mix-it with the incoming writes!{color}

 

bhupendra.jain proposed also the ideas of "labels"

 
{quote}{color:#222222}Internally, we are also having discussion to develop similar solution. In our approach, We were also thinking of adding "RS Label" Feature similar to Hadoop Node Label feature. {color}

{color:#222222}Each RS can have a label to denote its capabilities / resources . When user create table, there can be extra attributes with its descriptor. The balancer can decide to host region of table based on RS label and these attributes further.  {color}
 {color:#222222}With RS label feature, Balancer can be more intelligent.  Example tables with high read load needs more cache backed by SSDs , So such table regions should be hosted on RS having SSDs ... {color}
{quote}
{color:#222222}I love the idea, but I think Clay's idea is better for a better and faster first set of commits on the subject! What do you think? {color}

  was:
Hi,

We wouls like to open the discussion about bringing the possibility to have regions deployed on {color:#222222}Heterogeneous deployment{color}, i.e Hbase cluster running different kind of hardware.
h2. Why?
 * Cloud deployments means that we may not be able to have the same hardware throughout the years
 * Some tables may need special requirements such as SSD whereas others should be using hard-drives
 * {color:#222222} {color}*in our usecase*{color:#222222}(single table, dedicated HBase and Hadoop tuned for our usecase, good key distribution){color}*, the number of regions per RS was the real limit for us*{color:#222222}.{color} 

h2. Our usecase

We found out that *in our usecase*(single table, dedicated HBase and Hadoop tuned for our usecase, good key distribution)*, the number of regions per RS was the real limit for us*.

Over the years, due to historical reasons and also the need to benchmark new machines, we ended-up with differents groups of hardware: some servers can handle only 180 regions, whereas the biggest can handle more than 900. Because of such a difference, we had to disable the LoadBalancing to avoid the {{roundRobinAssigmnent}}. We developed some internal tooling which are responsible for load balancing regions across RegionServers. That was 1.5 year ago.
h2. Our Proof-of-concept

We did work on a Proof-of-concept [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java], and some early tests [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java], [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.java], and [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.java]. We wrote the balancer for our use-case, which means that:
 * there is one table
 * there is no region-replica
 * good key dispersion
 * there is no regions on master

A rule file is loaded before balancing. It contains lines of rules. A rule is composed of a regexp for hostname, and a limit. For example, we could have:

{{rs[0-9] 200 rs1[0-9] 50 }}

RegionServers with hostname matching the first rules will have a limit of 200, and the others 50. If there's no match, a default is set.

Thanks to the rule, we have two informations: the max number of regions for this cluster, and the rules for each servers. {{HeterogeneousBalancer}} will try to balance regions according to their capacity.

Let's take an example. Let's say that we have 20 RS:
 * 10 RS, named through {{rs0}} to {{rs9}} loaded with 60 regions each, and each can handle 200 regions.
 * 10 RS, named through {{rs10}} to {{rs19}} loaded with 60 regions each, and each can support 50 regions.

Based on the following rules:

{{rs[0-9] 200 rs1[0-9] 50 }}

The second group is overloaded, whereas the first group has plenty of space.

We know that we can handle at maximum *2500 regions* (200*10 + 50*10) and we have currently *1200 regions* (60*20). {{HeterogeneousBalancer}} will understand that the cluster is *full at 48.0%* (1200/2500). Based on this information, we will then *try to put all the RegionServers to ~48% of load according to the rules.* In this case, it will move regions from the second group to the first.

The balancer will:
 * compute how many regions needs to be moved. In our example, by moving 36 regions on rs10, we could go from 120.0% to 46.0%
 * select regions with lowest data-locality
 * try to find an appropriate RS for the region. We will take the lowest available RS.

h2. Other implementations and ideas

Clay Baenziger proposed this idea on the dev ML:
{quote}{color:#222222}Could it work to have the stochastic load balancer use [pluggable cost functions instead of this static list of cost functions|[https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java#L198]]? Then, could this type of a load balancer be implemented simply as a new cost function which folks could choose to load and mix with the others?{color}
{quote}
{color:#222222}I think this could be an interesting way to include user-functions in the mix. As you know your hardawre and the pattern access, you can easily know which metrics is important for balancing, for us, it will only be the number of regions, but we could mix-it with the incoming writes!
{color}

 

bhupendra.jain proposed also the ideas of "labels"

 
{quote}
h1. {color:#222222}Internally, we are also having discussion to develop similar solution. In our approach, We were also thinking of adding "RS Label" Feature similar to Hadoop Node Label feature. {color}
{color:#222222}Each RS can have a label to denote its capabilities / resources . When user create table, there can be extra attributes with its descriptor. The balancer can decide to host region of table based on RS label and these attributes further.  {color}
{color:#222222}With RS label feature, Balancer can be more intelligent.  Example tables with high read load needs more cache backed by SSDs , So such table regions should be hosted on RS having SSDs ... {color}
{quote}
{color:#222222}I love the idea, but I think Clay's idea is better for a better and faster first set of commits on the subject! What do you think? {color}


> Provide a way to have Heterogeneous deployment
> ----------------------------------------------
>
>                 Key: HBASE-22618
>                 URL: https://issues.apache.org/jira/browse/HBASE-22618
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.1.6, 1.4.11
>            Reporter: Pierre Zemb
>            Priority: Major
>
> Hi,
> We wouls like to open the discussion about bringing the possibility to have regions deployed on {color:#222222}Heterogeneous deployment{color}, i.e Hbase cluster running different kind of hardware.
> h2. Why?
>  * Cloud deployments means that we may not be able to have the same hardware throughout the years
>  * Some tables may need special requirements such as SSD whereas others should be using hard-drives
>  * {color:#222222} {color}*in our usecase*{color:#222222}(single table, dedicated HBase and Hadoop tuned for our usecase, good key distribution){color}*, the number of regions per RS was the real limit for us*{color:#222222}.{color}
> h2. Our usecase
> We found out that *in our usecase*(single table, dedicated HBase and Hadoop tuned for our usecase, good key distribution)*, the number of regions per RS was the real limit for us*.
> Over the years, due to historical reasons and also the need to benchmark new machines, we ended-up with differents groups of hardware: some servers can handle only 180 regions, whereas the biggest can handle more than 900. Because of such a difference, we had to disable the LoadBalancing to avoid the {{roundRobinAssigmnent}}. We developed some internal tooling which are responsible for load balancing regions across RegionServers. That was 1.5 year ago.
> h2. Our Proof-of-concept
> We did work on a Proof-of-concept [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java], and some early tests [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/HeterogeneousBalancer.java], [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerBalance.java], and [here|https://github.com/PierreZ/hbase/blob/dev/hbase14/balancer/hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestHeterogeneousBalancerRules.java]. We wrote the balancer for our use-case, which means that:
>  * there is one table
>  * there is no region-replica
>  * good key dispersion
>  * there is no regions on master
> A rule file is loaded before balancing. It contains lines of rules. A rule is composed of a regexp for hostname, and a limit. For example, we could have:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> RegionServers with hostname matching the first rules will have a limit of 200, and the others 50. If there's no match, a default is set.
> Thanks to the rule, we have two informations: the max number of regions for this cluster, and the rules for each servers. {{HeterogeneousBalancer}} will try to balance regions according to their capacity.
> Let's take an example. Let's say that we have 20 RS:
>  * 10 RS, named through {{rs0}} to {{rs9}} loaded with 60 regions each, and each can handle 200 regions.
>  * 10 RS, named through {{rs10}} to {{rs19}} loaded with 60 regions each, and each can support 50 regions.
> Based on the following rules:
>  
> {quote}rs[0-9] 200
> rs1[0-9] 50
> {quote}
>  
> The second group is overloaded, whereas the first group has plenty of space.
> We know that we can handle at maximum *2500 regions* (200*10 + 50*10) and we have currently *1200 regions* (60*20). {{HeterogeneousBalancer}} will understand that the cluster is *full at 48.0%* (1200/2500). Based on this information, we will then *try to put all the RegionServers to ~48% of load according to the rules.* In this case, it will move regions from the second group to the first.
> The balancer will:
>  * compute how many regions needs to be moved. In our example, by moving 36 regions on rs10, we could go from 120.0% to 46.0%
>  * select regions with lowest data-locality
>  * try to find an appropriate RS for the region. We will take the lowest available RS.
> h2. Other implementations and ideas
> Clay Baenziger proposed this idea on the dev ML:
> {quote}{color:#222222}Could it work to have the stochastic load balancer use [pluggable cost functions instead of this static list of cost functions|[https://github.com/apache/hbase/blob/baf3ae80f5588ee848176adefc9f56818458a387/hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java#L198]]? Then, could this type of a load balancer be implemented simply as a new cost function which folks could choose to load and mix with the others?{color}
> {quote}
> {color:#222222}I think this could be an interesting way to include user-functions in the mix. As you know your hardawre and the pattern access, you can easily know which metrics is important for balancing, for us, it will only be the number of regions, but we could mix-it with the incoming writes!{color}
>  
> bhupendra.jain proposed also the ideas of "labels"
>  
> {quote}{color:#222222}Internally, we are also having discussion to develop similar solution. In our approach, We were also thinking of adding "RS Label" Feature similar to Hadoop Node Label feature. {color}
> {color:#222222}Each RS can have a label to denote its capabilities / resources . When user create table, there can be extra attributes with its descriptor. The balancer can decide to host region of table based on RS label and these attributes further.  {color}
>  {color:#222222}With RS label feature, Balancer can be more intelligent.  Example tables with high read load needs more cache backed by SSDs , So such table regions should be hosted on RS having SSDs ... {color}
> {quote}
> {color:#222222}I love the idea, but I think Clay's idea is better for a better and faster first set of commits on the subject! What do you think? {color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)