You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Elliott Clark (JIRA)" <ji...@apache.org> on 2013/05/11 02:25:15 UTC
[jira] [Comment Edited] (HBASE-8517) Stochastic Loadbalancer isn't finding steady state on real clusters

    [ https://issues.apache.org/jira/browse/HBASE-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655067#comment-13655067 ] 

Elliott Clark edited comment on HBASE-8517 at 5/11/13 12:23 AM:
----------------------------------------------------------------

bq.How was the constant 600 determined ?
That was the default before adding the scaling.

bq.Should it be related to cluster size (number of servers, number of regions) ?
Possibly but this works well, and number of servers should on a well run cluster be a proxy for number of regions; so it's not really needed in the computation.

bq.Looks like the return value from computeCost() is ignored.
Yes it is just used to output a good debug message.

bq.Is the addition of 9 needed when cluster.numMovedMetaRegions is 0 ?
Yes, because the possibility is there.  You want to scale based upon the highest possible cost.
                
      was (Author: eclark):
    bq.How was the constant 600 determined ?
That was the default before adding the scaling.

bq.Should it be related to cluster size (number of servers, number of regions) ?
Possibly but this works well, and number of servers should on a well run cluster be a proxy for number of regions.

bq.Looks like the return value from computeCost() is ignored.
Yes it is just used to output a good debug message.

bq.Is the addition of 9 needed when cluster.numMovedMetaRegions is 0 ?
Yes, because the possibility is there.  You want to scale based upon the highest possible cost.
                  
> Stochastic Loadbalancer isn't finding steady state on real clusters
> -------------------------------------------------------------------
>
>                 Key: HBASE-8517
>                 URL: https://issues.apache.org/jira/browse/HBASE-8517
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.95.0
>            Reporter: Elliott Clark
>            Assignee: Elliott Clark
>         Attachments: HBASE-8517-0.patch, HBASE-8517-1.patch
>
>
> I have a cluster that runs IT tests.  Last night after all tests were done I noticed that the balancer was thrashing regions around.
> The number of regions on each machine is not correct.
> The balancer seems to value the cost of moving a region way too little.
> {code}
> 2013-05-09 16:34:58,920 DEBUG [IPC Server handler 4 on 60000] org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished computing new load balance plan.  Computation took 5367ms to try 8910 different iterations.  Found a solution that moves 37 regions; Going from a computed cost of 56.50254222730425 to a new cost of 11.214035466575254
> 2013-05-09 16:37:48,715 DEBUG [IPC Server handler 7 on 60000] org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished computing new load balance plan.  Computation took 4735ms to try 8910 different iterations.  Found a solution that moves 38 regions; Going from a computed cost of 56.612624531830996 to a new cost of 11.275763861636982
> 2013-05-09 16:38:11,398 DEBUG [IPC Server handler 6 on 60000] org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished computing new load balance plan.  Computation took 4502ms to try 8910 different iterations.  Found a solution that moves 39 regions; Going from a computed cost of 56.50048461413552 to a new cost of 11.225352339003237
> {code}
> Each of those balancer runs were triggered when there was no load on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira