You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2011/07/01 15:58:30 UTC
[jira] [Commented] (HBASE-4031) An imbalance result calculated by LoadBalancer

    [ https://issues.apache.org/jira/browse/HBASE-4031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058555#comment-13058555 ] 

Ted Yu commented on HBASE-4031:
-------------------------------

I think HBASE-4053 reveals the cause for this issue.

> An imbalance result calculated by LoadBalancer
> ----------------------------------------------
>
>                 Key: HBASE-4031
>                 URL: https://issues.apache.org/jira/browse/HBASE-4031
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: HMaster222.rar, HRegionServer222.rar
>
>
>   I found the problem while the cluster couldn't balance(Around time of 2011-05-24 11:28).One node's regions count is the double of the other nodes. And it didn't move regions anymore:
>    Address Start Code Load
> 158-1-101-202:20030 1306205409671 requests=0, regions=2593, usedHeap=114, maxHeap=8165 158-1-101-222:20030 1306205940117 requests=0, regions=5841, usedHeap=80, maxHeap=8165 158-1-101-52:20030 1306205417261 requests=0, regions=2622, usedHeap=76, maxHeap=8165 158-1-101-82:20030 1306205415714 requests=0, regions=2633, usedHeap=69, maxHeap=8165 
> Total:  servers: 4   requests=0, regions=13689
>   HBASE-3985-"Same Region could be picked out twice in LoadBalancer" was found by my analysis on this problem.
>   But I'm afraid it's not the main cause of the problem.
>   There's one active master, one standby master, four regionservers in our cluster.
> >>10:57:41, the standby hamster 222 becomes the active one.
> 2011-05-24 10:57:41,314 INFO org.apache.hadoop.hbase.master.HMaster: Master startup proceeding: master failover
> >>4 regionservers was registered in 222 one by one. Only one regionserver seemed some time late.
> 2011-05-24 10:57:37,533 INFO : Registering server=158-1-101-82,20020,1306205415714, regionCount=3388, userLoad=true
> 2011-05-24 10:57:37,537 INFO : Registering server=158-1-101-202,20020,1306205409671, regionCount=3453, userLoad=true
> 2011-05-24 10:57:37,598 INFO : Registering server=158-1-101-52,20020,1306205417261, regionCount=3411, userLoad=true
> 2011-05-24 10:59:00,408 INFO : Registering server=158-1-101-222,20020,1306205940117, regionCount=0, userLoad=false
> >>13134 regions needed to move after rebuildUserRegions(13689 regions in the cluster during the time).
> 2011-05-24 10:58:47,534 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 13134 regions in transition
> >>All the 13134 regions were opened, regions opened count in each server:
> 158-1-101-222,20020,1306205940117    Count: 834
> 158-1-101-82,20020,1306205415714    Count: 4093
> 158-1-101-202,20020,1306205409671    Count: 4118
> 158-1-101-52,20020,1306205417261    Count: 4089
> >>The nearest balancer calculate results:
> 2011-05-24 11:12:11,076 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 19ms. Moving 5012 regions off of 3 overloaded servers onto 1 less loaded servers
> "5012" is an unimaginable number here, for it is larger than the average number "3424.5"

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira