You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "feng xu (JIRA)" <ji...@apache.org> on 2011/07/25 03:06:09 UTC

[jira] [Created] (HBASE-4134) the bug about the region nums when hbck tool to fix the region multi deployed problem.

the bug about the region nums when hbck tool  to fix the region multi deployed problem.
---------------------------------------------------------------------------------------

                 Key: HBASE-4134
                 URL: https://issues.apache.org/jira/browse/HBASE-4134
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.3
            Reporter: feng xu
             Fix For: 0.90.4


I run ./hbase hbck to check my cluster healthly.
the result is:
.......
ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
Summary: 
  -ROOT- is okay. 
    Number of regions: 1 
    Deployed on: 158-1-91-105:20020 
  .META. is okay. 
    Number of regions: 1 
    Deployed on: 158-1-91-103:20020 
  test1 is okay. 
    Number of regions: 25297 
    Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
14829 inconsistencies detected. 
Status: INCONSISTENT 

after run the ./hbase hbck -fix,the problem be fixed but I found that total region nums  also be added when no data be putted.
.......
Line 105: 2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
Line 185253: 2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
....

I check the test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. region to find what happend on this region when execute the hbck tool.
I found that the test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. be close silently whitout update zk node. if be opened again on another 
regionserver this region will be count twice AssignmentManager.servers list.

the open log is:
	Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
	Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
	Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
	Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
	Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
	Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
	Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
	Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
	Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
	Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
	Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
	Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
	

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4134) The total number of regions was more than the actual region count after the hbck fix

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4134:
--------------------------

    Description: 
1. I found the problem(some regions were multiply assigned) while running hbck to check the cluster's health. Here's the result:
{noformat}
ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
Summary: 
  -ROOT- is okay. 
    Number of regions: 1 
    Deployed on: 158-1-91-105:20020 
  .META. is okay. 
    Number of regions: 1 
    Deployed on: 158-1-91-103:20020 
  test1 is okay. 
    Number of regions: 25297 
    Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
14829 inconsistencies detected. 
Status: INCONSISTENT 
{noformat}

2. Then I tried to use "hbck -fix" to fix the problem. Everything seemed ok. But I found that the total number of regions reported by load balancer (35029) was more than the actual region count(25299) after the fixing.
Here's the related logs snippet:
{noformat}
2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
{noformat}

3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
(1) It was assigned to "158-1-91-101" at first. 
(2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without notice to HMaster.
(3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
(4) HBCK will trigger a new assignment.

The fact is, the region was assigned again, but the old assignment information still remained in AM#regions,AM#servers.

That's why the problem of "region count was larger than the actual number" occurred.  

{noformat}
Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
{noformat}

  was:
1. I found the problem(Some regions were multiplied) while running hbck to check the cluster's health. Here's the result:
{noformat}
ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
Summary: 
  -ROOT- is okay. 
    Number of regions: 1 
    Deployed on: 158-1-91-105:20020 
  .META. is okay. 
    Number of regions: 1 
    Deployed on: 158-1-91-103:20020 
  test1 is okay. 
    Number of regions: 25297 
    Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
14829 inconsistencies detected. 
Status: INCONSISTENT 
{noformat}

2. Then I tried use "hbck -fix" to fix the problems, everything seemed ok. But I found that the total numbers of Regions(35029) was more than the actual regions count(25299) while balancing after the fixing.
Here's the related logs snippet:
{noformat}
2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
{noformat}

3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
(1) It was assigned to "158-1-91-101" at first. 
(2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without noticing to HMaster.
(3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
(4) HBCK will tricker a new assignment.

The fact is, the region was assigned again, but the old assignment information was still remained in the sets of AM#regions,AM#servers.

That's why did the problem of "regions count was larger than the actual number" occur.  

{noformat}
Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
{noformat}

        Summary: The total number of regions was more than the actual region count after the hbck fix  (was: the total numbers of Regions was more than the actual regions count while balancing after the hbck fixing.)

> The total number of regions was more than the actual region count after the hbck fix
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-4134
>                 URL: https://issues.apache.org/jira/browse/HBASE-4134
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: feng xu
>             Fix For: 0.90.4
>
>
> 1. I found the problem(some regions were multiply assigned) while running hbck to check the cluster's health. Here's the result:
> {noformat}
> ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
> Summary: 
>   -ROOT- is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-105:20020 
>   .META. is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-103:20020 
>   test1 is okay. 
>     Number of regions: 25297 
>     Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
> 14829 inconsistencies detected. 
> Status: INCONSISTENT 
> {noformat}
> 2. Then I tried to use "hbck -fix" to fix the problem. Everything seemed ok. But I found that the total number of regions reported by load balancer (35029) was more than the actual region count(25299) after the fixing.
> Here's the related logs snippet:
> {noformat}
> 2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
> 2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
> {noformat}
> 3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
> (1) It was assigned to "158-1-91-101" at first. 
> (2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without notice to HMaster.
> (3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
> (4) HBCK will trigger a new assignment.
> The fact is, the region was assigned again, but the old assignment information still remained in AM#regions,AM#servers.
> That's why the problem of "region count was larger than the actual number" occurred.  
> {noformat}
> Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
> Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
> Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
> Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
> Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
> Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
> Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
> Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
> Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4134) the total numbers of Regions was more than the actual regions count while balancing after the hbck fixing.

Posted by "feng xu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

feng xu updated HBASE-4134:
---------------------------

    Description: 
1. I found the problem(Some regions were multiplied) while running hbck to check the cluster's health. Here's the result:
{noformat}
ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
Summary: 
  -ROOT- is okay. 
    Number of regions: 1 
    Deployed on: 158-1-91-105:20020 
  .META. is okay. 
    Number of regions: 1 
    Deployed on: 158-1-91-103:20020 
  test1 is okay. 
    Number of regions: 25297 
    Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
14829 inconsistencies detected. 
Status: INCONSISTENT 
{noformat}

2. Then I tried use "hbck -fix" to fix the problems, everything seemed ok. But I found that the total numbers of Regions(35029) was more than the actual regions count(25299) while balancing after the fixing.
Here's the related logs snippet:
{noformat}
2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
{noformat}

3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
(1) It was assigned to "158-1-91-101" at first. 
(2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without noticing to HMaster.
(3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
(4) HBCK will tricker a new assignment.

The fact is, the region was assigned again, but the old assignment information was still remained in the sets of AM#regions,AM#servers.

That's why did the problem of "regions count was larger than the actual number" occur.  

{noformat}
Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
{noformat}

  was:
I run ./hbase hbck to check my cluster healthly.
the result is:
.......
ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
Summary: 
  -ROOT- is okay. 
    Number of regions: 1 
    Deployed on: 158-1-91-105:20020 
  .META. is okay. 
    Number of regions: 1 
    Deployed on: 158-1-91-103:20020 
  test1 is okay. 
    Number of regions: 25297 
    Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
14829 inconsistencies detected. 
Status: INCONSISTENT 

after run the ./hbase hbck -fix,the problem be fixed but I found that total region nums  also be added when no data be putted.
.......
Line 105: 2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
Line 185253: 2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
....

I check the test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. region to find what happend on this region when execute the hbck tool.
I found that the test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. be close silently whitout update zk node. if be opened again on another 
regionserver this region will be count twice AssignmentManager.servers list.

the open log is:
	Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
	Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
	Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
	Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
	Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
	Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
	Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
	Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
	Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
	Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
	Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
	Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
	

        Summary: the total numbers of Regions was more than the actual regions count while balancing after the hbck fixing.  (was: the bug about the region nums when hbck tool  to fix the region multi deployed problem.)

> the total numbers of Regions was more than the actual regions count while balancing after the hbck fixing.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4134
>                 URL: https://issues.apache.org/jira/browse/HBASE-4134
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: feng xu
>             Fix For: 0.90.4
>
>
> 1. I found the problem(Some regions were multiplied) while running hbck to check the cluster's health. Here's the result:
> {noformat}
> ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
> Summary: 
>   -ROOT- is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-105:20020 
>   .META. is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-103:20020 
>   test1 is okay. 
>     Number of regions: 25297 
>     Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
> 14829 inconsistencies detected. 
> Status: INCONSISTENT 
> {noformat}
> 2. Then I tried use "hbck -fix" to fix the problems, everything seemed ok. But I found that the total numbers of Regions(35029) was more than the actual regions count(25299) while balancing after the fixing.
> Here's the related logs snippet:
> {noformat}
> 2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
> 2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
> {noformat}
> 3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
> (1) It was assigned to "158-1-91-101" at first. 
> (2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without noticing to HMaster.
> (3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
> (4) HBCK will tricker a new assignment.
> The fact is, the region was assigned again, but the old assignment information was still remained in the sets of AM#regions,AM#servers.
> That's why did the problem of "regions count was larger than the actual number" occur.  
> {noformat}
> Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
> Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
> Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
> Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
> Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
> Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
> Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
> Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
> Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4134) The total number of regions was more than the actual region count after the hbck fix

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-4134:
---------------------------------

    Fix Version/s:     (was: 0.94.0)
                   0.96.0

Pushing to 0.96. Feel free to move back if you disagree.
                
> The total number of regions was more than the actual region count after the hbck fix
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-4134
>                 URL: https://issues.apache.org/jira/browse/HBASE-4134
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: feng xu
>             Fix For: 0.96.0
>
>
> 1. I found the problem(some regions were multiply assigned) while running hbck to check the cluster's health. Here's the result:
> {noformat}
> ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
> Summary: 
>   -ROOT- is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-105:20020 
>   .META. is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-103:20020 
>   test1 is okay. 
>     Number of regions: 25297 
>     Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
> 14829 inconsistencies detected. 
> Status: INCONSISTENT 
> {noformat}
> 2. Then I tried to use "hbck -fix" to fix the problem. Everything seemed ok. But I found that the total number of regions reported by load balancer (35029) was more than the actual region count(25299) after the fixing.
> Here's the related logs snippet:
> {noformat}
> 2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
> 2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
> {noformat}
> 3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
> (1) It was assigned to "158-1-91-101" at first. 
> (2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without notice to HMaster.
> (3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
> (4) HBCK will trigger a new assignment.
> The fact is, the region was assigned again, but the old assignment information still remained in AM#regions,AM#servers.
> That's why the problem of "region count was larger than the actual number" occurred.  
> {noformat}
> Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
> Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
> Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
> Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
> Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
> Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
> Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
> Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
> Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4134) The total number of regions was more than the actual region count after the hbck fix

Posted by "feng xu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

feng xu updated HBASE-4134:
---------------------------

    Fix Version/s:     (was: 0.94.0)
                   0.92.0

> The total number of regions was more than the actual region count after the hbck fix
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-4134
>                 URL: https://issues.apache.org/jira/browse/HBASE-4134
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: feng xu
>             Fix For: 0.92.0
>
>
> 1. I found the problem(some regions were multiply assigned) while running hbck to check the cluster's health. Here's the result:
> {noformat}
> ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
> Summary: 
>   -ROOT- is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-105:20020 
>   .META. is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-103:20020 
>   test1 is okay. 
>     Number of regions: 25297 
>     Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
> 14829 inconsistencies detected. 
> Status: INCONSISTENT 
> {noformat}
> 2. Then I tried to use "hbck -fix" to fix the problem. Everything seemed ok. But I found that the total number of regions reported by load balancer (35029) was more than the actual region count(25299) after the fixing.
> Here's the related logs snippet:
> {noformat}
> 2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
> 2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
> {noformat}
> 3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
> (1) It was assigned to "158-1-91-101" at first. 
> (2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without notice to HMaster.
> (3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
> (4) HBCK will trigger a new assignment.
> The fact is, the region was assigned again, but the old assignment information still remained in AM#regions,AM#servers.
> That's why the problem of "region count was larger than the actual number" occurred.  
> {noformat}
> Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
> Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
> Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
> Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
> Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
> Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
> Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
> Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
> Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4134) The total number of regions was more than the actual region count after the hbck fix

Posted by "feng xu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

feng xu updated HBASE-4134:
---------------------------

    Fix Version/s:     (was: 0.90.4)
                   0.94.0

> The total number of regions was more than the actual region count after the hbck fix
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-4134
>                 URL: https://issues.apache.org/jira/browse/HBASE-4134
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: feng xu
>             Fix For: 0.94.0
>
>
> 1. I found the problem(some regions were multiply assigned) while running hbck to check the cluster's health. Here's the result:
> {noformat}
> ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
> Summary: 
>   -ROOT- is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-105:20020 
>   .META. is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-103:20020 
>   test1 is okay. 
>     Number of regions: 25297 
>     Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
> 14829 inconsistencies detected. 
> Status: INCONSISTENT 
> {noformat}
> 2. Then I tried to use "hbck -fix" to fix the problem. Everything seemed ok. But I found that the total number of regions reported by load balancer (35029) was more than the actual region count(25299) after the fixing.
> Here's the related logs snippet:
> {noformat}
> 2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
> 2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
> {noformat}
> 3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
> (1) It was assigned to "158-1-91-101" at first. 
> (2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without notice to HMaster.
> (3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
> (4) HBCK will trigger a new assignment.
> The fact is, the region was assigned again, but the old assignment information still remained in AM#regions,AM#servers.
> That's why the problem of "region count was larger than the actual number" occurred.  
> {noformat}
> Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
> Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
> Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
> Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
> Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
> Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
> Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
> Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
> Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4134) The total number of regions was more than the actual region count after the hbck fix

Posted by "feng xu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070954#comment-13070954 ] 

feng xu commented on HBASE-4134:
--------------------------------

To Ted Yu:
The HBASE-4053 patch has been integrated before this issue occurred in my test cluster.
I think this issue has no relationship with HBASE-4053.
the HBASE-4053 patch ensure that the region is not double counting in one regionserver.
but in this issue the region was carried by two(maybe more) regionservers.

> The total number of regions was more than the actual region count after the hbck fix
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-4134
>                 URL: https://issues.apache.org/jira/browse/HBASE-4134
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: feng xu
>             Fix For: 0.94.0
>
>
> 1. I found the problem(some regions were multiply assigned) while running hbck to check the cluster's health. Here's the result:
> {noformat}
> ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
> Summary: 
>   -ROOT- is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-105:20020 
>   .META. is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-103:20020 
>   test1 is okay. 
>     Number of regions: 25297 
>     Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
> 14829 inconsistencies detected. 
> Status: INCONSISTENT 
> {noformat}
> 2. Then I tried to use "hbck -fix" to fix the problem. Everything seemed ok. But I found that the total number of regions reported by load balancer (35029) was more than the actual region count(25299) after the fixing.
> Here's the related logs snippet:
> {noformat}
> 2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
> 2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
> {noformat}
> 3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
> (1) It was assigned to "158-1-91-101" at first. 
> (2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without notice to HMaster.
> (3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
> (4) HBCK will trigger a new assignment.
> The fact is, the region was assigned again, but the old assignment information still remained in AM#regions,AM#servers.
> That's why the problem of "region count was larger than the actual number" occurred.  
> {noformat}
> Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
> Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
> Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
> Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
> Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
> Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
> Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
> Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
> Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4134) The total number of regions was more than the actual region count after the hbck fix

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070498#comment-13070498 ] 

Ted Yu commented on HBASE-4134:
-------------------------------

https://issues.apache.org/jira/browse/HBASE-4053 is in 0.90.4 RC1
Do you want to try out RC1 to see if the situation of double counting has improved ?

> The total number of regions was more than the actual region count after the hbck fix
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-4134
>                 URL: https://issues.apache.org/jira/browse/HBASE-4134
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: feng xu
>             Fix For: 0.90.4
>
>
> 1. I found the problem(some regions were multiply assigned) while running hbck to check the cluster's health. Here's the result:
> {noformat}
> ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
> Summary: 
>   -ROOT- is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-105:20020 
>   .META. is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-103:20020 
>   test1 is okay. 
>     Number of regions: 25297 
>     Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
> 14829 inconsistencies detected. 
> Status: INCONSISTENT 
> {noformat}
> 2. Then I tried to use "hbck -fix" to fix the problem. Everything seemed ok. But I found that the total number of regions reported by load balancer (35029) was more than the actual region count(25299) after the fixing.
> Here's the related logs snippet:
> {noformat}
> 2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
> 2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
> {noformat}
> 3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
> (1) It was assigned to "158-1-91-101" at first. 
> (2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without notice to HMaster.
> (3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
> (4) HBCK will trigger a new assignment.
> The fact is, the region was assigned again, but the old assignment information still remained in AM#regions,AM#servers.
> That's why the problem of "region count was larger than the actual number" occurred.  
> {noformat}
> Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
> Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
> Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
> Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
> Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
> Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
> Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
> Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
> Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4134) The total number of regions was more than the actual region count after the hbck fix

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-4134:
------------------------------

    Fix Version/s:     (was: 0.92.0)
                   0.94.0

> The total number of regions was more than the actual region count after the hbck fix
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-4134
>                 URL: https://issues.apache.org/jira/browse/HBASE-4134
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: feng xu
>             Fix For: 0.94.0
>
>
> 1. I found the problem(some regions were multiply assigned) while running hbck to check the cluster's health. Here's the result:
> {noformat}
> ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
> Summary: 
>   -ROOT- is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-105:20020 
>   .META. is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-103:20020 
>   test1 is okay. 
>     Number of regions: 25297 
>     Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
> 14829 inconsistencies detected. 
> Status: INCONSISTENT 
> {noformat}
> 2. Then I tried to use "hbck -fix" to fix the problem. Everything seemed ok. But I found that the total number of regions reported by load balancer (35029) was more than the actual region count(25299) after the fixing.
> Here's the related logs snippet:
> {noformat}
> 2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
> 2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
> {noformat}
> 3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
> (1) It was assigned to "158-1-91-101" at first. 
> (2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without notice to HMaster.
> (3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
> (4) HBCK will trigger a new assignment.
> The fact is, the region was assigned again, but the old assignment information still remained in AM#regions,AM#servers.
> That's why the problem of "region count was larger than the actual number" occurred.  
> {noformat}
> Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
> Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
> Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
> Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
> Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
> Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
> Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
> Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
> Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4134) The total number of regions was more than the actual region count after the hbck fix

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071507#comment-13071507 ] 

stack commented on HBASE-4134:
------------------------------

Do you want to bring it back into 0.92 Feng Xu?

> The total number of regions was more than the actual region count after the hbck fix
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-4134
>                 URL: https://issues.apache.org/jira/browse/HBASE-4134
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: feng xu
>             Fix For: 0.94.0
>
>
> 1. I found the problem(some regions were multiply assigned) while running hbck to check the cluster's health. Here's the result:
> {noformat}
> ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
> Summary: 
>   -ROOT- is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-105:20020 
>   .META. is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-103:20020 
>   test1 is okay. 
>     Number of regions: 25297 
>     Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
> 14829 inconsistencies detected. 
> Status: INCONSISTENT 
> {noformat}
> 2. Then I tried to use "hbck -fix" to fix the problem. Everything seemed ok. But I found that the total number of regions reported by load balancer (35029) was more than the actual region count(25299) after the fixing.
> Here's the related logs snippet:
> {noformat}
> 2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
> 2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
> {noformat}
> 3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
> (1) It was assigned to "158-1-91-101" at first. 
> (2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without notice to HMaster.
> (3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
> (4) HBCK will trigger a new assignment.
> The fact is, the region was assigned again, but the old assignment information still remained in AM#regions,AM#servers.
> That's why the problem of "region count was larger than the actual number" occurred.  
> {noformat}
> Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
> Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
> Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
> Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
> Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
> Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
> Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
> Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
> Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4134) The total number of regions was more than the actual region count after the hbck fix

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070834#comment-13070834 ] 

stack commented on HBASE-4134:
------------------------------

@feng nice debugging

> The total number of regions was more than the actual region count after the hbck fix
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-4134
>                 URL: https://issues.apache.org/jira/browse/HBASE-4134
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.3
>            Reporter: feng xu
>             Fix For: 0.90.4
>
>
> 1. I found the problem(some regions were multiply assigned) while running hbck to check the cluster's health. Here's the result:
> {noformat}
> ERROR: Region test1,230778,1311216270050.fff783529fcd983043610eaa1cc5c2fe. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,252103,1311216293671.fff9ed2cb69bdce535451a07686c0db5. is listed in META on region server 158-1-91-101:20020 but is multiply assigned to region servers 158-1-91-101:20020, 158-1-91-105:20020 
> ERROR: Region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. is listed in META on region server 158-1-91-103:20020 but is multiply assigned to region servers 158-1-91-103:20020, 158-1-91-105:20020 
> Summary: 
>   -ROOT- is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-105:20020 
>   .META. is okay. 
>     Number of regions: 1 
>     Deployed on: 158-1-91-103:20020 
>   test1 is okay. 
>     Number of regions: 25297 
>     Deployed on: 158-1-91-101:20020 158-1-91-103:20020 158-1-91-105:20020 
> 14829 inconsistencies detected. 
> Status: INCONSISTENT 
> {noformat}
> 2. Then I tried to use "hbck -fix" to fix the problem. Everything seemed ok. But I found that the total number of regions reported by load balancer (35029) was more than the actual region count(25299) after the fixing.
> Here's the related logs snippet:
> {noformat}
> 2011-07-22 02:19:02,866 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=25299 average=8433.0 mostloaded=8433 
> 2011-07-22 03:06:11,832 INFO org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.  servers=3 regions=35029 average=11676.333 mostloaded=11677 leastloaded=11676
> {noformat}
> 3. I tracked one region's behavior during the time. Taking the region of "test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0." as example:
> (1) It was assigned to "158-1-91-101" at first. 
> (2) HBCK sent closing request to RegionServer. And RegionServer closed it silently without notice to HMaster.
> (3) The region was still carried by RS "158-1-91-103" which was known to HMaster.
> (4) HBCK will trigger a new assignment.
> The fact is, the region was assigned again, but the old assignment information still remained in AM#regions,AM#servers.
> That's why the problem of "region count was larger than the actual number" occurred.  
> {noformat}
> Line 178967: 2011-07-22 02:47:51,247 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/ffff52782c0241a598b3e37ca8729da0 (region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., server=HBCKServerName, state=M_ZK_REGION_OFFLINE)
> Line 178968: 2011-07-22 02:47:51,247 INFO org.apache.hadoop.hbase.master.AssignmentManager: Handling HBCK triggered transition=M_ZK_REGION_OFFLINE, server=HBCKServerName, region=ffff52782c0241a598b3e37ca8729da0
> Line 178969: 2011-07-22 02:47:51,248 INFO org.apache.hadoop.hbase.master.AssignmentManager: HBCK repair is triggering assignment of region=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0.
> Line 178970: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. so generated a random one; hri=test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0., src=, dest=158-1-91-101,20020,1311231878544; 3 (online=3, exclude=null) available servers
> Line 178971: 2011-07-22 02:47:51,248 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. to 158-1-91-101,20020,1311231878544
> Line 178983: 2011-07-22 02:47:51,285 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179001: 2011-07-22 02:47:51,318 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=158-1-91-101,20020,1311231878544, region=ffff52782c0241a598b3e37ca8729da0
> Line 179002: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for ffff52782c0241a598b3e37ca8729da0; deleting unassigned node
> Line 179003: 2011-07-22 02:47:51,319 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Deleting existing unassigned node for ffff52782c0241a598b3e37ca8729da0 that is in expected state RS_ZK_REGION_OPENED
> Line 179007: 2011-07-22 02:47:51,326 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x1314ac5addb0042-0x1314ac5addb0042 Successfully deleted unassigned node for region ffff52782c0241a598b3e37ca8729da0 in expected state RS_ZK_REGION_OPENED
> Line 179011: 2011-07-22 02:47:51,335 WARN org.apache.hadoop.hbase.master.AssignmentManager: Overwriting ffff52782c0241a598b3e37ca8729da0 on serverName=158-1-91-103,20020,1311232056655, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> Line 179012: 2011-07-22 02:47:51,335 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region test1,282187,1311216322104.ffff52782c0241a598b3e37ca8729da0. on 158-1-91-101,20020,1311231878544
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira