You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jieshan Bean (JIRA)" <ji...@apache.org> on 2011/07/01 11:04:28 UTC

[jira] [Created] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Most of the regions were added into AssignmentManager#servers twice
-------------------------------------------------------------------

                 Key: HBASE-4053
                 URL: https://issues.apache.org/jira/browse/HBASE-4053
             Project: HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 0.90.3
            Reporter: Jieshan Bean
             Fix For: 0.90.4


Here's the scenario of how did the problem happened:

1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
   but the 9923 regions already exists, force added.
3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).

AssignmentManager#addToServers method:
private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
  List<HRegionInfo> hris = servers.get(hsi);
  if (hris == null) {
    hris = new ArrayList<HRegionInfo>();
    servers.put(hsi, hris);
  }
  hris.add(hri); // Same region was double added here
}

logs:
2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060754#comment-13060754 ] 

jiraposter@reviews.apache.org commented on HBASE-4053:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/782/#review975
-----------------------------------------------------------

Ship it!


- Andrew


On 2011-07-06 18:27:32, Ted Yu wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/782/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-07-06 18:27:32)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  When master fails over, we should check whether hris contains the region addToServers() is trying to add.
bq.  But ArrayList is not the best data structure to perform search of specific HRegionInfo. Maybe we should consider replacing it with e.g. ConcurrentSkipListSet
bq.  
bq.  Also removes bulkAssignUserRegions() which is no longer called.
bq.  
bq.  
bq.  This addresses bug HBASE-4053.
bq.      https://issues.apache.org/jira/browse/HBASE-4053
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 1143478 
bq.    /src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1143478 
bq.  
bq.  Diff: https://reviews.apache.org/r/782/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran test suite.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu reassigned HBASE-4053:
-----------------------------

    Assignee: Ted Yu

> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>            Assignee: Ted Yu
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058550#comment-13058550 ] 

Ted Yu commented on HBASE-4053:
-------------------------------

ConcurrentSkipListSet would be more appropriate than ConcurrentSkipListMap.

> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "Jieshan Bean (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-4053:
--------------------------------

    Attachment: surefire-report.html
                HBase-4053-90.patch

I have take some tests according to the modification of Ted's patch. All the tests were successed(But only test the version of 0.90..).





> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060743#comment-13060743 ] 

jiraposter@reviews.apache.org commented on HBASE-4053:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/782/
-----------------------------------------------------------

(Updated 2011-07-06 18:27:32.068243)


Review request for hbase.


Changes
-------

Modified HRegionInfo.compareTo() to include comparison of offLine.


Summary
-------

When master fails over, we should check whether hris contains the region addToServers() is trying to add.
But ArrayList is not the best data structure to perform search of specific HRegionInfo. Maybe we should consider replacing it with e.g. ConcurrentSkipListSet

Also removes bulkAssignUserRegions() which is no longer called.


This addresses bug HBASE-4053.
    https://issues.apache.org/jira/browse/HBASE-4053


Diffs (updated)
-----

  /src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 1143478 
  /src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1143478 

Diff: https://reviews.apache.org/r/782/diff


Testing
-------

Ran test suite.


Thanks,

Ted



> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060603#comment-13060603 ] 

jiraposter@reviews.apache.org commented on HBASE-4053:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/782/
-----------------------------------------------------------

Review request for hbase.


Summary
-------

When master fails over, we should check whether hris contains the region addToServers() is trying to add.
But ArrayList is not the best data structure to perform search of specific HRegionInfo. Maybe we should consider replacing it with e.g. ConcurrentSkipListSet


This addresses bug HBASE-4053.
    https://issues.apache.org/jira/browse/HBASE-4053


Diffs
-----

  /src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1142537 

Diff: https://reviews.apache.org/r/782/diff


Testing
-------

Ran test suite.


Thanks,

Ted



> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060605#comment-13060605 ] 

jiraposter@reviews.apache.org commented on HBASE-4053:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/782/
-----------------------------------------------------------

(Updated 2011-07-06 14:27:36.750538)


Review request for hbase.


Summary (updated)
-------

When master fails over, we should check whether hris contains the region addToServers() is trying to add.
But ArrayList is not the best data structure to perform search of specific HRegionInfo. Maybe we should consider replacing it with e.g. ConcurrentSkipListSet

Also removes bulkAssignUserRegions() which is no longer called.


This addresses bug HBASE-4053.
    https://issues.apache.org/jira/browse/HBASE-4053


Diffs
-----

  /src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1142537 

Diff: https://reviews.apache.org/r/782/diff


Testing
-------

Ran test suite.


Thanks,

Ted



> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060772#comment-13060772 ] 

Ted Yu commented on HBASE-4053:
-------------------------------

Integrated to branch and TRUNK.

Thanks for the review Andrew.

> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4053:
--------------------------

    Attachment: 4053.txt

This patch replaces List<HRegionInfo> of servers with Set<HRegionInfo>

> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060696#comment-13060696 ] 

jiraposter@reviews.apache.org commented on HBASE-4053:
------------------------------------------------------



bq.  On 2011-07-06 16:32:42, Andrew Purtell wrote:
bq.  > This change seems reasonable. One concern I have regarding the change from List to Set is HRegionInfo's hashcode (used by Object.hashCode) takes into account the value of its 'offLine' member but its compareTo() method (used by Object.equals) does not. Perhaps you can address this inconsistency as part of this change?

For equals(), is distinguishing between offline and online states for two regions with same start key and end key for finding multiply assigned region(s) ?


- Ted


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/782/#review969
-----------------------------------------------------------


On 2011-07-06 14:27:36, Ted Yu wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/782/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-07-06 14:27:36)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  When master fails over, we should check whether hris contains the region addToServers() is trying to add.
bq.  But ArrayList is not the best data structure to perform search of specific HRegionInfo. Maybe we should consider replacing it with e.g. ConcurrentSkipListSet
bq.  
bq.  Also removes bulkAssignUserRegions() which is no longer called.
bq.  
bq.  
bq.  This addresses bug HBASE-4053.
bq.      https://issues.apache.org/jira/browse/HBASE-4053
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1142537 
bq.  
bq.  Diff: https://reviews.apache.org/r/782/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran test suite.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu resolved HBASE-4053.
---------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060673#comment-13060673 ] 

jiraposter@reviews.apache.org commented on HBASE-4053:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/782/#review969
-----------------------------------------------------------


This change seems reasonable. One concern I have regarding the change from List to Set is HRegionInfo's hashcode (used by Object.hashCode) takes into account the value of its 'offLine' member but its compareTo() method (used by Object.equals) does not. Perhaps you can address this inconsistency as part of this change?

- Andrew


On 2011-07-06 14:27:36, Ted Yu wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/782/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-07-06 14:27:36)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  When master fails over, we should check whether hris contains the region addToServers() is trying to add.
bq.  But ArrayList is not the best data structure to perform search of specific HRegionInfo. Maybe we should consider replacing it with e.g. ConcurrentSkipListSet
bq.  
bq.  Also removes bulkAssignUserRegions() which is no longer called.
bq.  
bq.  
bq.  This addresses bug HBASE-4053.
bq.      https://issues.apache.org/jira/browse/HBASE-4053
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1142537 
bq.  
bq.  Diff: https://reviews.apache.org/r/782/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran test suite.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "Jieshan Bean (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059306#comment-13059306 ] 

Jieshan Bean commented on HBASE-4053:
-------------------------------------

if addToServers() check whether the set contains the region, it will be a very time-consuming work.
I aggree to the idea of changing the data structure.

> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060272#comment-13060272 ] 

Ted Yu commented on HBASE-4053:
-------------------------------

Thanks for the update.
The patch for 0.90 seems to include some Chinese characters.

> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060810#comment-13060810 ] 

Hudson commented on HBASE-4053:
-------------------------------

Integrated in HBase-TRUNK #2007 (See [https://builds.apache.org/job/HBase-TRUNK/2007/])
    HBASE-4053  Most of the regions were added into AssignmentManager#servers twice

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
* /hbase/trunk/CHANGES.txt


> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>         Attachments: 4053.txt, HBase-4053-90.patch, surefire-report.html
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058546#comment-13058546 ] 

Ted Yu commented on HBASE-4053:
-------------------------------

When master fails over, we should check whether hris contains the region addToServers() is trying to add.
But ArrayList is not the best data structure to perform search of specific HRegionInfo. Maybe we should consider replacing it with e.g. ConcurrentSkipListMap

> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058837#comment-13058837 ] 

stack commented on HBASE-4053:
------------------------------

@Ted That makes sense.

> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira