You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2011/07/01 15:23:28 UTC

[jira] [Commented] (HBASE-4053) Most of the regions were added into AssignmentManager#servers twice

    [ https://issues.apache.org/jira/browse/HBASE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058546#comment-13058546 ] 

Ted Yu commented on HBASE-4053:
-------------------------------

When master fails over, we should check whether hris contains the region addToServers() is trying to add.
But ArrayList is not the best data structure to perform search of specific HRegionInfo. Maybe we should consider replacing it with e.g. ConcurrentSkipListMap

> Most of the regions were added into AssignmentManager#servers twice
> -------------------------------------------------------------------
>
>                 Key: HBASE-4053
>                 URL: https://issues.apache.org/jira/browse/HBASE-4053
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.3
>            Reporter: Jieshan Bean
>             Fix For: 0.90.4
>
>
> Here's the scenario of how did the problem happened:
> 1. When HMaster start, all regionservers checkin ok, and count of regions out on cluster is 10083, which is the actual region number count.
> 2. Then OpenedRegionHandler#process received zookeeper's events, and added 9923 regions to the hris list.
>    but the 9923 regions already exists, force added.
> 3. The LoadBalancer get the wrong Region numbers of 20006 (10083 + 9923).
> AssignmentManager#addToServers method:
> private void addToServers(final HServerInfo hsi, final HRegionInfo hri) {
>   List<HRegionInfo> hris = servers.get(hsi);
>   if (hris == null) {
>     hris = new ArrayList<HRegionInfo>();
>     servers.put(hsi, hris);
>   }
>   hris.add(hri); // Same region was double added here
> }
> logs:
> 2011-06-27 16:13:06,845 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=3, stopped=false, count of regions out on cluster=10083
> 2011-06-27 16:13:17,334 INFO org.apache.hadoop.hbase.master.AssignmentManager: Failed-over master needs to process 9923 regions in transition
> 2011-06-27 16:21:45,135 DEBUG org.apache.hadoop.hbase.master.LoadBalancer: Balance parameter: numRegions=20006, numServers=3, max=6669, min=6668

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira