You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "chenxu (JIRA)" <ji...@apache.org> on 2017/06/20 06:19:00 UTC

[jira] [Comment Edited] (HBASE-18215) some advises about refactoring of rsgroup

    [ https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055201#comment-16055201 ] 

chenxu edited comment on HBASE-18215 at 6/20/17 6:18 AM:
---------------------------------------------------------

bq. #5 Can you provide a specific scenario? When the rsgroup patch was written if I remember correctly the reverse was true. AM cannot handle null results when calling randomAssignment().

there is a scenario

{code:title=RSGroupBasedLoadBalancer.java|borderStyle=solid}
public List<RegionPlan> balanceCluster(Map<ServerName, List<HRegionInfo>> clusterState)
    throws HBaseIOException {
    ...
    List<HRegionInfo> misplacedRegions = correctedState.get(LoadBalancer.BOGUS_SERVER_NAME);
    for (HRegionInfo regionInfo : misplacedRegions) {
      regionPlans.add(new RegionPlan(regionInfo, null, null));
    }
   ...
{code}
if region misplaced, RegionPlan’s dest is null, and the following will happen
{code:title=AssignmentManager.java|borderStyle=solid}
private RegionPlan getRegionPlan(final HRegionInfo region,
    final boolean forceNewPlan) throws HBaseIOException {
    ...
   if (forceNewPlan
          || existingPlan == null
          || existingPlan.getDestination() == null
          || !destServers.contains(existingPlan.getDestination())) {
        newPlan = true;
        try {
          randomPlan = new RegionPlan(region, null,
              balancer.randomAssignment(region, destServers));
        } catch (IOException ex) {
          LOG.warn("Failed to create new plan.",ex);
          return null;
        }
        this.regionPlans.put(encodedName, randomPlan);
      }
    }
    if (newPlan) {
      if (randomPlan.getDestination() == null) {
        LOG.warn("Can't find a destination for " + encodedName);
        return null;
      }
      ...
      return randomPlan;
    }
    ...
}
{code}
if balancer.randomAssignment return null, getRegionPlan will null, AM will handle like this
{code:title=AssignmentManager.java|borderStyle=solid}
private void assign(RegionState state, boolean forceNewPlan) {
    ...
    if (plan == null) {
      LOG.warn("Unable to determine a plan to assign " + region);
      // For meta region, we have to keep retrying until succeeding
      if (region.isMetaRegion()) {
        ....
      }
      regionStates.updateRegionState(region, State.FAILED_OPEN);
      return;
    }
    ....
}
{code}
the target region will be transition to FAILED_OPEN status.
but if balancer.randomAssignment return BOGUS_SERVER_NAME, AM can’t handle this.
an OPEN RPC will be sent to the BOGUS_SERVER_NAME, this should not happen


was (Author: javaman_chen):
bq. Can you provide a specific scenario?

there is a scenario

{code:title=RSGroupBasedLoadBalancer.java|borderStyle=solid}
public List<RegionPlan> balanceCluster(Map<ServerName, List<HRegionInfo>> clusterState)
    throws HBaseIOException {
    ...
    List<HRegionInfo> misplacedRegions = correctedState.get(LoadBalancer.BOGUS_SERVER_NAME);
    for (HRegionInfo regionInfo : misplacedRegions) {
      regionPlans.add(new RegionPlan(regionInfo, null, null));
    }
   ...
{code}
if region misplaced, RegionPlan’s dest is null, and the following will happen
{code:title=AssignmentManager.java|borderStyle=solid}
private RegionPlan getRegionPlan(final HRegionInfo region,
    final boolean forceNewPlan) throws HBaseIOException {
    ...
   if (forceNewPlan
          || existingPlan == null
          || existingPlan.getDestination() == null
          || !destServers.contains(existingPlan.getDestination())) {
        newPlan = true;
        try {
          randomPlan = new RegionPlan(region, null,
              balancer.randomAssignment(region, destServers));
        } catch (IOException ex) {
          LOG.warn("Failed to create new plan.",ex);
          return null;
        }
        this.regionPlans.put(encodedName, randomPlan);
      }
    }
    if (newPlan) {
      if (randomPlan.getDestination() == null) {
        LOG.warn("Can't find a destination for " + encodedName);
        return null;
      }
      ...
      return randomPlan;
    }
    ...
}
{code}
if balancer.randomAssignment return null, getRegionPlan will null, AM will handle like this
{code:title=AssignmentManager.java|borderStyle=solid}
private void assign(RegionState state, boolean forceNewPlan) {
    ...
    if (plan == null) {
      LOG.warn("Unable to determine a plan to assign " + region);
      // For meta region, we have to keep retrying until succeeding
      if (region.isMetaRegion()) {
        ....
      }
      regionStates.updateRegionState(region, State.FAILED_OPEN);
      return;
    }
    ....
}
{code}
the target region will be transition to FAILED_OPEN status.
but if balancer.randomAssignment return BOGUS_SERVER_NAME, AM can’t handle this.
an OPEN RPC will be sent to the BOGUS_SERVER_NAME, this should not happen

> some advises about refactoring of rsgroup
> -----------------------------------------
>
>                 Key: HBASE-18215
>                 URL: https://issues.apache.org/jira/browse/HBASE-18215
>             Project: HBase
>          Issue Type: Improvement
>          Components: Balancer
>            Reporter: chenxu
>         Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster,  after Integrated, found some refactoring points. maybe the points were not right, but i think there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer should consider masterServer assignment first in balanceCluster, roundRobinAssignment, retainAssignment and randomAssignment
>   do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table. 
> in our implementation, we first modify the local rsgroup file, then load the group info into memory, after that execute the balancer command, everything is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it can greatly reduce the complexity of rsgroup implementation, there is no need to wait for the rsgroup table to be online, and methods like moveServers, moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed from RSGroupAdminService.only a refresh method is need(modify persistence layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module, because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced one by one, if there are two many groups, we can do this in parallel.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)