You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "cuijianwei (JIRA)" <ji...@apache.org> on 2015/01/09 12:20:34 UTC

[jira] [Created] (HBASE-12829) Request count in RegionLoad may not accurate to compute the region load cost

cuijianwei created HBASE-12829:
----------------------------------

             Summary: Request count in RegionLoad may not accurate to compute the region load cost
                 Key: HBASE-12829
                 URL: https://issues.apache.org/jira/browse/HBASE-12829
             Project: HBase
          Issue Type: Improvement
          Components: Balancer
    Affects Versions: 0.99.2
            Reporter: cuijianwei
            Priority: Minor


StochasticLoadBalancer#RequestCostFunction(ReadRequestCostFunction and WriteRequestCostFunction) will compute load cost for a region based on a number of remembered region loads. Each region load records the total count for read/write request at reported time since it opened. However, the request count will be reset if region moved, making the new reported count could not represent the total request. For example, if a region has high write throughput, the WrtieRequest in region load will be very big after onlined for a long time, then if the region moved, the new WriteRequest will be much smaller, making the region contributes much smaller to the cost of its belonging rs. We may need to consider the region open time to get more accurate region load. 
As another way, how about using read/write request count at each time slots instead of total request count? The total count will make older read/write request throughput contribute more to the cost by CostFromRegionLoadFunction#getRegionLoadCost:
{code}
    protected double getRegionLoadCost(Collection<RegionLoad> regionLoadList) {
      double cost = 0;

      for (RegionLoad rl : regionLoadList) {
        double toAdd = getCostFromRl(rl);

        if (cost == 0) {
          cost = toAdd;
        } else {
          cost = (.5 * cost) + (.5 * toAdd);
        }
      }
      return cost;
    }
{code}
For example, assume the balancer now remembers three loads for a region at time t1, t2, t3(t1 < t2 < t3), the write request is w1, w2, w3 respectively for time slots [0, t1), [t1, t2), [t2, t3), so the WriteRequest in the region load at t1, t2, t3 will be w1, w1 + w2, w1 + w2 + w3 and the WriteRequest cost will be:
{code}
    0.5 * (w1 + w2 + w3) + 0.25 * (w1 + w2)  + 0.25 * w1 = w1 + 0.75 * w2 + 0.5 * w3
{code}
The w1 contributes more to the cost than w2 and w3. However, intuitively, I think the recent read/write throughput should represent the current load of the region better than the older ones. Therefore, how about using w1, w2 and w3 directly when computing? Then, the cost will become:
{code}
    0.25 * w1 + 0.25 * w2 + 0.5 * w3
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)