You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2021/03/05 07:46:36 UTC

[GitHub] [hbase] d-c-manning commented on a change in pull request #3011: HBASE-25625 StochasticBalancer CostFunctions needs a better way to ev…

d-c-manning commented on a change in pull request #3011:
URL: https://github.com/apache/hbase/pull/3011#discussion_r587954225



##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
##########
@@ -513,6 +515,75 @@ private void registerRegion(RegionInfo region, int regionIndex,
       }
     }
 
+    private void reComputeRegionStDevPerTable() {
+      for (int i = 0; i < numTables; i++) {
+        regionStDevPerTable[i] = 0;
+      }
+
+      for (int[] aNumRegionsPerServerPerTable : numRegionsPerServerPerTable) {
+        for (int tableIndex = 0; tableIndex < aNumRegionsPerServerPerTable.length; tableIndex++) {
+          double deviation = aNumRegionsPerServerPerTable[tableIndex]
+            - regionsPerTable[tableIndex] / numServers;
+          regionStDevPerTable[tableIndex] += deviation * deviation;
+        }
+      }
+
+      for (int i = 0; i < numTables; i++) {
+        regionStDevPerTable[i] = scale(getMinStDev(regionsPerTable[i], numServers),
+          getMaxStDev(regionsPerTable[i], numServers),
+          Math.sqrt(regionStDevPerTable[i] / numServers));
+      }
+    }
+
+    /**
+     * Return the max standard deviation of distribution of regions
+     * Compute max as if all region servers had 0 and one had the sum of all costs.  This must be
+     * a zero sum cost for this to make sense.
+    */
+    public double getMaxStDev(double total, double numServers) {

Review comment:
       can this be static?
   ```suggestion
       public static double getMaxStDev(double total, double numServers) {
   ```

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
##########
@@ -837,19 +908,8 @@ void regionMoved(int region, int oldServer, int newServer) {
       }
       numRegionsPerServerPerTable[newServer][tableIndex]++;
 
-      //check whether this caused maxRegionsPerTable in the new Server to be updated
-      if (numRegionsPerServerPerTable[newServer][tableIndex] > numMaxRegionsPerTable[tableIndex]) {
-        numMaxRegionsPerTable[tableIndex] = numRegionsPerServerPerTable[newServer][tableIndex];
-      } else if (oldServer >= 0 && (numRegionsPerServerPerTable[oldServer][tableIndex] + 1)
-          == numMaxRegionsPerTable[tableIndex]) {
-        //recompute maxRegionsPerTable since the previous value was coming from the old server
-        numMaxRegionsPerTable[tableIndex] = 0;
-        for (int[] aNumRegionsPerServerPerTable : numRegionsPerServerPerTable) {
-          if (aNumRegionsPerServerPerTable[tableIndex] > numMaxRegionsPerTable[tableIndex]) {
-            numMaxRegionsPerTable[tableIndex] = aNumRegionsPerServerPerTable[tableIndex];
-          }
-        }
-      }
+      // recalculate stdev
+      reComputeRegionStDevPerTable();

Review comment:
       It seems like here we should only be recomputing the stdev for the affected `tableIndex`. I don't want to over-optimize if it makes the code less readable... but in clusters with large numbers of tables, we would be doing unnecessary work.
   
   I'm thinking in the context of [HBASE-22300](https://issues.apache.org/jira/browse/HBASE-22300) where large clusters no longer run through enough iterations of the balancer. In that case, the issue is number of regions, which will always be larger than number of tables... and likely much larger. So it may not be a major issue here, but I'm bringing it up just in case.

##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancerLargeCluster.java
##########
@@ -38,6 +38,8 @@ public void testLargeCluster() {
     int numRegionsPerServer = 80; // all servers except one
     int numTables = 100;
     int replication = 1;
-    testWithCluster(numNodes, numRegions, numRegionsPerServer, replication, numTables, true, true);
+
+    // we need to capture the outlier and generate a move

Review comment:
       I don't understand this comment.

##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestStochasticLoadBalancerLargeCluster.java
##########
@@ -38,6 +38,8 @@ public void testLargeCluster() {
     int numRegionsPerServer = 80; // all servers except one
     int numTables = 100;
     int replication = 1;
-    testWithCluster(numNodes, numRegions, numRegionsPerServer, replication, numTables, true, true);
+
+    // we need to capture the outlier and generate a move
+    testWithCluster(numNodes, numRegions, numRegionsPerServer, replication, numTables, false, false);

Review comment:
       The change of code suggests that the balancer will no longer fully balance the cluster in one run... why is that? Are we making the balancer less effective, even in this test with large cluster with one outlier node? I thought this is the type of case that the fix is supposed to improve?

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
##########
@@ -168,7 +168,9 @@ private void createRegionFinder() {
     int[]   initialRegionIndexToServerIndex;    //regionIndex -> serverIndex (initial cluster state)
     int[]   regionIndexToTableIndex;     //regionIndex -> tableIndex
     int[][] numRegionsPerServerPerTable; //serverIndex -> tableIndex -> # regions
-    int[]   numMaxRegionsPerTable;       //tableIndex -> max number of regions in a single RS
+    int[] regionsPerTable;              // count of regions per table
+    double[] regionStDevPerTable;       //tableIndex -> standard deviation of region distribution

Review comment:
       nit: should we rename to mention that this value is already scaled?
   ```suggestion
       double[] regionStDevPerTableScaled;       //tableIndex -> scaled standard deviation of region distribution
   ```

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
##########
@@ -383,20 +385,20 @@ protected Cluster(
         }
       }
 
+      regionsPerTable = new int[numTables];
+      for (int i = 0; i < numTables; i++) {
+          regionsPerTable[i] = 0;
+      }
+

Review comment:
       nit: maybe it's nice to be explicit like the code above, but default values are already 0
   ```suggestion
   ```

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
##########
@@ -513,6 +515,75 @@ private void registerRegion(RegionInfo region, int regionIndex,
       }
     }
 
+    private void reComputeRegionStDevPerTable() {
+      for (int i = 0; i < numTables; i++) {
+        regionStDevPerTable[i] = 0;
+      }
+
+      for (int[] aNumRegionsPerServerPerTable : numRegionsPerServerPerTable) {
+        for (int tableIndex = 0; tableIndex < aNumRegionsPerServerPerTable.length; tableIndex++) {
+          double deviation = aNumRegionsPerServerPerTable[tableIndex]
+            - regionsPerTable[tableIndex] / numServers;
+          regionStDevPerTable[tableIndex] += deviation * deviation;
+        }
+      }
+
+      for (int i = 0; i < numTables; i++) {
+        regionStDevPerTable[i] = scale(getMinStDev(regionsPerTable[i], numServers),
+          getMaxStDev(regionsPerTable[i], numServers),
+          Math.sqrt(regionStDevPerTable[i] / numServers));
+      }
+    }
+
+    /**
+     * Return the max standard deviation of distribution of regions

Review comment:
       ```suggestion
        * Return the max standard deviation of distribution of costs
   ```

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
##########
@@ -513,6 +515,75 @@ private void registerRegion(RegionInfo region, int regionIndex,
       }
     }
 
+    private void reComputeRegionStDevPerTable() {
+      for (int i = 0; i < numTables; i++) {
+        regionStDevPerTable[i] = 0;
+      }
+
+      for (int[] aNumRegionsPerServerPerTable : numRegionsPerServerPerTable) {
+        for (int tableIndex = 0; tableIndex < aNumRegionsPerServerPerTable.length; tableIndex++) {
+          double deviation = aNumRegionsPerServerPerTable[tableIndex]
+            - regionsPerTable[tableIndex] / numServers;
+          regionStDevPerTable[tableIndex] += deviation * deviation;
+        }
+      }
+
+      for (int i = 0; i < numTables; i++) {
+        regionStDevPerTable[i] = scale(getMinStDev(regionsPerTable[i], numServers),
+          getMaxStDev(regionsPerTable[i], numServers),
+          Math.sqrt(regionStDevPerTable[i] / numServers));
+      }
+    }
+
+    /**
+     * Return the max standard deviation of distribution of regions
+     * Compute max as if all region servers had 0 and one had the sum of all costs.  This must be
+     * a zero sum cost for this to make sense.
+    */
+    public double getMaxStDev(double total, double numServers) {
+      double mean = total / numServers;
+      return Math.sqrt(((total - mean) * (total - mean)
+      + (numServers - 1) * mean * mean)
+      / numServers);
+    }
+
+    /**
+     * Return the min standard deviation of distribution of regions
+    */
+    public double getMinStDev(double total, double numServers) {

Review comment:
       can this be static?
   ```suggestion
       public static double getMinStDev(double total, double numServers) {
   ```

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
##########
@@ -513,6 +515,75 @@ private void registerRegion(RegionInfo region, int regionIndex,
       }
     }
 
+    private void reComputeRegionStDevPerTable() {
+      for (int i = 0; i < numTables; i++) {
+        regionStDevPerTable[i] = 0;
+      }
+
+      for (int[] aNumRegionsPerServerPerTable : numRegionsPerServerPerTable) {
+        for (int tableIndex = 0; tableIndex < aNumRegionsPerServerPerTable.length; tableIndex++) {
+          double deviation = aNumRegionsPerServerPerTable[tableIndex]
+            - regionsPerTable[tableIndex] / numServers;
+          regionStDevPerTable[tableIndex] += deviation * deviation;
+        }
+      }
+
+      for (int i = 0; i < numTables; i++) {
+        regionStDevPerTable[i] = scale(getMinStDev(regionsPerTable[i], numServers),
+          getMaxStDev(regionsPerTable[i], numServers),
+          Math.sqrt(regionStDevPerTable[i] / numServers));
+      }
+    }
+
+    /**
+     * Return the max standard deviation of distribution of regions
+     * Compute max as if all region servers had 0 and one had the sum of all costs.  This must be
+     * a zero sum cost for this to make sense.
+    */
+    public double getMaxStDev(double total, double numServers) {
+      double mean = total / numServers;
+      return Math.sqrt(((total - mean) * (total - mean)
+      + (numServers - 1) * mean * mean)
+      / numServers);
+    }
+
+    /**
+     * Return the min standard deviation of distribution of regions

Review comment:
       ```suggestion
        * Return the min standard deviation of distribution of costs
   ```

##########
File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
##########
@@ -513,6 +515,75 @@ private void registerRegion(RegionInfo region, int regionIndex,
       }
     }
 
+    private void reComputeRegionStDevPerTable() {
+      for (int i = 0; i < numTables; i++) {
+        regionStDevPerTable[i] = 0;
+      }
+
+      for (int[] aNumRegionsPerServerPerTable : numRegionsPerServerPerTable) {
+        for (int tableIndex = 0; tableIndex < aNumRegionsPerServerPerTable.length; tableIndex++) {
+          double deviation = aNumRegionsPerServerPerTable[tableIndex]
+            - regionsPerTable[tableIndex] / numServers;
+          regionStDevPerTable[tableIndex] += deviation * deviation;
+        }
+      }
+
+      for (int i = 0; i < numTables; i++) {
+        regionStDevPerTable[i] = scale(getMinStDev(regionsPerTable[i], numServers),
+          getMaxStDev(regionsPerTable[i], numServers),
+          Math.sqrt(regionStDevPerTable[i] / numServers));
+      }
+    }
+
+    /**
+     * Return the max standard deviation of distribution of regions
+     * Compute max as if all region servers had 0 and one had the sum of all costs.  This must be
+     * a zero sum cost for this to make sense.
+    */
+    public double getMaxStDev(double total, double numServers) {
+      double mean = total / numServers;
+      return Math.sqrt(((total - mean) * (total - mean)
+      + (numServers - 1) * mean * mean)
+      / numServers);
+    }
+
+    /**
+     * Return the min standard deviation of distribution of regions
+    */
+    public double getMinStDev(double total, double numServers) {
+      double mean = total / numServers;
+    // It's possible that there aren't enough regions to go around
+      double min;
+      if (numServers > total) {
+        min = Math.sqrt(((numServers - total) * mean * mean
+          + (1 - mean) * (1 - mean) * total) / numServers);
+      } else {
+        // Some will have 1 more than everything else.
+        int numHigh = (int) (total - (Math.floor(mean) * numServers));
+        int numLow = (int) (numServers - numHigh);
+        min = Math.sqrt(((numHigh * (Math.ceil(mean) - mean) * (Math.ceil(mean) - mean))
+          + (numLow * (mean - Math.floor(mean)) * (mean - Math.floor(mean)))) / numServers);
+      }
+      return min;
+    }
+
+    /**
+     * Scale the value between 0 and 1.
+     *
+     * @param min   Min value
+     * @param max   The Max value
+     * @param value The value to be scaled.
+     * @return The scaled value.
+    */
+    public double scale(double min, double max, double value) {

Review comment:
       can this be static?
   ```suggestion
       public static double scale(double min, double max, double value) {
   ```

##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java
##########
@@ -391,7 +391,7 @@ public void testRegionAvailabilityWithRegionMoves() throws Exception {
     // now move region1 from servers[0] to servers[2]
     cluster.doAction(new MoveRegionAction(0, 0, 2));
     // check that the numMaxRegionsPerTable for "table" has increased to 2

Review comment:
       comment no longer applies




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org