You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by "desaikomal (via GitHub)" <gi...@apache.org> on 2023/06/26 13:55:03 UTC

[GitHub] [helix] desaikomal opened a new pull request, #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

desaikomal opened a new pull request, #2546:
URL: https://github.com/apache/helix/pull/2546

   
   
   ### Issues
   
   - [ ] My PR addresses the following Helix issues and references them in the PR description:
   
   (#200 - Link your issue number here: You can write "Fixes #XXX". Please use the proper keyword so that the issue gets closed automatically. See https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue
   Any of the following keywords can be used: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved)
   
   ### Description
   
   - [ ] Here are some details about my PR, including screenshots of any UI changes:
   This change is focused on changes to core logic and has not done any additional code cleanup, de-dup etc.
   
   There are 3 more changes planned for the change once core logic is checked-in. Change-1:  code dedup across computeBestPossibleStateForPartition Change-2: moving the message procesing state to 'DataProcess' stage of the pipeline. Change-3: This is may be: to add metrics to indicate that we diverted from 'preferred list'
   
   The core logic is as follows:
   Pre-Compute stage
   ------------------
   - in WAGED rebalancer: computeNewIdealState, we create 'WagedInstanceCapacity' which tracks instance capacity and WeightProvider for partitions. These changes were introduced in Part-1.
   - We process the pending messages for the resources where if we are moving out partition but not yet moved, we reduce the capacity.
   
   Once we compute the ideal state, we go through preference list sorting. There are no changes to the pure computational change.
   
   Post-compute stage
   ------------------
   We look at the current-state as well as ideal state, we combine the list and do the sorting of the nodes. We then pick up only the required numReplica nodes.
   
   This is where we look at the nodes which are "NEW" (ie. not in currentState) and check if it has the required capacity to hold the new partition. If not, we remove it from combined list.
   
   
   ### Tests
   
   - [ ] The following tests are written for this issue:
   
   (List the names of added unit/integration tests)
   
   - The following is the result of the "mvn test" command on the appropriate module:
   Testing done:
   Have run through the mvn test (10 times)
   In another change, I have added a new test case and verified that it works. [ERROR] Failures:
   [ERROR]   TestNoThrottleDisabledPartitions.testDisablingTopStateReplicaByDisablingInstance:98 expected:<false> but was:<true>
   [ERROR]   TestClusterMaintenanceMode.testMaintenanceHistory:412 expected:<EXIT> but was:<ENTER>
   [ERROR]   TestParticipantFreeze.testUnfreezeParticipant:228 expected:<true> but was:<false>
   [ERROR]   TestRecurringJobQueue.testDeletingRecurrentQueueWithHistory:298 expected:<true> but was:<false>
   [ERROR] Tests run: 1335, Failures: 4, Errors: 0, Skipped: 0
   
   (If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)
   
   ### Changes that Break Backward Compatibility (Optional)
   
   - My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:
   
   (Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)
   
   ### Documentation (Optional)
   
   - In case of new functionality, my PR adds documentation in the following wiki page:
   
   (Link the GitHub wiki you added)
   
   ### Commits
   
   - My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Code Quality
   
   - My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] desaikomal commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "desaikomal (via GitHub)" <gi...@apache.org>.
desaikomal commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1248924960


##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedInstanceCapacity.java:
##########
@@ -42,79 +49,102 @@ public class WagedInstanceCapacity implements InstanceCapacityDataProvider {
 
   // Available Capacity per Instance
   private final Map<String, Map<String, Integer>> _instanceCapacityMap;
-  private final ResourceControllerDataProvider _cache;
+  private final Map<String, Map<String, Set<String>>> _allocatedPartitionsMap;
 
   public WagedInstanceCapacity(ResourceControllerDataProvider clusterData) {
-    _cache = clusterData;
     _instanceCapacityMap = new HashMap<>();
-
-    ClusterConfig clusterConfig = _cache.getClusterConfig();
-    for (InstanceConfig instanceConfig : _cache.getInstanceConfigMap().values()) {
-      Map<String, Integer> instanceCapacity =
-        WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, instanceConfig);
+    _allocatedPartitionsMap = new HashMap<>();
+    ClusterConfig clusterConfig = clusterData.getClusterConfig();
+    for (InstanceConfig instanceConfig : clusterData.getInstanceConfigMap().values()) {
+      Map<String, Integer> instanceCapacity = WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, instanceConfig);
       _instanceCapacityMap.put(instanceConfig.getInstanceName(), instanceCapacity);
+
+      _allocatedPartitionsMap.put(instanceConfig.getInstanceName(), new HashMap<>());
     }
   }
 
-  /**
-   * Create Default Capacity Map.
-   * This is a utility method to create a default capacity map matching instance capacity map for participants.
-   * This is required as non-WAGED partitions will be placed on same instance and we don't know their actual capacity.
-   * This will generate default values of 0 for all the capacity keys.
-   */
-  private Map<String, Integer> createDefaultParticipantWeight() {
-    // copy the value of first Instance capacity.
-    Map<String, Integer> partCapacity = new HashMap<>(_instanceCapacityMap.values().iterator().next());
+  // Helper methods.
+  private boolean isPartitionInAllocatedMap(String instance, String resource, String partition) {
+    return _allocatedPartitionsMap.get(instance).containsKey(resource)
+        && _allocatedPartitionsMap.get(instance).get(resource).contains(partition);
+  }
 
-    // Set the value of all capacity to -1.
-    for (String key : partCapacity.keySet()) {
-      partCapacity.put(key, -1);
-    }
-    return partCapacity;
+  public void process(ResourceControllerDataProvider cache, CurrentStateOutput currentStateOutput,
+      Map<String, Resource> resourceMap, WagedResourceWeightsProvider weightProvider) {
+    processPendingMessages(cache, currentStateOutput, resourceMap, weightProvider);
+    processCurrentState(cache, currentStateOutput, resourceMap, weightProvider);

Review Comment:
   sure, will do so.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] desaikomal commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "desaikomal (via GitHub)" <gi...@apache.org>.
desaikomal commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1248370152


##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java:
##########
@@ -274,59 +276,208 @@ public ResourceAssignment computeBestPossiblePartitionState(ResourceControllerDa
     return partitionMapping;
   }
 
+
   @Override
   protected Map<String, String> computeBestPossibleStateForPartition(Set<String> liveInstances,

Review Comment:
   thanks for your suggetions. i have re-organized the code and it looks much much better. thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] desaikomal commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "desaikomal (via GitHub)" <gi...@apache.org>.
desaikomal commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1248137255


##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java:
##########
@@ -274,59 +276,208 @@ public ResourceAssignment computeBestPossiblePartitionState(ResourceControllerDa
     return partitionMapping;
   }
 
+
   @Override
   protected Map<String, String> computeBestPossibleStateForPartition(Set<String> liveInstances,

Review Comment:
   I would love it if i can change this protected API directly.  it is the right fix as well. Please let me know as this will simplify code a lot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] desaikomal commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "desaikomal (via GitHub)" <gi...@apache.org>.
desaikomal commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1244318312


##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java:
##########
@@ -408,4 +420,129 @@ private int getNumExtraReplicas(ClusterConfig clusterConfig) {
     }
     return numExtraReplicas;
   }
+
+
+  private Map<String, String> computeBestPossibleStateForPartition(ResourceControllerDataProvider cache,

Review Comment:
   As I wrote in my change description, I will make the dedup effort as next change. Trying to make code easy to read and reason through. There are 3 more changes planned (but this is the core-logic). Other 3 changes are: 1. dedup code. 2. move the process() logic to DataProcessing stage  and (3) add the metrics. 
   
   So can you please review for "CORRECTNESS"
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] junkaixue commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "junkaixue (via GitHub)" <gi...@apache.org>.
junkaixue commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1248673105


##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedInstanceCapacity.java:
##########
@@ -42,79 +49,102 @@ public class WagedInstanceCapacity implements InstanceCapacityDataProvider {
 
   // Available Capacity per Instance
   private final Map<String, Map<String, Integer>> _instanceCapacityMap;
-  private final ResourceControllerDataProvider _cache;
+  private final Map<String, Map<String, Set<String>>> _allocatedPartitionsMap;
 
   public WagedInstanceCapacity(ResourceControllerDataProvider clusterData) {
-    _cache = clusterData;
     _instanceCapacityMap = new HashMap<>();
-
-    ClusterConfig clusterConfig = _cache.getClusterConfig();
-    for (InstanceConfig instanceConfig : _cache.getInstanceConfigMap().values()) {
-      Map<String, Integer> instanceCapacity =
-        WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, instanceConfig);
+    _allocatedPartitionsMap = new HashMap<>();
+    ClusterConfig clusterConfig = clusterData.getClusterConfig();
+    for (InstanceConfig instanceConfig : clusterData.getInstanceConfigMap().values()) {
+      Map<String, Integer> instanceCapacity = WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, instanceConfig);
       _instanceCapacityMap.put(instanceConfig.getInstanceName(), instanceCapacity);
+
+      _allocatedPartitionsMap.put(instanceConfig.getInstanceName(), new HashMap<>());
     }
   }
 
-  /**
-   * Create Default Capacity Map.
-   * This is a utility method to create a default capacity map matching instance capacity map for participants.
-   * This is required as non-WAGED partitions will be placed on same instance and we don't know their actual capacity.
-   * This will generate default values of 0 for all the capacity keys.
-   */
-  private Map<String, Integer> createDefaultParticipantWeight() {
-    // copy the value of first Instance capacity.
-    Map<String, Integer> partCapacity = new HashMap<>(_instanceCapacityMap.values().iterator().next());
+  // Helper methods.
+  private boolean isPartitionInAllocatedMap(String instance, String resource, String partition) {
+    return _allocatedPartitionsMap.get(instance).containsKey(resource)
+        && _allocatedPartitionsMap.get(instance).get(resource).contains(partition);
+  }
 
-    // Set the value of all capacity to -1.
-    for (String key : partCapacity.keySet()) {
-      partCapacity.put(key, -1);
-    }
-    return partCapacity;
+  public void process(ResourceControllerDataProvider cache, CurrentStateOutput currentStateOutput,
+      Map<String, Resource> resourceMap, WagedResourceWeightsProvider weightProvider) {
+    processPendingMessages(cache, currentStateOutput, resourceMap, weightProvider);
+    processCurrentState(cache, currentStateOutput, resourceMap, weightProvider);
   }
 
   /**
    * Process the pending messages based on the Current states
    * @param currentState - Current state of the resources.
    */
-  public void processPendingMessages(CurrentStateOutput currentState) {
-    Map<String, Map<Partition, Map<String, Message>>> pendingMsgs = currentState.getPendingMessages();
-
-    for (String resource : pendingMsgs.keySet()) {
-      Map<Partition, Map<String, Message>> partitionMsgs = pendingMsgs.get(resource);
+  public void processPendingMessages(ResourceControllerDataProvider cache,
+      CurrentStateOutput currentState, Map<String, Resource> resourceMap,
+      WagedResourceWeightsProvider weightProvider) {
+
+    for (Map.Entry<String, Resource> resourceEntry : resourceMap.entrySet()) {
+      String resName = resourceEntry.getKey();
+      Resource resource = resourceEntry.getValue();
+      // list of partitions in the resource
+      Collection<Partition> partitions = resource.getPartitions();
+      // State model definition for the resource
+      StateModelDefinition stateModelDef = cache.getStateModelDef(resource.getStateModelDefRef());
+      if (stateModelDef == null) {
+        LOG.warn("State Model Definition for resource: " + resName + " is null");
+        continue;
+      }
+      Map<String, Integer> statePriorityMap = stateModelDef.getStatePriorityMap();
 
-      for (Partition partition : partitionMsgs.keySet()) {
+      for (Partition partition : partitions) {
         String partitionName = partition.getPartitionName();
-
         // Get Partition Weight
-        Map<String, Integer> partCapacity = getPartitionCapacity(resource, partitionName);
-
-        // TODO - check
-        Map<String, Message> msgs = partitionMsgs.get(partition);
-        // TODO - Check
-        for (String instance : msgs.keySet()) {
-           reduceAvailableInstanceCapacity(instance, partCapacity);
+        Map<String, Integer> partCapacity = weightProvider.getPartitionWeights(resName, partitionName);
+
+        // Get the pending messages for the partition
+        Map<String, Message> pendingMessages = currentState.getPendingMessageMap(resName, partition);
+        if (pendingMessages != null && !pendingMessages.isEmpty()) {
+          for (Map.Entry<String, Message> entry :  pendingMessages.entrySet()) {
+            String instance = entry.getKey();
+            if (isPartitionInAllocatedMap(instance, resName, partitionName)) {
+              continue;
+            }
+            Message msg = entry.getValue();
+            if (statePriorityMap.get(msg.getFromState()) < statePriorityMap.get(msg.getToState())
+                && msg.getToState().equals(stateModelDef.getInitialState())

Review Comment:
   I cannot remember but not sure whether this can cause double counting. Because when there is a OFFLINE -> STANDBY message, then the current state is already OFFLINE. So the capacity will be reduced by pending message and current states both.
   
   Because I remember the order of creating state transition with state model with OFFLINE state. But I cannot remember when we write it back to ZK...



##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedInstanceCapacity.java:
##########
@@ -42,79 +49,102 @@ public class WagedInstanceCapacity implements InstanceCapacityDataProvider {
 
   // Available Capacity per Instance
   private final Map<String, Map<String, Integer>> _instanceCapacityMap;
-  private final ResourceControllerDataProvider _cache;
+  private final Map<String, Map<String, Set<String>>> _allocatedPartitionsMap;
 
   public WagedInstanceCapacity(ResourceControllerDataProvider clusterData) {
-    _cache = clusterData;
     _instanceCapacityMap = new HashMap<>();
-
-    ClusterConfig clusterConfig = _cache.getClusterConfig();
-    for (InstanceConfig instanceConfig : _cache.getInstanceConfigMap().values()) {
-      Map<String, Integer> instanceCapacity =
-        WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, instanceConfig);
+    _allocatedPartitionsMap = new HashMap<>();
+    ClusterConfig clusterConfig = clusterData.getClusterConfig();
+    for (InstanceConfig instanceConfig : clusterData.getInstanceConfigMap().values()) {
+      Map<String, Integer> instanceCapacity = WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, instanceConfig);
       _instanceCapacityMap.put(instanceConfig.getInstanceName(), instanceCapacity);
+
+      _allocatedPartitionsMap.put(instanceConfig.getInstanceName(), new HashMap<>());
     }
   }
 
-  /**
-   * Create Default Capacity Map.
-   * This is a utility method to create a default capacity map matching instance capacity map for participants.
-   * This is required as non-WAGED partitions will be placed on same instance and we don't know their actual capacity.
-   * This will generate default values of 0 for all the capacity keys.
-   */
-  private Map<String, Integer> createDefaultParticipantWeight() {
-    // copy the value of first Instance capacity.
-    Map<String, Integer> partCapacity = new HashMap<>(_instanceCapacityMap.values().iterator().next());
+  // Helper methods.
+  private boolean isPartitionInAllocatedMap(String instance, String resource, String partition) {
+    return _allocatedPartitionsMap.get(instance).containsKey(resource)
+        && _allocatedPartitionsMap.get(instance).get(resource).contains(partition);
+  }
 
-    // Set the value of all capacity to -1.
-    for (String key : partCapacity.keySet()) {
-      partCapacity.put(key, -1);
-    }
-    return partCapacity;
+  public void process(ResourceControllerDataProvider cache, CurrentStateOutput currentStateOutput,
+      Map<String, Resource> resourceMap, WagedResourceWeightsProvider weightProvider) {
+    processPendingMessages(cache, currentStateOutput, resourceMap, weightProvider);
+    processCurrentState(cache, currentStateOutput, resourceMap, weightProvider);

Review Comment:
   Would suggest change the order. Because current states represent the partitions already taken capacity. Logically pending message is ongoing taking capacity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] junkaixue commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "junkaixue (via GitHub)" <gi...@apache.org>.
junkaixue commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1242678664


##########
helix-core/src/main/java/org/apache/helix/controller/dataproviders/InstanceCapacityDataProvider.java:
##########
@@ -42,18 +42,10 @@ public interface InstanceCapacityDataProvider {
   /**
    * Check if partition can be placed on the instance.
    *
-   * @param instanceName - instance name 
+   * @param instanceName - instance name
    * @param partitionCapacity - Partition capacity expresed in capacity map.
    * @return boolean - True if the partition can be placed, False otherwise
    */
   public boolean isInstanceCapacityAvailable(String instanceName, Map<String, Integer> partitionCapacity);
 
-  /**
-   * Reduce the available capacity by specified Partition Capacity Map.
-   *
-   * @param instanceName - instance name 
-   * @param partitionCapacity - Partition capacity expresed in capacity map.
-   * @returns boolean - True if successfully updated partition capacity, false otherwise.
-   */
-  public boolean reduceAvailableInstanceCapacity(String instanceName, Map<String, Integer> partitionCapacity);

Review Comment:
   Question: this is the one just added before, right? If it already exists in last release, we cannot delete it.



##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java:
##########
@@ -408,4 +420,129 @@ private int getNumExtraReplicas(ClusterConfig clusterConfig) {
     }
     return numExtraReplicas;
   }
+
+
+  private Map<String, String> computeBestPossibleStateForPartition(ResourceControllerDataProvider cache,

Review Comment:
   This function has a lot of redundant code with existing logic. I would suggest to simplify it.
   
   



##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java:
##########
@@ -408,4 +420,129 @@ private int getNumExtraReplicas(ClusterConfig clusterConfig) {
     }
     return numExtraReplicas;
   }
+
+
+  private Map<String, String> computeBestPossibleStateForPartition(ResourceControllerDataProvider cache,
+      StateModelDefinition stateModelDef, List<String> preferenceList,
+      CurrentStateOutput currentStateOutput, Set<String> disabledInstancesForPartition,
+      IdealState idealState, ClusterConfig clusterConfig, Partition partition,
+      MonitoredAbnormalResolver monitoredResolver) {
+
+    Optional<Map<String, String>> optionalOverwrittenStates =
+        computeStatesOverwriteForPartition(stateModelDef, preferenceList, currentStateOutput,
+            idealState, partition, monitoredResolver);
+    if (optionalOverwrittenStates.isPresent()) {
+      return optionalOverwrittenStates.get();
+    }
+
+    Map<String, String> currentStateMap = new HashMap<>(
+        currentStateOutput.getCurrentStateMap(idealState.getResourceName(), partition));
+    // Instances not in preference list but still have active replica, retain to avoid zero replica during movement
+    List<String> currentInstances = new ArrayList<>(currentStateMap.keySet());
+    Collections.sort(currentInstances);
+    Map<String, String> pendingStates =
+        new HashMap<>(currentStateOutput.getPendingStateMap(idealState.getResourceName(), partition));
+    for (String instance : pendingStates.keySet()) {
+      if (!currentStateMap.containsKey(instance)) {
+        currentStateMap.put(instance, stateModelDef.getInitialState());
+        currentInstances.add(instance);
+      }
+    }
+
+    Set<String> instancesToDrop = new HashSet<>();
+    Iterator<String> it = currentInstances.iterator();
+    while (it.hasNext()) {
+      String instance = it.next();
+      String state = currentStateMap.get(instance);
+      if (state == null) {
+        it.remove();
+        instancesToDrop.add(instance); // These instances should be set to DROPPED after we get bestPossibleStateMap;
+      }
+    }
+
+    // Sort the instancesToMove by their current partition state.
+    // Reason: because the states are assigned to instances in the order appeared in preferenceList, if we have
+    // [node1:Slave, node2:Master], we want to keep it that way, instead of assigning Master to node1.
+
+    if (preferenceList == null) {
+      preferenceList = Collections.emptyList();
+    }
+
+    boolean isPreferenceListEmpty = preferenceList.isEmpty();
+
+    int numExtraReplicas = getNumExtraReplicas(clusterConfig);
+
+    // TODO : Keep the behavior consistent with existing state count, change back to read from idealstate
+    // replicas
+    int numReplicas = preferenceList.size();
+    List<String> instanceToAdd = new ArrayList<>(preferenceList);
+    instanceToAdd.removeAll(currentInstances);
+
+    List<String> combinedPreferenceList = new ArrayList<>();
+
+    if (currentInstances.size() <= numReplicas
+        && numReplicas + numExtraReplicas - currentInstances.size() > 0) {
+      int subListSize = numReplicas + numExtraReplicas - currentInstances.size();
+      combinedPreferenceList.addAll(instanceToAdd
+          .subList(0, Math.min(subListSize, instanceToAdd.size())));
+    }
+
+    // Make all initial state instance not in preference list to be dropped.
+    Map<String, String> currentMapWithPreferenceList = new HashMap<>(currentStateMap);
+    currentMapWithPreferenceList.keySet().retainAll(preferenceList);
+
+    combinedPreferenceList.addAll(currentInstances);
+    combinedPreferenceList.sort(new PreferenceListNodeComparator(currentStateMap, stateModelDef, preferenceList));
+
+    // if preference list is not empty, and we do have new intanceToAdd, we should check if it has capacity to hold the partition.
+    // if (!isPreferenceListEmpty && combinedPreferenceList.size() > numReplicas && instanceToAdd.size() > 0) {

Review Comment:
   Remove this if it is not required.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] desaikomal commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "desaikomal (via GitHub)" <gi...@apache.org>.
desaikomal commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1244316338


##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java:
##########
@@ -408,4 +420,129 @@ private int getNumExtraReplicas(ClusterConfig clusterConfig) {
     }
     return numExtraReplicas;
   }
+
+
+  private Map<String, String> computeBestPossibleStateForPartition(ResourceControllerDataProvider cache,
+      StateModelDefinition stateModelDef, List<String> preferenceList,
+      CurrentStateOutput currentStateOutput, Set<String> disabledInstancesForPartition,
+      IdealState idealState, ClusterConfig clusterConfig, Partition partition,
+      MonitoredAbnormalResolver monitoredResolver) {
+
+    Optional<Map<String, String>> optionalOverwrittenStates =
+        computeStatesOverwriteForPartition(stateModelDef, preferenceList, currentStateOutput,
+            idealState, partition, monitoredResolver);
+    if (optionalOverwrittenStates.isPresent()) {
+      return optionalOverwrittenStates.get();
+    }
+
+    Map<String, String> currentStateMap = new HashMap<>(
+        currentStateOutput.getCurrentStateMap(idealState.getResourceName(), partition));
+    // Instances not in preference list but still have active replica, retain to avoid zero replica during movement
+    List<String> currentInstances = new ArrayList<>(currentStateMap.keySet());
+    Collections.sort(currentInstances);
+    Map<String, String> pendingStates =
+        new HashMap<>(currentStateOutput.getPendingStateMap(idealState.getResourceName(), partition));
+    for (String instance : pendingStates.keySet()) {
+      if (!currentStateMap.containsKey(instance)) {
+        currentStateMap.put(instance, stateModelDef.getInitialState());
+        currentInstances.add(instance);
+      }
+    }
+
+    Set<String> instancesToDrop = new HashSet<>();
+    Iterator<String> it = currentInstances.iterator();
+    while (it.hasNext()) {
+      String instance = it.next();
+      String state = currentStateMap.get(instance);
+      if (state == null) {
+        it.remove();
+        instancesToDrop.add(instance); // These instances should be set to DROPPED after we get bestPossibleStateMap;
+      }
+    }
+
+    // Sort the instancesToMove by their current partition state.
+    // Reason: because the states are assigned to instances in the order appeared in preferenceList, if we have
+    // [node1:Slave, node2:Master], we want to keep it that way, instead of assigning Master to node1.
+
+    if (preferenceList == null) {
+      preferenceList = Collections.emptyList();
+    }
+
+    boolean isPreferenceListEmpty = preferenceList.isEmpty();
+
+    int numExtraReplicas = getNumExtraReplicas(clusterConfig);
+
+    // TODO : Keep the behavior consistent with existing state count, change back to read from idealstate
+    // replicas
+    int numReplicas = preferenceList.size();
+    List<String> instanceToAdd = new ArrayList<>(preferenceList);
+    instanceToAdd.removeAll(currentInstances);
+
+    List<String> combinedPreferenceList = new ArrayList<>();
+
+    if (currentInstances.size() <= numReplicas
+        && numReplicas + numExtraReplicas - currentInstances.size() > 0) {
+      int subListSize = numReplicas + numExtraReplicas - currentInstances.size();
+      combinedPreferenceList.addAll(instanceToAdd
+          .subList(0, Math.min(subListSize, instanceToAdd.size())));
+    }
+
+    // Make all initial state instance not in preference list to be dropped.
+    Map<String, String> currentMapWithPreferenceList = new HashMap<>(currentStateMap);
+    currentMapWithPreferenceList.keySet().retainAll(preferenceList);
+
+    combinedPreferenceList.addAll(currentInstances);
+    combinedPreferenceList.sort(new PreferenceListNodeComparator(currentStateMap, stateModelDef, preferenceList));
+
+    // if preference list is not empty, and we do have new intanceToAdd, we should check if it has capacity to hold the partition.
+    // if (!isPreferenceListEmpty && combinedPreferenceList.size() > numReplicas && instanceToAdd.size() > 0) {

Review Comment:
   Sure,



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] junkaixue commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "junkaixue (via GitHub)" <gi...@apache.org>.
junkaixue commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1244433767


##########
helix-core/src/main/java/org/apache/helix/controller/dataproviders/ResourceControllerDataProvider.java:
##########
@@ -475,4 +481,24 @@ final void refreshStablePartitionList(Map<String, IdealState> idealStateMap) {
       }
     }
   }
+
+  public void setWagedCapacityProviders(WagedInstanceCapacity capacityProvider, WagedResourceWeightsProvider resourceWeightProvider) {

Review Comment:
   NIT: java doc.



##########
helix-core/src/main/java/org/apache/helix/controller/dataproviders/ResourceControllerDataProvider.java:
##########
@@ -475,4 +481,24 @@ final void refreshStablePartitionList(Map<String, IdealState> idealStateMap) {
       }
     }
   }
+
+  public void setWagedCapacityProviders(WagedInstanceCapacity capacityProvider, WagedResourceWeightsProvider resourceWeightProvider) {
+    // WAGED specific capacity / weight provider
+    _wagedInstanceCapacity = capacityProvider;
+    _wagedPartitionWeightProvider = resourceWeightProvider;
+  }
+
+  public boolean checkAndReduceCapacity(String instance, String resourceName, String partition) {

Review Comment:
   NIT: java doc.



##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java:
##########
@@ -274,59 +276,208 @@ public ResourceAssignment computeBestPossiblePartitionState(ResourceControllerDa
     return partitionMapping;
   }
 
+
   @Override
   protected Map<String, String> computeBestPossibleStateForPartition(Set<String> liveInstances,

Review Comment:
   Since this is not public API, can we directly change liveInstances to cache? And refactor the abstract rebalancer part?
   
   If not, I would suggest to add cache as the last argument to form a new function. Then we can have old function call pass null value to cache to skip the WAGED check. That can reduce more redundant code.



##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/AbstractRebalancer.java:
##########
@@ -106,15 +106,26 @@ public ResourceAssignment computeBestPossiblePartitionState(
       List<String> preferenceList = getPreferenceList(partition, idealState,
           Collections.unmodifiableSet(cache.getLiveInstances().keySet()));
       Map<String, String> bestStateForPartition =
-          computeBestPossibleStateForPartition(cache.getLiveInstances().keySet(), stateModelDef,
-              preferenceList, currentStateOutput, disabledInstancesForPartition, idealState,
-              cache.getClusterConfig(), partition,
-              cache.getAbnormalStateResolver(stateModelDefName));
+          computeBestPossibleStateForPartition(cache, stateModelDef, preferenceList,
+              currentStateOutput, disabledInstancesForPartition, idealState,
+              partition);
       partitionMapping.addReplicaMap(partition, bestStateForPartition);
     }
     return partitionMapping;
   }
 
+  protected Map<String, String> computeBestPossibleStateForPartition(T cache,
+      StateModelDefinition stateModelDef, List<String> preferenceList,
+      CurrentStateOutput currentStateOutput, Set<String> disabledInstancesForPartition,
+      IdealState idealState, Partition partition) {
+
+    String stateModelDefName = idealState.getStateModelDefRef();

Review Comment:
   NIT: this variable just used one, can you make it inline call instead of defining a variable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] desaikomal commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "desaikomal (via GitHub)" <gi...@apache.org>.
desaikomal commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1248924909


##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedInstanceCapacity.java:
##########
@@ -42,79 +49,102 @@ public class WagedInstanceCapacity implements InstanceCapacityDataProvider {
 
   // Available Capacity per Instance
   private final Map<String, Map<String, Integer>> _instanceCapacityMap;
-  private final ResourceControllerDataProvider _cache;
+  private final Map<String, Map<String, Set<String>>> _allocatedPartitionsMap;
 
   public WagedInstanceCapacity(ResourceControllerDataProvider clusterData) {
-    _cache = clusterData;
     _instanceCapacityMap = new HashMap<>();
-
-    ClusterConfig clusterConfig = _cache.getClusterConfig();
-    for (InstanceConfig instanceConfig : _cache.getInstanceConfigMap().values()) {
-      Map<String, Integer> instanceCapacity =
-        WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, instanceConfig);
+    _allocatedPartitionsMap = new HashMap<>();
+    ClusterConfig clusterConfig = clusterData.getClusterConfig();
+    for (InstanceConfig instanceConfig : clusterData.getInstanceConfigMap().values()) {
+      Map<String, Integer> instanceCapacity = WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, instanceConfig);
       _instanceCapacityMap.put(instanceConfig.getInstanceName(), instanceCapacity);
+
+      _allocatedPartitionsMap.put(instanceConfig.getInstanceName(), new HashMap<>());
     }
   }
 
-  /**
-   * Create Default Capacity Map.
-   * This is a utility method to create a default capacity map matching instance capacity map for participants.
-   * This is required as non-WAGED partitions will be placed on same instance and we don't know their actual capacity.
-   * This will generate default values of 0 for all the capacity keys.
-   */
-  private Map<String, Integer> createDefaultParticipantWeight() {
-    // copy the value of first Instance capacity.
-    Map<String, Integer> partCapacity = new HashMap<>(_instanceCapacityMap.values().iterator().next());
+  // Helper methods.
+  private boolean isPartitionInAllocatedMap(String instance, String resource, String partition) {
+    return _allocatedPartitionsMap.get(instance).containsKey(resource)
+        && _allocatedPartitionsMap.get(instance).get(resource).contains(partition);
+  }
 
-    // Set the value of all capacity to -1.
-    for (String key : partCapacity.keySet()) {
-      partCapacity.put(key, -1);
-    }
-    return partCapacity;
+  public void process(ResourceControllerDataProvider cache, CurrentStateOutput currentStateOutput,
+      Map<String, Resource> resourceMap, WagedResourceWeightsProvider weightProvider) {
+    processPendingMessages(cache, currentStateOutput, resourceMap, weightProvider);
+    processCurrentState(cache, currentStateOutput, resourceMap, weightProvider);
   }
 
   /**
    * Process the pending messages based on the Current states
    * @param currentState - Current state of the resources.
    */
-  public void processPendingMessages(CurrentStateOutput currentState) {
-    Map<String, Map<Partition, Map<String, Message>>> pendingMsgs = currentState.getPendingMessages();
-
-    for (String resource : pendingMsgs.keySet()) {
-      Map<Partition, Map<String, Message>> partitionMsgs = pendingMsgs.get(resource);
+  public void processPendingMessages(ResourceControllerDataProvider cache,
+      CurrentStateOutput currentState, Map<String, Resource> resourceMap,
+      WagedResourceWeightsProvider weightProvider) {
+
+    for (Map.Entry<String, Resource> resourceEntry : resourceMap.entrySet()) {
+      String resName = resourceEntry.getKey();
+      Resource resource = resourceEntry.getValue();
+      // list of partitions in the resource
+      Collection<Partition> partitions = resource.getPartitions();
+      // State model definition for the resource
+      StateModelDefinition stateModelDef = cache.getStateModelDef(resource.getStateModelDefRef());
+      if (stateModelDef == null) {
+        LOG.warn("State Model Definition for resource: " + resName + " is null");
+        continue;
+      }
+      Map<String, Integer> statePriorityMap = stateModelDef.getStatePriorityMap();
 
-      for (Partition partition : partitionMsgs.keySet()) {
+      for (Partition partition : partitions) {
         String partitionName = partition.getPartitionName();
-
         // Get Partition Weight
-        Map<String, Integer> partCapacity = getPartitionCapacity(resource, partitionName);
-
-        // TODO - check
-        Map<String, Message> msgs = partitionMsgs.get(partition);
-        // TODO - Check
-        for (String instance : msgs.keySet()) {
-           reduceAvailableInstanceCapacity(instance, partCapacity);
+        Map<String, Integer> partCapacity = weightProvider.getPartitionWeights(resName, partitionName);
+
+        // Get the pending messages for the partition
+        Map<String, Message> pendingMessages = currentState.getPendingMessageMap(resName, partition);
+        if (pendingMessages != null && !pendingMessages.isEmpty()) {
+          for (Map.Entry<String, Message> entry :  pendingMessages.entrySet()) {
+            String instance = entry.getKey();
+            if (isPartitionInAllocatedMap(instance, resName, partitionName)) {
+              continue;
+            }
+            Message msg = entry.getValue();
+            if (statePriorityMap.get(msg.getFromState()) < statePriorityMap.get(msg.getToState())
+                && msg.getToState().equals(stateModelDef.getInitialState())

Review Comment:
   I was worried about double-counting as a result, we maintain 
   private final Map<String, Map<String, Set<String>>> _allocatedPartitionsMap;
   As long as for same instance, same resource/partition, if we have either pending message and/or currentState, we will account for only once as we maintain this additional metadata.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] junkaixue commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "junkaixue (via GitHub)" <gi...@apache.org>.
junkaixue commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1248139890


##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java:
##########
@@ -274,59 +276,208 @@ public ResourceAssignment computeBestPossiblePartitionState(ResourceControllerDa
     return partitionMapping;
   }
 
+
   @Override
   protected Map<String, String> computeBestPossibleStateForPartition(Set<String> liveInstances,

Review Comment:
   My bad. This method cannot be changed because AbstractRebalancer has been extended by user. Change method signature would lead build break...
   
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] junkaixue commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "junkaixue (via GitHub)" <gi...@apache.org>.
junkaixue commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1259101535


##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/util/WagedRebalanceUtil.java:
##########
@@ -61,7 +62,11 @@ public static Map<String, Integer> fetchCapacityUsage(String partitionName,
       ResourceConfig resourceConfig, ClusterConfig clusterConfig) {
     Map<String, Map<String, Integer>> capacityMap;
     try {
-      capacityMap = resourceConfig.getPartitionCapacityMap();
+      if (resourceConfig != null) {
+        capacityMap = resourceConfig.getPartitionCapacityMap();
+      } else {
+        capacityMap = new HashMap<>();
+        }

Review Comment:
   NIT: can be simplified as to one-liner
   
   capacityMap = resourceConfig == null ? new HashMap<>() : resourceConfig.getPartitionCapacityMap();
   



##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/waged/WagedInstanceCapacity.java:
##########
@@ -42,79 +49,125 @@ public class WagedInstanceCapacity implements InstanceCapacityDataProvider {
 
   // Available Capacity per Instance
   private final Map<String, Map<String, Integer>> _instanceCapacityMap;
-  private final ResourceControllerDataProvider _cache;
+  private final Map<String, Map<String, Set<String>>> _allocatedPartitionsMap;
 
   public WagedInstanceCapacity(ResourceControllerDataProvider clusterData) {
-    _cache = clusterData;
     _instanceCapacityMap = new HashMap<>();
-
-    ClusterConfig clusterConfig = _cache.getClusterConfig();
-    for (InstanceConfig instanceConfig : _cache.getInstanceConfigMap().values()) {
-      Map<String, Integer> instanceCapacity =
-        WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, instanceConfig);
+    _allocatedPartitionsMap = new HashMap<>();
+    ClusterConfig clusterConfig = clusterData.getClusterConfig();
+    if (clusterConfig == null) {
+      LOG.error("Cluster config is null, cannot initialize instance capacity map.");
+      return;
+    }
+    for (InstanceConfig instanceConfig : clusterData.getInstanceConfigMap().values()) {
+      Map<String, Integer> instanceCapacity = WagedValidationUtil.validateAndGetInstanceCapacity(clusterConfig, instanceConfig);
       _instanceCapacityMap.put(instanceConfig.getInstanceName(), instanceCapacity);
+      _allocatedPartitionsMap.put(instanceConfig.getInstanceName(), new HashMap<>());
     }
   }
 
-  /**
-   * Create Default Capacity Map.
-   * This is a utility method to create a default capacity map matching instance capacity map for participants.
-   * This is required as non-WAGED partitions will be placed on same instance and we don't know their actual capacity.
-   * This will generate default values of 0 for all the capacity keys.
-   */
-  private Map<String, Integer> createDefaultParticipantWeight() {
-    // copy the value of first Instance capacity.
-    Map<String, Integer> partCapacity = new HashMap<>(_instanceCapacityMap.values().iterator().next());
-
-    // Set the value of all capacity to -1.
-    for (String key : partCapacity.keySet()) {
-      partCapacity.put(key, -1);
+  // Helper methods.
+  private boolean isPartitionInAllocatedMap(String instance, String resource, String partition) {

Review Comment:
   I am OK with this as temporary protective method. Long run, we should based on the idea behavior construct the logic. Im not sure whether we have the scenario need to handle double counting. Suggest to have a TODO mark here for future clean up the logic.
   
   NIT: would be a better naming with "hasPartitionChargedForCapacity"
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] desaikomal commented on pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "desaikomal (via GitHub)" <gi...@apache.org>.
desaikomal commented on PR #2546:
URL: https://github.com/apache/helix/pull/2546#issuecomment-1631263161

   Thanks @junkaixue for your patience with the review. Thanks @qqu0127 , @junkaixue for review. This change is reviewed and approved.
   Commit message: WAGED - Fix the intermediate hard-constraint violation during n=n+1 state.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] xyuanlu merged pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "xyuanlu (via GitHub)" <gi...@apache.org>.
xyuanlu merged PR #2546:
URL: https://github.com/apache/helix/pull/2546


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] desaikomal commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "desaikomal (via GitHub)" <gi...@apache.org>.
desaikomal commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1244315900


##########
helix-core/src/main/java/org/apache/helix/controller/dataproviders/InstanceCapacityDataProvider.java:
##########
@@ -42,18 +42,10 @@ public interface InstanceCapacityDataProvider {
   /**
    * Check if partition can be placed on the instance.
    *
-   * @param instanceName - instance name 
+   * @param instanceName - instance name
    * @param partitionCapacity - Partition capacity expresed in capacity map.
    * @return boolean - True if the partition can be placed, False otherwise
    */
   public boolean isInstanceCapacityAvailable(String instanceName, Map<String, Integer> partitionCapacity);
 
-  /**
-   * Reduce the available capacity by specified Partition Capacity Map.
-   *
-   * @param instanceName - instance name 
-   * @param partitionCapacity - Partition capacity expresed in capacity map.
-   * @returns boolean - True if successfully updated partition capacity, false otherwise.
-   */
-  public boolean reduceAvailableInstanceCapacity(String instanceName, Map<String, Integer> partitionCapacity);

Review Comment:
   This is not in our release yet. It was checkedin as part-1 change. We haven't had releae with it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] desaikomal commented on a diff in pull request #2546: WAGED - Part 2 - Fix the core calculation for n - n+1 issue.

Posted by "desaikomal (via GitHub)" <gi...@apache.org>.
desaikomal commented on code in PR #2546:
URL: https://github.com/apache/helix/pull/2546#discussion_r1248019491


##########
helix-core/src/main/java/org/apache/helix/controller/rebalancer/DelayedAutoRebalancer.java:
##########
@@ -408,4 +420,129 @@ private int getNumExtraReplicas(ClusterConfig clusterConfig) {
     }
     return numExtraReplicas;
   }
+
+
+  private Map<String, String> computeBestPossibleStateForPartition(ResourceControllerDataProvider cache,

Review Comment:
   As we talked offline, I have done the code cleanup/consolidation with this change. the line of code change have increased but between the 2 versions, you can review core logic + code consolidation. thanks,



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org