You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@helix.apache.org by mkscrg <gi...@git.apache.org> on 2016/11/03 16:29:43 UTC

[GitHub] helix pull request #59: helix-core: AutoRebalanceStrategy should sort by par...

GitHub user mkscrg opened a pull request:

    https://github.com/apache/helix/pull/59

    helix-core: AutoRebalanceStrategy should sort by partition count and state priority

    This is an alternative fix to #58. Here `AutoRebalanceStrategy` uses state priorities in addition to partition count to sort live nodes, rather than dropping no-priority states from the current mapping.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mkscrg/helix auto-sort-with-priority

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/helix/pull/59.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #59
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] helix issue #59: helix-core: AutoRebalanceStrategy should sort by partition ...

Posted by kishoreg <gi...@git.apache.org>.
Github user kishoreg commented on the issue:

    https://github.com/apache/helix/pull/59
  
    Maybe we should keep the partition count map in place.I am thinking about the case where new nodes get added, will the new change stop redistributing partitions.
    
    A simple test case to try is 
    
    2 nodes (n1,n2) with 6 partitions 
    add a node(n3) 
    
     we should see one partition from n1,n2 move to n3. We already have that test case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] helix issue #59: helix-core: AutoRebalanceStrategy should sort by partition ...

Posted by lei-xia <gi...@git.apache.org>.
Github user lei-xia commented on the issue:

    https://github.com/apache/helix/pull/59
  
    @mkscrg  we already generated the 0.6.6 release,  will include this fix into our next release.  We will have more frequent releases from now on,  and next release is expected to be out in another 1 month or so.  If you really need this to be included in our official release and can not wait another month, please let us know and we can discuss internally to see whether we can build a hot release for you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] helix pull request #59: helix-core: AutoRebalanceStrategy should sort by par...

Posted by lei-xia <gi...@git.apache.org>.
Github user lei-xia commented on a diff in the pull request:

    https://github.com/apache/helix/pull/59#discussion_r88552224
  
    --- Diff: helix-core/src/main/java/org/apache/helix/controller/rebalancer/strategy/AutoRebalanceStrategy.java ---
    @@ -791,43 +795,61 @@ public String getLocation(int partitionId, int replicaId, int numPartitions, int
       }
     
       /**
    -   * Sorter for live nodes that sorts firstly according to the number of partitions currently
    -   * registered against a node (more partitions means sort earlier), then by node name.
    +   * Sorter for live nodes that sorts
    +   * - first by the sum of "state scores" of partitions currently registered to a node
    +   *   (more higher-priority partitions means sort earlier)
    +   * - then by node name.
        * This prevents unnecessarily moving partitions due to the capacity assignment
        * unnecessarily reducing the capacity of lower down elements.
        */
       private static class CurrentStateNodeComparator implements Comparator<String> {
     
         /**
    -     * The number of partitions that are active for each participant.
    +     * The sum of state scores for partitions on each . (Map<ParticipantId, Integer>)
    +     * State scores are the reverse of state priorities, s.t. states with lower priority numbers have higher scores.
          */
    -    private final Map<String, Integer> partitionCounts;
    +    private final Map<String, Integer> participantStatePriorities;
     
         /**
          * Create it.
          * @param currentMapping The current mapping of partitions to participants.
    +     * @param stateModelDef The resource's associated state model.
          */
    -    public CurrentStateNodeComparator(Map<String, Map<String, String>> currentMapping) {
    -      partitionCounts = new HashMap<String, Integer>();
    -      for (Entry<String, Map<String, String>> entry : currentMapping.entrySet()) {
    -        for (String participantId : entry.getValue().keySet()) {
    -          Integer existing = partitionCounts.get(participantId);
    -          partitionCounts.put(participantId, existing != null ? existing + 1 : 1);
    +    public CurrentStateNodeComparator(Map<String, Map<String, String>> currentMapping,
    +                                      StateModelDefinition stateModelDef) {
    +      Map<String, Integer> stateScores = getStateScores(stateModelDef);
    +
    +      participantStatePriorities = new HashMap<String, Integer>();
    +      for (Map<String, String> participantStates : currentMapping.values()) {
    +        for (String participantId : participantStates.keySet()) {
    +          int existing = participantStatePriorities.getOrDefault(participantId, 0);
    +          participantStatePriorities.put(participantId, existing + stateScores.getOrDefault(participantId, 0));
    --- End diff --
    
    same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] helix issue #59: helix-core: AutoRebalanceStrategy should sort by partition ...

Posted by mkscrg <gi...@git.apache.org>.
Github user mkscrg commented on the issue:

    https://github.com/apache/helix/pull/59
  
    @lei-xia OK sounds good. Will keep in touch :+1:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] helix pull request #59: helix-core: AutoRebalanceStrategy should sort by par...

Posted by lei-xia <gi...@git.apache.org>.
Github user lei-xia commented on a diff in the pull request:

    https://github.com/apache/helix/pull/59#discussion_r88552181
  
    --- Diff: helix-core/src/main/java/org/apache/helix/controller/rebalancer/strategy/AutoRebalanceStrategy.java ---
    @@ -791,43 +795,61 @@ public String getLocation(int partitionId, int replicaId, int numPartitions, int
       }
     
       /**
    -   * Sorter for live nodes that sorts firstly according to the number of partitions currently
    -   * registered against a node (more partitions means sort earlier), then by node name.
    +   * Sorter for live nodes that sorts
    +   * - first by the sum of "state scores" of partitions currently registered to a node
    +   *   (more higher-priority partitions means sort earlier)
    +   * - then by node name.
        * This prevents unnecessarily moving partitions due to the capacity assignment
        * unnecessarily reducing the capacity of lower down elements.
        */
       private static class CurrentStateNodeComparator implements Comparator<String> {
     
         /**
    -     * The number of partitions that are active for each participant.
    +     * The sum of state scores for partitions on each . (Map<ParticipantId, Integer>)
    +     * State scores are the reverse of state priorities, s.t. states with lower priority numbers have higher scores.
          */
    -    private final Map<String, Integer> partitionCounts;
    +    private final Map<String, Integer> participantStatePriorities;
     
         /**
          * Create it.
          * @param currentMapping The current mapping of partitions to participants.
    +     * @param stateModelDef The resource's associated state model.
          */
    -    public CurrentStateNodeComparator(Map<String, Map<String, String>> currentMapping) {
    -      partitionCounts = new HashMap<String, Integer>();
    -      for (Entry<String, Map<String, String>> entry : currentMapping.entrySet()) {
    -        for (String participantId : entry.getValue().keySet()) {
    -          Integer existing = partitionCounts.get(participantId);
    -          partitionCounts.put(participantId, existing != null ? existing + 1 : 1);
    +    public CurrentStateNodeComparator(Map<String, Map<String, String>> currentMapping,
    +                                      StateModelDefinition stateModelDef) {
    +      Map<String, Integer> stateScores = getStateScores(stateModelDef);
    +
    +      participantStatePriorities = new HashMap<String, Integer>();
    +      for (Map<String, String> participantStates : currentMapping.values()) {
    +        for (String participantId : participantStates.keySet()) {
    +          int existing = participantStatePriorities.getOrDefault(participantId, 0);
    --- End diff --
    
    Also, you are using Java 8 feature, Helix still use Java 6 now, please refactor this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] helix issue #59: helix-core: AutoRebalanceStrategy should sort by partition ...

Posted by lei-xia <gi...@git.apache.org>.
Github user lei-xia commented on the issue:

    https://github.com/apache/helix/pull/59
  
    Looks good to me.  Could you please include a test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] helix issue #59: helix-core: AutoRebalanceStrategy should sort by partition ...

Posted by mkscrg <gi...@git.apache.org>.
Github user mkscrg commented on the issue:

    https://github.com/apache/helix/pull/59
  
    @kishoreg confirmed I can run individual tests with e.g. `mvn clean test -Dtest=TestAutoRebalance -DfailIfNoTests=false`. The full `helix-core` test suite hits OOM errors on my machine, but that may be a local config issue.
    
    @lei-xia can we get this into the 0.6.6 release?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] helix issue #59: helix-core: AutoRebalanceStrategy should sort by partition ...

Posted by kishoreg <gi...@git.apache.org>.
Github user kishoreg commented on the issue:

    https://github.com/apache/helix/pull/59
  
    LGTM. @lei-xia can you also take a look at this. 
    
    Regarding test case, standard mvn command to run a specific test should work. mvn clean install should work. What is the failure you are seeing? maybe we should discuss that on mailing list


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] helix issue #59: helix-core: AutoRebalanceStrategy should sort by partition ...

Posted by mkscrg <gi...@git.apache.org>.
Github user mkscrg commented on the issue:

    https://github.com/apache/helix/pull/59
  
    I think this strategy accounts for partition count and priority:
    - it assigns each state a "score", which is just a reverse numbering of the priority-ordered state list
    - it ranks nodes according to the sum of the state scores of the partitions on the node
    
    I modified my test code to your 2/6 scenario, and it worked as expected:
    
    ```
    Setting up cluster THE_CLUSTER
    Starting CONTROLLER
    Starting NODE_0
    Starting NODE_1
    Adding resource R
    Rebalancing resource R
    Transition: NODE_0 OFFLINE to ONLINE for R_0
    Transition: NODE_1 OFFLINE to ONLINE for R_3
    Transition: NODE_1 OFFLINE to ONLINE for R_5
    Transition: NODE_1 OFFLINE to ONLINE for R_1
    Transition: NODE_0 OFFLINE to ONLINE for R_4
    Transition: NODE_0 OFFLINE to ONLINE for R_2
    Cluster state after setup:
                    NODE_0  NODE_1
            R_0     ONLINE  null
            R_1     null    ONLINE
            R_2     ONLINE  null
            R_3     null    ONLINE
            R_4     ONLINE  null
            R_5     null    ONLINE
    ------------------------------------------------------------
    Starting NODE_2
    Transition: NODE_0 ONLINE to OFFLINE for R_2
    Transition: NODE_1 ONLINE to OFFLINE for R_5
    Transition: NODE_0 OFFLINE to DROPPED for R_2
    Transition: NODE_1 OFFLINE to DROPPED for R_5
    Transition: NODE_2 OFFLINE to ONLINE for R_5
    Transition: NODE_2 OFFLINE to ONLINE for R_2
    Cluster state after starting third node:
                    NODE_0  NODE_1  NODE_2
            R_0     ONLINE  null    null
            R_1     null    ONLINE  null
            R_2     null    null    ONLINE
            R_3     null    ONLINE  null
            R_4     ONLINE  null    null
            R_5     null    null    ONLINE
    ------------------------------------------------------------
    ```
    
    I can gist the test code if helpful.
    
    You mentioned 
    > We already have that test case.
    How do you run the test cases? I've noticed the `build` script has them disabled, and a simple `mvn clean install` fails.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] helix issue #59: helix-core: AutoRebalanceStrategy should sort by partition ...

Posted by lei-xia <gi...@git.apache.org>.
Github user lei-xia commented on the issue:

    https://github.com/apache/helix/pull/59
  
    Could you please also rebase to the HEAD of the branch?  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] helix issue #59: helix-core: AutoRebalanceStrategy should sort by partition ...

Posted by kishoreg <gi...@git.apache.org>.
Github user kishoreg commented on the issue:

    https://github.com/apache/helix/pull/59
  
    Any update on this PR. I think we are seeing similar issues in one of our projects


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---