You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@helix.apache.org by "Zhen Zhang (JIRA)" <ji...@apache.org> on 2014/11/13 04:34:03 UTC

[jira] [Created] (HELIX-547) AutoRebalancer may not converge in some rare situation

Zhen Zhang created HELIX-547:
--------------------------------

             Summary: AutoRebalancer may not converge in some rare situation
                 Key: HELIX-547
                 URL: https://issues.apache.org/jira/browse/HELIX-547
             Project: Apache Helix
          Issue Type: Bug
            Reporter: Zhen Zhang


We discovered that AutoRebalancer may not converge to a stable mapping in some rare situation. Assume we have a DB with 1024 partitions; using LeaderStandby state model; replica is 1; 6 nodes which are all alive. The current mapping is:
{noformat}
...
MyDB_873={localhost_5=LEADER}
...
{noformat}

Given:
{noformat}
allNodes=allLiveNodes={localhost_0, ..., localhost_5}
stateCountMap: {LEADER=1, STANDBY=0}
capacity: 2147483647
{noformat}

AutoRebalanceStrategy#computePartitionAssignment will output new mapping:
{noformat}
...
MyDB_873={localhost_1=LEADER}
...
{noformat}

Then Helix controller will send LEADER->STANDBY to localhost_5, and OFFLINE->STANDBY to localhost_1, so next time when auto rebalancer is triggered, the current mapping becomes:
{noformat}
...
MyDB_873={localhost_5=STANDBY, localhost_1=STANDBY}
...
{noformat}

In this case, AutoRebalanceStrategy#computePartitionAssignment will output new mapping:
{noformat}
...
MyDB_873={localhost_5=LEADER}
...
{noformat}

Thus AutoRebalanceStrategy#computePartitionAssignment keeps assign localhost_1 and localhost_5 to MyDB_873 alternatively without converging to a stable mapping.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)