You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@helix.apache.org by Geoffroy Fouquier <ge...@exensa.com> on 2020/03/11 09:35:53 UTC

User-defined rebalancer

Hello,

  I am trying whithout success to set up a user-defined rebalancer and I 
don't understand what could be the problem.

I set-up a standalone cluster with 3 instances, add a resource with 20 
partitions (and in a nutshell follow the user-defined rebalancer 
tutorial). With the same code, when i used the semi-auto balancer, i got 
this ideal state :

IdealState for crawlDB:
{
   "id" : "crawlDB",
   "mapFields" : {
     "crawlDB_0" : { "standalone-1.localhost" : "MASTER" },
     "crawlDB_1" : { "standalone-1.localhost" : "MASTER" },
     "crawlDB_10" : { "standalone-1.localhost" : "MASTER" },
     "crawlDB_11" : { "standalone-1.localhost" : "MASTER" },
     "crawlDB_12" : { "standalone-1.localhost" : "MASTER" },
     "crawlDB_13" : { "standalone-1.localhost" : "MASTER" },
     "crawlDB_14" : { "standalone-1.localhost" : "MASTER" },
     "crawlDB_15" : { "standalone-2.localhost" : "MASTER" },
     "crawlDB_16" : { "standalone-2.localhost" : "MASTER" },
     "crawlDB_17" : { "standalone-3.localhost" : "MASTER" },
     "crawlDB_18" : { "standalone-2.localhost" : "MASTER" },
     "crawlDB_19" : { "standalone-2.localhost" : "MASTER" },
     "crawlDB_2" : { "standalone-3.localhost" : "MASTER" },
     "crawlDB_3" : { "standalone-2.localhost" : "MASTER" },
     "crawlDB_4" : { "standalone-3.localhost" : "MASTER" },
     "crawlDB_5" : { "standalone-3.localhost" : "MASTER" },
     "crawlDB_6" : { "standalone-3.localhost" : "MASTER" },
     "crawlDB_7" : { "standalone-2.localhost" : "MASTER" },
     "crawlDB_8" : { "standalone-3.localhost" : "MASTER" },
     "crawlDB_9" : { "standalone-2.localhost" : "MASTER" }
   },
   "listFields" : {
     "crawlDB_0" : [ "standalone-1.localhost" ],
     "crawlDB_1" : [ "standalone-1.localhost" ],
     "crawlDB_10" : [ "standalone-1.localhost" ],
     "crawlDB_11" : [ "standalone-1.localhost" ],
     "crawlDB_12" : [ "standalone-1.localhost" ],
     "crawlDB_13" : [ "standalone-1.localhost" ],
     "crawlDB_14" : [ "standalone-1.localhost" ],
     "crawlDB_15" : [ "standalone-2.localhost" ],
     "crawlDB_16" : [ "standalone-2.localhost" ],
     "crawlDB_17" : [ "standalone-3.localhost" ],
     "crawlDB_18" : [ "standalone-2.localhost" ],
     "crawlDB_19" : [ "standalone-2.localhost" ],
     "crawlDB_2" : [ "standalone-3.localhost" ],
     "crawlDB_3" : [ "standalone-2.localhost" ],
     "crawlDB_4" : [ "standalone-3.localhost" ],
     "crawlDB_5" : [ "standalone-3.localhost" ],
     "crawlDB_6" : [ "standalone-3.localhost" ],
     "crawlDB_7" : [ "standalone-2.localhost" ],
     "crawlDB_8" : [ "standalone-3.localhost" ],
     "crawlDB_9" : [ "standalone-2.localhost" ]
   },
   "simpleFields" : {
     "IDEAL_STATE_MODE" : "AUTO",
     "NUM_PARTITIONS" : "20",
     "REBALANCE_MODE" : "SEMI_AUTO",
     "REBALANCE_STRATEGY" : "DEFAULT",
     "REPLICAS" : "1",
     "STATE_MODEL_DEF_REF" : "MasterSlave",
     "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
   }
}

which is correct. Now if used my own rebalancer (based on a simple 
modulo to compute preferences and state-map), the generated mapping 
remains empty. Just for testing purpose, I made a test where i used the 
SemiAutoRebalancer class as a user-defined balancer and i got the same 
result :

IdealState for crawlDB:
{
   "id" : "crawlDB",
   "mapFields" : {
     "crawlDB_0" : { },
     "crawlDB_1" : { },
     "crawlDB_10" : { },
     "crawlDB_11" : { },
     "crawlDB_12" : { },
     "crawlDB_13" : { },
     "crawlDB_14" : { },
     "crawlDB_15" : { },
     "crawlDB_16" : { },
     "crawlDB_17" : { },
     "crawlDB_18" : { },
     "crawlDB_19" : { },
     "crawlDB_2" : { },
     "crawlDB_3" : { },
     "crawlDB_4" : { },
     "crawlDB_5" : { },
     "crawlDB_6" : { },
     "crawlDB_7" : { },
     "crawlDB_8" : { },
     "crawlDB_9" : { }
   },
   "listFields" : {
     "crawlDB_0" : [ ],
     "crawlDB_1" : [ ],
     "crawlDB_10" : [ ],
     "crawlDB_11" : [ ],
     "crawlDB_12" : [ ],
     "crawlDB_13" : [ ],
     "crawlDB_14" : [ ],
     "crawlDB_15" : [ ],
     "crawlDB_16" : [ ],
     "crawlDB_17" : [ ],
     "crawlDB_18" : [ ],
     "crawlDB_19" : [ ],
     "crawlDB_2" : [ ],
     "crawlDB_3" : [ ],
     "crawlDB_4" : [ ],
     "crawlDB_5" : [ ],
     "crawlDB_6" : [ ],
     "crawlDB_7" : [ ],
     "crawlDB_8" : [ ],
     "crawlDB_9" : [ ]
   },
   "simpleFields" : {
     "IDEAL_STATE_MODE" : "AUTO",
     "NUM_PARTITIONS" : "20",
     "REBALANCER_CLASS_NAME" : 
"org.apache.helix.controller.rebalancer.SemiAutoRebalancer",
     "REBALANCE_MODE" : "USER_DEFINED",
     "REBALANCE_STRATEGY" : "DEFAULT",
     "REPLICAS" : "1",
     "STATE_MODEL_DEF_REF" : "MasterSlave",
     "STATE_MODEL_FACTORY_NAME" : "DEFAULT"
   }
}

I also try to change the rebalance strategy since the default one (auto) 
doesn't seem to compute a mapping if no live instance are present. But 
the same stragegy is used with the semi-auto balancer above and it 
works. Any clue ?

Thanks !