You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@helix.apache.org by Grainier Perera <gr...@apache.org> on 2022/06/18 03:24:03 UTC

Some resources won't become online

Hi Devs,

I'm trying to add several resources to the cluster using the following
configurations[1]. However, only some will become `ONLINE`. What could be
the reason? Is there a way to guarantee every resource will become `ONLINE`
if WAGED capacity constraints are met?

You can see with the same IdealState, "_mm:root:_system:cron3" has mapFields
and it is ONLINE, and "_mm:root:_system:cron2" is not. Furthermore, I see
this behavior more often when the replicas count is set to 1.

ResourceInfo:
1. "_mm:root:_system:cron2"

IdealState for _mm:root:_system:cron2:
{
  "id" : "_mm:root:_system:cron2",
  "simpleFields" : {
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "MAX_PARTITIONS_PER_INSTANCE" : "1",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
  },
  "mapFields" : {
    "_mm:root:_system:cron2_0" : { }
  },
  "listFields" : {
    "_mm:root:_system:cron2_0" : [ ]
  }
}


ExternalView for _mm:root:_system:cron2:
{
  "id" : "_mm:root:_system:cron2",
  "simpleFields" : {
    "BUCKET_SIZE" : "0",
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "MAX_PARTITIONS_PER_INSTANCE" : "1",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
  },
  *"mapFields" : { },*
  "listFields" : { }
}


2. "_mm:root:_system:cron3"

IdealState for _mm:root:_system:cron3:
{
  "id" : "_mm:root:_system:cron3",
  "simpleFields" : {
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "MAX_PARTITIONS_PER_INSTANCE" : "1",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
  },
  "mapFields" : {
    "_mm:root:_system:cron3_0" : { }
  },
  "listFields" : {
    "_mm:root:_system:cron3_0" : [ ]
  }
}


ExternalView for _mm:root:_system:cron3:
{
  "id" : "_mm:root:_system:cron3",
  "simpleFields" : {
    "BUCKET_SIZE" : "0",
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "MAX_PARTITIONS_PER_INSTANCE" : "1",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
  },
  *"mapFields" : {*
*    "_mm:root:_system:cron3_0" : {*
*      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
*    }*
*  },*
  "listFields" : { }
}


[1]: https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb

Thank you.
Grainier Perera.

Re: Some resources won't become online

Posted by Grainier Perera <gr...@apache.org>.
Hi Junkai,

Thank you so much for the explanation!

Thanks & Regards,
Grainier Perera.


On Wed, 22 Jun 2022 at 00:54, Junkai Xue <jx...@apache.org> wrote:

> "STANDALONE" means the controller you started just for that cluster
> management.
>
> Usually, in real production to guarantee controllers' high availability,
> we will create a cluster called "super cluster". Controllers join that
> cluster as CONTROLLER_PARTICIPANT. It will manage controllers to decide
> which controller is the leader of which real application cluster.
>
> We will have a tutorial for that later. It should be in open source doc
> but I cannot find it right now.
>
> Best,
>
> Junkai
>
>
> On Sun, Jun 19, 2022 at 10:14 PM Grainier Perera <gr...@apache.org>
> wrote:
>
>> Hi Junkai,
>>
>> Thank you so much. It worked. I've set the controller mode to
>> `STANDALONE` and now everything seems to be working as expected.
>>
>> One small question, does `STANDALONE` means it's using an embedded
>> controller? And is having a `STANDALONE` controller per instance a
>> good idea?
>>
>> Thank you,
>> Grainier Perera.
>>
>>
>> On Mon, 20 Jun 2022 at 00:08, Junkai Xue <ju...@gmail.com> wrote:
>>
>>> Ah. I found the problem. I would suggest you to enable this entry for
>>> cluster config. "PERSIST_INTERMEDIATE_ASSIGNMENT":"true"
>>>
>>> It will give you how Helix assignment for FULL_AUTO in IdealState. Once
>>> you enable, you will get which instance it should assign for the resource.
>>> Now it is very clear that, you add you controller instance in your code
>>> as a participant:
>>>
>>> To me, resource 4 is assigned to controller, which does not accept
>>> partition bootstrap:
>>>
>>> {
>>>   "id" : "resource4",
>>>   "simpleFields" : {
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "1000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "OnlineOffline"
>>>   },
>>>   "mapFields" : {
>>>     "resource4_0" : {
>>>       "CEPControllerName-16e8ca90-df6f-4252-9ce8-3efdcce24f4a" : "ONLINE"
>>>     }
>>>   },
>>>   "listFields" : {
>>>     "resource4_0" : [
>>> "CEPControllerName-16e8ca90-df6f-4252-9ce8-3efdcce24f4a" ]
>>>   }
>>> }
>>>
>>> Have a try on your side and do not make the controller as a participant
>>> for that cluster.
>>>
>>> best,
>>>
>>> Junkai
>>>
>>> On Sat, Jun 18, 2022 at 9:49 PM Grainier Perera <gr...@apache.org>
>>> wrote:
>>>
>>>> Hi Junkai,
>>>>
>>>> This is reproducible. Please find the sample code [1]. With this sample;
>>>>
>>>>    - Initially, I'm creating a cluster with 3 instances (Using OOTB
>>>>    `OnlineOfflineStateModelFactory` and WAGED rebalancer...)
>>>>    - Step 1: Adds 6 different resources to the cluster with 1
>>>>    partition and 1 replica each.
>>>>    - Step 2: Adds an additional instance to the cluster.
>>>>    - Step 3: Removes an existing instance from the cluster.
>>>>    - Step 4: Remove all resources.
>>>>
>>>> However, after Step 1, you can see resource1 and resource2 is not
>>>> getting assigned to any Instance.
>>>> c8cep_on_localhost_12000 c8cep_on_localhost_12001
>>>> c8cep_on_localhost_12002
>>>> resource1 - - -
>>>> resource2 ONLINE - -
>>>> resource3 - ONLINE
>>>> resource4 - - ONLINE
>>>> resource5 - - -
>>>> resource6 ONLINE - -
>>>> After other steps also, not every resource is getting rebalanced
>>>> properly.
>>>>
>>>> [1] https://gist.github.com/grainier/055511179d8b4a4f0c678f17889ed853
>>>>
>>>> Thanks,
>>>> Grainier Perera.
>>>>
>>>>
>>>> On Sun, 19 Jun 2022 at 08:32, Junkai Xue <jx...@apache.org> wrote:
>>>>
>>>>> BTW, have you setup proper capacity in InstanceConfig of the only
>>>>> instance?
>>>>>
>>>>> Best,
>>>>>
>>>>> Junkai
>>>>>
>>>>> On Sat, Jun 18, 2022 at 7:10 PM Junkai Xue <jx...@apache.org> wrote:
>>>>>
>>>>>> Interesting. Is this reproducible? We can have a try on your data.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Junkai
>>>>>>
>>>>>> On Sat, Jun 18, 2022 at 4:31 AM Grainier Perera <gr...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Junkai,
>>>>>>>
>>>>>>> I tried removing `MAX_PARTITIONS_PER_INSTANCE`. But it's still the
>>>>>>> same. What's weird is, when I add a few resources, I see some of them still
>>>>>>> not getting into the `ONLINE` state. In the below sample, you can see only
>>>>>>> the 2nd and 4th resources have proper `mapFields`, whereas the 1st and 3rd
>>>>>>> resources don't seem to have any mapping (all of them have the
>>>>>>> same IdealState). However, after a restart, this can change to 1 & 3
>>>>>>> becomes `ONLINE` and 2 & 3 may lose their mapping. But the pattern
>>>>>>> remains... cannot understand why.
>>>>>>>
>>>>>>>
>>>>>>> *ExternalView for _mm:root:_system:cron1:*{
>>>>>>>   "id" : "_mm:root:_system:cron1",
>>>>>>>   "simpleFields" : {
>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>>>>   },
>>>>>>>   *"mapFields" : { },*
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> *ExternalView for _mm:root:_system:cron2:*{
>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>   "simpleFields" : {
>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>>>>   },
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *  "mapFields" : {    "_mm:root:_system:cron2_0" : {
>>>>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> *ExternalView for _mm:root:_system:cron3:*{
>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>   "simpleFields" : {
>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>>>>   },
>>>>>>>   *"mapFields" : { },*
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> *ExternalView for _mm:root:_system:cron4:*{
>>>>>>>   "id" : "_mm:root:_system:cron4",
>>>>>>>   "simpleFields" : {
>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>>>>   },
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *  "mapFields" : {    "_mm:root:_system:cron4_0" : {
>>>>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Grainier Perera.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, 18 Jun 2022 at 13:21, Junkai Xue <ju...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Then most likely, it caused by this entry of config:
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>> Usually, we never set this config up. It restricts the assignment
>>>>>>>> for instance. So now you already have one partition from 3_0 assigned. No
>>>>>>>> other partition can be assigned.
>>>>>>>>
>>>>>>>> So either you remove this entry of config setup or add more
>>>>>>>> instances may help.
>>>>>>>>
>>>>>>>> Please let us know if you have further questions.
>>>>>>>>
>>>>>>>> best,
>>>>>>>>
>>>>>>>> Junkai
>>>>>>>>
>>>>>>>> On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <
>>>>>>>> grainier@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi Junkai,
>>>>>>>>>
>>>>>>>>> - Correct. I haven't added any rack-aware information.
>>>>>>>>> - I'm connecting 1 instance at the startup and then expanding
>>>>>>>>> on-demand (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
>>>>>>>>> - I've checked the live instances and other znodes in Zookeeper.
>>>>>>>>> Everything looks ok, except
>>>>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty
>>>>>>>>> `mapFields` while
>>>>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3 has `mapFields`
>>>>>>>>> with a ONLINE record. I still cannot understand why? and what I'm doing
>>>>>>>>> wrong :(
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *[zk: localhost:2181(CONNECTED) 18] get
>>>>>>>>> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
>>>>>>>>>   "id" : "C8CEPCluster",
>>>>>>>>>   "simpleFields" : {
>>>>>>>>>     "allowParticipantAutoJoin" : "true"
>>>>>>>>>   },
>>>>>>>>>   "mapFields" : {
>>>>>>>>>     "DEFAULT_INSTANCE_CAPACITY_MAP" : {
>>>>>>>>>       "MEMORY" : "100",
>>>>>>>>>       "CPU" : "100"
>>>>>>>>>     },
>>>>>>>>>     "DEFAULT_PARTITION_WEIGHT_MAP" : {
>>>>>>>>>       "MEMORY" : "5",
>>>>>>>>>       "CPU" : "5"
>>>>>>>>>     }
>>>>>>>>>   },
>>>>>>>>>   "listFields" : {
>>>>>>>>>     "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> *[zk: localhost:2181(CONNECTED) 8] get
>>>>>>>>> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*
>>>>>>>>> {
>>>>>>>>>   "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
>>>>>>>>>   "simpleFields" : {
>>>>>>>>>     "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
>>>>>>>>>     "HELIX_VERSION" : "1.0.4",
>>>>>>>>>     "LIVE_INSTANCE" : "1@c8cep-0.c8cep.c8.svc.cluster.local",
>>>>>>>>>     "SESSION_ID" : "106a30539a8003e"
>>>>>>>>>   },
>>>>>>>>>   "mapFields" : { },
>>>>>>>>>   "listFields" : { }
>>>>>>>>> }
>>>>>>>>> [zk: localhost:2181(CONNECTED) 26] get
>>>>>>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
>>>>>>>>> {
>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>   "simpleFields" : { },
>>>>>>>>>   "mapFields" : {
>>>>>>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>>>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>>>>>>     }
>>>>>>>>>   },
>>>>>>>>>   "listFields" : { }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> *[zk: localhost:2181(CONNECTED) 27] get
>>>>>>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>   "simpleFields" : { },
>>>>>>>>>   "mapFields" : {
>>>>>>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>>>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>>>>>>     }
>>>>>>>>>   },
>>>>>>>>>   "listFields" : { }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> *[zk: localhost:2181(CONNECTED) 38] get
>>>>>>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>   "simpleFields" : {
>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>   },
>>>>>>>>>   "mapFields" : {
>>>>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>>>>   },
>>>>>>>>>   "listFields" : {
>>>>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> *[zk: localhost:2181(CONNECTED) 39] get
>>>>>>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>   "simpleFields" : {
>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>   },
>>>>>>>>>   "mapFields" : {
>>>>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>>>>   },
>>>>>>>>>   "listFields" : {
>>>>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> *[zk: localhost:2181(CONNECTED) 42] get
>>>>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>   "simpleFields" : {
>>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>   },
>>>>>>>>>   *"mapFields" : { },*
>>>>>>>>>   "listFields" : { }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> *[zk: localhost:2181(CONNECTED) 43] get
>>>>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>   "simpleFields" : {
>>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>   },
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *"mapFields" : {    "_mm:root:_system:cron3_0" : {
>>>>>>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
>>>>>>>>>   "listFields" : { }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Thank you.
>>>>>>>>> Grainier Perera.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <jx...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> OK. So you dont put any rackaware information. Then how many
>>>>>>>>>> instances do you have connecting to that cluster? Please double check the
>>>>>>>>>> live instances in Zookeeper as well.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Junkai
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <
>>>>>>>>>> grainier@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Junkai,
>>>>>>>>>>>
>>>>>>>>>>> I've added cluster init code to the gist [1]. Apart from that,
>>>>>>>>>>> ClusterConfig is configured like this;
>>>>>>>>>>>
>>>>>>>>>>>             ClusterConfig clusterConfig =
>>>>>>>>>>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>>>>>>>>>>             // Configuring the capacity keys in the Cluster
>>>>>>>>>>> Config. For example, MEMORY.
>>>>>>>>>>>
>>>>>>>>>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>>>>>>>>>>             // Configuring the instance capacity in the Instance
>>>>>>>>>>> Config. For example, MEMORY = 100.
>>>>>>>>>>>
>>>>>>>>>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>>>>>>>>>>             // Configuring the partition weight in the Resource
>>>>>>>>>>> Config. For example, MEMORY = 5.
>>>>>>>>>>>
>>>>>>>>>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>>>>>>>>>>             configAccessor.setClusterConfig(CLUSTER_NAME,
>>>>>>>>>>> clusterConfig);
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Grainier Perera.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <jx...@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Could you please share your cluster config as well?
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>>
>>>>>>>>>>>> Junkai
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <
>>>>>>>>>>>> grainier@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Devs,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm trying to add several resources to the cluster using the
>>>>>>>>>>>>> following configurations[1]. However, only some will become `ONLINE`. What
>>>>>>>>>>>>> could be the reason? Is there a way to guarantee every resource will become
>>>>>>>>>>>>> `ONLINE` if WAGED capacity constraints are met?
>>>>>>>>>>>>>
>>>>>>>>>>>>> You can see with the same IdealState, "_mm:root:_system:cron3"
>>>>>>>>>>>>> has mapFields and it is ONLINE, and "_mm:root:_system:cron2"
>>>>>>>>>>>>> is not. Furthermore, I see this behavior more often when the replicas count
>>>>>>>>>>>>> is set to 1.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ResourceInfo:
>>>>>>>>>>>>> 1. "_mm:root:_system:cron2"
>>>>>>>>>>>>>
>>>>>>>>>>>>> IdealState for _mm:root:_system:cron2:
>>>>>>>>>>>>> {
>>>>>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   "mapFields" : {
>>>>>>>>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   "listFields" : {
>>>>>>>>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>>>>>>>>   }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ExternalView for _mm:root:_system:cron2:
>>>>>>>>>>>>> {
>>>>>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   *"mapFields" : { },*
>>>>>>>>>>>>>   "listFields" : { }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. "_mm:root:_system:cron3"
>>>>>>>>>>>>>
>>>>>>>>>>>>> IdealState for _mm:root:_system:cron3:
>>>>>>>>>>>>> {
>>>>>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   "mapFields" : {
>>>>>>>>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   "listFields" : {
>>>>>>>>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>>>>>>>>   }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ExternalView for _mm:root:_system:cron3:
>>>>>>>>>>>>> {
>>>>>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   *"mapFields" : {*
>>>>>>>>>>>>> *    "_mm:root:_system:cron3_0" : {*
>>>>>>>>>>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>>>>>>>>>>> *    }*
>>>>>>>>>>>>> *  },*
>>>>>>>>>>>>>   "listFields" : { }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]:
>>>>>>>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>> Grainier Perera.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Junkai Xue
>>>>>>>>
>>>>>>>
>>>
>>> --
>>> Junkai Xue
>>>
>>

Re: Some resources won't become online

Posted by Junkai Xue <jx...@apache.org>.
"STANDALONE" means the controller you started just for that cluster
management.

Usually, in real production to guarantee controllers' high availability, we
will create a cluster called "super cluster". Controllers join that cluster
as CONTROLLER_PARTICIPANT. It will manage controllers to decide which
controller is the leader of which real application cluster.

We will have a tutorial for that later. It should be in open source doc but
I cannot find it right now.

Best,

Junkai


On Sun, Jun 19, 2022 at 10:14 PM Grainier Perera <gr...@apache.org>
wrote:

> Hi Junkai,
>
> Thank you so much. It worked. I've set the controller mode to `STANDALONE`
> and now everything seems to be working as expected.
>
> One small question, does `STANDALONE` means it's using an embedded
> controller? And is having a `STANDALONE` controller per instance a
> good idea?
>
> Thank you,
> Grainier Perera.
>
>
> On Mon, 20 Jun 2022 at 00:08, Junkai Xue <ju...@gmail.com> wrote:
>
>> Ah. I found the problem. I would suggest you to enable this entry for
>> cluster config. "PERSIST_INTERMEDIATE_ASSIGNMENT":"true"
>>
>> It will give you how Helix assignment for FULL_AUTO in IdealState. Once
>> you enable, you will get which instance it should assign for the resource.
>> Now it is very clear that, you add you controller instance in your code
>> as a participant:
>>
>> To me, resource 4 is assigned to controller, which does not accept
>> partition bootstrap:
>>
>> {
>>   "id" : "resource4",
>>   "simpleFields" : {
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "1000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "OnlineOffline"
>>   },
>>   "mapFields" : {
>>     "resource4_0" : {
>>       "CEPControllerName-16e8ca90-df6f-4252-9ce8-3efdcce24f4a" : "ONLINE"
>>     }
>>   },
>>   "listFields" : {
>>     "resource4_0" : [
>> "CEPControllerName-16e8ca90-df6f-4252-9ce8-3efdcce24f4a" ]
>>   }
>> }
>>
>> Have a try on your side and do not make the controller as a participant
>> for that cluster.
>>
>> best,
>>
>> Junkai
>>
>> On Sat, Jun 18, 2022 at 9:49 PM Grainier Perera <gr...@apache.org>
>> wrote:
>>
>>> Hi Junkai,
>>>
>>> This is reproducible. Please find the sample code [1]. With this sample;
>>>
>>>    - Initially, I'm creating a cluster with 3 instances (Using OOTB
>>>    `OnlineOfflineStateModelFactory` and WAGED rebalancer...)
>>>    - Step 1: Adds 6 different resources to the cluster with 1 partition
>>>    and 1 replica each.
>>>    - Step 2: Adds an additional instance to the cluster.
>>>    - Step 3: Removes an existing instance from the cluster.
>>>    - Step 4: Remove all resources.
>>>
>>> However, after Step 1, you can see resource1 and resource2 is not
>>> getting assigned to any Instance.
>>> c8cep_on_localhost_12000 c8cep_on_localhost_12001
>>> c8cep_on_localhost_12002
>>> resource1 - - -
>>> resource2 ONLINE - -
>>> resource3 - ONLINE
>>> resource4 - - ONLINE
>>> resource5 - - -
>>> resource6 ONLINE - -
>>> After other steps also, not every resource is getting rebalanced
>>> properly.
>>>
>>> [1] https://gist.github.com/grainier/055511179d8b4a4f0c678f17889ed853
>>>
>>> Thanks,
>>> Grainier Perera.
>>>
>>>
>>> On Sun, 19 Jun 2022 at 08:32, Junkai Xue <jx...@apache.org> wrote:
>>>
>>>> BTW, have you setup proper capacity in InstanceConfig of the only
>>>> instance?
>>>>
>>>> Best,
>>>>
>>>> Junkai
>>>>
>>>> On Sat, Jun 18, 2022 at 7:10 PM Junkai Xue <jx...@apache.org> wrote:
>>>>
>>>>> Interesting. Is this reproducible? We can have a try on your data.
>>>>>
>>>>> Best,
>>>>>
>>>>> Junkai
>>>>>
>>>>> On Sat, Jun 18, 2022 at 4:31 AM Grainier Perera <gr...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Junkai,
>>>>>>
>>>>>> I tried removing `MAX_PARTITIONS_PER_INSTANCE`. But it's still the
>>>>>> same. What's weird is, when I add a few resources, I see some of them still
>>>>>> not getting into the `ONLINE` state. In the below sample, you can see only
>>>>>> the 2nd and 4th resources have proper `mapFields`, whereas the 1st and 3rd
>>>>>> resources don't seem to have any mapping (all of them have the
>>>>>> same IdealState). However, after a restart, this can change to 1 & 3
>>>>>> becomes `ONLINE` and 2 & 3 may lose their mapping. But the pattern
>>>>>> remains... cannot understand why.
>>>>>>
>>>>>>
>>>>>> *ExternalView for _mm:root:_system:cron1:*{
>>>>>>   "id" : "_mm:root:_system:cron1",
>>>>>>   "simpleFields" : {
>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>>>   },
>>>>>>   *"mapFields" : { },*
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> *ExternalView for _mm:root:_system:cron2:*{
>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>   "simpleFields" : {
>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>>>   },
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *  "mapFields" : {    "_mm:root:_system:cron2_0" : {
>>>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> *ExternalView for _mm:root:_system:cron3:*{
>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>   "simpleFields" : {
>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>>>   },
>>>>>>   *"mapFields" : { },*
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> *ExternalView for _mm:root:_system:cron4:*{
>>>>>>   "id" : "_mm:root:_system:cron4",
>>>>>>   "simpleFields" : {
>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>>>   },
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *  "mapFields" : {    "_mm:root:_system:cron4_0" : {
>>>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>>
>>>>>> Thanks,
>>>>>> Grainier Perera.
>>>>>>
>>>>>>
>>>>>> On Sat, 18 Jun 2022 at 13:21, Junkai Xue <ju...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Then most likely, it caused by this entry of config:
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>> Usually, we never set this config up. It restricts the assignment
>>>>>>> for instance. So now you already have one partition from 3_0 assigned. No
>>>>>>> other partition can be assigned.
>>>>>>>
>>>>>>> So either you remove this entry of config setup or add more
>>>>>>> instances may help.
>>>>>>>
>>>>>>> Please let us know if you have further questions.
>>>>>>>
>>>>>>> best,
>>>>>>>
>>>>>>> Junkai
>>>>>>>
>>>>>>> On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <
>>>>>>> grainier@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi Junkai,
>>>>>>>>
>>>>>>>> - Correct. I haven't added any rack-aware information.
>>>>>>>> - I'm connecting 1 instance at the startup and then expanding
>>>>>>>> on-demand (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
>>>>>>>> - I've checked the live instances and other znodes in Zookeeper.
>>>>>>>> Everything looks ok, except
>>>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty
>>>>>>>> `mapFields` while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3
>>>>>>>> has `mapFields` with a ONLINE record. I still cannot understand why? and
>>>>>>>> what I'm doing wrong :(
>>>>>>>>
>>>>>>>>
>>>>>>>> *[zk: localhost:2181(CONNECTED) 18] get
>>>>>>>> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
>>>>>>>>   "id" : "C8CEPCluster",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "allowParticipantAutoJoin" : "true"
>>>>>>>>   },
>>>>>>>>   "mapFields" : {
>>>>>>>>     "DEFAULT_INSTANCE_CAPACITY_MAP" : {
>>>>>>>>       "MEMORY" : "100",
>>>>>>>>       "CPU" : "100"
>>>>>>>>     },
>>>>>>>>     "DEFAULT_PARTITION_WEIGHT_MAP" : {
>>>>>>>>       "MEMORY" : "5",
>>>>>>>>       "CPU" : "5"
>>>>>>>>     }
>>>>>>>>   },
>>>>>>>>   "listFields" : {
>>>>>>>>     "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
>>>>>>>>   }
>>>>>>>> }
>>>>>>>>
>>>>>>>> *[zk: localhost:2181(CONNECTED) 8] get
>>>>>>>> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*
>>>>>>>> {
>>>>>>>>   "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
>>>>>>>>     "HELIX_VERSION" : "1.0.4",
>>>>>>>>     "LIVE_INSTANCE" : "1@c8cep-0.c8cep.c8.svc.cluster.local",
>>>>>>>>     "SESSION_ID" : "106a30539a8003e"
>>>>>>>>   },
>>>>>>>>   "mapFields" : { },
>>>>>>>>   "listFields" : { }
>>>>>>>> }
>>>>>>>> [zk: localhost:2181(CONNECTED) 26] get
>>>>>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
>>>>>>>> {
>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>   "simpleFields" : { },
>>>>>>>>   "mapFields" : {
>>>>>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>>>>>     }
>>>>>>>>   },
>>>>>>>>   "listFields" : { }
>>>>>>>> }
>>>>>>>>
>>>>>>>> *[zk: localhost:2181(CONNECTED) 27] get
>>>>>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>   "simpleFields" : { },
>>>>>>>>   "mapFields" : {
>>>>>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>>>>>     }
>>>>>>>>   },
>>>>>>>>   "listFields" : { }
>>>>>>>> }
>>>>>>>>
>>>>>>>> *[zk: localhost:2181(CONNECTED) 38] get
>>>>>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>   },
>>>>>>>>   "mapFields" : {
>>>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>>>   },
>>>>>>>>   "listFields" : {
>>>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>>>   }
>>>>>>>> }
>>>>>>>>
>>>>>>>> *[zk: localhost:2181(CONNECTED) 39] get
>>>>>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>   },
>>>>>>>>   "mapFields" : {
>>>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>>>   },
>>>>>>>>   "listFields" : {
>>>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>>>   }
>>>>>>>> }
>>>>>>>>
>>>>>>>> *[zk: localhost:2181(CONNECTED) 42] get
>>>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>   },
>>>>>>>>   *"mapFields" : { },*
>>>>>>>>   "listFields" : { }
>>>>>>>> }
>>>>>>>>
>>>>>>>> *[zk: localhost:2181(CONNECTED) 43] get
>>>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>   },
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *"mapFields" : {    "_mm:root:_system:cron3_0" : {
>>>>>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
>>>>>>>>   "listFields" : { }
>>>>>>>> }
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>> Grainier Perera.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <jx...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> OK. So you dont put any rackaware information. Then how many
>>>>>>>>> instances do you have connecting to that cluster? Please double check the
>>>>>>>>> live instances in Zookeeper as well.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Junkai
>>>>>>>>>
>>>>>>>>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <
>>>>>>>>> grainier@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Junkai,
>>>>>>>>>>
>>>>>>>>>> I've added cluster init code to the gist [1]. Apart from that,
>>>>>>>>>> ClusterConfig is configured like this;
>>>>>>>>>>
>>>>>>>>>>             ClusterConfig clusterConfig =
>>>>>>>>>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>>>>>>>>>             // Configuring the capacity keys in the Cluster
>>>>>>>>>> Config. For example, MEMORY.
>>>>>>>>>>
>>>>>>>>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>>>>>>>>>             // Configuring the instance capacity in the Instance
>>>>>>>>>> Config. For example, MEMORY = 100.
>>>>>>>>>>
>>>>>>>>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>>>>>>>>>             // Configuring the partition weight in the Resource
>>>>>>>>>> Config. For example, MEMORY = 5.
>>>>>>>>>>
>>>>>>>>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>>>>>>>>>             configAccessor.setClusterConfig(CLUSTER_NAME,
>>>>>>>>>> clusterConfig);
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Grainier Perera.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <jx...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Could you please share your cluster config as well?
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>> Junkai
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <
>>>>>>>>>>> grainier@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Devs,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm trying to add several resources to the cluster using the
>>>>>>>>>>>> following configurations[1]. However, only some will become `ONLINE`. What
>>>>>>>>>>>> could be the reason? Is there a way to guarantee every resource will become
>>>>>>>>>>>> `ONLINE` if WAGED capacity constraints are met?
>>>>>>>>>>>>
>>>>>>>>>>>> You can see with the same IdealState, "_mm:root:_system:cron3"
>>>>>>>>>>>> has mapFields and it is ONLINE, and "_mm:root:_system:cron2"
>>>>>>>>>>>> is not. Furthermore, I see this behavior more often when the replicas count
>>>>>>>>>>>> is set to 1.
>>>>>>>>>>>>
>>>>>>>>>>>> ResourceInfo:
>>>>>>>>>>>> 1. "_mm:root:_system:cron2"
>>>>>>>>>>>>
>>>>>>>>>>>> IdealState for _mm:root:_system:cron2:
>>>>>>>>>>>> {
>>>>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "mapFields" : {
>>>>>>>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "listFields" : {
>>>>>>>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>>>>>>>   }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ExternalView for _mm:root:_system:cron2:
>>>>>>>>>>>> {
>>>>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>>>   },
>>>>>>>>>>>>   *"mapFields" : { },*
>>>>>>>>>>>>   "listFields" : { }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2. "_mm:root:_system:cron3"
>>>>>>>>>>>>
>>>>>>>>>>>> IdealState for _mm:root:_system:cron3:
>>>>>>>>>>>> {
>>>>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "mapFields" : {
>>>>>>>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>>>>>>>   },
>>>>>>>>>>>>   "listFields" : {
>>>>>>>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>>>>>>>   }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ExternalView for _mm:root:_system:cron3:
>>>>>>>>>>>> {
>>>>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>>>   },
>>>>>>>>>>>>   *"mapFields" : {*
>>>>>>>>>>>> *    "_mm:root:_system:cron3_0" : {*
>>>>>>>>>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>>>>>>>>>> *    }*
>>>>>>>>>>>> *  },*
>>>>>>>>>>>>   "listFields" : { }
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1]:
>>>>>>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you.
>>>>>>>>>>>> Grainier Perera.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Junkai Xue
>>>>>>>
>>>>>>
>>
>> --
>> Junkai Xue
>>
>

Re: Some resources won't become online

Posted by Grainier Perera <gr...@apache.org>.
Hi Junkai,

Thank you so much. It worked. I've set the controller mode to `STANDALONE`
and now everything seems to be working as expected.

One small question, does `STANDALONE` means it's using an embedded
controller? And is having a `STANDALONE` controller per instance a
good idea?

Thank you,
Grainier Perera.


On Mon, 20 Jun 2022 at 00:08, Junkai Xue <ju...@gmail.com> wrote:

> Ah. I found the problem. I would suggest you to enable this entry for
> cluster config. "PERSIST_INTERMEDIATE_ASSIGNMENT":"true"
>
> It will give you how Helix assignment for FULL_AUTO in IdealState. Once
> you enable, you will get which instance it should assign for the resource.
> Now it is very clear that, you add you controller instance in your code as
> a participant:
>
> To me, resource 4 is assigned to controller, which does not accept
> partition bootstrap:
>
> {
>   "id" : "resource4",
>   "simpleFields" : {
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "1000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "OnlineOffline"
>   },
>   "mapFields" : {
>     "resource4_0" : {
>       "CEPControllerName-16e8ca90-df6f-4252-9ce8-3efdcce24f4a" : "ONLINE"
>     }
>   },
>   "listFields" : {
>     "resource4_0" : [
> "CEPControllerName-16e8ca90-df6f-4252-9ce8-3efdcce24f4a" ]
>   }
> }
>
> Have a try on your side and do not make the controller as a participant
> for that cluster.
>
> best,
>
> Junkai
>
> On Sat, Jun 18, 2022 at 9:49 PM Grainier Perera <gr...@apache.org>
> wrote:
>
>> Hi Junkai,
>>
>> This is reproducible. Please find the sample code [1]. With this sample;
>>
>>    - Initially, I'm creating a cluster with 3 instances (Using OOTB
>>    `OnlineOfflineStateModelFactory` and WAGED rebalancer...)
>>    - Step 1: Adds 6 different resources to the cluster with 1 partition
>>    and 1 replica each.
>>    - Step 2: Adds an additional instance to the cluster.
>>    - Step 3: Removes an existing instance from the cluster.
>>    - Step 4: Remove all resources.
>>
>> However, after Step 1, you can see resource1 and resource2 is not getting
>> assigned to any Instance.
>> c8cep_on_localhost_12000 c8cep_on_localhost_12001
>> c8cep_on_localhost_12002
>> resource1 - - -
>> resource2 ONLINE - -
>> resource3 - ONLINE
>> resource4 - - ONLINE
>> resource5 - - -
>> resource6 ONLINE - -
>> After other steps also, not every resource is getting rebalanced properly.
>>
>> [1] https://gist.github.com/grainier/055511179d8b4a4f0c678f17889ed853
>>
>> Thanks,
>> Grainier Perera.
>>
>>
>> On Sun, 19 Jun 2022 at 08:32, Junkai Xue <jx...@apache.org> wrote:
>>
>>> BTW, have you setup proper capacity in InstanceConfig of the only
>>> instance?
>>>
>>> Best,
>>>
>>> Junkai
>>>
>>> On Sat, Jun 18, 2022 at 7:10 PM Junkai Xue <jx...@apache.org> wrote:
>>>
>>>> Interesting. Is this reproducible? We can have a try on your data.
>>>>
>>>> Best,
>>>>
>>>> Junkai
>>>>
>>>> On Sat, Jun 18, 2022 at 4:31 AM Grainier Perera <gr...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Junkai,
>>>>>
>>>>> I tried removing `MAX_PARTITIONS_PER_INSTANCE`. But it's still the
>>>>> same. What's weird is, when I add a few resources, I see some of them still
>>>>> not getting into the `ONLINE` state. In the below sample, you can see only
>>>>> the 2nd and 4th resources have proper `mapFields`, whereas the 1st and 3rd
>>>>> resources don't seem to have any mapping (all of them have the
>>>>> same IdealState). However, after a restart, this can change to 1 & 3
>>>>> becomes `ONLINE` and 2 & 3 may lose their mapping. But the pattern
>>>>> remains... cannot understand why.
>>>>>
>>>>>
>>>>> *ExternalView for _mm:root:_system:cron1:*{
>>>>>   "id" : "_mm:root:_system:cron1",
>>>>>   "simpleFields" : {
>>>>>     "BUCKET_SIZE" : "0",
>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "NUM_PARTITIONS" : "1",
>>>>>     "REBALANCER_CLASS_NAME" :
>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>>   },
>>>>>   *"mapFields" : { },*
>>>>>   "listFields" : { }
>>>>> }
>>>>>
>>>>>
>>>>> *ExternalView for _mm:root:_system:cron2:*{
>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>   "simpleFields" : {
>>>>>     "BUCKET_SIZE" : "0",
>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "NUM_PARTITIONS" : "1",
>>>>>     "REBALANCER_CLASS_NAME" :
>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>>   },
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *  "mapFields" : {    "_mm:root:_system:cron2_0" : {
>>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>>>>   "listFields" : { }
>>>>> }
>>>>>
>>>>>
>>>>> *ExternalView for _mm:root:_system:cron3:*{
>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>   "simpleFields" : {
>>>>>     "BUCKET_SIZE" : "0",
>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "NUM_PARTITIONS" : "1",
>>>>>     "REBALANCER_CLASS_NAME" :
>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>>   },
>>>>>   *"mapFields" : { },*
>>>>>   "listFields" : { }
>>>>> }
>>>>>
>>>>>
>>>>> *ExternalView for _mm:root:_system:cron4:*{
>>>>>   "id" : "_mm:root:_system:cron4",
>>>>>   "simpleFields" : {
>>>>>     "BUCKET_SIZE" : "0",
>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "NUM_PARTITIONS" : "1",
>>>>>     "REBALANCER_CLASS_NAME" :
>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>>   },
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *  "mapFields" : {    "_mm:root:_system:cron4_0" : {
>>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>>>>   "listFields" : { }
>>>>> }
>>>>>
>>>>> Thanks,
>>>>> Grainier Perera.
>>>>>
>>>>>
>>>>> On Sat, 18 Jun 2022 at 13:21, Junkai Xue <ju...@gmail.com> wrote:
>>>>>
>>>>>> Then most likely, it caused by this entry of config:
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>> Usually, we never set this config up. It restricts the assignment for
>>>>>> instance. So now you already have one partition from 3_0 assigned. No other
>>>>>> partition can be assigned.
>>>>>>
>>>>>> So either you remove this entry of config setup or add more instances
>>>>>> may help.
>>>>>>
>>>>>> Please let us know if you have further questions.
>>>>>>
>>>>>> best,
>>>>>>
>>>>>> Junkai
>>>>>>
>>>>>> On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <gr...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Junkai,
>>>>>>>
>>>>>>> - Correct. I haven't added any rack-aware information.
>>>>>>> - I'm connecting 1 instance at the startup and then expanding
>>>>>>> on-demand (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
>>>>>>> - I've checked the live instances and other znodes in Zookeeper.
>>>>>>> Everything looks ok, except
>>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty
>>>>>>> `mapFields` while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3
>>>>>>> has `mapFields` with a ONLINE record. I still cannot understand why? and
>>>>>>> what I'm doing wrong :(
>>>>>>>
>>>>>>>
>>>>>>> *[zk: localhost:2181(CONNECTED) 18] get
>>>>>>> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
>>>>>>>   "id" : "C8CEPCluster",
>>>>>>>   "simpleFields" : {
>>>>>>>     "allowParticipantAutoJoin" : "true"
>>>>>>>   },
>>>>>>>   "mapFields" : {
>>>>>>>     "DEFAULT_INSTANCE_CAPACITY_MAP" : {
>>>>>>>       "MEMORY" : "100",
>>>>>>>       "CPU" : "100"
>>>>>>>     },
>>>>>>>     "DEFAULT_PARTITION_WEIGHT_MAP" : {
>>>>>>>       "MEMORY" : "5",
>>>>>>>       "CPU" : "5"
>>>>>>>     }
>>>>>>>   },
>>>>>>>   "listFields" : {
>>>>>>>     "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> *[zk: localhost:2181(CONNECTED) 8] get
>>>>>>> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*
>>>>>>> {
>>>>>>>   "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
>>>>>>>   "simpleFields" : {
>>>>>>>     "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
>>>>>>>     "HELIX_VERSION" : "1.0.4",
>>>>>>>     "LIVE_INSTANCE" : "1@c8cep-0.c8cep.c8.svc.cluster.local",
>>>>>>>     "SESSION_ID" : "106a30539a8003e"
>>>>>>>   },
>>>>>>>   "mapFields" : { },
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>> [zk: localhost:2181(CONNECTED) 26] get
>>>>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
>>>>>>> {
>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>   "simpleFields" : { },
>>>>>>>   "mapFields" : {
>>>>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>>>>     }
>>>>>>>   },
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>>
>>>>>>> *[zk: localhost:2181(CONNECTED) 27] get
>>>>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>   "simpleFields" : { },
>>>>>>>   "mapFields" : {
>>>>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>>>>     }
>>>>>>>   },
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>>
>>>>>>> *[zk: localhost:2181(CONNECTED) 38] get
>>>>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>   "simpleFields" : {
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>   },
>>>>>>>   "mapFields" : {
>>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>>   },
>>>>>>>   "listFields" : {
>>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> *[zk: localhost:2181(CONNECTED) 39] get
>>>>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>   "simpleFields" : {
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>   },
>>>>>>>   "mapFields" : {
>>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>>   },
>>>>>>>   "listFields" : {
>>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> *[zk: localhost:2181(CONNECTED) 42] get
>>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>   "simpleFields" : {
>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>   },
>>>>>>>   *"mapFields" : { },*
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>>
>>>>>>> *[zk: localhost:2181(CONNECTED) 43] get
>>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>   "simpleFields" : {
>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>   },
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *"mapFields" : {    "_mm:root:_system:cron3_0" : {
>>>>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>>
>>>>>>> Thank you.
>>>>>>> Grainier Perera.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <jx...@apache.org> wrote:
>>>>>>>
>>>>>>>> OK. So you dont put any rackaware information. Then how many
>>>>>>>> instances do you have connecting to that cluster? Please double check the
>>>>>>>> live instances in Zookeeper as well.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Junkai
>>>>>>>>
>>>>>>>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <
>>>>>>>> grainier@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi Junkai,
>>>>>>>>>
>>>>>>>>> I've added cluster init code to the gist [1]. Apart from that,
>>>>>>>>> ClusterConfig is configured like this;
>>>>>>>>>
>>>>>>>>>             ClusterConfig clusterConfig =
>>>>>>>>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>>>>>>>>             // Configuring the capacity keys in the Cluster
>>>>>>>>> Config. For example, MEMORY.
>>>>>>>>>
>>>>>>>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>>>>>>>>             // Configuring the instance capacity in the Instance
>>>>>>>>> Config. For example, MEMORY = 100.
>>>>>>>>>
>>>>>>>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>>>>>>>>             // Configuring the partition weight in the Resource
>>>>>>>>> Config. For example, MEMORY = 5.
>>>>>>>>>
>>>>>>>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>>>>>>>>             configAccessor.setClusterConfig(CLUSTER_NAME,
>>>>>>>>> clusterConfig);
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Grainier Perera.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <jx...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Could you please share your cluster config as well?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>>
>>>>>>>>>> Junkai
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <
>>>>>>>>>> grainier@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Devs,
>>>>>>>>>>>
>>>>>>>>>>> I'm trying to add several resources to the cluster using the
>>>>>>>>>>> following configurations[1]. However, only some will become `ONLINE`. What
>>>>>>>>>>> could be the reason? Is there a way to guarantee every resource will become
>>>>>>>>>>> `ONLINE` if WAGED capacity constraints are met?
>>>>>>>>>>>
>>>>>>>>>>> You can see with the same IdealState, "_mm:root:_system:cron3"
>>>>>>>>>>> has mapFields and it is ONLINE, and "_mm:root:_system:cron2" is
>>>>>>>>>>> not. Furthermore, I see this behavior more often when the replicas count is
>>>>>>>>>>> set to 1.
>>>>>>>>>>>
>>>>>>>>>>> ResourceInfo:
>>>>>>>>>>> 1. "_mm:root:_system:cron2"
>>>>>>>>>>>
>>>>>>>>>>> IdealState for _mm:root:_system:cron2:
>>>>>>>>>>> {
>>>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>>   },
>>>>>>>>>>>   "mapFields" : {
>>>>>>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>>>>>>   },
>>>>>>>>>>>   "listFields" : {
>>>>>>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>>>>>>   }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ExternalView for _mm:root:_system:cron2:
>>>>>>>>>>> {
>>>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>>   },
>>>>>>>>>>>   *"mapFields" : { },*
>>>>>>>>>>>   "listFields" : { }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2. "_mm:root:_system:cron3"
>>>>>>>>>>>
>>>>>>>>>>> IdealState for _mm:root:_system:cron3:
>>>>>>>>>>> {
>>>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>>   },
>>>>>>>>>>>   "mapFields" : {
>>>>>>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>>>>>>   },
>>>>>>>>>>>   "listFields" : {
>>>>>>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>>>>>>   }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ExternalView for _mm:root:_system:cron3:
>>>>>>>>>>> {
>>>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>>   },
>>>>>>>>>>>   *"mapFields" : {*
>>>>>>>>>>> *    "_mm:root:_system:cron3_0" : {*
>>>>>>>>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>>>>>>>>> *    }*
>>>>>>>>>>> *  },*
>>>>>>>>>>>   "listFields" : { }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [1]:
>>>>>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>>>>>>>>
>>>>>>>>>>> Thank you.
>>>>>>>>>>> Grainier Perera.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Junkai Xue
>>>>>>
>>>>>
>
> --
> Junkai Xue
>

Re: Some resources won't become online

Posted by Junkai Xue <ju...@gmail.com>.
Ah. I found the problem. I would suggest you to enable this entry for
cluster config. "PERSIST_INTERMEDIATE_ASSIGNMENT":"true"

It will give you how Helix assignment for FULL_AUTO in IdealState. Once you
enable, you will get which instance it should assign for the resource.
Now it is very clear that, you add you controller instance in your code as
a participant:

To me, resource 4 is assigned to controller, which does not accept
partition bootstrap:

{
  "id" : "resource4",
  "simpleFields" : {
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "1000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "OnlineOffline"
  },
  "mapFields" : {
    "resource4_0" : {
      "CEPControllerName-16e8ca90-df6f-4252-9ce8-3efdcce24f4a" : "ONLINE"
    }
  },
  "listFields" : {
    "resource4_0" : [
"CEPControllerName-16e8ca90-df6f-4252-9ce8-3efdcce24f4a" ]
  }
}

Have a try on your side and do not make the controller as a participant for
that cluster.

best,

Junkai

On Sat, Jun 18, 2022 at 9:49 PM Grainier Perera <gr...@apache.org> wrote:

> Hi Junkai,
>
> This is reproducible. Please find the sample code [1]. With this sample;
>
>    - Initially, I'm creating a cluster with 3 instances (Using OOTB
>    `OnlineOfflineStateModelFactory` and WAGED rebalancer...)
>    - Step 1: Adds 6 different resources to the cluster with 1 partition
>    and 1 replica each.
>    - Step 2: Adds an additional instance to the cluster.
>    - Step 3: Removes an existing instance from the cluster.
>    - Step 4: Remove all resources.
>
> However, after Step 1, you can see resource1 and resource2 is not getting
> assigned to any Instance.
> c8cep_on_localhost_12000 c8cep_on_localhost_12001 c8cep_on_localhost_12002
> resource1 - - -
> resource2 ONLINE - -
> resource3 - ONLINE
> resource4 - - ONLINE
> resource5 - - -
> resource6 ONLINE - -
> After other steps also, not every resource is getting rebalanced properly.
>
> [1] https://gist.github.com/grainier/055511179d8b4a4f0c678f17889ed853
>
> Thanks,
> Grainier Perera.
>
>
> On Sun, 19 Jun 2022 at 08:32, Junkai Xue <jx...@apache.org> wrote:
>
>> BTW, have you setup proper capacity in InstanceConfig of the only
>> instance?
>>
>> Best,
>>
>> Junkai
>>
>> On Sat, Jun 18, 2022 at 7:10 PM Junkai Xue <jx...@apache.org> wrote:
>>
>>> Interesting. Is this reproducible? We can have a try on your data.
>>>
>>> Best,
>>>
>>> Junkai
>>>
>>> On Sat, Jun 18, 2022 at 4:31 AM Grainier Perera <gr...@apache.org>
>>> wrote:
>>>
>>>> Hi Junkai,
>>>>
>>>> I tried removing `MAX_PARTITIONS_PER_INSTANCE`. But it's still the
>>>> same. What's weird is, when I add a few resources, I see some of them still
>>>> not getting into the `ONLINE` state. In the below sample, you can see only
>>>> the 2nd and 4th resources have proper `mapFields`, whereas the 1st and 3rd
>>>> resources don't seem to have any mapping (all of them have the
>>>> same IdealState). However, after a restart, this can change to 1 & 3
>>>> becomes `ONLINE` and 2 & 3 may lose their mapping. But the pattern
>>>> remains... cannot understand why.
>>>>
>>>>
>>>> *ExternalView for _mm:root:_system:cron1:*{
>>>>   "id" : "_mm:root:_system:cron1",
>>>>   "simpleFields" : {
>>>>     "BUCKET_SIZE" : "0",
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>   },
>>>>   *"mapFields" : { },*
>>>>   "listFields" : { }
>>>> }
>>>>
>>>>
>>>> *ExternalView for _mm:root:_system:cron2:*{
>>>>   "id" : "_mm:root:_system:cron2",
>>>>   "simpleFields" : {
>>>>     "BUCKET_SIZE" : "0",
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>   },
>>>>
>>>>
>>>>
>>>>
>>>> *  "mapFields" : {    "_mm:root:_system:cron2_0" : {
>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>>>   "listFields" : { }
>>>> }
>>>>
>>>>
>>>> *ExternalView for _mm:root:_system:cron3:*{
>>>>   "id" : "_mm:root:_system:cron3",
>>>>   "simpleFields" : {
>>>>     "BUCKET_SIZE" : "0",
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>   },
>>>>   *"mapFields" : { },*
>>>>   "listFields" : { }
>>>> }
>>>>
>>>>
>>>> *ExternalView for _mm:root:_system:cron4:*{
>>>>   "id" : "_mm:root:_system:cron4",
>>>>   "simpleFields" : {
>>>>     "BUCKET_SIZE" : "0",
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>>   },
>>>>
>>>>
>>>>
>>>>
>>>> *  "mapFields" : {    "_mm:root:_system:cron4_0" : {
>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>>>   "listFields" : { }
>>>> }
>>>>
>>>> Thanks,
>>>> Grainier Perera.
>>>>
>>>>
>>>> On Sat, 18 Jun 2022 at 13:21, Junkai Xue <ju...@gmail.com> wrote:
>>>>
>>>>> Then most likely, it caused by this entry of config:
>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>> Usually, we never set this config up. It restricts the assignment for
>>>>> instance. So now you already have one partition from 3_0 assigned. No other
>>>>> partition can be assigned.
>>>>>
>>>>> So either you remove this entry of config setup or add more instances
>>>>> may help.
>>>>>
>>>>> Please let us know if you have further questions.
>>>>>
>>>>> best,
>>>>>
>>>>> Junkai
>>>>>
>>>>> On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <gr...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Junkai,
>>>>>>
>>>>>> - Correct. I haven't added any rack-aware information.
>>>>>> - I'm connecting 1 instance at the startup and then expanding
>>>>>> on-demand (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
>>>>>> - I've checked the live instances and other znodes in Zookeeper.
>>>>>> Everything looks ok, except
>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty
>>>>>> `mapFields` while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3
>>>>>> has `mapFields` with a ONLINE record. I still cannot understand why? and
>>>>>> what I'm doing wrong :(
>>>>>>
>>>>>>
>>>>>> *[zk: localhost:2181(CONNECTED) 18] get
>>>>>> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
>>>>>>   "id" : "C8CEPCluster",
>>>>>>   "simpleFields" : {
>>>>>>     "allowParticipantAutoJoin" : "true"
>>>>>>   },
>>>>>>   "mapFields" : {
>>>>>>     "DEFAULT_INSTANCE_CAPACITY_MAP" : {
>>>>>>       "MEMORY" : "100",
>>>>>>       "CPU" : "100"
>>>>>>     },
>>>>>>     "DEFAULT_PARTITION_WEIGHT_MAP" : {
>>>>>>       "MEMORY" : "5",
>>>>>>       "CPU" : "5"
>>>>>>     }
>>>>>>   },
>>>>>>   "listFields" : {
>>>>>>     "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> *[zk: localhost:2181(CONNECTED) 8] get
>>>>>> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*
>>>>>> {
>>>>>>   "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
>>>>>>   "simpleFields" : {
>>>>>>     "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
>>>>>>     "HELIX_VERSION" : "1.0.4",
>>>>>>     "LIVE_INSTANCE" : "1@c8cep-0.c8cep.c8.svc.cluster.local",
>>>>>>     "SESSION_ID" : "106a30539a8003e"
>>>>>>   },
>>>>>>   "mapFields" : { },
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>> [zk: localhost:2181(CONNECTED) 26] get
>>>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
>>>>>> {
>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>   "simpleFields" : { },
>>>>>>   "mapFields" : {
>>>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>>>     }
>>>>>>   },
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>>
>>>>>> *[zk: localhost:2181(CONNECTED) 27] get
>>>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>   "simpleFields" : { },
>>>>>>   "mapFields" : {
>>>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>>>     }
>>>>>>   },
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>>
>>>>>> *[zk: localhost:2181(CONNECTED) 38] get
>>>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>   "simpleFields" : {
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>   },
>>>>>>   "mapFields" : {
>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>   },
>>>>>>   "listFields" : {
>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> *[zk: localhost:2181(CONNECTED) 39] get
>>>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>   "simpleFields" : {
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>   },
>>>>>>   "mapFields" : {
>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>   },
>>>>>>   "listFields" : {
>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> *[zk: localhost:2181(CONNECTED) 42] get
>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>   "simpleFields" : {
>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>   },
>>>>>>   *"mapFields" : { },*
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>>
>>>>>> *[zk: localhost:2181(CONNECTED) 43] get
>>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>   "simpleFields" : {
>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>   },
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *"mapFields" : {    "_mm:root:_system:cron3_0" : {
>>>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>>
>>>>>> Thank you.
>>>>>> Grainier Perera.
>>>>>>
>>>>>>
>>>>>> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <jx...@apache.org> wrote:
>>>>>>
>>>>>>> OK. So you dont put any rackaware information. Then how many
>>>>>>> instances do you have connecting to that cluster? Please double check the
>>>>>>> live instances in Zookeeper as well.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Junkai
>>>>>>>
>>>>>>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <
>>>>>>> grainier@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi Junkai,
>>>>>>>>
>>>>>>>> I've added cluster init code to the gist [1]. Apart from that,
>>>>>>>> ClusterConfig is configured like this;
>>>>>>>>
>>>>>>>>             ClusterConfig clusterConfig =
>>>>>>>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>>>>>>>             // Configuring the capacity keys in the Cluster Config.
>>>>>>>> For example, MEMORY.
>>>>>>>>
>>>>>>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>>>>>>>             // Configuring the instance capacity in the Instance
>>>>>>>> Config. For example, MEMORY = 100.
>>>>>>>>
>>>>>>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>>>>>>>             // Configuring the partition weight in the Resource
>>>>>>>> Config. For example, MEMORY = 5.
>>>>>>>>
>>>>>>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>>>>>>>             configAccessor.setClusterConfig(CLUSTER_NAME,
>>>>>>>> clusterConfig);
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Grainier Perera.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <jx...@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Could you please share your cluster config as well?
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Junkai
>>>>>>>>>
>>>>>>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <
>>>>>>>>> grainier@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Devs,
>>>>>>>>>>
>>>>>>>>>> I'm trying to add several resources to the cluster using the
>>>>>>>>>> following configurations[1]. However, only some will become `ONLINE`. What
>>>>>>>>>> could be the reason? Is there a way to guarantee every resource will become
>>>>>>>>>> `ONLINE` if WAGED capacity constraints are met?
>>>>>>>>>>
>>>>>>>>>> You can see with the same IdealState, "_mm:root:_system:cron3"
>>>>>>>>>> has mapFields and it is ONLINE, and "_mm:root:_system:cron2" is
>>>>>>>>>> not. Furthermore, I see this behavior more often when the replicas count is
>>>>>>>>>> set to 1.
>>>>>>>>>>
>>>>>>>>>> ResourceInfo:
>>>>>>>>>> 1. "_mm:root:_system:cron2"
>>>>>>>>>>
>>>>>>>>>> IdealState for _mm:root:_system:cron2:
>>>>>>>>>> {
>>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>   },
>>>>>>>>>>   "mapFields" : {
>>>>>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>>>>>   },
>>>>>>>>>>   "listFields" : {
>>>>>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>>>>>   }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ExternalView for _mm:root:_system:cron2:
>>>>>>>>>> {
>>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>   },
>>>>>>>>>>   *"mapFields" : { },*
>>>>>>>>>>   "listFields" : { }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2. "_mm:root:_system:cron3"
>>>>>>>>>>
>>>>>>>>>> IdealState for _mm:root:_system:cron3:
>>>>>>>>>> {
>>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>   },
>>>>>>>>>>   "mapFields" : {
>>>>>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>>>>>   },
>>>>>>>>>>   "listFields" : {
>>>>>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>>>>>   }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ExternalView for _mm:root:_system:cron3:
>>>>>>>>>> {
>>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>>   "simpleFields" : {
>>>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>>   },
>>>>>>>>>>   *"mapFields" : {*
>>>>>>>>>> *    "_mm:root:_system:cron3_0" : {*
>>>>>>>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>>>>>>>> *    }*
>>>>>>>>>> *  },*
>>>>>>>>>>   "listFields" : { }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1]:
>>>>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>>>>>>>
>>>>>>>>>> Thank you.
>>>>>>>>>> Grainier Perera.
>>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>> Junkai Xue
>>>>>
>>>>

-- 
Junkai Xue

Re: Some resources won't become online

Posted by Grainier Perera <gr...@apache.org>.
Hi Junkai,

This is reproducible. Please find the sample code [1]. With this sample;

   - Initially, I'm creating a cluster with 3 instances (Using OOTB
   `OnlineOfflineStateModelFactory` and WAGED rebalancer...)
   - Step 1: Adds 6 different resources to the cluster with 1 partition and
   1 replica each.
   - Step 2: Adds an additional instance to the cluster.
   - Step 3: Removes an existing instance from the cluster.
   - Step 4: Remove all resources.

However, after Step 1, you can see resource1 and resource2 is not getting
assigned to any Instance.
c8cep_on_localhost_12000 c8cep_on_localhost_12001 c8cep_on_localhost_12002
resource1 - - -
resource2 ONLINE - -
resource3 - ONLINE
resource4 - - ONLINE
resource5 - - -
resource6 ONLINE - -
After other steps also, not every resource is getting rebalanced properly.

[1] https://gist.github.com/grainier/055511179d8b4a4f0c678f17889ed853

Thanks,
Grainier Perera.


On Sun, 19 Jun 2022 at 08:32, Junkai Xue <jx...@apache.org> wrote:

> BTW, have you setup proper capacity in InstanceConfig of the only instance?
>
> Best,
>
> Junkai
>
> On Sat, Jun 18, 2022 at 7:10 PM Junkai Xue <jx...@apache.org> wrote:
>
>> Interesting. Is this reproducible? We can have a try on your data.
>>
>> Best,
>>
>> Junkai
>>
>> On Sat, Jun 18, 2022 at 4:31 AM Grainier Perera <gr...@apache.org>
>> wrote:
>>
>>> Hi Junkai,
>>>
>>> I tried removing `MAX_PARTITIONS_PER_INSTANCE`. But it's still the
>>> same. What's weird is, when I add a few resources, I see some of them still
>>> not getting into the `ONLINE` state. In the below sample, you can see only
>>> the 2nd and 4th resources have proper `mapFields`, whereas the 1st and 3rd
>>> resources don't seem to have any mapping (all of them have the
>>> same IdealState). However, after a restart, this can change to 1 & 3
>>> becomes `ONLINE` and 2 & 3 may lose their mapping. But the pattern
>>> remains... cannot understand why.
>>>
>>>
>>> *ExternalView for _mm:root:_system:cron1:*{
>>>   "id" : "_mm:root:_system:cron1",
>>>   "simpleFields" : {
>>>     "BUCKET_SIZE" : "0",
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>   },
>>>   *"mapFields" : { },*
>>>   "listFields" : { }
>>> }
>>>
>>>
>>> *ExternalView for _mm:root:_system:cron2:*{
>>>   "id" : "_mm:root:_system:cron2",
>>>   "simpleFields" : {
>>>     "BUCKET_SIZE" : "0",
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>   },
>>>
>>>
>>>
>>>
>>> *  "mapFields" : {    "_mm:root:_system:cron2_0" : {
>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>>   "listFields" : { }
>>> }
>>>
>>>
>>> *ExternalView for _mm:root:_system:cron3:*{
>>>   "id" : "_mm:root:_system:cron3",
>>>   "simpleFields" : {
>>>     "BUCKET_SIZE" : "0",
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>   },
>>>   *"mapFields" : { },*
>>>   "listFields" : { }
>>> }
>>>
>>>
>>> *ExternalView for _mm:root:_system:cron4:*{
>>>   "id" : "_mm:root:_system:cron4",
>>>   "simpleFields" : {
>>>     "BUCKET_SIZE" : "0",
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>>   },
>>>
>>>
>>>
>>>
>>> *  "mapFields" : {    "_mm:root:_system:cron4_0" : {
>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>>   "listFields" : { }
>>> }
>>>
>>> Thanks,
>>> Grainier Perera.
>>>
>>>
>>> On Sat, 18 Jun 2022 at 13:21, Junkai Xue <ju...@gmail.com> wrote:
>>>
>>>> Then most likely, it caused by this entry of config:
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>> Usually, we never set this config up. It restricts the assignment for
>>>> instance. So now you already have one partition from 3_0 assigned. No other
>>>> partition can be assigned.
>>>>
>>>> So either you remove this entry of config setup or add more instances
>>>> may help.
>>>>
>>>> Please let us know if you have further questions.
>>>>
>>>> best,
>>>>
>>>> Junkai
>>>>
>>>> On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <gr...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Junkai,
>>>>>
>>>>> - Correct. I haven't added any rack-aware information.
>>>>> - I'm connecting 1 instance at the startup and then expanding
>>>>> on-demand (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
>>>>> - I've checked the live instances and other znodes in Zookeeper.
>>>>> Everything looks ok, except
>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty
>>>>> `mapFields` while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3
>>>>> has `mapFields` with a ONLINE record. I still cannot understand why? and
>>>>> what I'm doing wrong :(
>>>>>
>>>>>
>>>>> *[zk: localhost:2181(CONNECTED) 18] get
>>>>> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
>>>>>   "id" : "C8CEPCluster",
>>>>>   "simpleFields" : {
>>>>>     "allowParticipantAutoJoin" : "true"
>>>>>   },
>>>>>   "mapFields" : {
>>>>>     "DEFAULT_INSTANCE_CAPACITY_MAP" : {
>>>>>       "MEMORY" : "100",
>>>>>       "CPU" : "100"
>>>>>     },
>>>>>     "DEFAULT_PARTITION_WEIGHT_MAP" : {
>>>>>       "MEMORY" : "5",
>>>>>       "CPU" : "5"
>>>>>     }
>>>>>   },
>>>>>   "listFields" : {
>>>>>     "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
>>>>>   }
>>>>> }
>>>>>
>>>>> *[zk: localhost:2181(CONNECTED) 8] get
>>>>> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*{
>>>>>   "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
>>>>>   "simpleFields" : {
>>>>>     "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
>>>>>     "HELIX_VERSION" : "1.0.4",
>>>>>     "LIVE_INSTANCE" : "1@c8cep-0.c8cep.c8.svc.cluster.local",
>>>>>     "SESSION_ID" : "106a30539a8003e"
>>>>>   },
>>>>>   "mapFields" : { },
>>>>>   "listFields" : { }
>>>>> }
>>>>> [zk: localhost:2181(CONNECTED) 26] get
>>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
>>>>> {
>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>   "simpleFields" : { },
>>>>>   "mapFields" : {
>>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>>     }
>>>>>   },
>>>>>   "listFields" : { }
>>>>> }
>>>>>
>>>>> *[zk: localhost:2181(CONNECTED) 27] get
>>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>   "simpleFields" : { },
>>>>>   "mapFields" : {
>>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>>     }
>>>>>   },
>>>>>   "listFields" : { }
>>>>> }
>>>>>
>>>>> *[zk: localhost:2181(CONNECTED) 38] get
>>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>   "simpleFields" : {
>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>     "NUM_PARTITIONS" : "1",
>>>>>     "REBALANCER_CLASS_NAME" :
>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>   },
>>>>>   "mapFields" : {
>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>   },
>>>>>   "listFields" : {
>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>   }
>>>>> }
>>>>>
>>>>> *[zk: localhost:2181(CONNECTED) 39] get
>>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>   "simpleFields" : {
>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>     "NUM_PARTITIONS" : "1",
>>>>>     "REBALANCER_CLASS_NAME" :
>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>   },
>>>>>   "mapFields" : {
>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>   },
>>>>>   "listFields" : {
>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>   }
>>>>> }
>>>>>
>>>>> *[zk: localhost:2181(CONNECTED) 42] get
>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>   "simpleFields" : {
>>>>>     "BUCKET_SIZE" : "0",
>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>     "NUM_PARTITIONS" : "1",
>>>>>     "REBALANCER_CLASS_NAME" :
>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>   },
>>>>>   *"mapFields" : { },*
>>>>>   "listFields" : { }
>>>>> }
>>>>>
>>>>> *[zk: localhost:2181(CONNECTED) 43] get
>>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>   "simpleFields" : {
>>>>>     "BUCKET_SIZE" : "0",
>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>     "NUM_PARTITIONS" : "1",
>>>>>     "REBALANCER_CLASS_NAME" :
>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>   },
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *"mapFields" : {    "_mm:root:_system:cron3_0" : {
>>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
>>>>>   "listFields" : { }
>>>>> }
>>>>>
>>>>> Thank you.
>>>>> Grainier Perera.
>>>>>
>>>>>
>>>>> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <jx...@apache.org> wrote:
>>>>>
>>>>>> OK. So you dont put any rackaware information. Then how many
>>>>>> instances do you have connecting to that cluster? Please double check the
>>>>>> live instances in Zookeeper as well.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Junkai
>>>>>>
>>>>>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <gr...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Junkai,
>>>>>>>
>>>>>>> I've added cluster init code to the gist [1]. Apart from that,
>>>>>>> ClusterConfig is configured like this;
>>>>>>>
>>>>>>>             ClusterConfig clusterConfig =
>>>>>>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>>>>>>             // Configuring the capacity keys in the Cluster Config.
>>>>>>> For example, MEMORY.
>>>>>>>
>>>>>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>>>>>>             // Configuring the instance capacity in the Instance
>>>>>>> Config. For example, MEMORY = 100.
>>>>>>>
>>>>>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>>>>>>             // Configuring the partition weight in the Resource
>>>>>>> Config. For example, MEMORY = 5.
>>>>>>>
>>>>>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>>>>>>             configAccessor.setClusterConfig(CLUSTER_NAME,
>>>>>>> clusterConfig);
>>>>>>>
>>>>>>> [1]
>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Grainier Perera.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <jx...@apache.org> wrote:
>>>>>>>
>>>>>>>> Could you please share your cluster config as well?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Junkai
>>>>>>>>
>>>>>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <
>>>>>>>> grainier@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi Devs,
>>>>>>>>>
>>>>>>>>> I'm trying to add several resources to the cluster using the
>>>>>>>>> following configurations[1]. However, only some will become `ONLINE`. What
>>>>>>>>> could be the reason? Is there a way to guarantee every resource will become
>>>>>>>>> `ONLINE` if WAGED capacity constraints are met?
>>>>>>>>>
>>>>>>>>> You can see with the same IdealState, "_mm:root:_system:cron3" has
>>>>>>>>> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not.
>>>>>>>>> Furthermore, I see this behavior more often when the replicas count is set
>>>>>>>>> to 1.
>>>>>>>>>
>>>>>>>>> ResourceInfo:
>>>>>>>>> 1. "_mm:root:_system:cron2"
>>>>>>>>>
>>>>>>>>> IdealState for _mm:root:_system:cron2:
>>>>>>>>> {
>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>   "simpleFields" : {
>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>   },
>>>>>>>>>   "mapFields" : {
>>>>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>>>>   },
>>>>>>>>>   "listFields" : {
>>>>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ExternalView for _mm:root:_system:cron2:
>>>>>>>>> {
>>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>>   "simpleFields" : {
>>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>   },
>>>>>>>>>   *"mapFields" : { },*
>>>>>>>>>   "listFields" : { }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2. "_mm:root:_system:cron3"
>>>>>>>>>
>>>>>>>>> IdealState for _mm:root:_system:cron3:
>>>>>>>>> {
>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>   "simpleFields" : {
>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>   },
>>>>>>>>>   "mapFields" : {
>>>>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>>>>   },
>>>>>>>>>   "listFields" : {
>>>>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>>>>   }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ExternalView for _mm:root:_system:cron3:
>>>>>>>>> {
>>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>>   "simpleFields" : {
>>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>>   },
>>>>>>>>>   *"mapFields" : {*
>>>>>>>>> *    "_mm:root:_system:cron3_0" : {*
>>>>>>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>>>>>>> *    }*
>>>>>>>>> *  },*
>>>>>>>>>   "listFields" : { }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1]:
>>>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>>>>>>
>>>>>>>>> Thank you.
>>>>>>>>> Grainier Perera.
>>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> Junkai Xue
>>>>
>>>

Re: Some resources won't become online

Posted by Junkai Xue <jx...@apache.org>.
BTW, have you setup proper capacity in InstanceConfig of the only instance?

Best,

Junkai

On Sat, Jun 18, 2022 at 7:10 PM Junkai Xue <jx...@apache.org> wrote:

> Interesting. Is this reproducible? We can have a try on your data.
>
> Best,
>
> Junkai
>
> On Sat, Jun 18, 2022 at 4:31 AM Grainier Perera <gr...@apache.org>
> wrote:
>
>> Hi Junkai,
>>
>> I tried removing `MAX_PARTITIONS_PER_INSTANCE`. But it's still the same.
>> What's weird is, when I add a few resources, I see some of them still not
>> getting into the `ONLINE` state. In the below sample, you can see only the
>> 2nd and 4th resources have proper `mapFields`, whereas the 1st and 3rd
>> resources don't seem to have any mapping (all of them have the
>> same IdealState). However, after a restart, this can change to 1 & 3
>> becomes `ONLINE` and 2 & 3 may lose their mapping. But the pattern
>> remains... cannot understand why.
>>
>>
>> *ExternalView for _mm:root:_system:cron1:*{
>>   "id" : "_mm:root:_system:cron1",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>   },
>>   *"mapFields" : { },*
>>   "listFields" : { }
>> }
>>
>>
>> *ExternalView for _mm:root:_system:cron2:*{
>>   "id" : "_mm:root:_system:cron2",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>   },
>>
>>
>>
>>
>> *  "mapFields" : {    "_mm:root:_system:cron2_0" : {
>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>   "listFields" : { }
>> }
>>
>>
>> *ExternalView for _mm:root:_system:cron3:*{
>>   "id" : "_mm:root:_system:cron3",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>   },
>>   *"mapFields" : { },*
>>   "listFields" : { }
>> }
>>
>>
>> *ExternalView for _mm:root:_system:cron4:*{
>>   "id" : "_mm:root:_system:cron4",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>>   },
>>
>>
>>
>>
>> *  "mapFields" : {    "_mm:root:_system:cron4_0" : {
>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>>   "listFields" : { }
>> }
>>
>> Thanks,
>> Grainier Perera.
>>
>>
>> On Sat, 18 Jun 2022 at 13:21, Junkai Xue <ju...@gmail.com> wrote:
>>
>>> Then most likely, it caused by this entry of config:
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>> Usually, we never set this config up. It restricts the assignment for
>>> instance. So now you already have one partition from 3_0 assigned. No other
>>> partition can be assigned.
>>>
>>> So either you remove this entry of config setup or add more instances
>>> may help.
>>>
>>> Please let us know if you have further questions.
>>>
>>> best,
>>>
>>> Junkai
>>>
>>> On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <gr...@apache.org>
>>> wrote:
>>>
>>>> Hi Junkai,
>>>>
>>>> - Correct. I haven't added any rack-aware information.
>>>> - I'm connecting 1 instance at the startup and then expanding on-demand
>>>> (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
>>>> - I've checked the live instances and other znodes in Zookeeper.
>>>> Everything looks ok, except
>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty
>>>> `mapFields` while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3
>>>> has `mapFields` with a ONLINE record. I still cannot understand why? and
>>>> what I'm doing wrong :(
>>>>
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 18] get
>>>> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
>>>>   "id" : "C8CEPCluster",
>>>>   "simpleFields" : {
>>>>     "allowParticipantAutoJoin" : "true"
>>>>   },
>>>>   "mapFields" : {
>>>>     "DEFAULT_INSTANCE_CAPACITY_MAP" : {
>>>>       "MEMORY" : "100",
>>>>       "CPU" : "100"
>>>>     },
>>>>     "DEFAULT_PARTITION_WEIGHT_MAP" : {
>>>>       "MEMORY" : "5",
>>>>       "CPU" : "5"
>>>>     }
>>>>   },
>>>>   "listFields" : {
>>>>     "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
>>>>   }
>>>> }
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 8] get
>>>> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*{
>>>>   "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
>>>>   "simpleFields" : {
>>>>     "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
>>>>     "HELIX_VERSION" : "1.0.4",
>>>>     "LIVE_INSTANCE" : "1@c8cep-0.c8cep.c8.svc.cluster.local",
>>>>     "SESSION_ID" : "106a30539a8003e"
>>>>   },
>>>>   "mapFields" : { },
>>>>   "listFields" : { }
>>>> }
>>>> [zk: localhost:2181(CONNECTED) 26] get
>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
>>>> {
>>>>   "id" : "_mm:root:_system:cron2",
>>>>   "simpleFields" : { },
>>>>   "mapFields" : {
>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>     }
>>>>   },
>>>>   "listFields" : { }
>>>> }
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 27] get
>>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
>>>>   "id" : "_mm:root:_system:cron3",
>>>>   "simpleFields" : { },
>>>>   "mapFields" : {
>>>>     "PARTITION_CAPACITY_MAP" : {
>>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>>     }
>>>>   },
>>>>   "listFields" : { }
>>>> }
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 38] get
>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
>>>>   "id" : "_mm:root:_system:cron2",
>>>>   "simpleFields" : {
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>   },
>>>>   "mapFields" : {
>>>>     "_mm:root:_system:cron2_0" : { }
>>>>   },
>>>>   "listFields" : {
>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>   }
>>>> }
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 39] get
>>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
>>>>   "id" : "_mm:root:_system:cron3",
>>>>   "simpleFields" : {
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>   },
>>>>   "mapFields" : {
>>>>     "_mm:root:_system:cron3_0" : { }
>>>>   },
>>>>   "listFields" : {
>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>   }
>>>> }
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 42] get
>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
>>>>   "id" : "_mm:root:_system:cron2",
>>>>   "simpleFields" : {
>>>>     "BUCKET_SIZE" : "0",
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>   },
>>>>   *"mapFields" : { },*
>>>>   "listFields" : { }
>>>> }
>>>>
>>>> *[zk: localhost:2181(CONNECTED) 43] get
>>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
>>>>   "id" : "_mm:root:_system:cron3",
>>>>   "simpleFields" : {
>>>>     "BUCKET_SIZE" : "0",
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>   },
>>>>
>>>>
>>>>
>>>>
>>>> *"mapFields" : {    "_mm:root:_system:cron3_0" : {
>>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
>>>>   "listFields" : { }
>>>> }
>>>>
>>>> Thank you.
>>>> Grainier Perera.
>>>>
>>>>
>>>> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <jx...@apache.org> wrote:
>>>>
>>>>> OK. So you dont put any rackaware information. Then how many instances
>>>>> do you have connecting to that cluster? Please double check the live
>>>>> instances in Zookeeper as well.
>>>>>
>>>>> Best,
>>>>>
>>>>> Junkai
>>>>>
>>>>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <gr...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Junkai,
>>>>>>
>>>>>> I've added cluster init code to the gist [1]. Apart from that,
>>>>>> ClusterConfig is configured like this;
>>>>>>
>>>>>>             ClusterConfig clusterConfig =
>>>>>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>>>>>             // Configuring the capacity keys in the Cluster Config.
>>>>>> For example, MEMORY.
>>>>>>
>>>>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>>>>>             // Configuring the instance capacity in the Instance
>>>>>> Config. For example, MEMORY = 100.
>>>>>>
>>>>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>>>>>             // Configuring the partition weight in the Resource
>>>>>> Config. For example, MEMORY = 5.
>>>>>>
>>>>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>>>>>             configAccessor.setClusterConfig(CLUSTER_NAME,
>>>>>> clusterConfig);
>>>>>>
>>>>>> [1]
>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>>>>>
>>>>>> Thanks,
>>>>>> Grainier Perera.
>>>>>>
>>>>>>
>>>>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <jx...@apache.org> wrote:
>>>>>>
>>>>>>> Could you please share your cluster config as well?
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Junkai
>>>>>>>
>>>>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <gr...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Devs,
>>>>>>>>
>>>>>>>> I'm trying to add several resources to the cluster using the
>>>>>>>> following configurations[1]. However, only some will become `ONLINE`. What
>>>>>>>> could be the reason? Is there a way to guarantee every resource will become
>>>>>>>> `ONLINE` if WAGED capacity constraints are met?
>>>>>>>>
>>>>>>>> You can see with the same IdealState, "_mm:root:_system:cron3" has
>>>>>>>> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not.
>>>>>>>> Furthermore, I see this behavior more often when the replicas count is set
>>>>>>>> to 1.
>>>>>>>>
>>>>>>>> ResourceInfo:
>>>>>>>> 1. "_mm:root:_system:cron2"
>>>>>>>>
>>>>>>>> IdealState for _mm:root:_system:cron2:
>>>>>>>> {
>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>   },
>>>>>>>>   "mapFields" : {
>>>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>>>   },
>>>>>>>>   "listFields" : {
>>>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>>>   }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> ExternalView for _mm:root:_system:cron2:
>>>>>>>> {
>>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>   },
>>>>>>>>   *"mapFields" : { },*
>>>>>>>>   "listFields" : { }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> 2. "_mm:root:_system:cron3"
>>>>>>>>
>>>>>>>> IdealState for _mm:root:_system:cron3:
>>>>>>>> {
>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>   },
>>>>>>>>   "mapFields" : {
>>>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>>>   },
>>>>>>>>   "listFields" : {
>>>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>>>   }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> ExternalView for _mm:root:_system:cron3:
>>>>>>>> {
>>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>>   "simpleFields" : {
>>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>>     "REPLICAS" : "1",
>>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>>   },
>>>>>>>>   *"mapFields" : {*
>>>>>>>> *    "_mm:root:_system:cron3_0" : {*
>>>>>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>>>>>> *    }*
>>>>>>>> *  },*
>>>>>>>>   "listFields" : { }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]:
>>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>> Grainier Perera.
>>>>>>>>
>>>>>>>
>>>
>>> --
>>> Junkai Xue
>>>
>>

Re: Some resources won't become online

Posted by Junkai Xue <jx...@apache.org>.
Interesting. Is this reproducible? We can have a try on your data.

Best,

Junkai

On Sat, Jun 18, 2022 at 4:31 AM Grainier Perera <gr...@apache.org> wrote:

> Hi Junkai,
>
> I tried removing `MAX_PARTITIONS_PER_INSTANCE`. But it's still the same.
> What's weird is, when I add a few resources, I see some of them still not
> getting into the `ONLINE` state. In the below sample, you can see only the
> 2nd and 4th resources have proper `mapFields`, whereas the 1st and 3rd
> resources don't seem to have any mapping (all of them have the
> same IdealState). However, after a restart, this can change to 1 & 3
> becomes `ONLINE` and 2 & 3 may lose their mapping. But the pattern
> remains... cannot understand why.
>
>
> *ExternalView for _mm:root:_system:cron1:*{
>   "id" : "_mm:root:_system:cron1",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0",
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>   },
>   *"mapFields" : { },*
>   "listFields" : { }
> }
>
>
> *ExternalView for _mm:root:_system:cron2:*{
>   "id" : "_mm:root:_system:cron2",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0",
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>   },
>
>
>
>
> *  "mapFields" : {    "_mm:root:_system:cron2_0" : {
> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>   "listFields" : { }
> }
>
>
> *ExternalView for _mm:root:_system:cron3:*{
>   "id" : "_mm:root:_system:cron3",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0",
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>   },
>   *"mapFields" : { },*
>   "listFields" : { }
> }
>
>
> *ExternalView for _mm:root:_system:cron4:*{
>   "id" : "_mm:root:_system:cron4",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0",
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
>   },
>
>
>
>
> *  "mapFields" : {    "_mm:root:_system:cron4_0" : {
> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
>   "listFields" : { }
> }
>
> Thanks,
> Grainier Perera.
>
>
> On Sat, 18 Jun 2022 at 13:21, Junkai Xue <ju...@gmail.com> wrote:
>
>> Then most likely, it caused by this entry of config:
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>> Usually, we never set this config up. It restricts the assignment for
>> instance. So now you already have one partition from 3_0 assigned. No other
>> partition can be assigned.
>>
>> So either you remove this entry of config setup or add more instances may
>> help.
>>
>> Please let us know if you have further questions.
>>
>> best,
>>
>> Junkai
>>
>> On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <gr...@apache.org>
>> wrote:
>>
>>> Hi Junkai,
>>>
>>> - Correct. I haven't added any rack-aware information.
>>> - I'm connecting 1 instance at the startup and then expanding on-demand
>>> (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
>>> - I've checked the live instances and other znodes in Zookeeper.
>>> Everything looks ok, except
>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty `mapFields`
>>> while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3 has `mapFields`
>>> with a ONLINE record. I still cannot understand why? and what I'm doing
>>> wrong :(
>>>
>>>
>>> *[zk: localhost:2181(CONNECTED) 18] get
>>> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
>>>   "id" : "C8CEPCluster",
>>>   "simpleFields" : {
>>>     "allowParticipantAutoJoin" : "true"
>>>   },
>>>   "mapFields" : {
>>>     "DEFAULT_INSTANCE_CAPACITY_MAP" : {
>>>       "MEMORY" : "100",
>>>       "CPU" : "100"
>>>     },
>>>     "DEFAULT_PARTITION_WEIGHT_MAP" : {
>>>       "MEMORY" : "5",
>>>       "CPU" : "5"
>>>     }
>>>   },
>>>   "listFields" : {
>>>     "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
>>>   }
>>> }
>>>
>>> *[zk: localhost:2181(CONNECTED) 8] get
>>> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*{
>>>   "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
>>>   "simpleFields" : {
>>>     "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
>>>     "HELIX_VERSION" : "1.0.4",
>>>     "LIVE_INSTANCE" : "1@c8cep-0.c8cep.c8.svc.cluster.local",
>>>     "SESSION_ID" : "106a30539a8003e"
>>>   },
>>>   "mapFields" : { },
>>>   "listFields" : { }
>>> }
>>> [zk: localhost:2181(CONNECTED) 26] get
>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
>>> {
>>>   "id" : "_mm:root:_system:cron2",
>>>   "simpleFields" : { },
>>>   "mapFields" : {
>>>     "PARTITION_CAPACITY_MAP" : {
>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>     }
>>>   },
>>>   "listFields" : { }
>>> }
>>>
>>> *[zk: localhost:2181(CONNECTED) 27] get
>>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
>>>   "id" : "_mm:root:_system:cron3",
>>>   "simpleFields" : { },
>>>   "mapFields" : {
>>>     "PARTITION_CAPACITY_MAP" : {
>>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>>     }
>>>   },
>>>   "listFields" : { }
>>> }
>>>
>>> *[zk: localhost:2181(CONNECTED) 38] get
>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
>>>   "id" : "_mm:root:_system:cron2",
>>>   "simpleFields" : {
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>   },
>>>   "mapFields" : {
>>>     "_mm:root:_system:cron2_0" : { }
>>>   },
>>>   "listFields" : {
>>>     "_mm:root:_system:cron2_0" : [ ]
>>>   }
>>> }
>>>
>>> *[zk: localhost:2181(CONNECTED) 39] get
>>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
>>>   "id" : "_mm:root:_system:cron3",
>>>   "simpleFields" : {
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>   },
>>>   "mapFields" : {
>>>     "_mm:root:_system:cron3_0" : { }
>>>   },
>>>   "listFields" : {
>>>     "_mm:root:_system:cron3_0" : [ ]
>>>   }
>>> }
>>>
>>> *[zk: localhost:2181(CONNECTED) 42] get
>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
>>>   "id" : "_mm:root:_system:cron2",
>>>   "simpleFields" : {
>>>     "BUCKET_SIZE" : "0",
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>   },
>>>   *"mapFields" : { },*
>>>   "listFields" : { }
>>> }
>>>
>>> *[zk: localhost:2181(CONNECTED) 43] get
>>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
>>>   "id" : "_mm:root:_system:cron3",
>>>   "simpleFields" : {
>>>     "BUCKET_SIZE" : "0",
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>   },
>>>
>>>
>>>
>>>
>>> *"mapFields" : {    "_mm:root:_system:cron3_0" : {
>>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
>>>   "listFields" : { }
>>> }
>>>
>>> Thank you.
>>> Grainier Perera.
>>>
>>>
>>> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <jx...@apache.org> wrote:
>>>
>>>> OK. So you dont put any rackaware information. Then how many instances
>>>> do you have connecting to that cluster? Please double check the live
>>>> instances in Zookeeper as well.
>>>>
>>>> Best,
>>>>
>>>> Junkai
>>>>
>>>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <gr...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Junkai,
>>>>>
>>>>> I've added cluster init code to the gist [1]. Apart from that,
>>>>> ClusterConfig is configured like this;
>>>>>
>>>>>             ClusterConfig clusterConfig =
>>>>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>>>>             // Configuring the capacity keys in the Cluster Config.
>>>>> For example, MEMORY.
>>>>>
>>>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>>>>             // Configuring the instance capacity in the Instance
>>>>> Config. For example, MEMORY = 100.
>>>>>
>>>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>>>>             // Configuring the partition weight in the Resource
>>>>> Config. For example, MEMORY = 5.
>>>>>
>>>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>>>>             configAccessor.setClusterConfig(CLUSTER_NAME,
>>>>> clusterConfig);
>>>>>
>>>>> [1]
>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>>>>
>>>>> Thanks,
>>>>> Grainier Perera.
>>>>>
>>>>>
>>>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <jx...@apache.org> wrote:
>>>>>
>>>>>> Could you please share your cluster config as well?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Junkai
>>>>>>
>>>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <gr...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Devs,
>>>>>>>
>>>>>>> I'm trying to add several resources to the cluster using the
>>>>>>> following configurations[1]. However, only some will become `ONLINE`. What
>>>>>>> could be the reason? Is there a way to guarantee every resource will become
>>>>>>> `ONLINE` if WAGED capacity constraints are met?
>>>>>>>
>>>>>>> You can see with the same IdealState, "_mm:root:_system:cron3" has
>>>>>>> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not.
>>>>>>> Furthermore, I see this behavior more often when the replicas count is set
>>>>>>> to 1.
>>>>>>>
>>>>>>> ResourceInfo:
>>>>>>> 1. "_mm:root:_system:cron2"
>>>>>>>
>>>>>>> IdealState for _mm:root:_system:cron2:
>>>>>>> {
>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>   "simpleFields" : {
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>   },
>>>>>>>   "mapFields" : {
>>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>>   },
>>>>>>>   "listFields" : {
>>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> ExternalView for _mm:root:_system:cron2:
>>>>>>> {
>>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>>   "simpleFields" : {
>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>   },
>>>>>>>   *"mapFields" : { },*
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> 2. "_mm:root:_system:cron3"
>>>>>>>
>>>>>>> IdealState for _mm:root:_system:cron3:
>>>>>>> {
>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>   "simpleFields" : {
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>   },
>>>>>>>   "mapFields" : {
>>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>>   },
>>>>>>>   "listFields" : {
>>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> ExternalView for _mm:root:_system:cron3:
>>>>>>> {
>>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>>   "simpleFields" : {
>>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>>     "REPLICAS" : "1",
>>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>>   },
>>>>>>>   *"mapFields" : {*
>>>>>>> *    "_mm:root:_system:cron3_0" : {*
>>>>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>>>>> *    }*
>>>>>>> *  },*
>>>>>>>   "listFields" : { }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> [1]:
>>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>>>>
>>>>>>> Thank you.
>>>>>>> Grainier Perera.
>>>>>>>
>>>>>>
>>
>> --
>> Junkai Xue
>>
>

Re: Some resources won't become online

Posted by Grainier Perera <gr...@apache.org>.
Hi Junkai,

I tried removing `MAX_PARTITIONS_PER_INSTANCE`. But it's still the same.
What's weird is, when I add a few resources, I see some of them still not
getting into the `ONLINE` state. In the below sample, you can see only the
2nd and 4th resources have proper `mapFields`, whereas the 1st and 3rd
resources don't seem to have any mapping (all of them have the
same IdealState). However, after a restart, this can change to 1 & 3
becomes `ONLINE` and 2 & 3 may lose their mapping. But the pattern
remains... cannot understand why.


*ExternalView for _mm:root:_system:cron1:*{
  "id" : "_mm:root:_system:cron1",
  "simpleFields" : {
    "BUCKET_SIZE" : "0",
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
  },
  *"mapFields" : { },*
  "listFields" : { }
}


*ExternalView for _mm:root:_system:cron2:*{
  "id" : "_mm:root:_system:cron2",
  "simpleFields" : {
    "BUCKET_SIZE" : "0",
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
  },




*  "mapFields" : {    "_mm:root:_system:cron2_0" : {
"c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
  "listFields" : { }
}


*ExternalView for _mm:root:_system:cron3:*{
  "id" : "_mm:root:_system:cron3",
  "simpleFields" : {
    "BUCKET_SIZE" : "0",
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
  },
  *"mapFields" : { },*
  "listFields" : { }
}


*ExternalView for _mm:root:_system:cron4:*{
  "id" : "_mm:root:_system:cron4",
  "simpleFields" : {
    "BUCKET_SIZE" : "0",
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "NewC8CEPStateModel"
  },




*  "mapFields" : {    "_mm:root:_system:cron4_0" : {
"c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  },*
  "listFields" : { }
}

Thanks,
Grainier Perera.


On Sat, 18 Jun 2022 at 13:21, Junkai Xue <ju...@gmail.com> wrote:

> Then most likely, it caused by this entry of config:
>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
> Usually, we never set this config up. It restricts the assignment for
> instance. So now you already have one partition from 3_0 assigned. No other
> partition can be assigned.
>
> So either you remove this entry of config setup or add more instances may
> help.
>
> Please let us know if you have further questions.
>
> best,
>
> Junkai
>
> On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <gr...@apache.org>
> wrote:
>
>> Hi Junkai,
>>
>> - Correct. I haven't added any rack-aware information.
>> - I'm connecting 1 instance at the startup and then expanding on-demand
>> (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
>> - I've checked the live instances and other znodes in Zookeeper.
>> Everything looks ok, except
>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty `mapFields`
>> while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3 has `mapFields`
>> with a ONLINE record. I still cannot understand why? and what I'm doing
>> wrong :(
>>
>>
>> *[zk: localhost:2181(CONNECTED) 18] get
>> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
>>   "id" : "C8CEPCluster",
>>   "simpleFields" : {
>>     "allowParticipantAutoJoin" : "true"
>>   },
>>   "mapFields" : {
>>     "DEFAULT_INSTANCE_CAPACITY_MAP" : {
>>       "MEMORY" : "100",
>>       "CPU" : "100"
>>     },
>>     "DEFAULT_PARTITION_WEIGHT_MAP" : {
>>       "MEMORY" : "5",
>>       "CPU" : "5"
>>     }
>>   },
>>   "listFields" : {
>>     "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
>>   }
>> }
>>
>> *[zk: localhost:2181(CONNECTED) 8] get
>> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*{
>>   "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
>>   "simpleFields" : {
>>     "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
>>     "HELIX_VERSION" : "1.0.4",
>>     "LIVE_INSTANCE" : "1@c8cep-0.c8cep.c8.svc.cluster.local",
>>     "SESSION_ID" : "106a30539a8003e"
>>   },
>>   "mapFields" : { },
>>   "listFields" : { }
>> }
>> [zk: localhost:2181(CONNECTED) 26] get
>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
>> {
>>   "id" : "_mm:root:_system:cron2",
>>   "simpleFields" : { },
>>   "mapFields" : {
>>     "PARTITION_CAPACITY_MAP" : {
>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>     }
>>   },
>>   "listFields" : { }
>> }
>>
>> *[zk: localhost:2181(CONNECTED) 27] get
>> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
>>   "id" : "_mm:root:_system:cron3",
>>   "simpleFields" : { },
>>   "mapFields" : {
>>     "PARTITION_CAPACITY_MAP" : {
>>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>>     }
>>   },
>>   "listFields" : { }
>> }
>>
>> *[zk: localhost:2181(CONNECTED) 38] get
>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
>>   "id" : "_mm:root:_system:cron2",
>>   "simpleFields" : {
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>   },
>>   "mapFields" : {
>>     "_mm:root:_system:cron2_0" : { }
>>   },
>>   "listFields" : {
>>     "_mm:root:_system:cron2_0" : [ ]
>>   }
>> }
>>
>> *[zk: localhost:2181(CONNECTED) 39] get
>> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
>>   "id" : "_mm:root:_system:cron3",
>>   "simpleFields" : {
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>   },
>>   "mapFields" : {
>>     "_mm:root:_system:cron3_0" : { }
>>   },
>>   "listFields" : {
>>     "_mm:root:_system:cron3_0" : [ ]
>>   }
>> }
>>
>> *[zk: localhost:2181(CONNECTED) 42] get
>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
>>   "id" : "_mm:root:_system:cron2",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>   },
>>   *"mapFields" : { },*
>>   "listFields" : { }
>> }
>>
>> *[zk: localhost:2181(CONNECTED) 43] get
>> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
>>   "id" : "_mm:root:_system:cron3",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>   },
>>
>>
>>
>>
>> *"mapFields" : {    "_mm:root:_system:cron3_0" : {
>> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
>>   "listFields" : { }
>> }
>>
>> Thank you.
>> Grainier Perera.
>>
>>
>> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <jx...@apache.org> wrote:
>>
>>> OK. So you dont put any rackaware information. Then how many instances
>>> do you have connecting to that cluster? Please double check the live
>>> instances in Zookeeper as well.
>>>
>>> Best,
>>>
>>> Junkai
>>>
>>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <gr...@apache.org>
>>> wrote:
>>>
>>>> Hi Junkai,
>>>>
>>>> I've added cluster init code to the gist [1]. Apart from that,
>>>> ClusterConfig is configured like this;
>>>>
>>>>             ClusterConfig clusterConfig =
>>>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>>>             // Configuring the capacity keys in the Cluster Config. For
>>>> example, MEMORY.
>>>>
>>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>>>             // Configuring the instance capacity in the Instance
>>>> Config. For example, MEMORY = 100.
>>>>
>>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>>>             // Configuring the partition weight in the Resource Config.
>>>> For example, MEMORY = 5.
>>>>
>>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>>>             configAccessor.setClusterConfig(CLUSTER_NAME,
>>>> clusterConfig);
>>>>
>>>> [1]
>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>>>
>>>> Thanks,
>>>> Grainier Perera.
>>>>
>>>>
>>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <jx...@apache.org> wrote:
>>>>
>>>>> Could you please share your cluster config as well?
>>>>>
>>>>> Best,
>>>>>
>>>>> Junkai
>>>>>
>>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <gr...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Devs,
>>>>>>
>>>>>> I'm trying to add several resources to the cluster using the
>>>>>> following configurations[1]. However, only some will become `ONLINE`. What
>>>>>> could be the reason? Is there a way to guarantee every resource will become
>>>>>> `ONLINE` if WAGED capacity constraints are met?
>>>>>>
>>>>>> You can see with the same IdealState, "_mm:root:_system:cron3" has
>>>>>> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not.
>>>>>> Furthermore, I see this behavior more often when the replicas count is set
>>>>>> to 1.
>>>>>>
>>>>>> ResourceInfo:
>>>>>> 1. "_mm:root:_system:cron2"
>>>>>>
>>>>>> IdealState for _mm:root:_system:cron2:
>>>>>> {
>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>   "simpleFields" : {
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>   },
>>>>>>   "mapFields" : {
>>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>>   },
>>>>>>   "listFields" : {
>>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> ExternalView for _mm:root:_system:cron2:
>>>>>> {
>>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>>   "simpleFields" : {
>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>   },
>>>>>>   *"mapFields" : { },*
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> 2. "_mm:root:_system:cron3"
>>>>>>
>>>>>> IdealState for _mm:root:_system:cron3:
>>>>>> {
>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>   "simpleFields" : {
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>   },
>>>>>>   "mapFields" : {
>>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>>   },
>>>>>>   "listFields" : {
>>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> ExternalView for _mm:root:_system:cron3:
>>>>>> {
>>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>>   "simpleFields" : {
>>>>>>     "BUCKET_SIZE" : "0",
>>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>>     "NUM_PARTITIONS" : "1",
>>>>>>     "REBALANCER_CLASS_NAME" :
>>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>>     "REPLICAS" : "1",
>>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>>   },
>>>>>>   *"mapFields" : {*
>>>>>> *    "_mm:root:_system:cron3_0" : {*
>>>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>>>> *    }*
>>>>>> *  },*
>>>>>>   "listFields" : { }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> [1]:
>>>>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>>>
>>>>>> Thank you.
>>>>>> Grainier Perera.
>>>>>>
>>>>>
>
> --
> Junkai Xue
>

Re: Some resources won't become online

Posted by Junkai Xue <ju...@gmail.com>.
Then most likely, it caused by this entry of config:
    "MAX_PARTITIONS_PER_INSTANCE" : "1",
Usually, we never set this config up. It restricts the assignment for
instance. So now you already have one partition from 3_0 assigned. No other
partition can be assigned.

So either you remove this entry of config setup or add more instances may
help.

Please let us know if you have further questions.

best,

Junkai

On Fri, Jun 17, 2022 at 11:38 PM Grainier Perera <gr...@apache.org>
wrote:

> Hi Junkai,
>
> - Correct. I haven't added any rack-aware information.
> - I'm connecting 1 instance at the startup and then expanding on-demand
> (I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
> - I've checked the live instances and other znodes in Zookeeper.
> Everything looks ok, except
> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has empty `mapFields`
> while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3 has `mapFields`
> with a ONLINE record. I still cannot understand why? and what I'm doing
> wrong :(
>
>
> *[zk: localhost:2181(CONNECTED) 18] get
> /C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
>   "id" : "C8CEPCluster",
>   "simpleFields" : {
>     "allowParticipantAutoJoin" : "true"
>   },
>   "mapFields" : {
>     "DEFAULT_INSTANCE_CAPACITY_MAP" : {
>       "MEMORY" : "100",
>       "CPU" : "100"
>     },
>     "DEFAULT_PARTITION_WEIGHT_MAP" : {
>       "MEMORY" : "5",
>       "CPU" : "5"
>     }
>   },
>   "listFields" : {
>     "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
>   }
> }
>
> *[zk: localhost:2181(CONNECTED) 8] get
> /C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*{
>   "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
>   "simpleFields" : {
>     "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
>     "HELIX_VERSION" : "1.0.4",
>     "LIVE_INSTANCE" : "1@c8cep-0.c8cep.c8.svc.cluster.local",
>     "SESSION_ID" : "106a30539a8003e"
>   },
>   "mapFields" : { },
>   "listFields" : { }
> }
> [zk: localhost:2181(CONNECTED) 26] get
> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
> {
>   "id" : "_mm:root:_system:cron2",
>   "simpleFields" : { },
>   "mapFields" : {
>     "PARTITION_CAPACITY_MAP" : {
>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>     }
>   },
>   "listFields" : { }
> }
>
> *[zk: localhost:2181(CONNECTED) 27] get
> /C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
>   "id" : "_mm:root:_system:cron3",
>   "simpleFields" : { },
>   "mapFields" : {
>     "PARTITION_CAPACITY_MAP" : {
>       "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
>     }
>   },
>   "listFields" : { }
> }
>
> *[zk: localhost:2181(CONNECTED) 38] get
> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
>   "id" : "_mm:root:_system:cron2",
>   "simpleFields" : {
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>   },
>   "mapFields" : {
>     "_mm:root:_system:cron2_0" : { }
>   },
>   "listFields" : {
>     "_mm:root:_system:cron2_0" : [ ]
>   }
> }
>
> *[zk: localhost:2181(CONNECTED) 39] get
> /C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
>   "id" : "_mm:root:_system:cron3",
>   "simpleFields" : {
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>   },
>   "mapFields" : {
>     "_mm:root:_system:cron3_0" : { }
>   },
>   "listFields" : {
>     "_mm:root:_system:cron3_0" : [ ]
>   }
> }
>
> *[zk: localhost:2181(CONNECTED) 42] get
> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
>   "id" : "_mm:root:_system:cron2",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0",
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>   },
>   *"mapFields" : { },*
>   "listFields" : { }
> }
>
> *[zk: localhost:2181(CONNECTED) 43] get
> /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
>   "id" : "_mm:root:_system:cron3",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0",
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>   },
>
>
>
>
> *"mapFields" : {    "_mm:root:_system:cron3_0" : {
> "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
>   "listFields" : { }
> }
>
> Thank you.
> Grainier Perera.
>
>
> On Sat, 18 Jun 2022 at 10:45, Junkai Xue <jx...@apache.org> wrote:
>
>> OK. So you dont put any rackaware information. Then how many instances do
>> you have connecting to that cluster? Please double check the live instances
>> in Zookeeper as well.
>>
>> Best,
>>
>> Junkai
>>
>> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <gr...@apache.org>
>> wrote:
>>
>>> Hi Junkai,
>>>
>>> I've added cluster init code to the gist [1]. Apart from that,
>>> ClusterConfig is configured like this;
>>>
>>>             ClusterConfig clusterConfig =
>>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>>             // Configuring the capacity keys in the Cluster Config. For
>>> example, MEMORY.
>>>
>>> clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>>             // Configuring the instance capacity in the Instance Config.
>>> For example, MEMORY = 100.
>>>
>>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>>             // Configuring the partition weight in the Resource Config.
>>> For example, MEMORY = 5.
>>>
>>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>>             configAccessor.setClusterConfig(CLUSTER_NAME, clusterConfig);
>>>
>>> [1]
>>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>>
>>> Thanks,
>>> Grainier Perera.
>>>
>>>
>>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <jx...@apache.org> wrote:
>>>
>>>> Could you please share your cluster config as well?
>>>>
>>>> Best,
>>>>
>>>> Junkai
>>>>
>>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <gr...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Devs,
>>>>>
>>>>> I'm trying to add several resources to the cluster using the following
>>>>> configurations[1]. However, only some will become `ONLINE`. What could be
>>>>> the reason? Is there a way to guarantee every resource will become `ONLINE`
>>>>> if WAGED capacity constraints are met?
>>>>>
>>>>> You can see with the same IdealState, "_mm:root:_system:cron3" has
>>>>> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not.
>>>>> Furthermore, I see this behavior more often when the replicas count is set
>>>>> to 1.
>>>>>
>>>>> ResourceInfo:
>>>>> 1. "_mm:root:_system:cron2"
>>>>>
>>>>> IdealState for _mm:root:_system:cron2:
>>>>> {
>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>   "simpleFields" : {
>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>     "NUM_PARTITIONS" : "1",
>>>>>     "REBALANCER_CLASS_NAME" :
>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>   },
>>>>>   "mapFields" : {
>>>>>     "_mm:root:_system:cron2_0" : { }
>>>>>   },
>>>>>   "listFields" : {
>>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>>   }
>>>>> }
>>>>>
>>>>>
>>>>> ExternalView for _mm:root:_system:cron2:
>>>>> {
>>>>>   "id" : "_mm:root:_system:cron2",
>>>>>   "simpleFields" : {
>>>>>     "BUCKET_SIZE" : "0",
>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>     "NUM_PARTITIONS" : "1",
>>>>>     "REBALANCER_CLASS_NAME" :
>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>   },
>>>>>   *"mapFields" : { },*
>>>>>   "listFields" : { }
>>>>> }
>>>>>
>>>>>
>>>>> 2. "_mm:root:_system:cron3"
>>>>>
>>>>> IdealState for _mm:root:_system:cron3:
>>>>> {
>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>   "simpleFields" : {
>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>     "NUM_PARTITIONS" : "1",
>>>>>     "REBALANCER_CLASS_NAME" :
>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>   },
>>>>>   "mapFields" : {
>>>>>     "_mm:root:_system:cron3_0" : { }
>>>>>   },
>>>>>   "listFields" : {
>>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>>   }
>>>>> }
>>>>>
>>>>>
>>>>> ExternalView for _mm:root:_system:cron3:
>>>>> {
>>>>>   "id" : "_mm:root:_system:cron3",
>>>>>   "simpleFields" : {
>>>>>     "BUCKET_SIZE" : "0",
>>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>>     "NUM_PARTITIONS" : "1",
>>>>>     "REBALANCER_CLASS_NAME" :
>>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>>     "REBALANCE_DELAY" : "10000",
>>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>>     "REPLICAS" : "1",
>>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>>   },
>>>>>   *"mapFields" : {*
>>>>> *    "_mm:root:_system:cron3_0" : {*
>>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>>> *    }*
>>>>> *  },*
>>>>>   "listFields" : { }
>>>>> }
>>>>>
>>>>>
>>>>> [1]: https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>>
>>>>> Thank you.
>>>>> Grainier Perera.
>>>>>
>>>>

-- 
Junkai Xue

Re: Some resources won't become online

Posted by Grainier Perera <gr...@apache.org>.
Hi Junkai,

- Correct. I haven't added any rack-aware information.
- I'm connecting 1 instance at the startup and then expanding on-demand
(I've set ALLOW_PARTICIPANT_AUTO_JOIN to true).
- I've checked the live instances and other znodes in Zookeeper. Everything
looks ok, except /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2 has
empty `mapFields` while /C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3
has `mapFields` with a ONLINE record. I still cannot understand why? and
what I'm doing wrong :(


*[zk: localhost:2181(CONNECTED) 18] get
/C8CEPCluster/CONFIGS/CLUSTER/C8CEPCluster*{
  "id" : "C8CEPCluster",
  "simpleFields" : {
    "allowParticipantAutoJoin" : "true"
  },
  "mapFields" : {
    "DEFAULT_INSTANCE_CAPACITY_MAP" : {
      "MEMORY" : "100",
      "CPU" : "100"
    },
    "DEFAULT_PARTITION_WEIGHT_MAP" : {
      "MEMORY" : "5",
      "CPU" : "5"
    }
  },
  "listFields" : {
    "INSTANCE_CAPACITY_KEYS" : [ "CPU", "MEMORY" ]
  }
}

*[zk: localhost:2181(CONNECTED) 8] get
/C8CEPCluster/LIVEINSTANCES/c8cep-0.c8cep.c8.svc.cluster.local_12000*{
  "id" : "c8cep-0.c8cep.c8.svc.cluster.local_12000",
  "simpleFields" : {
    "CURRENT_TASK_THREAD_POOL_SIZE" : "40",
    "HELIX_VERSION" : "1.0.4",
    "LIVE_INSTANCE" : "1@c8cep-0.c8cep.c8.svc.cluster.local",
    "SESSION_ID" : "106a30539a8003e"
  },
  "mapFields" : { },
  "listFields" : { }
}
[zk: localhost:2181(CONNECTED) 26] get
/C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron2
{
  "id" : "_mm:root:_system:cron2",
  "simpleFields" : { },
  "mapFields" : {
    "PARTITION_CAPACITY_MAP" : {
      "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
    }
  },
  "listFields" : { }
}

*[zk: localhost:2181(CONNECTED) 27] get
/C8CEPCluster/CONFIGS/RESOURCE/_mm:root:_system:cron3*{
  "id" : "_mm:root:_system:cron3",
  "simpleFields" : { },
  "mapFields" : {
    "PARTITION_CAPACITY_MAP" : {
      "DEFAULT" : "{\"CPU\":\"10\",\"MEMORY\":\"10\"}"
    }
  },
  "listFields" : { }
}

*[zk: localhost:2181(CONNECTED) 38] get
/C8CEPCluster/IDEALSTATES/_mm:root:_system:cron2*{
  "id" : "_mm:root:_system:cron2",
  "simpleFields" : {
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "MAX_PARTITIONS_PER_INSTANCE" : "1",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
  },
  "mapFields" : {
    "_mm:root:_system:cron2_0" : { }
  },
  "listFields" : {
    "_mm:root:_system:cron2_0" : [ ]
  }
}

*[zk: localhost:2181(CONNECTED) 39] get
/C8CEPCluster/IDEALSTATES/_mm:root:_system:cron3*{
  "id" : "_mm:root:_system:cron3",
  "simpleFields" : {
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "MAX_PARTITIONS_PER_INSTANCE" : "1",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
  },
  "mapFields" : {
    "_mm:root:_system:cron3_0" : { }
  },
  "listFields" : {
    "_mm:root:_system:cron3_0" : [ ]
  }
}

*[zk: localhost:2181(CONNECTED) 42] get
/C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron2*{
  "id" : "_mm:root:_system:cron2",
  "simpleFields" : {
    "BUCKET_SIZE" : "0",
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "MAX_PARTITIONS_PER_INSTANCE" : "1",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
  },
  *"mapFields" : { },*
  "listFields" : { }
}

*[zk: localhost:2181(CONNECTED) 43] get
/C8CEPCluster/EXTERNALVIEW/_mm:root:_system:cron3*{
  "id" : "_mm:root:_system:cron3",
  "simpleFields" : {
    "BUCKET_SIZE" : "0",
    "DELAY_REBALANCE_ENABLED" : "true",
    "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
    "MAX_PARTITIONS_PER_INSTANCE" : "1",
    "NUM_PARTITIONS" : "1",
    "REBALANCER_CLASS_NAME" :
"org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
    "REBALANCE_DELAY" : "10000",
    "REBALANCE_MODE" : "FULL_AUTO",
    "REPLICAS" : "1",
    "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
  },




*"mapFields" : {    "_mm:root:_system:cron3_0" : {
"c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"    }  }*,
  "listFields" : { }
}

Thank you.
Grainier Perera.


On Sat, 18 Jun 2022 at 10:45, Junkai Xue <jx...@apache.org> wrote:

> OK. So you dont put any rackaware information. Then how many instances do
> you have connecting to that cluster? Please double check the live instances
> in Zookeeper as well.
>
> Best,
>
> Junkai
>
> On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <gr...@apache.org>
> wrote:
>
>> Hi Junkai,
>>
>> I've added cluster init code to the gist [1]. Apart from that,
>> ClusterConfig is configured like this;
>>
>>             ClusterConfig clusterConfig =
>> configAccessor.getClusterConfig(CLUSTER_NAME);
>>             // Configuring the capacity keys in the Cluster Config. For
>> example, MEMORY.
>>             clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>>             // Configuring the instance capacity in the Instance Config.
>> For example, MEMORY = 100.
>>
>> clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>>             // Configuring the partition weight in the Resource Config.
>> For example, MEMORY = 5.
>>
>> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>>             configAccessor.setClusterConfig(CLUSTER_NAME, clusterConfig);
>>
>> [1]
>> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>>
>> Thanks,
>> Grainier Perera.
>>
>>
>> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <jx...@apache.org> wrote:
>>
>>> Could you please share your cluster config as well?
>>>
>>> Best,
>>>
>>> Junkai
>>>
>>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <gr...@apache.org>
>>> wrote:
>>>
>>>> Hi Devs,
>>>>
>>>> I'm trying to add several resources to the cluster using the following
>>>> configurations[1]. However, only some will become `ONLINE`. What could be
>>>> the reason? Is there a way to guarantee every resource will become `ONLINE`
>>>> if WAGED capacity constraints are met?
>>>>
>>>> You can see with the same IdealState, "_mm:root:_system:cron3" has
>>>> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not.
>>>> Furthermore, I see this behavior more often when the replicas count is set
>>>> to 1.
>>>>
>>>> ResourceInfo:
>>>> 1. "_mm:root:_system:cron2"
>>>>
>>>> IdealState for _mm:root:_system:cron2:
>>>> {
>>>>   "id" : "_mm:root:_system:cron2",
>>>>   "simpleFields" : {
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>   },
>>>>   "mapFields" : {
>>>>     "_mm:root:_system:cron2_0" : { }
>>>>   },
>>>>   "listFields" : {
>>>>     "_mm:root:_system:cron2_0" : [ ]
>>>>   }
>>>> }
>>>>
>>>>
>>>> ExternalView for _mm:root:_system:cron2:
>>>> {
>>>>   "id" : "_mm:root:_system:cron2",
>>>>   "simpleFields" : {
>>>>     "BUCKET_SIZE" : "0",
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>   },
>>>>   *"mapFields" : { },*
>>>>   "listFields" : { }
>>>> }
>>>>
>>>>
>>>> 2. "_mm:root:_system:cron3"
>>>>
>>>> IdealState for _mm:root:_system:cron3:
>>>> {
>>>>   "id" : "_mm:root:_system:cron3",
>>>>   "simpleFields" : {
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>   },
>>>>   "mapFields" : {
>>>>     "_mm:root:_system:cron3_0" : { }
>>>>   },
>>>>   "listFields" : {
>>>>     "_mm:root:_system:cron3_0" : [ ]
>>>>   }
>>>> }
>>>>
>>>>
>>>> ExternalView for _mm:root:_system:cron3:
>>>> {
>>>>   "id" : "_mm:root:_system:cron3",
>>>>   "simpleFields" : {
>>>>     "BUCKET_SIZE" : "0",
>>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>>     "NUM_PARTITIONS" : "1",
>>>>     "REBALANCER_CLASS_NAME" :
>>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>>     "REBALANCE_DELAY" : "10000",
>>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>>     "REPLICAS" : "1",
>>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>>   },
>>>>   *"mapFields" : {*
>>>> *    "_mm:root:_system:cron3_0" : {*
>>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>>> *    }*
>>>> *  },*
>>>>   "listFields" : { }
>>>> }
>>>>
>>>>
>>>> [1]: https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>>
>>>> Thank you.
>>>> Grainier Perera.
>>>>
>>>

Re: Some resources won't become online

Posted by Junkai Xue <jx...@apache.org>.
OK. So you dont put any rackaware information. Then how many instances do
you have connecting to that cluster? Please double check the live instances
in Zookeeper as well.

Best,

Junkai

On Fri, Jun 17, 2022 at 10:01 PM Grainier Perera <gr...@apache.org>
wrote:

> Hi Junkai,
>
> I've added cluster init code to the gist [1]. Apart from that,
> ClusterConfig is configured like this;
>
>             ClusterConfig clusterConfig =
> configAccessor.getClusterConfig(CLUSTER_NAME);
>             // Configuring the capacity keys in the Cluster Config. For
> example, MEMORY.
>             clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
>             // Configuring the instance capacity in the Instance Config.
> For example, MEMORY = 100.
>             clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
>             // Configuring the partition weight in the Resource Config.
> For example, MEMORY = 5.
>
> clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
>             configAccessor.setClusterConfig(CLUSTER_NAME, clusterConfig);
>
> [1]
> https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java
>
> Thanks,
> Grainier Perera.
>
>
> On Sat, 18 Jun 2022 at 10:00, Junkai Xue <jx...@apache.org> wrote:
>
>> Could you please share your cluster config as well?
>>
>> Best,
>>
>> Junkai
>>
>> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <gr...@apache.org>
>> wrote:
>>
>>> Hi Devs,
>>>
>>> I'm trying to add several resources to the cluster using the following
>>> configurations[1]. However, only some will become `ONLINE`. What could be
>>> the reason? Is there a way to guarantee every resource will become `ONLINE`
>>> if WAGED capacity constraints are met?
>>>
>>> You can see with the same IdealState, "_mm:root:_system:cron3" has
>>> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not.
>>> Furthermore, I see this behavior more often when the replicas count is set
>>> to 1.
>>>
>>> ResourceInfo:
>>> 1. "_mm:root:_system:cron2"
>>>
>>> IdealState for _mm:root:_system:cron2:
>>> {
>>>   "id" : "_mm:root:_system:cron2",
>>>   "simpleFields" : {
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>   },
>>>   "mapFields" : {
>>>     "_mm:root:_system:cron2_0" : { }
>>>   },
>>>   "listFields" : {
>>>     "_mm:root:_system:cron2_0" : [ ]
>>>   }
>>> }
>>>
>>>
>>> ExternalView for _mm:root:_system:cron2:
>>> {
>>>   "id" : "_mm:root:_system:cron2",
>>>   "simpleFields" : {
>>>     "BUCKET_SIZE" : "0",
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>   },
>>>   *"mapFields" : { },*
>>>   "listFields" : { }
>>> }
>>>
>>>
>>> 2. "_mm:root:_system:cron3"
>>>
>>> IdealState for _mm:root:_system:cron3:
>>> {
>>>   "id" : "_mm:root:_system:cron3",
>>>   "simpleFields" : {
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>   },
>>>   "mapFields" : {
>>>     "_mm:root:_system:cron3_0" : { }
>>>   },
>>>   "listFields" : {
>>>     "_mm:root:_system:cron3_0" : [ ]
>>>   }
>>> }
>>>
>>>
>>> ExternalView for _mm:root:_system:cron3:
>>> {
>>>   "id" : "_mm:root:_system:cron3",
>>>   "simpleFields" : {
>>>     "BUCKET_SIZE" : "0",
>>>     "DELAY_REBALANCE_ENABLED" : "true",
>>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>>     "NUM_PARTITIONS" : "1",
>>>     "REBALANCER_CLASS_NAME" :
>>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>>     "REBALANCE_DELAY" : "10000",
>>>     "REBALANCE_MODE" : "FULL_AUTO",
>>>     "REPLICAS" : "1",
>>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>>   },
>>>   *"mapFields" : {*
>>> *    "_mm:root:_system:cron3_0" : {*
>>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>>> *    }*
>>> *  },*
>>>   "listFields" : { }
>>> }
>>>
>>>
>>> [1]: https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>>
>>> Thank you.
>>> Grainier Perera.
>>>
>>

Re: Some resources won't become online

Posted by Grainier Perera <gr...@apache.org>.
Hi Junkai,

I've added cluster init code to the gist [1]. Apart from that,
ClusterConfig is configured like this;

            ClusterConfig clusterConfig =
configAccessor.getClusterConfig(CLUSTER_NAME);
            // Configuring the capacity keys in the Cluster Config. For
example, MEMORY.
            clusterConfig.setInstanceCapacityKeys(INSTANCE_CAPACITY_KEYS);
            // Configuring the instance capacity in the Instance Config.
For example, MEMORY = 100.
            clusterConfig.setDefaultInstanceCapacityMap(INSTANCE_CAPACITY);
            // Configuring the partition weight in the Resource Config. For
example, MEMORY = 5.

clusterConfig.setDefaultPartitionWeightMap(DEFAULT_RESOURCE_USAGE);
            configAccessor.setClusterConfig(CLUSTER_NAME, clusterConfig);

[1]
https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb#file-clustersetup-java

Thanks,
Grainier Perera.


On Sat, 18 Jun 2022 at 10:00, Junkai Xue <jx...@apache.org> wrote:

> Could you please share your cluster config as well?
>
> Best,
>
> Junkai
>
> On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <gr...@apache.org>
> wrote:
>
>> Hi Devs,
>>
>> I'm trying to add several resources to the cluster using the following
>> configurations[1]. However, only some will become `ONLINE`. What could be
>> the reason? Is there a way to guarantee every resource will become `ONLINE`
>> if WAGED capacity constraints are met?
>>
>> You can see with the same IdealState, "_mm:root:_system:cron3" has
>> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not.
>> Furthermore, I see this behavior more often when the replicas count is set
>> to 1.
>>
>> ResourceInfo:
>> 1. "_mm:root:_system:cron2"
>>
>> IdealState for _mm:root:_system:cron2:
>> {
>>   "id" : "_mm:root:_system:cron2",
>>   "simpleFields" : {
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>   },
>>   "mapFields" : {
>>     "_mm:root:_system:cron2_0" : { }
>>   },
>>   "listFields" : {
>>     "_mm:root:_system:cron2_0" : [ ]
>>   }
>> }
>>
>>
>> ExternalView for _mm:root:_system:cron2:
>> {
>>   "id" : "_mm:root:_system:cron2",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>   },
>>   *"mapFields" : { },*
>>   "listFields" : { }
>> }
>>
>>
>> 2. "_mm:root:_system:cron3"
>>
>> IdealState for _mm:root:_system:cron3:
>> {
>>   "id" : "_mm:root:_system:cron3",
>>   "simpleFields" : {
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>   },
>>   "mapFields" : {
>>     "_mm:root:_system:cron3_0" : { }
>>   },
>>   "listFields" : {
>>     "_mm:root:_system:cron3_0" : [ ]
>>   }
>> }
>>
>>
>> ExternalView for _mm:root:_system:cron3:
>> {
>>   "id" : "_mm:root:_system:cron3",
>>   "simpleFields" : {
>>     "BUCKET_SIZE" : "0",
>>     "DELAY_REBALANCE_ENABLED" : "true",
>>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>>     "NUM_PARTITIONS" : "1",
>>     "REBALANCER_CLASS_NAME" :
>> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>>     "REBALANCE_DELAY" : "10000",
>>     "REBALANCE_MODE" : "FULL_AUTO",
>>     "REPLICAS" : "1",
>>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>>   },
>>   *"mapFields" : {*
>> *    "_mm:root:_system:cron3_0" : {*
>> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
>> *    }*
>> *  },*
>>   "listFields" : { }
>> }
>>
>>
>> [1]: https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>>
>> Thank you.
>> Grainier Perera.
>>
>

Re: Some resources won't become online

Posted by Junkai Xue <jx...@apache.org>.
Could you please share your cluster config as well?

Best,

Junkai

On Fri, Jun 17, 2022 at 8:24 PM Grainier Perera <gr...@apache.org> wrote:

> Hi Devs,
>
> I'm trying to add several resources to the cluster using the following
> configurations[1]. However, only some will become `ONLINE`. What could be
> the reason? Is there a way to guarantee every resource will become `ONLINE`
> if WAGED capacity constraints are met?
>
> You can see with the same IdealState, "_mm:root:_system:cron3" has
> mapFields and it is ONLINE, and "_mm:root:_system:cron2" is not.
> Furthermore, I see this behavior more often when the replicas count is set
> to 1.
>
> ResourceInfo:
> 1. "_mm:root:_system:cron2"
>
> IdealState for _mm:root:_system:cron2:
> {
>   "id" : "_mm:root:_system:cron2",
>   "simpleFields" : {
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>   },
>   "mapFields" : {
>     "_mm:root:_system:cron2_0" : { }
>   },
>   "listFields" : {
>     "_mm:root:_system:cron2_0" : [ ]
>   }
> }
>
>
> ExternalView for _mm:root:_system:cron2:
> {
>   "id" : "_mm:root:_system:cron2",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0",
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>   },
>   *"mapFields" : { },*
>   "listFields" : { }
> }
>
>
> 2. "_mm:root:_system:cron3"
>
> IdealState for _mm:root:_system:cron3:
> {
>   "id" : "_mm:root:_system:cron3",
>   "simpleFields" : {
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>   },
>   "mapFields" : {
>     "_mm:root:_system:cron3_0" : { }
>   },
>   "listFields" : {
>     "_mm:root:_system:cron3_0" : [ ]
>   }
> }
>
>
> ExternalView for _mm:root:_system:cron3:
> {
>   "id" : "_mm:root:_system:cron3",
>   "simpleFields" : {
>     "BUCKET_SIZE" : "0",
>     "DELAY_REBALANCE_ENABLED" : "true",
>     "IDEAL_STATE_MODE" : "AUTO_REBALANCE",
>     "MAX_PARTITIONS_PER_INSTANCE" : "1",
>     "NUM_PARTITIONS" : "1",
>     "REBALANCER_CLASS_NAME" :
> "org.apache.helix.controller.rebalancer.waged.WagedRebalancer",
>     "REBALANCE_DELAY" : "10000",
>     "REBALANCE_MODE" : "FULL_AUTO",
>     "REPLICAS" : "1",
>     "STATE_MODEL_DEF_REF" : "C8CEPStateModel"
>   },
>   *"mapFields" : {*
> *    "_mm:root:_system:cron3_0" : {*
> *      "c8cep-0.c8cep.c8.svc.cluster.local_12000" : "ONLINE"*
> *    }*
> *  },*
>   "listFields" : { }
> }
>
>
> [1]: https://gist.github.com/grainier/aa1c0b279ea99f88d74c1e94d79f5cdb
>
> Thank you.
> Grainier Perera.
>