You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@helix.apache.org by Grainier Perera <gr...@apache.org> on 2022/06/24 09:09:36 UTC

Validating capacity availability of instances

Hi Devs,

Is there a way to validate the capacity availability of cluster instances
when adding a resource and rebalancing it with WAGED? Because the resource
addition process seems to happen in an event pipeline. So, when it
encounters the "FAILED_TO_CALCULATE" exception, it doesn't seem to
propagate to the place where we add the resource. Therefore, it seems
tricky to validate capacity availability beforehand.

While looking for this I found [1]. But I couldn't clearly understand the
usage of the "WAGED simulation API" mentioned there. So, here's what I've
tried;

So the questions are:
- Is it correct?
- If so, is encountering a ""getIdealAssignmentForWagedFullAuto():
Calculation failed: Failed to compute BestPossibleState!"" can be
considered "FAILED_TO_CALCULATE"?
- If so, is there a way to get the proper reason for the failure. Like "Unable
to find any available candidate node for partition resource4_0; Fail
reasons: {resource4-resource4_0-ONLINE={c8cep_on_localhost_12002=[Node has
insufficient capacity]..."
- Or, is there a better way of doing this?

    try {
        IdealState newIS = getIdealState(resourceName);
        ResourceConfig newResourceConfig = new ResourceConfig(resourceName);
        // Set PARTITION_CAPACITY_MAP
        Map<String, String> capacityDataMap = ImmutableMap.of("CPU",
"20", "MEMORY", "60");
        newResourceConfig.getRecord().setMapField(ResourceConfig.ResourceConfigProperty.PARTITION_CAPACITY_MAP.name(),

Collections.singletonMap(ResourceConfig.DEFAULT_PARTITION_KEY,
OBJECT_MAPPER.writeValueAsString(capacityDataMap)));

        // Read existing cluster/instances/resources info
        final ZKHelixDataAccessor dataAccessor = new
ZKHelixDataAccessor(CLUSTER_NAME,
                new
ZkBaseDataAccessor.Builder<ZNRecord>().setZkAddress(ZK_ADDRESS).build());
        ClusterConfig clusterConfig =
dataAccessor.getProperty(dataAccessor.keyBuilder().clusterConfig());
        List<InstanceConfig> instanceConfigs =
dataAccessor.getChildValues(dataAccessor.keyBuilder().instanceConfigs(),
true);
        List<String> liveInstances =
dataAccessor.getChildNames(dataAccessor.keyBuilder().liveInstances());
        List<IdealState> idealStates =
dataAccessor.getChildValues(dataAccessor.keyBuilder().idealStates(),
true);
        List<ResourceConfig> resourceConfigs =
dataAccessor.getChildValues(dataAccessor.keyBuilder().resourceConfigs(),
true);

        // Do we need add this?
        idealStates.add(newIS);
        resourceConfigs.add(newResourceConfig);

        // Verify that utilResult contains the assignment for the
resources added
        Map<String, ResourceAssignment> utilResult = HelixUtil
                .getTargetAssignmentForWagedFullAuto(ZK_ADDRESS,
clusterConfig, instanceConfigs,
                        liveInstances, idealStates, resourceConfigs);

    } catch (HelixException e) {
        // Getting "getIdealAssignmentForWagedFullAuto(): Calculation
failed: Failed to compute BestPossibleState!"
        // means not enough capacity?
    }

[1] https://github.com/apache/helix/pull/1701

Thank you,
Grainier Perera.

Re: Validating capacity availability of instances

Posted by Grainier Perera <gr...@apache.org>.

Hi Junkai,

Thank you for the explanation. I'll create an issue to improve the
exception message.

Thanks,
Grainier Perera.


On Sat, 25 Jun 2022 at 09:38, Junkai Xue <ju...@gmail.com> wrote:

> If you did not turn on the rackware, it shall be a capacity problem. You
> can file an issue in Apache Helix github to improve the exception or logs.
>
> Best,
>
> Junkai
>
> On Fri, Jun 24, 2022 at 2:10 AM Grainier Perera <gr...@apache.org>
> wrote:
>
>> Hi Devs,
>>
>> Is there a way to validate the capacity availability of cluster instances
>> when adding a resource and rebalancing it with WAGED? Because the resource
>> addition process seems to happen in an event pipeline. So, when it
>> encounters the "FAILED_TO_CALCULATE" exception, it doesn't seem to
>> propagate to the place where we add the resource. Therefore, it seems
>> tricky to validate capacity availability beforehand.
>>
>> While looking for this I found [1]. But I couldn't clearly understand the
>> usage of the "WAGED simulation API" mentioned there. So, here's what I've
>> tried;
>>
>> So the questions are:
>> - Is it correct?
>> - If so, is encountering a ""getIdealAssignmentForWagedFullAuto():
>> Calculation failed: Failed to compute BestPossibleState!"" can be
>> considered "FAILED_TO_CALCULATE"?
>> - If so, is there a way to get the proper reason for the failure. Like "Unable
>> to find any available candidate node for partition resource4_0; Fail
>> reasons: {resource4-resource4_0-ONLINE={c8cep_on_localhost_12002=[Node has
>> insufficient capacity]..."
>> - Or, is there a better way of doing this?
>>
>>     try {
>>         IdealState newIS = getIdealState(resourceName);
>>         ResourceConfig newResourceConfig = new ResourceConfig(resourceName);
>>         // Set PARTITION_CAPACITY_MAP
>>         Map<String, String> capacityDataMap = ImmutableMap.of("CPU", "20", "MEMORY", "60");
>>         newResourceConfig.getRecord().setMapField(ResourceConfig.ResourceConfigProperty.PARTITION_CAPACITY_MAP.name(),
>>                 Collections.singletonMap(ResourceConfig.DEFAULT_PARTITION_KEY, OBJECT_MAPPER.writeValueAsString(capacityDataMap)));
>>
>>         // Read existing cluster/instances/resources info
>>         final ZKHelixDataAccessor dataAccessor = new ZKHelixDataAccessor(CLUSTER_NAME,
>>                 new ZkBaseDataAccessor.Builder<ZNRecord>().setZkAddress(ZK_ADDRESS).build());
>>         ClusterConfig clusterConfig = dataAccessor.getProperty(dataAccessor.keyBuilder().clusterConfig());
>>         List<InstanceConfig> instanceConfigs = dataAccessor.getChildValues(dataAccessor.keyBuilder().instanceConfigs(), true);
>>         List<String> liveInstances = dataAccessor.getChildNames(dataAccessor.keyBuilder().liveInstances());
>>         List<IdealState> idealStates = dataAccessor.getChildValues(dataAccessor.keyBuilder().idealStates(), true);
>>         List<ResourceConfig> resourceConfigs = dataAccessor.getChildValues(dataAccessor.keyBuilder().resourceConfigs(), true);
>>
>>         // Do we need add this?
>>         idealStates.add(newIS);
>>         resourceConfigs.add(newResourceConfig);
>>
>>         // Verify that utilResult contains the assignment for the resources added
>>         Map<String, ResourceAssignment> utilResult = HelixUtil
>>                 .getTargetAssignmentForWagedFullAuto(ZK_ADDRESS, clusterConfig, instanceConfigs,
>>                         liveInstances, idealStates, resourceConfigs);
>>
>>     } catch (HelixException e) {
>>         // Getting "getIdealAssignmentForWagedFullAuto(): Calculation failed: Failed to compute BestPossibleState!"
>>         // means not enough capacity?
>>     }
>>
>> [1] https://github.com/apache/helix/pull/1701
>>
>> Thank you,
>> Grainier Perera.
>>
>
>
> --
> Junkai Xue
>

Re: Validating capacity availability of instances

Posted by Junkai Xue <ju...@gmail.com>.

If you did not turn on the rackware, it shall be a capacity problem. You
can file an issue in Apache Helix github to improve the exception or logs.

Best,

Junkai

On Fri, Jun 24, 2022 at 2:10 AM Grainier Perera <gr...@apache.org> wrote:

> Hi Devs,
>
> Is there a way to validate the capacity availability of cluster instances
> when adding a resource and rebalancing it with WAGED? Because the resource
> addition process seems to happen in an event pipeline. So, when it
> encounters the "FAILED_TO_CALCULATE" exception, it doesn't seem to
> propagate to the place where we add the resource. Therefore, it seems
> tricky to validate capacity availability beforehand.
>
> While looking for this I found [1]. But I couldn't clearly understand the
> usage of the "WAGED simulation API" mentioned there. So, here's what I've
> tried;
>
> So the questions are:
> - Is it correct?
> - If so, is encountering a ""getIdealAssignmentForWagedFullAuto():
> Calculation failed: Failed to compute BestPossibleState!"" can be
> considered "FAILED_TO_CALCULATE"?
> - If so, is there a way to get the proper reason for the failure. Like "Unable
> to find any available candidate node for partition resource4_0; Fail
> reasons: {resource4-resource4_0-ONLINE={c8cep_on_localhost_12002=[Node has
> insufficient capacity]..."
> - Or, is there a better way of doing this?
>
>     try {
>         IdealState newIS = getIdealState(resourceName);
>         ResourceConfig newResourceConfig = new ResourceConfig(resourceName);
>         // Set PARTITION_CAPACITY_MAP
>         Map<String, String> capacityDataMap = ImmutableMap.of("CPU", "20", "MEMORY", "60");
>         newResourceConfig.getRecord().setMapField(ResourceConfig.ResourceConfigProperty.PARTITION_CAPACITY_MAP.name(),
>                 Collections.singletonMap(ResourceConfig.DEFAULT_PARTITION_KEY, OBJECT_MAPPER.writeValueAsString(capacityDataMap)));
>
>         // Read existing cluster/instances/resources info
>         final ZKHelixDataAccessor dataAccessor = new ZKHelixDataAccessor(CLUSTER_NAME,
>                 new ZkBaseDataAccessor.Builder<ZNRecord>().setZkAddress(ZK_ADDRESS).build());
>         ClusterConfig clusterConfig = dataAccessor.getProperty(dataAccessor.keyBuilder().clusterConfig());
>         List<InstanceConfig> instanceConfigs = dataAccessor.getChildValues(dataAccessor.keyBuilder().instanceConfigs(), true);
>         List<String> liveInstances = dataAccessor.getChildNames(dataAccessor.keyBuilder().liveInstances());
>         List<IdealState> idealStates = dataAccessor.getChildValues(dataAccessor.keyBuilder().idealStates(), true);
>         List<ResourceConfig> resourceConfigs = dataAccessor.getChildValues(dataAccessor.keyBuilder().resourceConfigs(), true);
>
>         // Do we need add this?
>         idealStates.add(newIS);
>         resourceConfigs.add(newResourceConfig);
>
>         // Verify that utilResult contains the assignment for the resources added
>         Map<String, ResourceAssignment> utilResult = HelixUtil
>                 .getTargetAssignmentForWagedFullAuto(ZK_ADDRESS, clusterConfig, instanceConfigs,
>                         liveInstances, idealStates, resourceConfigs);
>
>     } catch (HelixException e) {
>         // Getting "getIdealAssignmentForWagedFullAuto(): Calculation failed: Failed to compute BestPossibleState!"
>         // means not enough capacity?
>     }
>
> [1] https://github.com/apache/helix/pull/1701
>
> Thank you,
> Grainier Perera.
>


-- 
Junkai Xue