You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@helix.apache.org by Grainier Perera <gr...@apache.org> on 2022/06/24 09:09:36 UTC
Validating capacity availability of instances
Hi Devs,
Is there a way to validate the capacity availability of cluster instances
when adding a resource and rebalancing it with WAGED? Because the resource
addition process seems to happen in an event pipeline. So, when it
encounters the "FAILED_TO_CALCULATE" exception, it doesn't seem to
propagate to the place where we add the resource. Therefore, it seems
tricky to validate capacity availability beforehand.
While looking for this I found [1]. But I couldn't clearly understand the
usage of the "WAGED simulation API" mentioned there. So, here's what I've
tried;
So the questions are:
- Is it correct?
- If so, is encountering a ""getIdealAssignmentForWagedFullAuto():
Calculation failed: Failed to compute BestPossibleState!"" can be
considered "FAILED_TO_CALCULATE"?
- If so, is there a way to get the proper reason for the failure. Like "Unable
to find any available candidate node for partition resource4_0; Fail
reasons: {resource4-resource4_0-ONLINE={c8cep_on_localhost_12002=[Node has
insufficient capacity]..."
- Or, is there a better way of doing this?
try {
IdealState newIS = getIdealState(resourceName);
ResourceConfig newResourceConfig = new ResourceConfig(resourceName);
// Set PARTITION_CAPACITY_MAP
Map<String, String> capacityDataMap = ImmutableMap.of("CPU",
"20", "MEMORY", "60");
newResourceConfig.getRecord().setMapField(ResourceConfig.ResourceConfigProperty.PARTITION_CAPACITY_MAP.name(),
Collections.singletonMap(ResourceConfig.DEFAULT_PARTITION_KEY,
OBJECT_MAPPER.writeValueAsString(capacityDataMap)));
// Read existing cluster/instances/resources info
final ZKHelixDataAccessor dataAccessor = new
ZKHelixDataAccessor(CLUSTER_NAME,
new
ZkBaseDataAccessor.Builder<ZNRecord>().setZkAddress(ZK_ADDRESS).build());
ClusterConfig clusterConfig =
dataAccessor.getProperty(dataAccessor.keyBuilder().clusterConfig());
List<InstanceConfig> instanceConfigs =
dataAccessor.getChildValues(dataAccessor.keyBuilder().instanceConfigs(),
true);
List<String> liveInstances =
dataAccessor.getChildNames(dataAccessor.keyBuilder().liveInstances());
List<IdealState> idealStates =
dataAccessor.getChildValues(dataAccessor.keyBuilder().idealStates(),
true);
List<ResourceConfig> resourceConfigs =
dataAccessor.getChildValues(dataAccessor.keyBuilder().resourceConfigs(),
true);
// Do we need add this?
idealStates.add(newIS);
resourceConfigs.add(newResourceConfig);
// Verify that utilResult contains the assignment for the
resources added
Map<String, ResourceAssignment> utilResult = HelixUtil
.getTargetAssignmentForWagedFullAuto(ZK_ADDRESS,
clusterConfig, instanceConfigs,
liveInstances, idealStates, resourceConfigs);
} catch (HelixException e) {
// Getting "getIdealAssignmentForWagedFullAuto(): Calculation
failed: Failed to compute BestPossibleState!"
// means not enough capacity?
}
[1] https://github.com/apache/helix/pull/1701
Thank you,
Grainier Perera.
Re: Validating capacity availability of instances
Posted by Grainier Perera <gr...@apache.org>.
Hi Junkai,
Thank you for the explanation. I'll create an issue to improve the
exception message.
Thanks,
Grainier Perera.
On Sat, 25 Jun 2022 at 09:38, Junkai Xue <ju...@gmail.com> wrote:
> If you did not turn on the rackware, it shall be a capacity problem. You
> can file an issue in Apache Helix github to improve the exception or logs.
>
> Best,
>
> Junkai
>
> On Fri, Jun 24, 2022 at 2:10 AM Grainier Perera <gr...@apache.org>
> wrote:
>
>> Hi Devs,
>>
>> Is there a way to validate the capacity availability of cluster instances
>> when adding a resource and rebalancing it with WAGED? Because the resource
>> addition process seems to happen in an event pipeline. So, when it
>> encounters the "FAILED_TO_CALCULATE" exception, it doesn't seem to
>> propagate to the place where we add the resource. Therefore, it seems
>> tricky to validate capacity availability beforehand.
>>
>> While looking for this I found [1]. But I couldn't clearly understand the
>> usage of the "WAGED simulation API" mentioned there. So, here's what I've
>> tried;
>>
>> So the questions are:
>> - Is it correct?
>> - If so, is encountering a ""getIdealAssignmentForWagedFullAuto():
>> Calculation failed: Failed to compute BestPossibleState!"" can be
>> considered "FAILED_TO_CALCULATE"?
>> - If so, is there a way to get the proper reason for the failure. Like "Unable
>> to find any available candidate node for partition resource4_0; Fail
>> reasons: {resource4-resource4_0-ONLINE={c8cep_on_localhost_12002=[Node has
>> insufficient capacity]..."
>> - Or, is there a better way of doing this?
>>
>> try {
>> IdealState newIS = getIdealState(resourceName);
>> ResourceConfig newResourceConfig = new ResourceConfig(resourceName);
>> // Set PARTITION_CAPACITY_MAP
>> Map<String, String> capacityDataMap = ImmutableMap.of("CPU", "20", "MEMORY", "60");
>> newResourceConfig.getRecord().setMapField(ResourceConfig.ResourceConfigProperty.PARTITION_CAPACITY_MAP.name(),
>> Collections.singletonMap(ResourceConfig.DEFAULT_PARTITION_KEY, OBJECT_MAPPER.writeValueAsString(capacityDataMap)));
>>
>> // Read existing cluster/instances/resources info
>> final ZKHelixDataAccessor dataAccessor = new ZKHelixDataAccessor(CLUSTER_NAME,
>> new ZkBaseDataAccessor.Builder<ZNRecord>().setZkAddress(ZK_ADDRESS).build());
>> ClusterConfig clusterConfig = dataAccessor.getProperty(dataAccessor.keyBuilder().clusterConfig());
>> List<InstanceConfig> instanceConfigs = dataAccessor.getChildValues(dataAccessor.keyBuilder().instanceConfigs(), true);
>> List<String> liveInstances = dataAccessor.getChildNames(dataAccessor.keyBuilder().liveInstances());
>> List<IdealState> idealStates = dataAccessor.getChildValues(dataAccessor.keyBuilder().idealStates(), true);
>> List<ResourceConfig> resourceConfigs = dataAccessor.getChildValues(dataAccessor.keyBuilder().resourceConfigs(), true);
>>
>> // Do we need add this?
>> idealStates.add(newIS);
>> resourceConfigs.add(newResourceConfig);
>>
>> // Verify that utilResult contains the assignment for the resources added
>> Map<String, ResourceAssignment> utilResult = HelixUtil
>> .getTargetAssignmentForWagedFullAuto(ZK_ADDRESS, clusterConfig, instanceConfigs,
>> liveInstances, idealStates, resourceConfigs);
>>
>> } catch (HelixException e) {
>> // Getting "getIdealAssignmentForWagedFullAuto(): Calculation failed: Failed to compute BestPossibleState!"
>> // means not enough capacity?
>> }
>>
>> [1] https://github.com/apache/helix/pull/1701
>>
>> Thank you,
>> Grainier Perera.
>>
>
>
> --
> Junkai Xue
>
Re: Validating capacity availability of instances
Posted by Junkai Xue <ju...@gmail.com>.
If you did not turn on the rackware, it shall be a capacity problem. You
can file an issue in Apache Helix github to improve the exception or logs.
Best,
Junkai
On Fri, Jun 24, 2022 at 2:10 AM Grainier Perera <gr...@apache.org> wrote:
> Hi Devs,
>
> Is there a way to validate the capacity availability of cluster instances
> when adding a resource and rebalancing it with WAGED? Because the resource
> addition process seems to happen in an event pipeline. So, when it
> encounters the "FAILED_TO_CALCULATE" exception, it doesn't seem to
> propagate to the place where we add the resource. Therefore, it seems
> tricky to validate capacity availability beforehand.
>
> While looking for this I found [1]. But I couldn't clearly understand the
> usage of the "WAGED simulation API" mentioned there. So, here's what I've
> tried;
>
> So the questions are:
> - Is it correct?
> - If so, is encountering a ""getIdealAssignmentForWagedFullAuto():
> Calculation failed: Failed to compute BestPossibleState!"" can be
> considered "FAILED_TO_CALCULATE"?
> - If so, is there a way to get the proper reason for the failure. Like "Unable
> to find any available candidate node for partition resource4_0; Fail
> reasons: {resource4-resource4_0-ONLINE={c8cep_on_localhost_12002=[Node has
> insufficient capacity]..."
> - Or, is there a better way of doing this?
>
> try {
> IdealState newIS = getIdealState(resourceName);
> ResourceConfig newResourceConfig = new ResourceConfig(resourceName);
> // Set PARTITION_CAPACITY_MAP
> Map<String, String> capacityDataMap = ImmutableMap.of("CPU", "20", "MEMORY", "60");
> newResourceConfig.getRecord().setMapField(ResourceConfig.ResourceConfigProperty.PARTITION_CAPACITY_MAP.name(),
> Collections.singletonMap(ResourceConfig.DEFAULT_PARTITION_KEY, OBJECT_MAPPER.writeValueAsString(capacityDataMap)));
>
> // Read existing cluster/instances/resources info
> final ZKHelixDataAccessor dataAccessor = new ZKHelixDataAccessor(CLUSTER_NAME,
> new ZkBaseDataAccessor.Builder<ZNRecord>().setZkAddress(ZK_ADDRESS).build());
> ClusterConfig clusterConfig = dataAccessor.getProperty(dataAccessor.keyBuilder().clusterConfig());
> List<InstanceConfig> instanceConfigs = dataAccessor.getChildValues(dataAccessor.keyBuilder().instanceConfigs(), true);
> List<String> liveInstances = dataAccessor.getChildNames(dataAccessor.keyBuilder().liveInstances());
> List<IdealState> idealStates = dataAccessor.getChildValues(dataAccessor.keyBuilder().idealStates(), true);
> List<ResourceConfig> resourceConfigs = dataAccessor.getChildValues(dataAccessor.keyBuilder().resourceConfigs(), true);
>
> // Do we need add this?
> idealStates.add(newIS);
> resourceConfigs.add(newResourceConfig);
>
> // Verify that utilResult contains the assignment for the resources added
> Map<String, ResourceAssignment> utilResult = HelixUtil
> .getTargetAssignmentForWagedFullAuto(ZK_ADDRESS, clusterConfig, instanceConfigs,
> liveInstances, idealStates, resourceConfigs);
>
> } catch (HelixException e) {
> // Getting "getIdealAssignmentForWagedFullAuto(): Calculation failed: Failed to compute BestPossibleState!"
> // means not enough capacity?
> }
>
> [1] https://github.com/apache/helix/pull/1701
>
> Thank you,
> Grainier Perera.
>
--
Junkai Xue