You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@jclouds.apache.org by "Ladislav Thon (JIRA)" <ji...@apache.org> on 2016/03/15 13:33:33 UTC

[jira] [Created] (JCLOUDS-1092) Azure: ComputeService.resumeNode spins in a timeout loop that doesn't have a chance to exit early

Ladislav Thon created JCLOUDS-1092:
--------------------------------------

             Summary: Azure: ComputeService.resumeNode spins in a timeout loop that doesn't have a chance to exit early
                 Key: JCLOUDS-1092
                 URL: https://issues.apache.org/jira/browse/JCLOUDS-1092
             Project: jclouds
          Issue Type: Bug
          Components: jclouds-labs
    Affects Versions: 1.9.2
            Reporter: Ladislav Thon


This is going to be a slightly longer text, so please bear with me.

Invoking {{ComputeService.resumeNode}} with the Azure provider goes through these layers:

- {{BaseComputeService.resumeNode}}
- {{AdaptingComputeServiceStrategies.resumeNode}}
- {{AzureComputeServiceAdapter.resumeNode}}

The problem manifests when traversing the callstack back up, so let's assume we got down to {{AzureComputeServiceAdapter.resumeNode}}. Also, the problem only appears for us when calling {{suspendNode}} and then {{resumeNode}} in rapid succession, but that's out of JClouds's control.

When the {{trackRequest}} method returns (https://github.com/jclouds/jclouds-labs/blob/fe24698d81/azurecompute/src/main/java/org/jclouds/azurecompute/compute/AzureComputeServiceAdapter.java#L383), it means that the asynchronous operation "start node" succeeded -- but that doesn't mean that the node is already running. In fact, it's only just starting -- I was able to confirm that in the debugger by calling {{api.getDeploymentApiForService(id).get(id)}} and inspecting the {{roleInstanceList}}.

When we get one layer back up, the {{AdaptingComputeServiceStrategies.resumeNode}} method calls {{getNode}} (see https://github.com/jclouds/jclouds/blob/b9322c583d/compute/src/main/java/org/jclouds/compute/strategy/impl/AdaptingComputeServiceStrategies.java#L164), which delegates to {{AzureComputeServiceAdapter.getNode}}.

{{AzureComputeServiceAdapter.getNode}} only returns non-{{null}} value when all of the deployment's role instances are in a settled state (non-transient), see https://github.com/jclouds/jclouds-labs/blob/fe24698d81/azurecompute/src/main/java/org/jclouds/azurecompute/compute/AzureComputeServiceAdapter.java#L269 So when the node is only just starting, {{AzureComputeServiceAdapter.getNode}} will return {{null}}.

Again one layer back up: {{AdaptingComputeServiceStrategies.getNode}} returns {{null}} and hence {{AdaptingComputeServiceStrategies.resumeNode}} also returns {{null}}.

One more layer back up: {{BaseComputeService.resumeNode}} will call the {{nodeRunning}} predicate with an {{AtomicReference}} of {{null}}, see https://github.com/jclouds/jclouds/blob/b9322c583d/compute/src/main/java/org/jclouds/compute/internal/BaseComputeService.java#L470

The predicate is a {{ComputeServiceTimeoutsModule.RetryablePredicateGuardingNull}} which delegates to {{Predicates2.RetryablePredicate}} and through that to {{AtomicNodeRunning}}. That is a subclass of {{RefreshAndDoubleCheckOnFailUnlessStatusInvalid}}, which will always return {{false}} when the resource is {{null}}, see https://github.com/jclouds/jclouds/blob/b9322c583d/compute/src/main/java/org/jclouds/compute/predicates/internal/RefreshAndDoubleCheckOnFailUnlessStatusInvalid.java#L63 There's also some kind of status refreshing, but that will never happen if the resource (node, in this case) is {{null}} (there's nothing to refresh).

All in all, the {{Predicates2.RetryablePredicate}} will spin on and on, until it times out, because for {{null}}, there's no chance it will exit early.

After the timeout, {{BaseComputeService.resumeNode}} prints that resuming node was not successful and returns. The problems are:

- the retrying predicate is spinning uselessly
- we have actually no idea about the status of the node when {{resumeNode}} returns



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)