You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brooklyn.apache.org by "Aled Sage (JIRA)" <ji...@apache.org> on 2017/09/18 10:14:00 UTC

[jira] [Commented] (BROOKLYN-533) AWS VM deletion failed with "Request limit exceeded"

    [ https://issues.apache.org/jira/browse/BROOKLYN-533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169836#comment-16169836 ] 

Aled Sage commented on BROOKLYN-533:
------------------------------------

Actually, that {{503 Service Unavailable}} response was a bad example! Looking at what the thread did, it sent a {{TerminateInstances}} and got back a 200, then started polling for {{DescribeInstances}}. The first five polls got back 200, but then it got a 503. It did an exponential backoff, retrying a total of 6 times over the next 50'ish seconds.

It then gave up, logging:
{noformat}
2017-09-15T17:34:09,965 ERROR 107 o.j.a.h.AWSServerErrorRetryHandler [r-VlI23lev-80548] Cannot retry after server error, command has exceeded retry limit 6: [method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract java.util.Set org.jclouds.aws.ec2.features.AWSInstanceApi.describeInstancesInRegion(java.lang.String,java.lang.String[])[eu-west-1, [Ljava.lang.String;@3c2a630], request=POST https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1]
2017-09-15T17:34:09,965 ERROR 127 o.a.b.l.j.JcloudsLocation [r-VlI23lev-80548] Problem releasing machine SshMachineLocation[34.252.178.167:aled@ec2-34-252-178-167.eu-west-1.compute.amazonaws.com/34.252.178.167:22(id=taep3uro9m)] in JcloudsLocation[AWS Dublin:AKIAIAGLWQ53TMPA5SDQ@a6b5yx6u15], instance id eu-west-1/i-0663997ccc85af459; ignoring and continuing, will throw subsequently: org.jclouds.aws.AWSResponseException: request POST https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1 failed with code 503, error: AWSError{requestId='48ed534d-a788-43e2-aa97-2fce47716db2', requestToken='null', code='RequestLimitExceeded', message='Request limit exceeded.', context='{Response=, Errors=}'}
{noformat}

---
However, in another thread I see the real problem.

Normally we call {{DescribeInstances}} once, and then we call {{TerminateInstances}}. But for some threads, the very first call to {{DescribeInstances}} failed with a 503 (including the 6 retries also failing). This meant it did not go on to make the {{TerminateInstances}} call.

> AWS VM deletion failed with "Request limit exceeded"
> ----------------------------------------------------
>
>                 Key: BROOKLYN-533
>                 URL: https://issues.apache.org/jira/browse/BROOKLYN-533
>             Project: Brooklyn
>          Issue Type: Bug
>    Affects Versions: 0.11.0
>            Reporter: Aled Sage
>
> I deployed an app with approx 100 VMs in AWS.
> I then stopped my app, thus terminating all the VMs. However, some requests failed with response {{503}}, {{RequestLimitExceeded}}. Those VMs were left running.
> My opinion is that jclouds should have done an exponential backoff, to retry the instance deletion.
> The propagated exception is shown below:
> {noformat}
> 2017-09-15T17:34:09,965 ERROR 127 o.a.b.l.j.JcloudsLocation [r-VlI23lev-80548] Problem releasing machine SshMachineLocation[34.252.178.167:aled@ec2-34-252-178-167.eu-west-1.compute.amazonaws.com/34.252.178.167:22(id=taep3uro9m)] in JcloudsLocation[AWS Dublin:xxxxxxxx@xxxxxxxx], instance id eu-west-1/i-0663997ccc85af459; ignoring and continuing, will throw subsequently: org.jclouds.aws.AWSResponseException: request POST https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1 failed with code 503, error: AWSError{requestId='48ed534d-a788-43e2-aa97-2fce47716db2', requestToken='null', code='RequestLimitExceeded', message='Request limit exceeded.', context='{Response=, Errors=}'}
> org.jclouds.aws.AWSResponseException: request POST https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1 failed with code 503, error: AWSError{requestId='48ed534d-a788-43e2-aa97-2fce47716db2', requestToken='null', code='RequestLimitExceeded', message='Request limit exceeded.', context='{Response=, Errors=}'}
>         at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:75) [259:sts:2.0.2]
>         at org.jclouds.http.handlers.DelegatingErrorHandler.handleError(DelegatingErrorHandler.java:67) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.http.internal.BaseHttpCommandExecutorService.shouldContinue(BaseHttpCommandExecutorService.java:140) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.http.internal.BaseHttpCommandExecutorService.invoke(BaseHttpCommandExecutorService.java:109) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:90) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:73) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:44) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.reflect.FunctionalReflection$FunctionalInvocationHandler.handleInvocation(FunctionalReflection.java:117) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87) [66:com.google.guava:18.0.0]
>         at com.sun.proxy.$Proxy179.describeInstancesInRegion(Unknown Source) [47:aws-ec2:2.0.2]
>         at org.jclouds.ec2.compute.strategy.EC2GetNodeMetadataStrategy.getRunningInstanceInRegion(EC2GetNodeMetadataStrategy.java:64) [77:ec2:2.0.2]
>         at org.jclouds.aws.ec2.compute.strategy.AWSEC2GetNodeMetadataStrategy.getRunningInstanceInRegion(AWSEC2GetNodeMetadataStrategy.java:52) [47:aws-ec2:2.0.2]
>         at org.jclouds.ec2.compute.strategy.EC2GetNodeMetadataStrategy.getNode(EC2GetNodeMetadataStrategy.java:56) [77:ec2:2.0.2]
>         at org.jclouds.compute.predicates.AtomicNodeTerminated.refreshOrNull(AtomicNodeTerminated.java:42) [100:jclouds-compute:2.0.2]
>         at org.jclouds.compute.predicates.AtomicNodeTerminated.refreshOrNull(AtomicNodeTerminated.java:28) [100:jclouds-compute:2.0.2]
>         at org.jclouds.compute.predicates.internal.TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.apply(TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.java:46) [100:jclouds-compute:2.0.2]
>         at org.jclouds.compute.predicates.internal.TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.apply(TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.java:31) [100:jclouds-compute:2.0.2]
>         at org.jclouds.util.Predicates2$RetryablePredicate.apply(Predicates2.java:117) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.compute.internal.BaseComputeService.doDestroyNode(BaseComputeService.java:309) [100:jclouds-compute:2.0.2]
>         at org.jclouds.compute.internal.BaseComputeService.destroyNode(BaseComputeService.java:250) [100:jclouds-compute:2.0.2]
>         at org.apache.brooklyn.location.jclouds.JcloudsLocation.releaseNode(JcloudsLocation.java:2189) [127:org.apache.brooklyn.locations-jclouds:0.12.0.SNAPSHOT]
>         at org.apache.brooklyn.location.jclouds.JcloudsLocation.release(JcloudsLocation.java:2141) [127:org.apache.brooklyn.locations-jclouds:0.12.0.SNAPSHOT]
>         at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.stopAnyProvisionedMachines(MachineLifecycleEffectorTasks.java:1033) [131:org.apache.brooklyn.software-base:0.12.0.SNAPSHOT]
>         at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopAnyProvisionedMachinesTask.call(MachineLifecycleEffectorTasks.java:883) [131:org.apache.brooklyn.software-base:0.12.0.SNAPSHOT]
>         at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopAnyProvisionedMachinesTask.call(MachineLifecycleEffectorTasks.java:880) [131:org.apache.brooklyn.software-base:0.12.0.SNAPSHOT]
>         at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:363) [122:org.apache.brooklyn.core:0.12.0.SNAPSHOT]
>         at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:529) [122:org.apache.brooklyn.core:0.12.0.SNAPSHOT]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:?]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:?]
>         at java.lang.Thread.run(Thread.java:748) [?:?]
> {noformat}
> ---
> Searching the log, I also saw the error response shown below:
> {noformat}
> 2017-09-12T20:02:08,902 ERROR 127 o.a.b.l.j.JcloudsLocation [er-yR7aVpU0-1224] Problem releasing machine SshMachineLocation[52.211.119.194:amp@ec2-52-211-119-194.eu-west-1.compute.amazonaws.com/52.211.119.194:22(id=usug6ourid)] in JcloudsLocation[AWS Dublin:xxxxxxxx@xxxxxxxx], instance id eu-west-1/i-074d162dceaf06b4f; ignoring and continuing, will throw subsequently: org.jclouds.aws.AWSResponseException: request POST https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1 failed with code 503, error: AWSError{requestId='7a0838cd-cbdc-49e1-95aa-6d3794b15839', requestToken='null', code='Unavailable', message='The service is unavailable. Please try again shortly.', context='{Response=, Errors=}'}
> org.jclouds.aws.AWSResponseException: request POST https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1 failed with code 503, error: AWSError{requestId='7a0838cd-cbdc-49e1-95aa-6d3794b15839', requestToken='null', code='Unavailable', message='The service is unavailable. Please try again shortly.', context='{Response=, Errors=}'}
>         at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:75) [259:sts:2.0.2]
>         at org.jclouds.http.handlers.DelegatingErrorHandler.handleError(DelegatingErrorHandler.java:67) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.http.internal.BaseHttpCommandExecutorService.shouldContinue(BaseHttpCommandExecutorService.java:140) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.http.internal.BaseHttpCommandExecutorService.invoke(BaseHttpCommandExecutorService.java:109) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:90) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:73) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:44) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.reflect.FunctionalReflection$FunctionalInvocationHandler.handleInvocation(FunctionalReflection.java:117) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87) [66:com.google.guava:18.0.0]
>         at com.sun.proxy.$Proxy201.describeInstancesInRegion(Unknown Source) [47:aws-ec2:2.0.2]
>         at org.jclouds.ec2.compute.strategy.EC2GetNodeMetadataStrategy.getRunningInstanceInRegion(EC2GetNodeMetadataStrategy.java:64) [77:ec2:2.0.2]
>         at org.jclouds.aws.ec2.compute.strategy.AWSEC2GetNodeMetadataStrategy.getRunningInstanceInRegion(AWSEC2GetNodeMetadataStrategy.java:52) [47:aws-ec2:2.0.2]
>         at org.jclouds.ec2.compute.strategy.EC2GetNodeMetadataStrategy.getNode(EC2GetNodeMetadataStrategy.java:56) [77:ec2:2.0.2]
>         at org.jclouds.compute.predicates.AtomicNodeTerminated.refreshOrNull(AtomicNodeTerminated.java:42) [100:jclouds-compute:2.0.2]
>         at org.jclouds.compute.predicates.AtomicNodeTerminated.refreshOrNull(AtomicNodeTerminated.java:28) [100:jclouds-compute:2.0.2]
>         at org.jclouds.compute.predicates.internal.TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.apply(TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.java:46) [100:jclouds-compute:2.0.2]
>         at org.jclouds.compute.predicates.internal.TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.apply(TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.java:31) [100:jclouds-compute:2.0.2]
>         at org.jclouds.util.Predicates2$RetryablePredicate.apply(Predicates2.java:117) [101:jclouds-core:2.0.2.2-20170712_1657]
>         at org.jclouds.compute.internal.BaseComputeService.doDestroyNode(BaseComputeService.java:309) [100:jclouds-compute:2.0.2]
>         at org.jclouds.compute.internal.BaseComputeService.destroyNode(BaseComputeService.java:250) [100:jclouds-compute:2.0.2]
>         at org.apache.brooklyn.location.jclouds.JcloudsLocation.releaseNode(JcloudsLocation.java:2189) [127:org.apache.brooklyn.locations-jclouds:0.12.0.SNAPSHOT]
>         at org.apache.brooklyn.location.jclouds.JcloudsLocation.release(JcloudsLocation.java:2141) [127:org.apache.brooklyn.locations-jclouds:0.12.0.SNAPSHOT]
>         at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.stopAnyProvisionedMachines(MachineLifecycleEffectorTasks.java:1033) [131:org.apache.brooklyn.software-base:0.12.0.SNAPSHOT]
>         at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopAnyProvisionedMachinesTask.call(MachineLifecycleEffectorTasks.java:883) [131:org.apache.brooklyn.software-base:0.12.0.SNAPSHOT]
>         at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopAnyProvisionedMachinesTask.call(MachineLifecycleEffectorTasks.java:880) [131:org.apache.brooklyn.software-base:0.12.0.SNAPSHOT]
>         at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:363) [122:org.apache.brooklyn.core:0.12.0.SNAPSHOT]
>         at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:529) [122:org.apache.brooklyn.core:0.12.0.SNAPSHOT]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:?]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:?]
>         at java.lang.Thread.run(Thread.java:748) [?:?]
> {noformat}
> What is the best thing for jclouds to do if it gets a response 503 {{RequestLimitExceeded}} or a {{code='Unavailable', message='The service is unavailable. Please try again shortly.'}}? Should it try again shortly (i.e. exponential backoff)? Or just propagate the exception? The first feels like a definite retry; the second probably retry as well, though it's unclear how long the service will be unavailable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)