You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brooklyn.apache.org by aledsage <gi...@git.apache.org> on 2016/11/20 22:07:24 UTC

[GitHub] brooklyn-server pull request #448: BROOKLYN-394: increase jclouds retry/back...

GitHub user aledsage opened a pull request:

    https://github.com/apache/brooklyn-server/pull/448

    BROOKLYN-394: increase jclouds retry/backoff time

    Question: Is 500ms and 6 retries a sensible level? It feels to me like a large backoff is good for API calls to a cloud. I can see this might slow things down in some situations (e.g. when it was a transient connectivity problem), but that still seems unlikely to happen often. In all the important cases I can think of, a larger backoff + retry time seems desirable.
    
    When running the `testCreateMany` to provision 20 VMs concurrently in AWS, I managed to cause rate-limiting when calling `RunInstances`, getting back `503 Service Unavailable` for 6 of the 20 VMs:
    
    ```
    grep -E "JavaUrlHttpCommandExecutorService.*Receiving.* 503 Service Unavailable" brooklyn.debug.log 
    2016-11-20 21:41:07,014 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-7]: Receiving response 305126632: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:07,027 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-17]: Receiving response -202425525: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:07,181 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-20]: Receiving response 1461817670: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:07,902 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-7]: Receiving response -412329992: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:07,951 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-17]: Receiving response -2106831550: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:08,094 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-20]: Receiving response -1404718861: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:08,189 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-13]: Receiving response -1425449702: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:09,141 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-13]: Receiving response -1388229651: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:09,575 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-11]: Receiving response 1776862310: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:11,419 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-15]: Receiving response 1334001839: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:11,695 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-13]: Receiving response 1602574625: HTTP/1.1 503 Service Unavailable
    ```
    
    Here's the output for one of them:
    ```
    016-11-20 21:41:07,774 DEBUG o.j.r.i.InvokeHttpMethod [pool-3-thread-13]: >> invoking RunInstances
    2016-11-20 21:41:08,189 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-13]: Receiving response -1425449702: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:08,191 DEBUG o.j.a.h.AWSServerErrorRetryHandler [pool-3-thread-13]: Retry 1/6: delaying for 541 ms: server error: [method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract org.jclouds.ec2.domain.Reservation org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1, null, ami-7d7bfc14, 1, 1, [Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
    2016-11-20 21:41:09,141 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-13]: Receiving response -1388229651: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:09,143 DEBUG o.j.a.h.AWSServerErrorRetryHandler [pool-3-thread-13]: Retry 2/6: delaying for 2143 ms: server error: [method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract org.jclouds.ec2.domain.Reservation org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1, null, ami-7d7bfc14, 1, 1, [Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
    2016-11-20 21:41:11,695 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-13]: Receiving response 1602574625: HTTP/1.1 503 Service Unavailable
    2016-11-20 21:41:11,697 DEBUG o.j.a.h.AWSServerErrorRetryHandler [pool-3-thread-13]: Retry 3/6: delaying for 4681 ms: server error: [method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract org.jclouds.ec2.domain.Reservation org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1, null, ami-7d7bfc14, 1, 1, [Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
    2016-11-20 21:41:17,536 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [pool-3-thread-13]: Receiving response 1803030217: HTTP/1.1 200 OK
    ```
    
    Note that it didn't succeed until we'd backed off multiple times for some of the `RunInstances` calls, with it taking a 4.7 second backoff above before it worked on the 4th attempt. I therefore suspect it was actually making things *worse* when we retried after 50ms, 100ms, 200ms, 400ms and 800ms (e.g. causing concurrent calls from other threads to be a lot more likely to fail, and not succeeding in any of the 5 retries).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aledsage/brooklyn-server BROOKLYN-394-retry-backoff-time

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/brooklyn-server/pull/448.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #448
    
----
commit 18cdc98d36f74da10d8987382dba77994de3b75d
Author: Aled Sage <al...@gmail.com>
Date:   2016-11-20T21:52:51Z

    BROOKLYN-394: increase jclouds retry/backoff time

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] brooklyn-server pull request #448: BROOKLYN-394: increase jclouds retry/back...

Posted by aledsage <gi...@git.apache.org>.
Github user aledsage commented on a diff in the pull request:

    https://github.com/apache/brooklyn-server/pull/448#discussion_r88812917
  
    --- Diff: locations/jclouds/src/test/java/org/apache/brooklyn/location/jclouds/JcloudsRateLimitedRetryLiveTest.java ---
    @@ -0,0 +1,131 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.brooklyn.location.jclouds;
    +
    +import java.util.List;
    +import java.util.concurrent.Executors;
    +
    +import org.apache.brooklyn.util.collections.MutableMap;
    +import org.apache.brooklyn.util.exceptions.Exceptions;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +import org.testng.annotations.AfterMethod;
    +import org.testng.annotations.BeforeMethod;
    +import org.testng.annotations.Test;
    +
    +import com.google.common.collect.Lists;
    +import com.google.common.util.concurrent.Futures;
    +import com.google.common.util.concurrent.ListenableFuture;
    +import com.google.common.util.concurrent.ListeningExecutorService;
    +import com.google.common.util.concurrent.MoreExecutors;
    +
    +/**
    + * Tests provisioning machines, where it causes a lot of activity (in an effort to be 
    + * rate-limited!). We expect the retry to do suitable exponential backoff that the retries
    + * eventually succeed, provisioning all the machines without error.
    + */
    +public class JcloudsRateLimitedRetryLiveTest extends AbstractJcloudsLiveTest {
    +
    +    private static final Logger LOG = LoggerFactory.getLogger(JcloudsRateLimitedRetryLiveTest.class);
    +
    +    public static final String LOCATION_SPEC = "jclouds:" + AWS_EC2_PROVIDER + ":" + AWS_EC2_USEAST_REGION_NAME;
    +    
    +    // Image: {id=us-east-1/ami-7d7bfc14, providerId=ami-7d7bfc14, name=RightImage_CentOS_6.3_x64_v5.8.8.5, location={scope=REGION, id=us-east-1, description=us-east-1, parent=aws-ec2, iso3166Codes=[US-VA]}, os={family=centos, arch=paravirtual, version=6.0, description=rightscale-us-east/RightImage_CentOS_6.3_x64_v5.8.8.5.manifest.xml, is64Bit=true}, description=rightscale-us-east/RightImage_CentOS_6.3_x64_v5.8.8.5.manifest.xml, version=5.8.8.5, status=AVAILABLE[available], loginUser=root, userMetadata={owner=411009282317, rootDeviceType=instance-store, virtualizationType=paravirtual, hypervisor=xen}}
    +    public static final String AWS_EC2_CENTOS_IMAGE_ID = "us-east-1/ami-7d7bfc14";
    +    
    +    protected ListeningExecutorService executor;
    +    
    +    @BeforeMethod(alwaysRun=true)
    +    @Override
    +    public void setUp() throws Exception {
    +        super.setUp();
    +        executor = MoreExecutors.listeningDecorator(Executors.newCachedThreadPool());
    +    }
    +    
    +    @AfterMethod(alwaysRun=true)
    +    @Override
    +    public void tearDown() throws Exception {
    +        try {
    +            super.tearDown();
    +        } finally {
    +            if (executor != null) executor.shutdownNow();
    +        }
    +    }
    +    
    +    @Test(groups = {"Live", "Acceptance"})
    +    public void testCreateOne() throws Exception {
    +        doMany(1);
    +    }
    +    
    +    @Test(groups = {"Live", "Acceptance"})
    --- End diff --
    
    I marked this as "Acceptance" as well because I really don't think we should be running tests to induce rate-limiting very often, and should not be provisioning 20 VMs in a single test often either.
    
    We should revisit our testng groups/profiles to make these more useful longer term.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] brooklyn-server pull request #448: BROOKLYN-394: increase jclouds retry/back...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/brooklyn-server/pull/448


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] brooklyn-server issue #448: BROOKLYN-394: increase jclouds retry/backoff tim...

Posted by aledsage <gi...@git.apache.org>.
Github user aledsage commented on the issue:

    https://github.com/apache/brooklyn-server/pull/448
  
    Thanks @Graeme-Miller - merging now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] brooklyn-server issue #448: BROOKLYN-394: increase jclouds retry/backoff tim...

Posted by neykov <gi...@git.apache.org>.
Github user neykov commented on the issue:

    https://github.com/apache/brooklyn-server/pull/448
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---