You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brooklyn.apache.org by neykov <gi...@git.apache.org> on 2015/03/25 11:54:59 UTC

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

GitHub user neykov opened a pull request:

    https://github.com/apache/incubator-brooklyn/pull/571

    new policies: SshConnectionFailure, ConditionalSuspendPolicy

      * SshConnectionFailure emits CONNECTION_FAILURE if it can't make ssh connection to the machine of the entity
      * ConditionalSuspendPolicy suspends a target policy if it receives a sensor event (CONNECTION_FAILURE by default)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/neykov/incubator-brooklyn ssh-connection-failure

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-brooklyn/pull/571.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #571
    
----
commit e323d67fa5be48ce2a8055a03248a785b1665d07
Author: Svetoslav Neykov <sv...@cloudsoftcorp.com>
Date:   2015-03-24T13:17:10Z

    new policies: SshConnectionFailure, ConditionalSuspendPolicy
    
      * SshConnectionFailure emits CONNECTION_FAILURE if it can't make ssh connection to the machine of the entity
      * ConditionalSuspendPolicy suspends a target policy if it receives a sensor event (CONNECTION_FAILURE by default)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

Posted by aledsage <gi...@git.apache.org>.
Github user aledsage commented on the pull request:

    https://github.com/apache/incubator-brooklyn/pull/571#issuecomment-88638991
  
    Looks good (and agree looks very useful). Only some minor comments from me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

Posted by aledsage <gi...@git.apache.org>.
Github user aledsage commented on the pull request:

    https://github.com/apache/incubator-brooklyn/pull/571#issuecomment-92157994
  
    Looks good; merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

Posted by neykov <gi...@git.apache.org>.
Github user neykov commented on a diff in the pull request:

    https://github.com/apache/incubator-brooklyn/pull/571#discussion_r27870258
  
    --- Diff: policy/src/test/java/brooklyn/policy/ha/ServiceFailureDetectorTest.java ---
    @@ -312,6 +316,54 @@ public void testReportsFailureWhenAlreadyOnFireOnRegisteringPolicy() throws Exce
             assertHasEventEventually(HASensors.ENTITY_FAILED, Predicates.<Object>equalTo(e1), null);
         }
         
    +    @Test(groups="Integration") // Has a 1.5 second wait
    +    public void testRepublishedFailure() throws Exception {
    +        Duration republishPeriod = Duration.millis(100);
    +
    +        e1.addEnricher(EnricherSpec.create(ServiceFailureDetector.class)
    +                .configure(ServiceFailureDetector.ENTITY_FAILED_REPUBLISH_TIME, republishPeriod));
    +            
    +        // Set the entity to healthy
    +        e1.setAttribute(TestEntity.SERVICE_UP, true);
    +        ServiceStateLogic.setExpectedState(e1, Lifecycle.RUNNING);
    +        EntityTestUtils.assertAttributeEqualsEventually(e1, Attributes.SERVICE_STATE_ACTUAL, Lifecycle.RUNNING);
    +        
    +        // Make the entity fail;
    +        ServiceStateLogic.ServiceProblemsLogic.updateProblemsIndicator(e1, "test", "foo");
    +        EntityTestUtils.assertAttributeEqualsEventually(e1, TestEntity.SERVICE_STATE_ACTUAL, Lifecycle.ON_FIRE);
    +        assertHasEventEventually(HASensors.ENTITY_FAILED, Predicates.<Object>equalTo(e1), null);
    +
    +        TimeUnit.SECONDS.sleep(1);
    +        
    +        // Now recover
    +        ServiceStateLogic.ServiceProblemsLogic.clearProblemsIndicator(e1, "test");
    +        EntityTestUtils.assertAttributeEqualsEventually(e1, TestEntity.SERVICE_STATE_ACTUAL, Lifecycle.RUNNING);
    +        assertHasEventEventually(HASensors.ENTITY_RECOVERED, Predicates.<Object>equalTo(e1), null);
    +        
    +        TimeUnit.MILLISECONDS.sleep(500);
    +
    +        //can't assert number of republish events due to jitter, warn below if it deviates from expectation
    +        assertTrue(events.size() > 2, "events="+events);
    +
    +        SensorEvent<FailureDescriptor> prevEvent = null;
    +        for (SensorEvent<FailureDescriptor> event : events) {
    +            if (prevEvent != null) {
    +                long repeatOffset = event.getTimestamp() - prevEvent.getTimestamp();
    +                long deviation = Math.abs(repeatOffset - republishPeriod.toMilliseconds());
    +                if (deviation > republishPeriod.toMilliseconds()/10 &&
    +                        //warn only if recovered is too far away from the last failure
    +                        (!event.getSensor().equals(HASensors.ENTITY_RECOVERED) ||
    +                        repeatOffset > republishPeriod.toMilliseconds())) {
    +                    log.error("The time between failure republish (" + repeatOffset + "ms) deviates too much from the expected " + republishPeriod + ". prevEvent=" + prevEvent + ", event=" + event);
    --- End diff --
    
    Republish is already checked for above ` assertTrue(events.size() > 2, "events="+events);` - 1 event for initial FAIL, 1 event for RECOVERED, any additional events for republish. The updated code now waits for 10 events before continuing.
    
    The check here is to make sure we are really emitting republish FAILs at the requested frequency, but this is something that can't be asserted as it will vary even on idle systems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

Posted by aledsage <gi...@git.apache.org>.
Github user aledsage commented on a diff in the pull request:

    https://github.com/apache/incubator-brooklyn/pull/571#discussion_r27613206
  
    --- Diff: policy/src/main/java/brooklyn/policy/ha/AbstractFailureDetector.java ---
    @@ -0,0 +1,359 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package brooklyn.policy.ha;
    +
    +import static brooklyn.util.time.Time.makeTimeStringRounded;
    +
    +import java.util.concurrent.Callable;
    +import java.util.concurrent.TimeUnit;
    +import java.util.concurrent.atomic.AtomicBoolean;
    +import java.util.concurrent.atomic.AtomicReference;
    +
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import brooklyn.config.ConfigKey;
    +import brooklyn.entity.basic.BrooklynTaskTags;
    +import brooklyn.entity.basic.ConfigKeys;
    +import brooklyn.entity.basic.EntityInternal;
    +import brooklyn.entity.basic.EntityLocal;
    +import brooklyn.event.Sensor;
    +import brooklyn.management.Task;
    +import brooklyn.policy.basic.AbstractPolicy;
    +import brooklyn.policy.ha.HASensors.FailureDescriptor;
    +import brooklyn.util.collections.MutableMap;
    +import brooklyn.util.exceptions.Exceptions;
    +import brooklyn.util.flags.SetFromFlag;
    +import brooklyn.util.task.BasicTask;
    +import brooklyn.util.task.ScheduledTask;
    +import brooklyn.util.time.Duration;
    +import brooklyn.util.time.Time;
    +
    +import com.google.common.reflect.TypeToken;
    +
    +public abstract class AbstractFailureDetector extends AbstractPolicy {
    +
    +    // TODO Remove duplication from ServiceFailureDetector, particularly for the stabilisation delays.
    +
    +    private static final Logger LOG = LoggerFactory.getLogger(ConnectionFailureDetector.class);
    --- End diff --
    
    Wrong class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

Posted by neykov <gi...@git.apache.org>.
Github user neykov commented on the pull request:

    https://github.com/apache/incubator-brooklyn/pull/571#issuecomment-87015446
  
    Added option to `ServiceFailureDetector` to republish failed state periodically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

Posted by neykov <gi...@git.apache.org>.
Github user neykov commented on the pull request:

    https://github.com/apache/incubator-brooklyn/pull/571#issuecomment-90521451
  
    Addressed comments. Republish events are checked but the frequency between them can't be asserted, that's why I left it at `log.error`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

Posted by aledsage <gi...@git.apache.org>.
Github user aledsage commented on a diff in the pull request:

    https://github.com/apache/incubator-brooklyn/pull/571#discussion_r27614624
  
    --- Diff: policy/src/test/java/brooklyn/policy/ha/ServiceFailureDetectorTest.java ---
    @@ -312,6 +316,54 @@ public void testReportsFailureWhenAlreadyOnFireOnRegisteringPolicy() throws Exce
             assertHasEventEventually(HASensors.ENTITY_FAILED, Predicates.<Object>equalTo(e1), null);
         }
         
    +    @Test(groups="Integration") // Has a 1.5 second wait
    +    public void testRepublishedFailure() throws Exception {
    +        Duration republishPeriod = Duration.millis(100);
    +
    +        e1.addEnricher(EnricherSpec.create(ServiceFailureDetector.class)
    +                .configure(ServiceFailureDetector.ENTITY_FAILED_REPUBLISH_TIME, republishPeriod));
    +            
    +        // Set the entity to healthy
    +        e1.setAttribute(TestEntity.SERVICE_UP, true);
    +        ServiceStateLogic.setExpectedState(e1, Lifecycle.RUNNING);
    +        EntityTestUtils.assertAttributeEqualsEventually(e1, Attributes.SERVICE_STATE_ACTUAL, Lifecycle.RUNNING);
    +        
    +        // Make the entity fail;
    +        ServiceStateLogic.ServiceProblemsLogic.updateProblemsIndicator(e1, "test", "foo");
    +        EntityTestUtils.assertAttributeEqualsEventually(e1, TestEntity.SERVICE_STATE_ACTUAL, Lifecycle.ON_FIRE);
    +        assertHasEventEventually(HASensors.ENTITY_FAILED, Predicates.<Object>equalTo(e1), null);
    +
    +        TimeUnit.SECONDS.sleep(1);
    +        
    +        // Now recover
    +        ServiceStateLogic.ServiceProblemsLogic.clearProblemsIndicator(e1, "test");
    +        EntityTestUtils.assertAttributeEqualsEventually(e1, TestEntity.SERVICE_STATE_ACTUAL, Lifecycle.RUNNING);
    +        assertHasEventEventually(HASensors.ENTITY_RECOVERED, Predicates.<Object>equalTo(e1), null);
    +        
    +        TimeUnit.MILLISECONDS.sleep(500);
    +
    +        //can't assert number of republish events due to jitter, warn below if it deviates from expectation
    +        assertTrue(events.size() > 2, "events="+events);
    +
    +        SensorEvent<FailureDescriptor> prevEvent = null;
    +        for (SensorEvent<FailureDescriptor> event : events) {
    +            if (prevEvent != null) {
    +                long repeatOffset = event.getTimestamp() - prevEvent.getTimestamp();
    +                long deviation = Math.abs(repeatOffset - republishPeriod.toMilliseconds());
    +                if (deviation > republishPeriod.toMilliseconds()/10 &&
    +                        //warn only if recovered is too far away from the last failure
    +                        (!event.getSensor().equals(HASensors.ENTITY_RECOVERED) ||
    +                        repeatOffset > republishPeriod.toMilliseconds())) {
    +                    log.error("The time between failure republish (" + repeatOffset + "ms) deviates too much from the expected " + republishPeriod + ". prevEvent=" + prevEvent + ", event=" + event);
    --- End diff --
    
    I see it's just a log rather than a failure. Is that because of the risk of false negatives? Would be good if we can have a test that will genuinely fail if the republish isn't working. For example, we could assert that we eventually get 2 entity_failed events before we set the entity has recovered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

Posted by aledsage <gi...@git.apache.org>.
Github user aledsage commented on a diff in the pull request:

    https://github.com/apache/incubator-brooklyn/pull/571#discussion_r27613918
  
    --- Diff: policy/src/main/java/brooklyn/policy/ha/SshMachineFailureDetector.java ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package brooklyn.policy.ha;
    +
    +import java.util.Map;
    +
    +import brooklyn.catalog.Catalog;
    +import brooklyn.config.ConfigKey;
    +import brooklyn.entity.basic.ConfigKeys;
    +import brooklyn.event.basic.BasicNotificationSensor;
    +import brooklyn.location.basic.Machines;
    +import brooklyn.location.basic.SshMachineLocation;
    +import brooklyn.policy.ha.HASensors.FailureDescriptor;
    +import brooklyn.util.guava.Maybe;
    +import brooklyn.util.internal.ssh.SshTool;
    +import brooklyn.util.time.Duration;
    +
    +import com.google.common.collect.ImmutableList;
    +import com.google.common.collect.ImmutableMap;
    +
    +@Catalog(name="Ssh Connectivity Failure Detector", description="HA policy for monitoring an SshMachine, "
    +        + "emitting an event if the connection is lost/restored")
    +public class SshMachineFailureDetector extends AbstractFailureDetector {
    +    public static final String DEFAULT_UNIQUE_TAG = "failureDetector.sshMachine.tag";
    +
    +    public static final BasicNotificationSensor<FailureDescriptor> CONNECTION_FAILED = HASensors.CONNECTION_FAILED;
    +
    +    public static final BasicNotificationSensor<FailureDescriptor> CONNECTION_RECOVERED = HASensors.CONNECTION_RECOVERED;
    +
    +    public static final ConfigKey<Duration> CONNECT_TIMEOUT = ConfigKeys.newDurationConfigKey(
    +            "ha.sshConnection.timeout", "How long to wait for conneciton before declaring failure", Duration.TEN_SECONDS);
    +
    +    @Override
    +    public void init() {
    +        super.init();
    +        if (config().getRaw(SENSOR_FAILED).isAbsent()) {
    +            config().set(SENSOR_FAILED, CONNECTION_FAILED);
    +        }
    +        if (config().getRaw(SENSOR_RECOVERED).isAbsent()) {
    +            config().set(SENSOR_RECOVERED, CONNECTION_RECOVERED);
    +        }
    +        if (config().getRaw(POLL_PERIOD).isAbsent()) {
    +            config().set(POLL_PERIOD, Duration.ONE_MINUTE);
    +        }
    +        uniqueTag = DEFAULT_UNIQUE_TAG;
    +    }
    +
    +    @Override
    +    protected CalculatedStatus calculateStatus() {
    +        Maybe<SshMachineLocation> sshMachineOption = Machines.findUniqueSshMachineLocation(entity.getLocations());
    +        if (sshMachineOption.isPresent()) {
    +            SshMachineLocation sshMachine = sshMachineOption.get();
    +            try {
    +                Duration timeout = config().get(CONNECT_TIMEOUT);
    +                Map<String, ?> flags = ImmutableMap.of(
    +                        SshTool.PROP_CONNECT_TIMEOUT.getName(), timeout.toMilliseconds(),
    +                        SshTool.PROP_SESSION_TIMEOUT.getName(), timeout.toMilliseconds(),
    +                        SshTool.PROP_SSH_TRIES.getName(), 1);
    +                int exitCode = sshMachine.execCommands(flags, SshMachineFailureDetector.class.getName(), ImmutableList.of("exit"));
    +                return new BasicCalculatedStatus(exitCode == 0, sshMachine.toString());
    +            } catch (Exception e) {
    --- End diff --
    
    Risks swallowing `InterruptedException`. Worth doing `Exceptions.propagateIfFatal(e)`.
    
    Do we want to log it (at debug) when it's the first time we've seen the ssh exception? No strong feelings. Certainly don't want to log it every time (unless at trace).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

Posted by aledsage <gi...@git.apache.org>.
Github user aledsage commented on a diff in the pull request:

    https://github.com/apache/incubator-brooklyn/pull/571#discussion_r27614430
  
    --- Diff: policy/src/test/java/brooklyn/policy/ha/ServiceFailureDetectorTest.java ---
    @@ -312,6 +316,54 @@ public void testReportsFailureWhenAlreadyOnFireOnRegisteringPolicy() throws Exce
             assertHasEventEventually(HASensors.ENTITY_FAILED, Predicates.<Object>equalTo(e1), null);
         }
         
    +    @Test(groups="Integration") // Has a 1.5 second wait
    +    public void testRepublishedFailure() throws Exception {
    +        Duration republishPeriod = Duration.millis(100);
    +
    +        e1.addEnricher(EnricherSpec.create(ServiceFailureDetector.class)
    +                .configure(ServiceFailureDetector.ENTITY_FAILED_REPUBLISH_TIME, republishPeriod));
    +            
    +        // Set the entity to healthy
    +        e1.setAttribute(TestEntity.SERVICE_UP, true);
    +        ServiceStateLogic.setExpectedState(e1, Lifecycle.RUNNING);
    +        EntityTestUtils.assertAttributeEqualsEventually(e1, Attributes.SERVICE_STATE_ACTUAL, Lifecycle.RUNNING);
    +        
    +        // Make the entity fail;
    +        ServiceStateLogic.ServiceProblemsLogic.updateProblemsIndicator(e1, "test", "foo");
    +        EntityTestUtils.assertAttributeEqualsEventually(e1, TestEntity.SERVICE_STATE_ACTUAL, Lifecycle.ON_FIRE);
    +        assertHasEventEventually(HASensors.ENTITY_FAILED, Predicates.<Object>equalTo(e1), null);
    +
    +        TimeUnit.SECONDS.sleep(1);
    +        
    +        // Now recover
    +        ServiceStateLogic.ServiceProblemsLogic.clearProblemsIndicator(e1, "test");
    +        EntityTestUtils.assertAttributeEqualsEventually(e1, TestEntity.SERVICE_STATE_ACTUAL, Lifecycle.RUNNING);
    +        assertHasEventEventually(HASensors.ENTITY_RECOVERED, Predicates.<Object>equalTo(e1), null);
    +        
    +        TimeUnit.MILLISECONDS.sleep(500);
    +
    +        //can't assert number of republish events due to jitter, warn below if it deviates from expectation
    +        assertTrue(events.size() > 2, "events="+events);
    +
    +        SensorEvent<FailureDescriptor> prevEvent = null;
    +        for (SensorEvent<FailureDescriptor> event : events) {
    +            if (prevEvent != null) {
    +                long repeatOffset = event.getTimestamp() - prevEvent.getTimestamp();
    +                long deviation = Math.abs(repeatOffset - republishPeriod.toMilliseconds());
    +                if (deviation > republishPeriod.toMilliseconds()/10 &&
    --- End diff --
    
    Same here: very time sensitive. We might well get a 10ms starvation of the required thread when running on a cloud test server.
    
    For this kind of time test (to avoid false negatives), I resorted to number that were more like republishPeriod being at least 1 second (rather than 100ms), and accepting anything up to 500ms difference.
    
    If we start running our integration tests on apache hardware, we might find that will sometimes fail as well!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

Posted by grkvlt <gi...@git.apache.org>.
Github user grkvlt commented on the pull request:

    https://github.com/apache/incubator-brooklyn/pull/571#issuecomment-86659227
  
    There look really useful @neykov 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

Posted by aledsage <gi...@git.apache.org>.
Github user aledsage commented on a diff in the pull request:

    https://github.com/apache/incubator-brooklyn/pull/571#discussion_r27614109
  
    --- Diff: policy/src/test/java/brooklyn/policy/ha/ServiceFailureDetectorTest.java ---
    @@ -312,6 +316,54 @@ public void testReportsFailureWhenAlreadyOnFireOnRegisteringPolicy() throws Exce
             assertHasEventEventually(HASensors.ENTITY_FAILED, Predicates.<Object>equalTo(e1), null);
         }
         
    +    @Test(groups="Integration") // Has a 1.5 second wait
    +    public void testRepublishedFailure() throws Exception {
    +        Duration republishPeriod = Duration.millis(100);
    +
    +        e1.addEnricher(EnricherSpec.create(ServiceFailureDetector.class)
    +                .configure(ServiceFailureDetector.ENTITY_FAILED_REPUBLISH_TIME, republishPeriod));
    +            
    +        // Set the entity to healthy
    +        e1.setAttribute(TestEntity.SERVICE_UP, true);
    +        ServiceStateLogic.setExpectedState(e1, Lifecycle.RUNNING);
    +        EntityTestUtils.assertAttributeEqualsEventually(e1, Attributes.SERVICE_STATE_ACTUAL, Lifecycle.RUNNING);
    +        
    +        // Make the entity fail;
    +        ServiceStateLogic.ServiceProblemsLogic.updateProblemsIndicator(e1, "test", "foo");
    +        EntityTestUtils.assertAttributeEqualsEventually(e1, TestEntity.SERVICE_STATE_ACTUAL, Lifecycle.ON_FIRE);
    +        assertHasEventEventually(HASensors.ENTITY_FAILED, Predicates.<Object>equalTo(e1), null);
    +
    +        TimeUnit.SECONDS.sleep(1);
    +        
    +        // Now recover
    +        ServiceStateLogic.ServiceProblemsLogic.clearProblemsIndicator(e1, "test");
    +        EntityTestUtils.assertAttributeEqualsEventually(e1, TestEntity.SERVICE_STATE_ACTUAL, Lifecycle.RUNNING);
    +        assertHasEventEventually(HASensors.ENTITY_RECOVERED, Predicates.<Object>equalTo(e1), null);
    +        
    +        TimeUnit.MILLISECONDS.sleep(500);
    +
    +        //can't assert number of republish events due to jitter, warn below if it deviates from expectation
    +        assertTrue(events.size() > 2, "events="+events);
    --- End diff --
    
    I'm hesitant about time-sensitive tests like this. We're probably ok because it's integration...
    
    For those that used to run in cloudbees, we'd occassionally see failures that suggested the thread had been starved for several seconds. I resorted to 30 second timeouts to be on the safe side, so we wouldn't get false negatives in functional tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-brooklyn pull request: new policies: SshConnectionFailur...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-brooklyn/pull/571


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---