You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@brooklyn.apache.org by "Aled Sage (JIRA)" <ji...@apache.org> on 2014/07/01 17:33:24 UTC

[jira] [Created] (BROOKLYN-16) Quarantine group: improve functionality and usability

Aled Sage created BROOKLYN-16:
---------------------------------

Summary: Quarantine group: improve functionality and usability
Key: BROOKLYN-16
URL: https://issues.apache.org/jira/browse/BROOKLYN-16
Project: Brooklyn
Issue Type: Improvement
Reporter: Aled Sage

I'd like us to clean up the behaviour and appearance of the "quarantine group" of clusters. My recent experience with some enterprise users highlights that it's confusing!

The configuraiton "dynamiccluster.quarantineFailedEntities" controls whether failed members of the cluster should be quarantined, or just deleted straight away.

Unquarantining
-------------------
Once an entity goes into quarantine, there is currently no way to get it out again (except deleting or discarding the entity).

However, it is good we don't add unquarantine nodes automatically (e.g. on the entity going to service-up again) because it may have been quarantined for good reason, such as going up+down.

PROPOSAL 1: We should have an explicit effector on the quarantine group entity to move the member back into the cluster's group of healthy members.

PROPOSAL 2: We should add a dynamic effector to each member of the quarantined group for "restoreFromQuarantine", which would add the member back into the cluster's group.
A user could invoke this effector by selecting the member in the web-console.

PROPOSAL 3: We could add an effector "restartMembers(boolean parallel)" on the quarantine group. Invoking this would restart the process for each member of the quarantine group. If parallel==true then this would be done in parallel, otherwise one member at a time.

PROPOSAL 4: We should have an explicit effector on the cluster to quarantine a member.

QuarantineGroup.expungeMembers
----------------------------------------------
There is an expungeMembers effector on the quarantine group. This takes a single parameter of "boolean firstStop", which controls whether it calls entity.stop() before unmanaging each entity.

The parameter name is confusing. Also the two behaviour is very different for the two parameter values, so potentially deserves two separate effectors.

Note this feels related to the "expunge" operation under the "lifecycle" tab of the web-console. There, it brings up a modal dialog with "Unmange an entity and (optionally) clean up resources, such as releasing a VM" and a checkbox for "Release resources".
The user feedback there was that it isn't the behaviour they expected when clicking "expunge". And that the behaviour was so different with the box ticked or unticked that it deserved two different operations.

PROPOSAL 5: replace the existing effector with two effectors: `unmanageMembers()` would just unmanage the entities without stopping or freeing the resources; `stopAndUnmanageMembers()` would first release the resources of each member (e.g. VMs etc, by calling entity.stop) and would then unmanage each.

Quarantine alternative
----------------------------
In our use-case, we're using docker. What we really want for this kind of failed node is to... generate a dump of the running process, and then stop the container (thus preserving the disk). We want the entity to be discarded from the cluster.

PROPOSAL 6: Add another config option to DynamicCluster for failedEntityHandler. This would take an instance of something like:

public interface FailedEntityHandler {
public enum HandlerResponse {
DISCARD_ENTITY,
STOP_AND_DISCARD_ENTITY,
ADD_TO_QUARANTINE,
KEEP_IN_GROUP;
}

HandlerResponse onFailedEntity(DynamicCluster cluster, Entity failedMember);
}

Visualization
----------------
Currently... if quarantined, then the entity tree (in the web-console) shows a "quarantine group" underneath (i.e. as a child of) the cluster.

All entities in the cluster (be they members of the quarantine group or healthy members of the cluster) appear under the cluster itself. This is because their *parent* is the cluster. An entity's parent never changes. What the user is really interested here is seeing the group membership.

There's a separate conversation to be had (or resurrected) about visualising groups (and other relationships) in the web-console. This use-case should be considered there.

--
This message was sent by Atlassian JIRA
(v6.2#6252)