You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stratos.apache.org by "Shaheedur Haque (shahhaqu)" <sh...@cisco.com> on 2015/04/09 15:43:50 UTC

RE: Maintenance modes (was RE: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos)

Hi Imesh, Sandaruwan,

Here is a written-up proposal. I *think* it covers the various use cases suggested both here and in JIRA STRATOS-1234, but as always, your thoughts on the matter are welcome. The write-up has the form of a “spec” and a “Q&A”. As a next step, I guess we could do a hang-out or con-call or something?

Thoughts welcome…

Thanks, Shaheed

OPERATIONAL STATE COMMANDS

The following commands, with the defined effects, are needed:


·        No command directly affects what I call the “major state” of the Application/Group/Cluster/Cartridge, i.e. the state as reflected in the information CURRENTLY returned by the application/{appId}/runtime information.

·        Each command affects what I call the “operational state” only. The commands and their operational states are:

o   Autoscaling on, off. Autoscaling on is current behaviour.

o   Autohealing on, off. Autohealing on is current behaviour.

o   Maintenance off, restart, replace. Maintenance off is current behaviour.

o   (We can add more later if needed)

Command

Server effect

Cartridge effect

Autoscaling off.

CEP and gathers stats and history as usual. Autoscalar operates as usual, except that no scaling is done. Instead, a cluster state variable tracks the normal, overload or underload state and logs messages when this state variable changes value.

No effect on running cartridges. No new cartridges are spun up, no existing cartridges are spun down EXCEPT for autohealing.

Autohealing off.

CEP ignores any heartbeat timeout other than to log that it happened, and set an instance state variable to track this.
When autohealing is turned back on, the timeout will happen again, and the failure will be acted upon normally, except that the log shall make it clear (using the instance state variable) that the autohealing had been delayed.

No new cartridges are spun up until after the autohealing is enabled.

Maintenance restart.

Like autohealing off except that the an extra state variable is set indicating maintenance mode is in effect.

The both state variables are cleared when the Cartridge resume event is seen.

Cartridge is signalled with an *event*, not a blocking callout.

Cartridge application must be able to reboot or just restart, and have the cartridge agent resume its previous (active/inactive) state. When resuming, the agent signals the server with a resume *event*.

Note this implies the cartridge agent is restartable (because the application can choose to reboot).

Maintenance replace.

Like maintenance restart except that the cartridge instance is replaced.

The difference between “restart” and “replace” is that the latter is for applications that cannot update themselves, but expect essentially a new VM instance with the new software.

In other words, this is the big hammer/most general approach to upgrades (e.g. this is more likely to work that an apt-get downgrade ☺).



·        Each command referred to here is a REST API call.

·        Each command can apply to an entire Application, or any nested level (group or cartridge) within it.

·        Arguments for application-wide use case:

o   application={appId}, operationalState={command}

·        Arguments for nested-level use case:

o   application={appId}, nesting={0}/{1}/{2}/…/{n}, operationalState ={command}

Q&A


1.      What’s the point of restart/replace, over and above auto* off?



These are to actually cause the application software in the VM instance to take note to do something. Typically, I would expect this to result in an internally-managed software update. For example think of a VMs running Ubuntu, and pointing to a known repository of say security patches, they could all just do a “apt-get update/upgrade”.



The Cartridge logic is defined to be event-based rather than blocking, because making the thing blocking would be a problem if a reboot was involved. (Also, generally, blocking operations in a distributed system raise too many edge cases like: can this operation be cancelled? Repeated? etc.).


2.      Propagation/inheritance rules



I see two options:



·        Use hierarchy. If you apply a thing a hierarchy level n, and n has internal structure (i.e. it is a group not a cartridge), the command propagates all the way down (note: this is implied in what I said for the application level command).

·        Do not use hierarchy. The command only applies to the level to which is was addressed by the REST call.

In either case, the effect of contradictory commands is UNDEFINED, i.e. toggling the flags in quick succession will likely result in an unhelpful outcome.

I think the normal approach is NOT to use hierarchy; after all just because there is a upgrade to be applied for application code in a given set of VMs, there is nothing to say that any elements lower down the hierarchy should be upgraded at the same time. Even in the case where (say) security patches to a common OS are to be applied, I would doubt the sanity of anybody doing this across every VM in the whole system in one go ☺. OTOH, maybe I am wrong!


3.      Should these commands apply to “deployed” or only to “configured” Applications?



I think the commands can be applied whether the Application is deployed or not….clearly the stuff that sets flags on instances has to set those flags on all current and future instances that may spin up under a given deployment.



From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: 27 March 2015 04:21
To: dev
Subject: Re: Maintenance modes (was RE: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos)

Hi Shaheed,

A really good suggestion! I think we could to manage what you have suggested in the same implementation as they overlap. I'm +1 for the idea of putting a cluster into the "Maintenance Mode" manually for diagnostic purposes and stop autoscaling it. We could introduce new API methods to manage this. The only question is whether we could use the same instance state for all the scenarios:

1. Update platform (might need to use the term platform here as it may get confused with the software that may run on the platform)
2. Apply patches
3. Pause a cluster for diagnostic purposes

I would like to suggest to change the updateSoftware API method to updatePlatform:
POST /applications/{applicationId}/updatePlatform

May be we could introduce a new API method as follows to put a cluster into "Maintenance/Diagnostic Mode":
POST /clusters/{clusterId}/pause

Thanks
Imesh

On Thu, Mar 26, 2015 at 3:01 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:

First, let me say that I like a lot of what is proposed in this JIRA, but I am forking the thread here because I would like to suggest that we generalise just one part of it, the API into Stratos to cover a set of related use cases.

In the current version of this JIRA, the proposed API into Stratos looks like this:

PUT /api/applications/{applicationId} /updateSoftware

(see the JIRA section 2.3 for the details). I think this is actually one of a set of possible runtime states that we would like to put VM instances and various parts of Stratos in. Notice that I am deliberately not using specific terms such as "cluster" or "Autoscalar" because working that out is the point of this email.

So, the sorts of use cases I have in mind are:

  *   Updating the cartridge software as per this JIRA
  *   Putting a cluster (or maybe an instance) into a "maintenance mode" for diagnostic reasons. There could be multiple versions of this maintenance mode where (for example)

     *   The instance(s) might still handle traffic and deliver "I'm alive" health stats but no autoscaling is done.
     *   The instance(s) don't deliver health stats but no health stats

  *   Some of these would deliver notifications to the cartridge agent, others might only affect Stratos component(s).
  *   etc...other ideas anybody?

Thus, it might make sense to generalise the API to support  a set of closely related cases. Is there interest in taking such an approach to address this JIRA as well in clarifying and addressing the other use cases?


Thanks, Shaheed

________________________________________
From: Sandaruwan Nanayakkara (JIRA) [jira@apache.org<ma...@apache.org>]
Sent: 25 March 2015 08:36
To: dev@stratos.incubator.apache.org<ma...@stratos.incubator.apache.org>
Subject: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos

[ https://issues.apache.org/jira/browse/STRATOS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379497#comment-14379497 ]

Sandaruwan Nanayakkara commented on STRATOS-1234:
-------------------------------------------------

Hi all,

I have updated the Google doc with updating scenarios and please share your ideas by commenting and will be pretty much appreciated.

https://docs.google.com/document/d/1Ep2EwLubQnAv0bQGXE2ynwIDrRFCtMnCZ1E52KtzUH4/edit?usp=sharing

After days I finally deployed almost all of the Stratos samples with kubernates and openstack :)
Now the main fuss is on triggering updates in different software. Can you give an example on a software and how update is triggered manually. A practical approach??
Suppose that I have a software in a single cartridge application. So when triggering update with the REST we need a specific way to communicate with the software. Is there any way that this updating command is given to the software?

Thanks
Sandaruwan



> Software Update Management Solution for Stratos
> ------------------------------------------------
>
> Key: STRATOS-1234
> URL: https://issues.apache.org/jira/browse/STRATOS-1234
> Project: Stratos
> Issue Type: New Feature
> Reporter: Imesh Gunaratne
> Labels: gsoc2015, mentor
>
> Stratos uses Virtual Machines and Containers for hosting platform services on different Infrastructure as a Service (IaaS) solutions. At present Puppet is used for orchestration management on Virtual Machine based systems and manages all required software in Puppet Master. Container based systems creates Docker images for each platform service by including required software in the Docker image itself.
> In Virtual Machine use-case VM instances will communicate with Puppet master and execute the software installation. The same approach can be used for applying software updates.
> In Docker use-case we do not use Puppet because a new container with required software can be started in few seconds. This is very efficient compared to using Puppet and installing software on demand.
> The requirement of this project is to implement a core Stratos feature to propagate software updates in a live PaaS environment.
> 1. Puppet based solution:
> - Push software updates of a cartridge to Puppet Master (might not need to automate).
> - Invoke the software update process via the Stratos API for a given application.
> - Stratos Manager could send a new event to trigger puppet agent in each instance to apply the updates.
> 2. Docker based solution
> - Create a new docker image (with a new image id) for the cartridge with software updates (might not need to automate).
> - Invoke the software update process via the Stratos API for a given application.
> - Autoscaler can implement a new feature to bring down existing instances and create new instances with the new docker image id.
> Important!
> - In each scenario if updates are backward compatible, software update process should execute in phases, it should not bring down the entire cluster to apply the updates. If so the service will be unavailable for a certain time period. The idea is to apply the updates to set of members at a time.
> - If the updates are not backward compatible, we could make the entire cluster unavailable at once and apply the updates.
> - Member's state needs to be changed to a new state called "Updating" when applying the updates.
> If there is an interest on doing this project please send a mail to imesh at apache dot org by copying Apache Dev mailing list [1]. Please refer Stratos Wiki [2] for more information on Stratos architecture and how it works.
> [1] http://stratos.apache.org/community/mailing-lists.html
> [2] https://cwiki.apache.org/confluence/display/STRATOS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Maintenance modes (was RE: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
+1

I’ll schedule a webex call early next week, I’ll add the names on the current email list, please let me know who else should be on

Thanks

Martin

Adding David Spence

From: Lakmal Warusawithana [mailto:lakmal@wso2.com]
Sent: Thursday, May 07, 2015 6:45 AM
To: dev@stratos.apache.org
Cc: Sandaruwan Nanayakkara (JIRA); Imesh Gunaratne (imesh@wso2.com); Shaheedur Haque (shahhaqu); Ryan Du Plessis (rdupless)
Subject: Re: Maintenance modes (was RE: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos)

+1, shall we have a call sometime next week?

On Thursday, May 7, 2015, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Imesh, Sandaruwan

We would like to continue the discussion on this feature as we think this could be a useful enhancement to stratos.

To get some idea about the effort and as a first steps towards an implementation I identified the areas / components which IMHO need to be enhanced (based on stratos 4.1:
Btw, I also marked some of the items with a “?” - any feedback would be appreciated.


•        new Rest API to update resource state with maintenance mode:

o   PUT, resource types: application / group / cluster / instance

o   Maintenance mode on / off / restart / replace

•  sub state: autoscaling off / on

•  auto healing on / off

•        new API in autoscaler to set maintenance mode – not sure about that if necessary, any pointers  ?

•        adding new / enhancing existing  topology events : [application / group /cluster / member]

o   enhancing messaging domain model to add maintenance state + sub states

o   adding / enhancing event handling in Autoscaler (receiver, monitors, etc …)

•  Event receiver / monitor for maintenance event

•  Can we utilize / reuse  ClusterMonitor->handleMemberMaintenanceModeEvent for this feature ?

•

•        Adding maintenance state (In autoscaler e.g. ClusterStatusProcessor, GroupStatusProcessor, etc. )

o   application

o   group

o   cluster

o   member – member already has a MAINTENANCE state, can we utilize it for this feature ?

•        enhance / add  drools rule to handle the new maintenance mode to turn on / off autoscaling, auto healing

o   scale up / scale down, dependent scaling, min / max

o   logging requirements

•        AutoscalerHealthStatEventReceiver

o   Handle Fault Events in context of maintenance mode

•        Persistence of maintenance related states

o   Registry - any pointers on how the maintenance mode should be persisted  ?


Any thoughts or feedback on this, do you think there will be other components affected or need to be reworked  ?

The other question would be what will be the best or recommended way to develop the feature with the input from the community and to ensure a smooth integration with the stratos master ?


Thanks

Martin

From: Shaheedur Haque (shahhaqu)
Sent: 09 April 2015 14:44
To: dev@stratos.apache.org<javascript:_e(%7B%7D,'cvml','dev@stratos.apache.org');>; Sandaruwan Nanayakkara (JIRA); Imesh Gunaratne (imesh@wso2.com<javascript:_e(%7B%7D,'cvml','imesh@wso2.com');>)
Subject: RE: Maintenance modes (was RE: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos)

Hi Imesh, Sandaruwan,

Here is a written-up proposal. I *think* it covers the various use cases suggested both here and in JIRA STRATOS-1234, but as always, your thoughts on the matter are welcome. The write-up has the form of a “spec” and a “Q&A”. As a next step, I guess we could do a hang-out or con-call or something?

Thoughts welcome…

Thanks, Shaheed

OPERATIONAL STATE COMMANDS

The following commands, with the defined effects, are needed:


•        No command directly affects what I call the “major state” of the Application/Group/Cluster/Cartridge, i.e. the state as reflected in the information CURRENTLY returned by the application/{appId}/runtime information.

•        Each command affects what I call the “operational state” only. The commands and their operational states are:

o   Autoscaling on, off. Autoscaling on is current behaviour.

o   Autohealing on, off. Autohealing on is current behaviour.

o   Maintenance off, restart, replace. Maintenance off is current behaviour.

o   (We can add more later if needed)

Command

Server effect

Cartridge effect

Autoscaling off.

CEP and gathers stats and history as usual. Autoscalar operates as usual, except that no scaling is done. Instead, a cluster state variable tracks the normal, overload or underload state and logs messages when this state variable changes value.

No effect on running cartridges. No new cartridges are spun up, no existing cartridges are spun down EXCEPT for autohealing.

Autohealing off.

CEP ignores any heartbeat timeout other than to log that it happened, and set an instance state variable to track this.
When autohealing is turned back on, the timeout will happen again, and the failure will be acted upon normally, except that the log shall make it clear (using the instance state variable) that the autohealing had been delayed.

No new cartridges are spun up until after the autohealing is enabled.

Maintenance restart.

Like autohealing off except that the an extra state variable is set indicating maintenance mode is in effect.

The both state variables are cleared when the Cartridge resume event is seen.

Cartridge is signalled with an *event*, not a blocking callout.

Cartridge application must be able to reboot or just restart, and have the cartridge agent resume its previous (active/inactive) state. When resuming, the agent signals the server with a resume *event*.

Note this implies the cartridge agent is restartable (because the application can choose to reboot).

Maintenance replace.

Like maintenance restart except that the cartridge instance is replaced.

The difference between “restart” and “replace” is that the latter is for applications that cannot update themselves, but expect essentially a new VM instance with the new software.

In other words, this is the big hammer/most general approach to upgrades (e.g. this is more likely to work that an apt-get downgrade ☺).



•        Each command referred to here is a REST API call.

•        Each command can apply to an entire Application, or any nested level (group or cartridge) within it.

•        Arguments for application-wide use case:

o   application={appId}, operationalState={command}

•        Arguments for nested-level use case:

o   application={appId}, nesting={0}/{1}/{2}/…/{n}, operationalState ={command}

Q&A


1.      What’s the point of restart/replace, over and above auto* off?



These are to actually cause the application software in the VM instance to take note to do something. Typically, I would expect this to result in an internally-managed software update. For example think of a VMs running Ubuntu, and pointing to a known repository of say security patches, they could all just do a “apt-get update/upgrade”.



The Cartridge logic is defined to be event-based rather than blocking, because making the thing blocking would be a problem if a reboot was involved. (Also, generally, blocking operations in a distributed system raise too many edge cases like: can this operation be cancelled? Repeated? etc.).


2.      Propagation/inheritance rules



I see two options:



•        Use hierarchy. If you apply a thing a hierarchy level n, and n has internal structure (i.e. it is a group not a cartridge), the command propagates all the way down (note: this is implied in what I said for the application level command).

•        Do not use hierarchy. The command only applies to the level to which is was addressed by the REST call.

In either case, the effect of contradictory commands is UNDEFINED, i.e. toggling the flags in quick succession will likely result in an unhelpful outcome.

I think the normal approach is NOT to use hierarchy; after all just because there is a upgrade to be applied for application code in a given set of VMs, there is nothing to say that any elements lower down the hierarchy should be upgraded at the same time. Even in the case where (say) security patches to a common OS are to be applied, I would doubt the sanity of anybody doing this across every VM in the whole system in one go ☺. OTOH, maybe I am wrong!


3.      Should these commands apply to “deployed” or only to “configured” Applications?



I think the commands can be applied whether the Application is deployed or not….clearly the stuff that sets flags on instances has to set those flags on all current and future instances that may spin up under a given deployment.



From: Imesh Gunaratne [mailto:imesh@apache.org<javascript:_e(%7B%7D,'cvml','imesh@apache.org');>]
Sent: 27 March 2015 04:21
To: dev
Subject: Re: Maintenance modes (was RE: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos)

Hi Shaheed,

A really good suggestion! I think we could to manage what you have suggested in the same implementation as they overlap. I'm +1 for the idea of putting a cluster into the "Maintenance Mode" manually for diagnostic purposes and stop autoscaling it. We could introduce new API methods to manage this. The only question is whether we could use the same instance state for all the scenarios:

1. Update platform (might need to use the term platform here as it may get confused with the software that may run on the platform)
2. Apply patches
3. Pause a cluster for diagnostic purposes

I would like to suggest to change the updateSoftware API method to updatePlatform:
POST /applications/{applicationId}/updatePlatform

May be we could introduce a new API method as follows to put a cluster into "Maintenance/Diagnostic Mode":
POST /clusters/{clusterId}/pause

Thanks
Imesh

On Thu, Mar 26, 2015 at 3:01 PM, Shaheedur Haque (shahhaqu) <shahhaqu@cisco.com<javascript:_e(%7B%7D,'cvml','shahhaqu@cisco.com');>> wrote:

First, let me say that I like a lot of what is proposed in this JIRA, but I am forking the thread here because I would like to suggest that we generalise just one part of it, the API into Stratos to cover a set of related use cases.

In the current version of this JIRA, the proposed API into Stratos looks like this:

PUT /api/applications/{applicationId} /updateSoftware

(see the JIRA section 2.3 for the details). I think this is actually one of a set of possible runtime states that we would like to put VM instances and various parts of Stratos in. Notice that I am deliberately not using specific terms such as "cluster" or "Autoscalar" because working that out is the point of this email.

So, the sorts of use cases I have in mind are:

  *   Updating the cartridge software as per this JIRA
  *   Putting a cluster (or maybe an instance) into a "maintenance mode" for diagnostic reasons. There could be multiple versions of this maintenance mode where (for example)

     *   The instance(s) might still handle traffic and deliver "I'm alive" health stats but no autoscaling is done.
     *   The instance(s) don't deliver health stats but no health stats

  *   Some of these would deliver notifications to the cartridge agent, others might only affect Stratos component(s).
  *   etc...other ideas anybody?

Thus, it might make sense to generalise the API to support  a set of closely related cases. Is there interest in taking such an approach to address this JIRA as well in clarifying and addressing the other use cases?


Thanks, Shaheed

________________________________________
From: Sandaruwan Nanayakkara (JIRA) [jira@apache.org<javascript:_e(%7B%7D,'cvml','jira@apache.org');>]
Sent: 25 March 2015 08:36
To: dev@stratos.incubator.apache.org<javascript:_e(%7B%7D,'cvml','dev@stratos.incubator.apache.org');>
Subject: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos

[ https://issues.apache.org/jira/browse/STRATOS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379497#comment-14379497 ]

Sandaruwan Nanayakkara commented on STRATOS-1234:
-------------------------------------------------

Hi all,

I have updated the Google doc with updating scenarios and please share your ideas by commenting and will be pretty much appreciated.

https://docs.google.com/document/d/1Ep2EwLubQnAv0bQGXE2ynwIDrRFCtMnCZ1E52KtzUH4/edit?usp=sharing

After days I finally deployed almost all of the Stratos samples with kubernates and openstack :)
Now the main fuss is on triggering updates in different software. Can you give an example on a software and how update is triggered manually. A practical approach??
Suppose that I have a software in a single cartridge application. So when triggering update with the REST we need a specific way to communicate with the software. Is there any way that this updating command is given to the software?

Thanks
Sandaruwan



> Software Update Management Solution for Stratos
> ------------------------------------------------
>
> Key: STRATOS-1234
> URL: https://issues.apache.org/jira/browse/STRATOS-1234
> Project: Stratos
> Issue Type: New Feature
> Reporter: Imesh Gunaratne
> Labels: gsoc2015, mentor
>
> Stratos uses Virtual Machines and Containers for hosting platform services on different Infrastructure as a Service (IaaS) solutions. At present Puppet is used for orchestration management on Virtual Machine based systems and manages all required software in Puppet Master. Container based systems creates Docker images for each platform service by including required software in the Docker image itself.
> In Virtual Machine use-case VM instances will communicate with Puppet master and execute the software installation. The same approach can be used for applying software updates.
> In Docker use-case we do not use Puppet because a new container with required software can be started in few seconds. This is very efficient compared to using Puppet and installing software on demand.
> The requirement of this project is to implement a core Stratos feature to propagate software updates in a live PaaS environment.
> 1. Puppet based solution:
> - Push software updates of a cartridge to Puppet Master (might not need to automate).
> - Invoke the software update process via the Stratos API for a given application.
> - Stratos Manager could send a new event to trigger puppet agent in each instance to apply the updates.
> 2. Docker based solution
> - Create a new docker image (with a new image id) for the cartridge with software updates (might not need to automate).
> - Invoke the software update process via the Stratos API for a given application.
> - Autoscaler can implement a new feature to bring down existing instances and create new instances with the new docker image id.
> Important!
> - In each scenario if updates are backward compatible, software update process should execute in phases, it should not bring down the entire cluster to apply the updates. If so the service will be unavailable for a certain time period. The idea is to apply the updates to set of members at a time.
> - If the updates are not backward compatible, we could make the entire cluster unavailable at once and apply the updates.
> - Member's state needs to be changed to a new state called "Updating" when applying the updates.
> If there is an interest on doing this project please send a mail to imesh at apache dot org by copying Apache Dev mailing list [1]. Please refer Stratos Wiki [2] for more information on Stratos architecture and how it works.
> [1] http://stratos.apache.org/community/mailing-lists.html
> [2] https://cwiki.apache.org/confluence/display/STRATOS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Sent from Gmail Mobile

Re: Maintenance modes (was RE: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos)

Posted by Lakmal Warusawithana <la...@wso2.com>.
+1, shall we have a call sometime next week?

On Thursday, May 7, 2015, Martin Eppel (meppel) <me...@cisco.com> wrote:

>  Hi Imesh, Sandaruwan
>
>
>
> We would like to continue the discussion on this feature as we think this
> could be a useful enhancement to stratos.
>
>
>
> To get some idea about the effort and as a first steps towards an
> implementation I identified the areas / components which IMHO need to be
> enhanced (based on stratos 4.1:
>
> Btw, I also marked some of the items with a “?” - any feedback would be
> appreciated.
>
>
>
> ·        new Rest API to update resource state with maintenance mode:
>
> o   PUT, resource types: application / group / cluster / instance
>
> o   Maintenance mode on / off / restart / replace
>
> §  sub state: autoscaling off / on
>
> §  auto healing on / off
>
> ·        new API in autoscaler to set maintenance mode – not sure about
> that if necessary, any pointers  ?
>
> ·        adding new / enhancing existing  topology events : [application
> / group /cluster / member]
>
> o   enhancing messaging domain model to add maintenance state + sub states
>
> o   adding / enhancing event handling in Autoscaler (receiver, monitors,
> etc …)
>
> §  Event receiver / monitor for maintenance event
>
> §  Can we utilize / reuse
>  ClusterMonitor->handleMemberMaintenanceModeEvent for this feature ?
>
> §
>
> ·        Adding maintenance state (In autoscaler e.g.
> ClusterStatusProcessor, GroupStatusProcessor, etc. )
>
> o   application
>
> o   group
>
> o   cluster
>
> o   member – member already has a MAINTENANCE state, can we utilize it
> for this feature ?
>
> ·        enhance / add  drools rule to handle the new maintenance mode to
> turn on / off autoscaling, auto healing
>
> o   scale up / scale down, dependent scaling, min / max
>
> o   logging requirements
>
> ·        AutoscalerHealthStatEventReceiver
>
> o   Handle Fault Events in context of maintenance mode
>
> ·        Persistence of maintenance related states
>
> o   Registry - any pointers on how the maintenance mode should be
> persisted  ?
>
>
>
>
>
> Any thoughts or feedback on this, do you think there will be other
> components affected or need to be reworked  ?
>
>
>
> The other question would be what will be the best or recommended way to
> develop the feature with the input from the community and to ensure a
> smooth integration with the stratos master ?
>
>
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Shaheedur Haque (shahhaqu)
> *Sent:* 09 April 2015 14:44
> *To:* dev@stratos.apache.org
> <javascript:_e(%7B%7D,'cvml','dev@stratos.apache.org');>; Sandaruwan
> Nanayakkara (JIRA); Imesh Gunaratne (imesh@wso2.com
> <javascript:_e(%7B%7D,'cvml','imesh@wso2.com');>)
> *Subject:* RE: Maintenance modes (was RE: [jira] [Commented]
> (STRATOS-1234) Software Update Management Solution for Stratos)
>
>
>
> Hi Imesh, Sandaruwan,
>
>
>
> Here is a written-up proposal. I **think** it covers the various use
> cases suggested both here and in JIRA STRATOS-1234, but as always, your
> thoughts on the matter are welcome. The write-up has the form of a “spec”
> and a “Q&A”. As a next step, I guess we could do a hang-out or con-call or
> something?
>
>
>
> Thoughts welcome…
>
>
>
> Thanks, Shaheed
>
>
>
> *OPERATIONAL STATE COMMANDS*
>
>
>
> The following commands, with the defined effects, are needed:
>
>
>
> ·        No command *directly* affects what I call the “major state” of
> the Application/Group/Cluster/Cartridge, i.e. the state as reflected in the
> information CURRENTLY returned by the application/{appId}/runtime
> information.
>
> ·        Each command affects what I call the “operational state” only.
> The commands and their operational states are:
>
> o   Autoscaling *on*, *off*. Autoscaling *on* is current behaviour.
>
> o   Autohealing *on*, *off*. Autohealing *on* is current behaviour.
>
> o   Maintenance *off*, *restart, replace.* Maintenance *off* is current
> behaviour.
>
> o   (We can add more later if needed)
>
>
>
> Command
>
> Server effect
>
> Cartridge effect
>
> Autoscaling *off*.
>
> CEP and gathers stats and history as usual. Autoscalar operates as usual,
> except that no scaling is done. Instead, a cluster state variable tracks
> the normal, overload or underload state and logs messages when this state
> variable changes value.
>
> No effect on running cartridges. No new cartridges are spun up, no
> existing cartridges are spun down EXCEPT for autohealing.
>
> Autohealing *off*.
>
> CEP ignores any heartbeat timeout other than to log that it happened, and
> set an instance state variable to track this.
>
> When autohealing is turned back on, the timeout will happen again, and the
> failure will be acted upon normally, except that the log shall make it
> clear (using the instance state variable) that the autohealing had been
> delayed.
>
> No new cartridges are spun up until after the autohealing is enabled.
>
> Maintenance *restart.*
>
> Like autohealing *off* except that the an extra state variable is set
> indicating maintenance mode is in effect.
>
>
>
> The both state variables are cleared when the Cartridge resume event is
> seen.
>
> Cartridge is signalled with an **event**, not a blocking callout.
>
>
>
> Cartridge application must be able to reboot or just restart, and have the
> cartridge agent resume its previous (active/inactive) state. When resuming,
> the agent signals the server with a resume **event**.
>
>
>
> Note this implies the cartridge agent is restartable (because the
> application can choose to reboot).
>
> Maintenance *replace.*
>
> Like maintenance restart except that the cartridge instance is replaced.
>
> The difference between “restart” and “replace” is that the latter is for
> applications that cannot update themselves, but expect essentially a new VM
> instance with the new software.
>
>
>
> In other words, this is the big hammer/most general approach to upgrades
> (e.g. this is more likely to work that an apt-get downgrade J).
>
>
>
> ·        Each command referred to here is a REST API call.
>
> ·        Each command can apply to an entire Application, or any nested
> level (group or cartridge) within it.
>
> ·        Arguments for application-wide use case:
>
> o   application={appId}, operationalState={command}
>
> ·        Arguments for nested-level use case:
>
> o   application={appId}, nesting={0}/{1}/{2}/…/{n}, operationalState
> ={command}
>
>
>
> *Q&A*
>
>
>
> 1.      *What’s the point of restart/replace, over and above auto* off?*
>
>
>
> These are to actually cause the application software in the VM instance to
> take note to do something. Typically, I would expect this to result in an
> internally-managed software update. For example think of a VMs running
> Ubuntu, and pointing to a known repository of say security patches, they
> could all just do a “apt-get update/upgrade”.
>
>
>
> The Cartridge logic is defined to be event-based rather than blocking,
> because making the thing blocking would be a problem if a reboot was
> involved. (Also, generally, blocking operations in a distributed system
> raise too many edge cases like: can this operation be cancelled? Repeated?
> etc.).
>
>
>
> 2.      *Propagation/inheritance rules*
>
>
>
> I see two options:
>
>
>
> ·        Use hierarchy. If you apply a thing a hierarchy level n, and n
> has internal structure (i.e. it is a group not a cartridge), the command
> propagates all the way down (note: this is implied in what I said for the
> application level command).
>
> ·        Do not use hierarchy. The command only applies to the level to
> which is was addressed by the REST call.
>
>
>
> In either case, the effect of contradictory commands is UNDEFINED, i.e.
> toggling the flags in quick succession will likely result in an unhelpful
> outcome.
>
>
>
> I think the normal approach is NOT to use hierarchy; after all just
> because there is a upgrade to be applied for application code in a given
> set of VMs, there is nothing to say that any elements lower down the
> hierarchy should be upgraded at the same time. Even in the case where (say)
> security patches to a common OS are to be applied, I would doubt the sanity
> of anybody doing this across every VM in the whole system in one go J.
> OTOH, maybe I am wrong!
>
>
>
> 3.      *Should these commands apply to “deployed” or only to
> “configured” Applications?*
>
>
>
> I think the commands can be applied whether the Application is deployed or
> not….clearly the stuff that sets flags on instances has to set those flags
> on all current and future instances that may spin up under a given
> deployment.
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org
> <javascript:_e(%7B%7D,'cvml','imesh@apache.org');>]
> *Sent:* 27 March 2015 04:21
> *To:* dev
> *Subject:* Re: Maintenance modes (was RE: [jira] [Commented]
> (STRATOS-1234) Software Update Management Solution for Stratos)
>
>
>
> Hi Shaheed,
>
>
>
> A really good suggestion! I think we could to manage what you have
> suggested in the same implementation as they overlap. I'm +1 for the idea
> of putting a cluster into the "Maintenance Mode" manually for diagnostic
> purposes and stop autoscaling it. We could introduce new API methods to
> manage this. The only question is whether we could use the same instance
> state for all the scenarios:
>
>
>
> 1. Update platform (might need to use the term platform here as it may get
> confused with the software that may run on the platform)
>
> 2. Apply patches
>
> 3. Pause a cluster for diagnostic purposes
>
>
>
> I would like to suggest to change the updateSoftware API method to
> updatePlatform:
>
> POST /applications/{applicationId}/updatePlatform
>
>
>
> May be we could introduce a new API method as follows to put a cluster
> into "Maintenance/Diagnostic Mode":
>
> POST /clusters/{clusterId}/pause
>
>
>
> Thanks
>
> Imesh
>
>
>
> On Thu, Mar 26, 2015 at 3:01 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com <javascript:_e(%7B%7D,'cvml','shahhaqu@cisco.com');>>
> wrote:
>
>
> First, let me say that I like a lot of what is proposed in this JIRA, but
> I am forking the thread here because I would like to suggest that we
> generalise just one part of it, the API into Stratos to cover a set of
> related use cases.
>
> In the current version of this JIRA, the proposed API into Stratos looks
> like this:
>
> PUT /api/applications/{applicationId} /updateSoftware
>
> (see the JIRA section 2.3 for the details). I think this is actually one
> of a set of possible runtime states that we would like to put VM instances
> and various parts of Stratos in. Notice that I am deliberately not using
> specific terms such as "cluster" or "Autoscalar" because working that out
> is the point of this email.
>
> So, the sorts of use cases I have in mind are:
>
>    - Updating the cartridge software as per this JIRA
>    - Putting a cluster (or maybe an instance) into a "maintenance mode"
>    for diagnostic reasons. There could be multiple versions of this
>    maintenance mode where (for example)
>
>
>     - The instance(s) might still handle traffic and deliver "I'm alive"
>       health stats but no autoscaling is done.
>       - The instance(s) don't deliver health stats but no health stats
>
>
>    - Some of these would deliver notifications to the cartridge agent,
>    others might only affect Stratos component(s).
>    - etc...other ideas anybody?
>
> Thus, it might make sense to generalise the API to support  a set of
> closely related cases. Is there interest in taking such an approach to
> address this JIRA as well in clarifying and addressing the other use cases?
>
>
>
> Thanks, Shaheed
>
> ________________________________________
> From: Sandaruwan Nanayakkara (JIRA) [jira@apache.org
> <javascript:_e(%7B%7D,'cvml','jira@apache.org');>]
> Sent: 25 March 2015 08:36
> To: dev@stratos.incubator.apache.org
> <javascript:_e(%7B%7D,'cvml','dev@stratos.incubator.apache.org');>
> Subject: [jira] [Commented] (STRATOS-1234) Software Update Management
> Solution for Stratos
>
> [
> https://issues.apache.org/jira/browse/STRATOS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379497#comment-14379497
> ]
>
> Sandaruwan Nanayakkara commented on STRATOS-1234:
> -------------------------------------------------
>
> Hi all,
>
> I have updated the Google doc with updating scenarios and please share
> your ideas by commenting and will be pretty much appreciated.
>
>
> https://docs.google.com/document/d/1Ep2EwLubQnAv0bQGXE2ynwIDrRFCtMnCZ1E52KtzUH4/edit?usp=sharing
>
> After days I finally deployed almost all of the Stratos samples with
> kubernates and openstack :)
> Now the main fuss is on triggering updates in different software. Can you
> give an example on a software and how update is triggered manually. A
> practical approach??
> Suppose that I have a software in a single cartridge application. So when
> triggering update with the REST we need a specific way to communicate with
> the software. Is there any way that this updating command is given to the
> software?
>
> Thanks
> Sandaruwan
>
>
>
> > Software Update Management Solution for Stratos
> > ------------------------------------------------
> >
> > Key: STRATOS-1234
> > URL: https://issues.apache.org/jira/browse/STRATOS-1234
> > Project: Stratos
> > Issue Type: New Feature
> > Reporter: Imesh Gunaratne
> > Labels: gsoc2015, mentor
> >
> > Stratos uses Virtual Machines and Containers for hosting platform
> services on different Infrastructure as a Service (IaaS) solutions. At
> present Puppet is used for orchestration management on Virtual Machine
> based systems and manages all required software in Puppet Master. Container
> based systems creates Docker images for each platform service by including
> required software in the Docker image itself.
> > In Virtual Machine use-case VM instances will communicate with Puppet
> master and execute the software installation. The same approach can be used
> for applying software updates.
> > In Docker use-case we do not use Puppet because a new container with
> required software can be started in few seconds. This is very efficient
> compared to using Puppet and installing software on demand.
> > The requirement of this project is to implement a core Stratos feature
> to propagate software updates in a live PaaS environment.
> > 1. Puppet based solution:
> > - Push software updates of a cartridge to Puppet Master (might not need
> to automate).
> > - Invoke the software update process via the Stratos API for a given
> application.
> > - Stratos Manager could send a new event to trigger puppet agent in each
> instance to apply the updates.
> > 2. Docker based solution
> > - Create a new docker image (with a new image id) for the cartridge with
> software updates (might not need to automate).
> > - Invoke the software update process via the Stratos API for a given
> application.
> > - Autoscaler can implement a new feature to bring down existing
> instances and create new instances with the new docker image id.
> > Important!
> > - In each scenario if updates are backward compatible, software update
> process should execute in phases, it should not bring down the entire
> cluster to apply the updates. If so the service will be unavailable for a
> certain time period. The idea is to apply the updates to set of members at
> a time.
> > - If the updates are not backward compatible, we could make the entire
> cluster unavailable at once and apply the updates.
> > - Member's state needs to be changed to a new state called "Updating"
> when applying the updates.
> > If there is an interest on doing this project please send a mail to
> imesh at apache dot org by copying Apache Dev mailing list [1]. Please
> refer Stratos Wiki [2] for more information on Stratos architecture and how
> it works.
> > [1] http://stratos.apache.org/community/mailing-lists.html
> > [2] https://cwiki.apache.org/confluence/display/STRATOS
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>


-- 
Sent from Gmail Mobile

RE: Maintenance modes (was RE: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Imesh, Sandaruwan

We would like to continue the discussion on this feature as we think this could be a useful enhancement to stratos.

To get some idea about the effort and as a first steps towards an implementation I identified the areas / components which IMHO need to be enhanced (based on stratos 4.1:
Btw, I also marked some of the items with a “?” - any feedback would be appreciated.


·        new Rest API to update resource state with maintenance mode:

o   PUT, resource types: application / group / cluster / instance

o   Maintenance mode on / off / restart / replace

§  sub state: autoscaling off / on

§  auto healing on / off

·        new API in autoscaler to set maintenance mode – not sure about that if necessary, any pointers  ?

·        adding new / enhancing existing  topology events : [application / group /cluster / member]

o   enhancing messaging domain model to add maintenance state + sub states

o   adding / enhancing event handling in Autoscaler (receiver, monitors, etc …)

§  Event receiver / monitor for maintenance event

§  Can we utilize / reuse  ClusterMonitor->handleMemberMaintenanceModeEvent for this feature ?

§

·        Adding maintenance state (In autoscaler e.g. ClusterStatusProcessor, GroupStatusProcessor, etc. )

o   application

o   group

o   cluster

o   member – member already has a MAINTENANCE state, can we utilize it for this feature ?

·        enhance / add  drools rule to handle the new maintenance mode to turn on / off autoscaling, auto healing

o   scale up / scale down, dependent scaling, min / max

o   logging requirements

·        AutoscalerHealthStatEventReceiver

o   Handle Fault Events in context of maintenance mode

·        Persistence of maintenance related states

o   Registry - any pointers on how the maintenance mode should be persisted  ?


Any thoughts or feedback on this, do you think there will be other components affected or need to be reworked  ?

The other question would be what will be the best or recommended way to develop the feature with the input from the community and to ensure a smooth integration with the stratos master ?


Thanks

Martin

From: Shaheedur Haque (shahhaqu)
Sent: 09 April 2015 14:44
To: dev@stratos.apache.org<ma...@stratos.apache.org>; Sandaruwan Nanayakkara (JIRA); Imesh Gunaratne (imesh@wso2.com<ma...@wso2.com>)
Subject: RE: Maintenance modes (was RE: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos)

Hi Imesh, Sandaruwan,

Here is a written-up proposal. I *think* it covers the various use cases suggested both here and in JIRA STRATOS-1234, but as always, your thoughts on the matter are welcome. The write-up has the form of a “spec” and a “Q&A”. As a next step, I guess we could do a hang-out or con-call or something?

Thoughts welcome…

Thanks, Shaheed

OPERATIONAL STATE COMMANDS

The following commands, with the defined effects, are needed:


·        No command directly affects what I call the “major state” of the Application/Group/Cluster/Cartridge, i.e. the state as reflected in the information CURRENTLY returned by the application/{appId}/runtime information.

·        Each command affects what I call the “operational state” only. The commands and their operational states are:

o   Autoscaling on, off. Autoscaling on is current behaviour.

o   Autohealing on, off. Autohealing on is current behaviour.

o   Maintenance off, restart, replace. Maintenance off is current behaviour.

o   (We can add more later if needed)

Command

Server effect

Cartridge effect

Autoscaling off.

CEP and gathers stats and history as usual. Autoscalar operates as usual, except that no scaling is done. Instead, a cluster state variable tracks the normal, overload or underload state and logs messages when this state variable changes value.

No effect on running cartridges. No new cartridges are spun up, no existing cartridges are spun down EXCEPT for autohealing.

Autohealing off.

CEP ignores any heartbeat timeout other than to log that it happened, and set an instance state variable to track this.
When autohealing is turned back on, the timeout will happen again, and the failure will be acted upon normally, except that the log shall make it clear (using the instance state variable) that the autohealing had been delayed.

No new cartridges are spun up until after the autohealing is enabled.

Maintenance restart.

Like autohealing off except that the an extra state variable is set indicating maintenance mode is in effect.

The both state variables are cleared when the Cartridge resume event is seen.

Cartridge is signalled with an *event*, not a blocking callout.

Cartridge application must be able to reboot or just restart, and have the cartridge agent resume its previous (active/inactive) state. When resuming, the agent signals the server with a resume *event*.

Note this implies the cartridge agent is restartable (because the application can choose to reboot).

Maintenance replace.

Like maintenance restart except that the cartridge instance is replaced.

The difference between “restart” and “replace” is that the latter is for applications that cannot update themselves, but expect essentially a new VM instance with the new software.

In other words, this is the big hammer/most general approach to upgrades (e.g. this is more likely to work that an apt-get downgrade ☺).



·        Each command referred to here is a REST API call.

·        Each command can apply to an entire Application, or any nested level (group or cartridge) within it.

·        Arguments for application-wide use case:

o   application={appId}, operationalState={command}

·        Arguments for nested-level use case:

o   application={appId}, nesting={0}/{1}/{2}/…/{n}, operationalState ={command}

Q&A


1.      What’s the point of restart/replace, over and above auto* off?



These are to actually cause the application software in the VM instance to take note to do something. Typically, I would expect this to result in an internally-managed software update. For example think of a VMs running Ubuntu, and pointing to a known repository of say security patches, they could all just do a “apt-get update/upgrade”.



The Cartridge logic is defined to be event-based rather than blocking, because making the thing blocking would be a problem if a reboot was involved. (Also, generally, blocking operations in a distributed system raise too many edge cases like: can this operation be cancelled? Repeated? etc.).


2.      Propagation/inheritance rules



I see two options:



·        Use hierarchy. If you apply a thing a hierarchy level n, and n has internal structure (i.e. it is a group not a cartridge), the command propagates all the way down (note: this is implied in what I said for the application level command).

·        Do not use hierarchy. The command only applies to the level to which is was addressed by the REST call.

In either case, the effect of contradictory commands is UNDEFINED, i.e. toggling the flags in quick succession will likely result in an unhelpful outcome.

I think the normal approach is NOT to use hierarchy; after all just because there is a upgrade to be applied for application code in a given set of VMs, there is nothing to say that any elements lower down the hierarchy should be upgraded at the same time. Even in the case where (say) security patches to a common OS are to be applied, I would doubt the sanity of anybody doing this across every VM in the whole system in one go ☺. OTOH, maybe I am wrong!


3.      Should these commands apply to “deployed” or only to “configured” Applications?



I think the commands can be applied whether the Application is deployed or not….clearly the stuff that sets flags on instances has to set those flags on all current and future instances that may spin up under a given deployment.



From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: 27 March 2015 04:21
To: dev
Subject: Re: Maintenance modes (was RE: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos)

Hi Shaheed,

A really good suggestion! I think we could to manage what you have suggested in the same implementation as they overlap. I'm +1 for the idea of putting a cluster into the "Maintenance Mode" manually for diagnostic purposes and stop autoscaling it. We could introduce new API methods to manage this. The only question is whether we could use the same instance state for all the scenarios:

1. Update platform (might need to use the term platform here as it may get confused with the software that may run on the platform)
2. Apply patches
3. Pause a cluster for diagnostic purposes

I would like to suggest to change the updateSoftware API method to updatePlatform:
POST /applications/{applicationId}/updatePlatform

May be we could introduce a new API method as follows to put a cluster into "Maintenance/Diagnostic Mode":
POST /clusters/{clusterId}/pause

Thanks
Imesh

On Thu, Mar 26, 2015 at 3:01 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:

First, let me say that I like a lot of what is proposed in this JIRA, but I am forking the thread here because I would like to suggest that we generalise just one part of it, the API into Stratos to cover a set of related use cases.

In the current version of this JIRA, the proposed API into Stratos looks like this:

PUT /api/applications/{applicationId} /updateSoftware

(see the JIRA section 2.3 for the details). I think this is actually one of a set of possible runtime states that we would like to put VM instances and various parts of Stratos in. Notice that I am deliberately not using specific terms such as "cluster" or "Autoscalar" because working that out is the point of this email.

So, the sorts of use cases I have in mind are:

  *   Updating the cartridge software as per this JIRA
  *   Putting a cluster (or maybe an instance) into a "maintenance mode" for diagnostic reasons. There could be multiple versions of this maintenance mode where (for example)

     *   The instance(s) might still handle traffic and deliver "I'm alive" health stats but no autoscaling is done.
     *   The instance(s) don't deliver health stats but no health stats

  *   Some of these would deliver notifications to the cartridge agent, others might only affect Stratos component(s).
  *   etc...other ideas anybody?

Thus, it might make sense to generalise the API to support  a set of closely related cases. Is there interest in taking such an approach to address this JIRA as well in clarifying and addressing the other use cases?


Thanks, Shaheed

________________________________________
From: Sandaruwan Nanayakkara (JIRA) [jira@apache.org<ma...@apache.org>]
Sent: 25 March 2015 08:36
To: dev@stratos.incubator.apache.org<ma...@stratos.incubator.apache.org>
Subject: [jira] [Commented] (STRATOS-1234) Software Update Management Solution for Stratos

[ https://issues.apache.org/jira/browse/STRATOS-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379497#comment-14379497 ]

Sandaruwan Nanayakkara commented on STRATOS-1234:
-------------------------------------------------

Hi all,

I have updated the Google doc with updating scenarios and please share your ideas by commenting and will be pretty much appreciated.

https://docs.google.com/document/d/1Ep2EwLubQnAv0bQGXE2ynwIDrRFCtMnCZ1E52KtzUH4/edit?usp=sharing

After days I finally deployed almost all of the Stratos samples with kubernates and openstack :)
Now the main fuss is on triggering updates in different software. Can you give an example on a software and how update is triggered manually. A practical approach??
Suppose that I have a software in a single cartridge application. So when triggering update with the REST we need a specific way to communicate with the software. Is there any way that this updating command is given to the software?

Thanks
Sandaruwan



> Software Update Management Solution for Stratos
> ------------------------------------------------
>
> Key: STRATOS-1234
> URL: https://issues.apache.org/jira/browse/STRATOS-1234
> Project: Stratos
> Issue Type: New Feature
> Reporter: Imesh Gunaratne
> Labels: gsoc2015, mentor
>
> Stratos uses Virtual Machines and Containers for hosting platform services on different Infrastructure as a Service (IaaS) solutions. At present Puppet is used for orchestration management on Virtual Machine based systems and manages all required software in Puppet Master. Container based systems creates Docker images for each platform service by including required software in the Docker image itself.
> In Virtual Machine use-case VM instances will communicate with Puppet master and execute the software installation. The same approach can be used for applying software updates.
> In Docker use-case we do not use Puppet because a new container with required software can be started in few seconds. This is very efficient compared to using Puppet and installing software on demand.
> The requirement of this project is to implement a core Stratos feature to propagate software updates in a live PaaS environment.
> 1. Puppet based solution:
> - Push software updates of a cartridge to Puppet Master (might not need to automate).
> - Invoke the software update process via the Stratos API for a given application.
> - Stratos Manager could send a new event to trigger puppet agent in each instance to apply the updates.
> 2. Docker based solution
> - Create a new docker image (with a new image id) for the cartridge with software updates (might not need to automate).
> - Invoke the software update process via the Stratos API for a given application.
> - Autoscaler can implement a new feature to bring down existing instances and create new instances with the new docker image id.
> Important!
> - In each scenario if updates are backward compatible, software update process should execute in phases, it should not bring down the entire cluster to apply the updates. If so the service will be unavailable for a certain time period. The idea is to apply the updates to set of members at a time.
> - If the updates are not backward compatible, we could make the entire cluster unavailable at once and apply the updates.
> - Member's state needs to be changed to a new state called "Updating" when applying the updates.
> If there is an interest on doing this project please send a mail to imesh at apache dot org by copying Apache Dev mailing list [1]. Please refer Stratos Wiki [2] for more information on Stratos architecture and how it works.
> [1] http://stratos.apache.org/community/mailing-lists.html
> [2] https://cwiki.apache.org/confluence/display/STRATOS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos