You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stratos.apache.org by "Shaheedur Haque (shahhaqu)" <sh...@cisco.com> on 2015/04/23 11:11:03 UTC

Clustered deployments of Stratos

Hi,

We currently try to achieve HA with Stratos using something so unpleasant that I won't even describe it here :). It has been suggested that Stratos has, for a while now, supported a clustered mode of deployment where, given N servers:


*        The SM, AS and MB operate in a N-way clustered mode

*        The CEP operates in a N-way loadsharing mode

*        The Cartridge Agents can react to a failure in one of the N CEPs by failing over to one of the other N-1 remaining servers

In looking for documentation on how to set this up, I came across these two write-ups [1] and [2]. Questions:


*        Both these documents mention only using N=2. Is that still correct?

*        [1] Seems recently written, and [2] is a little older but not much. Are both documents still regarded as current?

Also, I'd love to hear any other experiences people have of running configurations like this.

Thanks, Shaheed

[1] https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
[2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html




Re: Clustered deployments of Stratos

Posted by Lakmal Warusawithana <la...@wso2.com>.
Hi Shaheed,

Thanks for detail explanation about the test. Will do a test round with HA
setup and update.

Yes, we need this work for GA

Thanks

On Friday, May 15, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>
wrote:

>  Hi Imesh,
>
>
>
> I finally got round to a proper series of tests, and here are the
> conclusions:
>
>
>
> ·        In Stratos 4.0, after a Pacemaker driven failover, the newly
> Active Stratos has lost all Cartridge Definitions.
>
> ·        In current [1] Stratos 4.1, after a Pacemaker driven failover,
> the newly Active Stratos:
>
> o   Has lost all Deployment Policies.
>
> o   Has lost contact with the Cartridge Agents, and all VMs are stuck
> with whatever state they had before the failover.
>
> ·        Note: I have not verified if Cartridge Groups are lost or not.
>
>
>
> I include the test results below at [2] and [3]. I am concerned as to
> whether 4.1 is ready for GA on this basis, so though more testing is no
> doubt possible (e.g. Cartridge Groups) I wanted to get this info to the
> list ASAP.
>
>
>
> Thanks, Shaheed
>
>
>
> [1] A recent build somewhere between beta 1 and beta 2, but I don’t think
> any relevant fixes have been made in master.
>
>
>
> [2] Persistence test output from Stratos 4.1. Note:
>
>
>
> 1.      In the build I have, the CLI is broken for a couple of commands;
> these are supplemented by direct “curl” commands further down.
>
> 2.      I’ve used one of our commands to show the instances and their
> state for a given application since there is not a compact JSON or
> convenient Startos CLI for that.
>
>
>
> *PERSISTENCE TEST, BEFORE FAILOVER*
>
> *================================*
>
>
>
> stratos> list-tenants
>
> Tenants:
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | Domain                | Tenant ID | Email            | State  | Created
> Date                 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com
> <javascript:_e(%7B%7D,'cvml','cloud1@cisco.com');> | Active | Fri May 15
> 04:46:58 MDT 2015 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
>
>
> stratos> list-network-partitions
>
> Network partitions found:
>
> +----------------------+----------------------+
>
> | Network Partition ID | Number of Partitions |
>
> +----------------------+----------------------+
>
> | RegionOne            | 1                    |
>
> +----------------------+----------------------+
>
>
>
> stratos> list-deployment-policies
>
> Deployment policies found:
>
> +-------------------+---------------+
>
> | ID                | Accessibility |
>
> +-------------------+---------------+
>
> | static-2-ha       | 1             |
>
> +-------------------+---------------+
>
> | autoscale-2-10-ha | 1             |
>
> +-------------------+---------------+
>
> | autoscale-1-5     | 1             |
>
> +-------------------+---------------+
>
> | static-1          | 1             |
>
> +-------------------+---------------+
>
>
>
> stratos> list-application-policies
>
> Error in listing application policies
>
> No application policies found
>
>
>
> stratos> list-autoscaling-policies
>
> Error in listing autoscaling policies
>
> No autoscaling policies found
>
>
>
> stratos> list-cartridges
>
> Cartridges found:
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | Type             | Category    | Name             |
> Description                | Version | Multi-Tenant |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +-----------------+-----------------+----------+
>
> | Application ID  | Alias           | Status   |
>
> +-----------------+-----------------+----------+
>
> | cartridge-proxy | cartridge-proxy | Deployed |
>
> +-----------------+-----------------+----------+
>
> | cisco-sample-vm | cisco-sample-vm | Deployed |
>
> +-----------------+-----------------+----------+
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/autoscalingPolicies
>
>
> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/applicationPolicies
>
>
> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>
>
>
>
>
> *PERSISTENCE TEST, AFTER FAILOVER*
>
> *===============================*
>
>
>
> stratos> list-tenants
>
> Tenants:
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | Domain                | Tenant ID | Email            | State  | Created
> Date                 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com
> <javascript:_e(%7B%7D,'cvml','cloud1@cisco.com');> | Active | Fri May 15
> 05:26:52 MDT 2015 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
>
>
> stratos> list-network-partitions
>
> Network partitions found:
>
> +----------------------+----------------------+
>
> | Network Partition ID | Number of Partitions |
>
> +----------------------+----------------------+
>
> | RegionOne            | 1                    |
>
> +----------------------+----------------------+
>
>
>
> stratos> list-deployment-policies
>
> No deployment policies found
>
>
>
> stratos> list-application-policies
>
> Error in listing application policies
>
> No application policies found
>
>
>
> stratos> list-autoscaling-policies
>
> Error in listing autoscaling policies
>
> No autoscaling policies found
>
>
>
> stratos> list-cartridges
>
> Cartridges found:
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | Type             | Category    | Name             |
> Description                | Version | Multi-Tenant |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +-----------------+-----------------+----------+
>
> | Application ID  | Alias           | Status   |
>
> +-----------------+-----------------+----------+
>
> | cartridge-proxy | cartridge-proxy | Deployed |
>
> +-----------------+-----------------+----------+
>
> | cisco-sample-vm | cisco-sample-vm | Deployed |
>
> +-----------------+-----------------+----------+
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/autoscalingPolicies
>
>
> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/applicationPolicies
>
>
> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>
>
>
> [3] Cartridge test output from Stratos 4.1. Note:
>
>
>
> 1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.
>
> 2.      We expect the Cartridge Agent to use a DNS lookup when it ends up
> reconnecting, and this worked just fine in Stratos 4.0.
>
>
>
> *CARTRIDGE TEST, BEFORE FAILOVER*
>
> *==============================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
> *CARTRIDGE TEST, AFTER FAILOVER*
>
> *=============================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
> *CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN
> WAIT 2 MINUTES*
>
>
> *===================================================================================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org
> <javascript:_e(%7B%7D,'cvml','imesh@apache.org');>]
> *Sent:* 14 May 2015 20:34
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> It would be better to use the REST API to query and see whether the
> relevant entities are persisted. Since data is stored in binary format in
> the registry it would be difficult to query the database and verify this.
>
>
>
> On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com <javascript:_e(%7B%7D,'cvml','shahhaqu@cisco.com');>>
> wrote:
>
> I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am
> going to need more specifics.
>
>
>
> For example, what query would you recommend to look at say deployment
> policies and cartridge definitions?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org
> <javascript:_e(%7B%7D,'cvml','imesh@apache.org');>]
> *Sent:* 09 May 2015 09:08
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Yes you could refer the tables that have the prefix "REG_".
>
>
>
> On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com <javascript:_e(%7B%7D,'cvml','shahhaqu@cisco.com');>>
> wrote:
>
> Can you suggest what tables to look at?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org
> <javascript:_e(%7B%7D,'cvml','imesh@apache.org');>]
> *Sent:* 07 May 2015 18:00
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for the clarification! May be the problem is with the MySQL
> active-passive configuration.
>
>
>
> I understand that you are switching the same OpenStack volume from active
> node to the passive node (when the passive node becomes active) therefore
> technically it should work. May be we need to investigate this problem
> further by analysing whether data is persisted properly in the active node
> before the passive node becomes active.
>
>
>
> Thanks
>
>
>
> On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com <javascript:_e(%7B%7D,'cvml','shahhaqu@cisco.com');>>
> wrote:
>
> The data is not synchronised between the active and passive nodes. For
> clarity, this is the HA model we had, much as described in the blog:
>
>
>
> ·        2 nodes, with Pacemaker in active-passive mode.
>
> ·        Under Pacemaker control:
>
> o   We run MySQL in active-passive mode, using a single OpenStack volume
> which we attach/reattach as the active role moves around nodes.
>
> o   As the Pacemaker moves the volume, and thus MySQL around on node
> failures, ActiveMQ and Stratos are moved around too.
>
> o   Thus, everything operates in active-passive mode.
>
>
>
> Even in this model, as the active Stratos 4.0 is moved around (i.e. the
> Stratos JVM on the old active node has gone with the node, and Pacemaker
> starts up a new Stratos JVM on what used to be the passive node), we found
> that the Cartridge Definition objects were found to be missing and, as a
> clumsy workaround [1], we had to replay the stored copied of them into
> Stratos using the REST API.
>
>
>
> With Stratos 4.1, using the new object names , early indications are *Deployment
> Policies* and *Application Deployment* policies are lost as the active
> fails over to the passive. If anything, these objects are more likely to
> hit the problems of [1], since Stratos 4.1 expects these to be tweaked on
> the fly (min/max etc).
>
>
>
> Thanks, Shaheed
>
>
>
> [1] Clearly, this loses any changes that were not in the stored copies.
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org
> <javascript:_e(%7B%7D,'cvml','imesh@apache.org');>]
> *Sent:* 03 May 2015 06:43
> *To:* dev@stratos.apache.org
> <javascript:_e(%7B%7D,'cvml','dev@stratos.apache.org');>
>
>
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for taking time to test this!
>
>
>
> Just to clarify the exact problem, do you mean that data is not
> synchronized between the active and passive nodes or they are not persisted
> in the active node?
>
>
>
> Thanks
>
>
> On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <shahhaqu@cisco.com
> <javascript:_e(%7B%7D,'cvml','shahhaqu@cisco.com');>> wrote:
>
>
> I have been looking into our use of Linux HA to setup an Active-Passive
> configuration. Testing indicates that in 4.1 (beta1), several objects seem
> not to be persisted properly. This includes at least:
>
> - Cartridges
> - Deployment policies
>
> Am I missing something? Is it safe to workaround this by replaying those
> objects?
>  ------------------------------
>
> *From:* Imesh Gunaratne [imesh@apache.org
> <javascript:_e(%7B%7D,'cvml','imesh@apache.org');>]
> *Sent:* 23 April 2015 10:47
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
> Hi Shaheed,
>
>
>
> Currently N-way clustering is still not possible with CC, AS & SM. We
> completed the initial phase of this feature however it was not completed.
> You could refer mail thread "[Discuss] Clustering Feature Implementation
> for 4.1.0-Alpha Release" for details.
>
>
>
> However at present [1] is valid. We could use Linux HA and deploy CC, AS
> and SM in Active-Passive mode.
>
>
>
> Thanks
>
>
>
>
>
>
>
> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com <javascript:_e(%7B%7D,'cvml','shahhaqu@cisco.com');>>
> wrote:
>
> Hi,
>
>
>
> We currently try to achieve HA with Stratos using something so unpleasant
> that I won’t even describe it here J. It has been suggested that Stratos
> has, for a while now, supported a clustered mode of deployment where, given
> N servers:
>
>
>
> ·        The SM, AS and MB operate in a N-way clustered mode
>
> ·        The CEP operates in a N-way loadsharing mode
>
> ·        The Cartridge Agents can react to a failure in one of the N CEPs
> by failing over to one of the other N-1 remaining servers
>
>
>
> In looking for documentation on how to set this up, I came across these
> two write-ups [1] and [2]. Questions:
>
>
>
> ·        Both these documents mention only using N=2. Is that still
> correct?
>
> ·        [1] Seems recently written, and [2] is a little older but not
> much. Are both documents still regarded as current?
>
>
>
> Also, I’d love to hear any other experiences people have of running
> configurations like this.
>
>
>
> Thanks, Shaheed
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>
> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>


-- 
Sent from Gmail Mobile

RE: Clustered deployments of Stratos

Posted by "Shaheedur Haque (shahhaqu)" <sh...@cisco.com>.
Just to close the loop here, the last remaining problem was traced back to an issue on our side where, for somewhat mysterious reasons, the pacemaker programming of our DNS failover was causing erratic behaviour. That was resolved, and for the record, this is what an Active-Standby failover looks like…

First, this is what the Cartridge Agent sees when the octl-01 VM is deleted and recreated:



[2015-05-21 11:18:39,691] ERROR - [EventPublisher] Cannot get a client to send events to TCP,octl.qmog.cisco.com:7611,TCP,octl.qmog.cisco.com:7711

[2015-05-21 11:18:39,698] ERROR - [EventPublisher] Cannot get a client to send events to TCP,octl.qmog.cisco.com:7611,TCP,octl.qmog.cisco.com:7711

[2015-05-21 11:18:40,600] ERROR - [EventPublisher] Cannot get a client to send events to TCP,octl.qmog.cisco.com:7611,TCP,octl.qmog.cisco.com:7711

[2015-05-21 11:18:43,089] ERROR - [AmqpTopicConnector] Could not connect to message broker

[2015-05-21 11:18:45,101] ERROR - [AmqpTopicConnector] Could not connect to message broker

[2015-05-21 11:18:45,530] ERROR - [AmqpTopicConnector] Could not connect to message broker

...

this pattern repeats as retries are tried every 2, 5, 10, 20 and 30 seconds

....

[2015-05-21 11:20:09,558] ERROR - [EventPublisher] Cannot get a client to send events to TCP,octl.qmog.cisco.com:7611,TCP,octl.qmog.cisco.com:7711

[2015-05-21 11:20:24,558] ERROR - [EventPublisher] Cannot get a client to send events to TCP,octl.qmog.cisco.com:7611,TCP,octl.qmog.cisco.com:7711

[2015-05-21 11:20:39,558] ERROR - [EventPublisher] Cannot get a client to send events to TCP,octl.qmog.cisco.com:7611,TCP,octl.qmog.cisco.com:7711

2015-05-21 11:20:09,558 [-] [DataBridge-Agent-pool-1-thread-24] ERROR EventPublisher Cannot get a client to send events to TCP,octl.qmog.cisco.com:7611,TCP,octl.qmog.cisco.com:7711

2015-05-21 11:20:24,558 [-] [DataBridge-Agent-pool-1-thread-25] ERROR EventPublisher Cannot get a client to send events to TCP,octl.qmog.cisco.com:7611,TCP,octl.qmog.cisco.com:7711

2015-05-21 11:20:39,558 [-] [DataBridge-Agent-pool-1-thread-26] ERROR EventPublisher Cannot get a client to send events to TCP,octl.qmog.cisco.com:7611,TCP,octl.qmog.cisco.com:7711


There are no more exceptions after this, so the Cartridge sees an outage of exactly 2 minutes, ending at 11:20:39. Meanwhile, on octl-02, Startos started at about this point:



TID: [2015-05-21 11:20:16,524]  INFO {org.wso2.carbon.server.extensions.PatchInstaller} -  Patch changes detected  {org.wso2.carbon.server.extensions.PatchInstaller}

TID: [2015-05-21 11:20:16,873]  INFO {org.wso2.carbon.server.util.PatchUtils.console} -  Backed up plugins to patch0000 {org.wso2.carbon.server.util.PatchUtils.console}

TID: [2015-05-21 11:20:17,662]  INFO {org.wso2.carbon.server.util.PatchUtils.console} -  Patch verification started {org.wso2.carbon.server.util.PatchUtils.console}

TID: [2015-05-21 11:20:17,904]  INFO {org.wso2.carbon.server.util.PatchUtils.console} -  Patch verification successfully completed without encountering any issues. {org.wso2.carbon.server.util.PatchUtils.console}

...

TID: [0] [STRATOS] [2015-05-21 11:20:58,454]  INFO {org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor} -  Topology initialized

There are no relevant errors, and subsequently, if the Cartridge is killed, that correctly detected and the VM respun.

Thanks, Shaheed

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: 18 May 2015 18:31
To: dev
Cc: Ryan Du Plessis (rdupless); Luca Martini (lmartini); Martin Eppel (meppel)
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,
As i have locally tested, member fault case working fine in the current master. Can you share us with the complete stratos logs once you upgrade your setup to the latest code in master? Also, you can enable debug logs for autosclaer and drools as below in <STRATOS-HOME>/repository/conf/log4j.properties
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

Are you running a separate CEP? In that case, if you could share the CEP logs, then it will help us to narrow down the problem.
Thanks,
Reka

On Mon, May 18, 2015 at 9:07 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Shaheed,
We had been using member fault detection to test the termination-behavior in beta2 and after that like one week before. So, i believe that It will work in the latest master. However we will also verify this again.
Thanks,
Reka

On Mon, May 18, 2015 at 7:53 PM, Imesh Gunaratne <im...@apache.org>> wrote:
Thanks Shaheed! I will verify the second problem where Stratos is not detecting manually terminated members.

Thanks

On Mon, May 18, 2015 at 3:39 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Ack. We are just in the middle of doing getting sync’d up again to master, and it sounds like that might fix the persistence issue.

I guess that leaves the Cartridge Agent reconnect side of the problem…

From: Lahiru Sandaruwan [mailto:lahirus@wso2.com<ma...@wso2.com>]
Sent: 17 May 2015 03:06

To: dev
Cc: Ryan Du Plessis (rdupless); Luca Martini (lmartini)
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Similarly it would be a great help, if you can verify all these issues in latest code, since we have been fixing a lot of issues in recent days, as a result of RC1 testing.

Thanks.

On Fri, May 15, 2015 at 9:42 PM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Shaheed,

Thanks for the quick response, after analyzing the results you have provided again, it looks like only the deployment policies are missing after the failover. We have fixed this issue in commit revision: 0c515aa013850575ddcfa2e299da5f0ec250ebc3

http://mail-archives.apache.org/mod_mbox/incubator-stratos-commits/201504.mbox/%3C22eed4e8639c401a8fda637fa6bb4501@git.apache.org%3E

Would you mind verifying whether this is there in your runtime?

Thanks


On Fri, May 15, 2015 at 9:02 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
The latter; we never have both Stratos instances running.

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 15 May 2015 16:17
To: dev
Cc: Ryan Du Plessis (rdupless); Luca Martini (lmartini)

Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Do you have both active and passive Stratos nodes running at the same time or do you start the passive node once the active node goes down?

Thanks

On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Hi Imesh,

I finally got round to a proper series of tests, and here are the conclusions:


•        In Stratos 4.0, after a Pacemaker driven failover, the newly Active Stratos has lost all Cartridge Definitions.

•        In current [1] Stratos 4.1, after a Pacemaker driven failover, the newly Active Stratos:

o   Has lost all Deployment Policies.

o   Has lost contact with the Cartridge Agents, and all VMs are stuck with whatever state they had before the failover.

•        Note: I have not verified if Cartridge Groups are lost or not.

I include the test results below at [2] and [3]. I am concerned as to whether 4.1 is ready for GA on this basis, so though more testing is no doubt possible (e.g. Cartridge Groups) I wanted to get this info to the list ASAP.

Thanks, Shaheed

[1] A recent build somewhere between beta 1 and beta 2, but I don’t think any relevant fixes have been made in master.

[2] Persistence test output from Stratos 4.1. Note:


1.      In the build I have, the CLI is broken for a couple of commands; these are supplemented by direct “curl” commands further down.

2.      I’ve used one of our commands to show the instances and their state for a given application since there is not a compact JSON or convenient Startos CLI for that.

PERSISTENCE TEST, BEFORE FAILOVER
================================

stratos> list-tenants
Tenants:
+-----------------------+-----------+------------------+--------+------------------------------+
| Domain                | Tenant ID | Email            | State  | Created Date                 |
+-----------------------+-----------+------------------+--------+------------------------------+
| cloud1.qmog.cisco.com<http://cloud1.qmog.cisco.com> | 1         | cloud1@cisco.com<ma...@cisco.com> | Active | Fri May 15 04:46:58 MDT 2015 |
+-----------------------+-----------+------------------+--------+------------------------------+

stratos> list-network-partitions
Network partitions found:
+----------------------+----------------------+
| Network Partition ID | Number of Partitions |
+----------------------+----------------------+
| RegionOne            | 1                    |
+----------------------+----------------------+

stratos> list-deployment-policies
Deployment policies found:
+-------------------+---------------+
| ID                | Accessibility |
+-------------------+---------------+
| static-2-ha       | 1             |
+-------------------+---------------+
| autoscale-2-10-ha | 1             |
+-------------------+---------------+
| autoscale-1-5     | 1             |
+-------------------+---------------+
| static-1          | 1             |
+-------------------+---------------+

stratos> list-application-policies
Error in listing application policies
No application policies found

stratos> list-autoscaling-policies
Error in listing autoscaling policies
No autoscaling policies found

stratos> list-cartridges
Cartridges found:
+------------------+-------------+------------------+----------------------------+---------+--------------+
| Type             | Category    | Name             | Description                | Version | Multi-Tenant |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+

stratos> list-applications
Applications found:
+-----------------+-----------------+----------+
| Application ID  | Alias           | Status   |
+-----------------+-----------------+----------+
| cartridge-proxy | cartridge-proxy | Deployed |
+-----------------+-----------------+----------+
| cisco-sample-vm | cisco-sample-vm | Deployed |
+-----------------+-----------------+----------+

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/autoscalingPolicies
[{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/applicationPolicies
[{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]


PERSISTENCE TEST, AFTER FAILOVER
===============================

stratos> list-tenants
Tenants:
+-----------------------+-----------+------------------+--------+------------------------------+
| Domain                | Tenant ID | Email            | State  | Created Date                 |
+-----------------------+-----------+------------------+--------+------------------------------+
| cloud1.qmog.cisco.com<http://cloud1.qmog.cisco.com> | 1         | cloud1@cisco.com<ma...@cisco.com> | Active | Fri May 15 05:26:52 MDT 2015 |
+-----------------------+-----------+------------------+--------+------------------------------+

stratos> list-network-partitions
Network partitions found:
+----------------------+----------------------+
| Network Partition ID | Number of Partitions |
+----------------------+----------------------+
| RegionOne            | 1                    |
+----------------------+----------------------+

stratos> list-deployment-policies
No deployment policies found

stratos> list-application-policies
Error in listing application policies
No application policies found

stratos> list-autoscaling-policies
Error in listing autoscaling policies
No autoscaling policies found

stratos> list-cartridges
Cartridges found:
+------------------+-------------+------------------+----------------------------+---------+--------------+
| Type             | Category    | Name             | Description                | Version | Multi-Tenant |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+

stratos> list-applications
Applications found:
+-----------------+-----------------+----------+
| Application ID  | Alias           | Status   |
+-----------------+-----------------+----------+
| cartridge-proxy | cartridge-proxy | Deployed |
+-----------------+-----------------+----------+
| cisco-sample-vm | cisco-sample-vm | Deployed |
+-----------------+-----------------+----------+

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/autoscalingPolicies
[{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/applicationPolicies
[{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]

[3] Cartridge test output from Stratos 4.1. Note:


1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.

2.      We expect the Cartridge Agent to use a DNS lookup when it ends up reconnecting, and this worked just fine in Stratos 4.0.

CARTRIDGE TEST, BEFORE FAILOVER
==============================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
     cisco-sample-vm: 172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active

CARTRIDGE TEST, AFTER FAILOVER
=============================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
     cisco-sample-vm: 172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active

CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN WAIT 2 MINUTES
===================================================================================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
     cisco-sample-vm: 172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active



From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 14 May 2015 20:34

To: dev
Subject: Re: Clustered deployments of Stratos

It would be better to use the REST API to query and see whether the relevant entities are persisted. Since data is stored in binary format in the registry it would be difficult to query the database and verify this.

On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am going to need more specifics.

For example, what query would you recommend to look at say deployment policies and cartridge definitions?

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 09 May 2015 09:08

To: dev
Subject: Re: Clustered deployments of Stratos

Yes you could refer the tables that have the prefix "REG_".

On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Can you suggest what tables to look at?

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 07 May 2015 18:00

To: dev
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for the clarification! May be the problem is with the MySQL active-passive configuration.

I understand that you are switching the same OpenStack volume from active node to the passive node (when the passive node becomes active) therefore technically it should work. May be we need to investigate this problem further by analysing whether data is persisted properly in the active node before the passive node becomes active.

Thanks

On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
The data is not synchronised between the active and passive nodes. For clarity, this is the HA model we had, much as described in the blog:


•        2 nodes, with Pacemaker in active-passive mode.

•        Under Pacemaker control:

o   We run MySQL in active-passive mode, using a single OpenStack volume which we attach/reattach as the active role moves around nodes.

o   As the Pacemaker moves the volume, and thus MySQL around on node failures, ActiveMQ and Stratos are moved around too.

o   Thus, everything operates in active-passive mode.

Even in this model, as the active Stratos 4.0 is moved around (i.e. the Stratos JVM on the old active node has gone with the node, and Pacemaker starts up a new Stratos JVM on what used to be the passive node), we found that the Cartridge Definition objects were found to be missing and, as a clumsy workaround [1], we had to replay the stored copied of them into Stratos using the REST API.

With Stratos 4.1, using the new object names , early indications are Deployment Policies and Application Deployment policies are lost as the active fails over to the passive. If anything, these objects are more likely to hit the problems of [1], since Stratos 4.1 expects these to be tweaked on the fly (min/max etc).

Thanks, Shaheed

[1] Clearly, this loses any changes that were not in the stored copies.

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 03 May 2015 06:43
To: dev@stratos.apache.org<ma...@stratos.apache.org>

Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for taking time to test this!

Just to clarify the exact problem, do you mean that data is not synchronized between the active and passive nodes or they are not persisted in the active node?

Thanks

On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:

I have been looking into our use of Linux HA to setup an Active-Passive configuration. Testing indicates that in 4.1 (beta1), several objects seem not to be persisted properly. This includes at least:

- Cartridges
- Deployment policies

Am I missing something? Is it safe to workaround this by replaying those objects?
________________________________
From: Imesh Gunaratne [imesh@apache.org<ma...@apache.org>]
Sent: 23 April 2015 10:47
To: dev
Subject: Re: Clustered deployments of Stratos
Hi Shaheed,

Currently N-way clustering is still not possible with CC, AS & SM. We completed the initial phase of this feature however it was not completed. You could refer mail thread "[Discuss] Clustering Feature Implementation for 4.1.0-Alpha Release" for details.

However at present [1] is valid. We could use Linux HA and deploy CC, AS and SM in Active-Passive mode.

Thanks



On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Hi,

We currently try to achieve HA with Stratos using something so unpleasant that I won’t even describe it here ☺. It has been suggested that Stratos has, for a while now, supported a clustered mode of deployment where, given N servers:


•        The SM, AS and MB operate in a N-way clustered mode

•        The CEP operates in a N-way loadsharing mode

•        The Cartridge Agents can react to a failure in one of the N CEPs by failing over to one of the other N-1 remaining servers

In looking for documentation on how to set this up, I came across these two write-ups [1] and [2]. Questions:


•        Both these documents mention only using N=2. Is that still correct?

•        [1] Seems recently written, and [2] is a little older but not much. Are both documents still regarded as current?

Also, I’d love to hear any other experiences people have of running configurations like this.

Thanks, Shaheed

[1] https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
[2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html






--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
--
Lahiru Sandaruwan
Committer and PMC member, Apache Stratos,
Senior Software Engineer,
WSO2 Inc., http://wso2.com
lean.enterprise.middleware

phone: +94773325954<tel:%2B94773325954>
email: lahirus@wso2.com<ma...@wso2.com> blog: http://lahiruwrites.blogspot.com/
linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007


Re: Clustered deployments of Stratos

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Shaheed,

As i have locally tested, member fault case working fine in the current
master. Can you share us with the complete stratos logs once you upgrade
your setup to the latest code in master? Also, you can enable debug logs
for autosclaer and drools as below in
<STRATOS-HOME>/repository/conf/log4j.properties

log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

Are you running a separate CEP? In that case, if you could share the CEP
logs, then it will help us to narrow down the problem.

Thanks,
Reka

On Mon, May 18, 2015 at 9:07 PM, Reka Thirunavukkarasu <re...@wso2.com>
wrote:

> Hi Shaheed,
>
> We had been using member fault detection to test the termination-behavior
> in beta2 and after that like one week before. So, i believe that It will
> work in the latest master. However we will also verify this again.
>
> Thanks,
> Reka
>
> On Mon, May 18, 2015 at 7:53 PM, Imesh Gunaratne <im...@apache.org> wrote:
>
>> Thanks Shaheed! I will verify the second problem where Stratos is not
>> detecting manually terminated members.
>>
>> Thanks
>>
>> On Mon, May 18, 2015 at 3:39 PM, Shaheedur Haque (shahhaqu) <
>> shahhaqu@cisco.com> wrote:
>>
>>>  Ack. We are just in the middle of doing getting sync’d up again to
>>> master, and it sounds like that might fix the persistence issue.
>>>
>>>
>>>
>>> I guess that leaves the Cartridge Agent reconnect side of the problem…
>>>
>>>
>>>
>>> *From:* Lahiru Sandaruwan [mailto:lahirus@wso2.com]
>>> *Sent:* 17 May 2015 03:06
>>>
>>> *To:* dev
>>> *Cc:* Ryan Du Plessis (rdupless); Luca Martini (lmartini)
>>> *Subject:* Re: Clustered deployments of Stratos
>>>
>>>
>>>
>>> Hi Shaheed,
>>>
>>>
>>>
>>> Similarly it would be a great help, if you can verify all these issues
>>> in latest code, since we have been fixing a lot of issues in recent days,
>>> as a result of RC1 testing.
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>> On Fri, May 15, 2015 at 9:42 PM, Imesh Gunaratne <im...@apache.org>
>>> wrote:
>>>
>>> Hi Shaheed,
>>>
>>>
>>>
>>> Thanks for the quick response, after analyzing the results you have
>>> provided again, it looks like only the deployment policies are missing
>>> after the failover. We have fixed this issue in commit
>>> revision: 0c515aa013850575ddcfa2e299da5f0ec250ebc3
>>>
>>>
>>>
>>>
>>> http://mail-archives.apache.org/mod_mbox/incubator-stratos-commits/201504.mbox/%3C22eed4e8639c401a8fda637fa6bb4501@git.apache.org%3E
>>>
>>>
>>>
>>> Would you mind verifying whether this is there in your runtime?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> On Fri, May 15, 2015 at 9:02 PM, Shaheedur Haque (shahhaqu) <
>>> shahhaqu@cisco.com> wrote:
>>>
>>> The latter; we never have both Stratos instances running.
>>>
>>>
>>>
>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>>> *Sent:* 15 May 2015 16:17
>>> *To:* dev
>>> *Cc:* Ryan Du Plessis (rdupless); Luca Martini (lmartini)
>>>
>>>
>>> *Subject:* Re: Clustered deployments of Stratos
>>>
>>>
>>>
>>> Hi Shaheed,
>>>
>>>
>>>
>>> Do you have both active and passive Stratos nodes running at the same
>>> time or do you start the passive node once the active node goes down?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) <
>>> shahhaqu@cisco.com> wrote:
>>>
>>> Hi Imesh,
>>>
>>>
>>>
>>> I finally got round to a proper series of tests, and here are the
>>> conclusions:
>>>
>>>
>>>
>>> ·        In Stratos 4.0, after a Pacemaker driven failover, the newly
>>> Active Stratos has lost all Cartridge Definitions.
>>>
>>> ·        In current [1] Stratos 4.1, after a Pacemaker driven failover,
>>> the newly Active Stratos:
>>>
>>> o   Has lost all Deployment Policies.
>>>
>>> o   Has lost contact with the Cartridge Agents, and all VMs are stuck
>>> with whatever state they had before the failover.
>>>
>>> ·        Note: I have not verified if Cartridge Groups are lost or not.
>>>
>>>
>>>
>>> I include the test results below at [2] and [3]. I am concerned as to
>>> whether 4.1 is ready for GA on this basis, so though more testing is no
>>> doubt possible (e.g. Cartridge Groups) I wanted to get this info to the
>>> list ASAP.
>>>
>>>
>>>
>>> Thanks, Shaheed
>>>
>>>
>>>
>>> [1] A recent build somewhere between beta 1 and beta 2, but I don’t
>>> think any relevant fixes have been made in master.
>>>
>>>
>>>
>>> [2] Persistence test output from Stratos 4.1. Note:
>>>
>>>
>>>
>>> 1.      In the build I have, the CLI is broken for a couple of
>>> commands; these are supplemented by direct “curl” commands further down.
>>>
>>> 2.      I’ve used one of our commands to show the instances and their
>>> state for a given application since there is not a compact JSON or
>>> convenient Startos CLI for that.
>>>
>>>
>>>
>>> *PERSISTENCE TEST, BEFORE FAILOVER*
>>>
>>> *================================*
>>>
>>>
>>>
>>> stratos> list-tenants
>>>
>>> Tenants:
>>>
>>>
>>> +-----------------------+-----------+------------------+--------+------------------------------+
>>>
>>> | Domain                | Tenant ID | Email            | State  |
>>> Created Date                 |
>>>
>>>
>>> +-----------------------+-----------+------------------+--------+------------------------------+
>>>
>>> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri
>>> May 15 04:46:58 MDT 2015 |
>>>
>>>
>>> +-----------------------+-----------+------------------+--------+------------------------------+
>>>
>>>
>>>
>>> stratos> list-network-partitions
>>>
>>> Network partitions found:
>>>
>>> +----------------------+----------------------+
>>>
>>> | Network Partition ID | Number of Partitions |
>>>
>>> +----------------------+----------------------+
>>>
>>> | RegionOne            | 1                    |
>>>
>>> +----------------------+----------------------+
>>>
>>>
>>>
>>> stratos> list-deployment-policies
>>>
>>> Deployment policies found:
>>>
>>> +-------------------+---------------+
>>>
>>> | ID                | Accessibility |
>>>
>>> +-------------------+---------------+
>>>
>>> | static-2-ha       | 1             |
>>>
>>> +-------------------+---------------+
>>>
>>> | autoscale-2-10-ha | 1             |
>>>
>>> +-------------------+---------------+
>>>
>>> | autoscale-1-5     | 1             |
>>>
>>> +-------------------+---------------+
>>>
>>> | static-1          | 1             |
>>>
>>> +-------------------+---------------+
>>>
>>>
>>>
>>> stratos> list-application-policies
>>>
>>> Error in listing application policies
>>>
>>> No application policies found
>>>
>>>
>>>
>>> stratos> list-autoscaling-policies
>>>
>>> Error in listing autoscaling policies
>>>
>>> No autoscaling policies found
>>>
>>>
>>>
>>> stratos> list-cartridges
>>>
>>> Cartridges found:
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | Type             | Category    | Name             |
>>> Description                | Version | Multi-Tenant |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
>>> Cartridge  | 1       | false        |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
>>> Cartridge  | 1       | false        |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
>>> Cartridge | 1       | false        |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
>>> Cartridge | 1       | false        |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
>>> Cartridge    | 1       | false        |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
>>> Cartridge    | 1       | false        |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>>
>>>
>>> stratos> list-applications
>>>
>>> Applications found:
>>>
>>> +-----------------+-----------------+----------+
>>>
>>> | Application ID  | Alias           | Status   |
>>>
>>> +-----------------+-----------------+----------+
>>>
>>> | cartridge-proxy | cartridge-proxy | Deployed |
>>>
>>> +-----------------+-----------------+----------+
>>>
>>> | cisco-sample-vm | cisco-sample-vm | Deployed |
>>>
>>> +-----------------+-----------------+----------+
>>>
>>>
>>>
>>> $ curl -uadmin:admin -k -H'Content-type: application/json'
>>> https://localhost:9443/api/autoscalingPolicies
>>>
>>>
>>> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>>>
>>>
>>>
>>> $ curl -uadmin:admin -k -H'Content-type: application/json'
>>> https://localhost:9443/api/applicationPolicies
>>>
>>>
>>> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>>>
>>>
>>>
>>>
>>>
>>> *PERSISTENCE TEST, AFTER FAILOVER*
>>>
>>> *===============================*
>>>
>>>
>>>
>>> stratos> list-tenants
>>>
>>> Tenants:
>>>
>>>
>>> +-----------------------+-----------+------------------+--------+------------------------------+
>>>
>>> | Domain                | Tenant ID | Email            | State  |
>>> Created Date                 |
>>>
>>>
>>> +-----------------------+-----------+------------------+--------+------------------------------+
>>>
>>> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri
>>> May 15 05:26:52 MDT 2015 |
>>>
>>>
>>> +-----------------------+-----------+------------------+--------+------------------------------+
>>>
>>>
>>>
>>> stratos> list-network-partitions
>>>
>>> Network partitions found:
>>>
>>> +----------------------+----------------------+
>>>
>>> | Network Partition ID | Number of Partitions |
>>>
>>> +----------------------+----------------------+
>>>
>>> | RegionOne            | 1                    |
>>>
>>> +----------------------+----------------------+
>>>
>>>
>>>
>>> stratos> list-deployment-policies
>>>
>>> No deployment policies found
>>>
>>>
>>>
>>> stratos> list-application-policies
>>>
>>> Error in listing application policies
>>>
>>> No application policies found
>>>
>>>
>>>
>>> stratos> list-autoscaling-policies
>>>
>>> Error in listing autoscaling policies
>>>
>>> No autoscaling policies found
>>>
>>>
>>>
>>> stratos> list-cartridges
>>>
>>> Cartridges found:
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | Type             | Category    | Name             |
>>> Description                | Version | Multi-Tenant |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
>>> Cartridge  | 1       | false        |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
>>> Cartridge  | 1       | false        |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
>>> Cartridge | 1       | false        |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
>>> Cartridge | 1       | false        |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
>>> Cartridge    | 1       | false        |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
>>> Cartridge    | 1       | false        |
>>>
>>>
>>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>>
>>>
>>>
>>> stratos> list-applications
>>>
>>> Applications found:
>>>
>>> +-----------------+-----------------+----------+
>>>
>>> | Application ID  | Alias           | Status   |
>>>
>>> +-----------------+-----------------+----------+
>>>
>>> | cartridge-proxy | cartridge-proxy | Deployed |
>>>
>>> +-----------------+-----------------+----------+
>>>
>>> | cisco-sample-vm | cisco-sample-vm | Deployed |
>>>
>>> +-----------------+-----------------+----------+
>>>
>>>
>>>
>>> $ curl -uadmin:admin -k -H'Content-type: application/json'
>>> https://localhost:9443/api/autoscalingPolicies
>>>
>>>
>>> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>>>
>>>
>>>
>>> $ curl -uadmin:admin -k -H'Content-type: application/json'
>>> https://localhost:9443/api/applicationPolicies
>>>
>>>
>>> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>>>
>>>
>>>
>>> [3] Cartridge test output from Stratos 4.1. Note:
>>>
>>>
>>>
>>> 1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.
>>>
>>> 2.      We expect the Cartridge Agent to use a DNS lookup when it ends
>>> up reconnecting, and this worked just fine in Stratos 4.0.
>>>
>>>
>>>
>>> *CARTRIDGE TEST, BEFORE FAILOVER*
>>>
>>> *==============================*
>>>
>>>
>>>
>>> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>>>
>>> cisco-sample-vm: applicationInstances 1, groupInstances 0,
>>> clusterInstances 1, members 1 (Active 1)
>>>
>>>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>>>
>>>
>>>
>>> *CARTRIDGE TEST, AFTER FAILOVER*
>>>
>>> *=============================*
>>>
>>>
>>>
>>> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>>>
>>> cisco-sample-vm: applicationInstances 1, groupInstances 0,
>>> clusterInstances 1, members 1 (Active 1)
>>>
>>>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>>>
>>>
>>>
>>> *CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE,
>>> THEN WAIT 2 MINUTES*
>>>
>>>
>>> *===================================================================================*
>>>
>>>
>>>
>>> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>>>
>>> cisco-sample-vm: applicationInstances 1, groupInstances 0,
>>> clusterInstances 1, members 1 (Active 1)
>>>
>>>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>>> *Sent:* 14 May 2015 20:34
>>>
>>>
>>> *To:* dev
>>> *Subject:* Re: Clustered deployments of Stratos
>>>
>>>
>>>
>>> It would be better to use the REST API to query and see whether the
>>> relevant entities are persisted. Since data is stored in binary format in
>>> the registry it would be difficult to query the database and verify this.
>>>
>>>
>>>
>>> On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <
>>> shahhaqu@cisco.com> wrote:
>>>
>>> I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am
>>> going to need more specifics.
>>>
>>>
>>>
>>> For example, what query would you recommend to look at say deployment
>>> policies and cartridge definitions?
>>>
>>>
>>>
>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>>> *Sent:* 09 May 2015 09:08
>>>
>>>
>>> *To:* dev
>>> *Subject:* Re: Clustered deployments of Stratos
>>>
>>>
>>>
>>> Yes you could refer the tables that have the prefix "REG_".
>>>
>>>
>>>
>>> On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <
>>> shahhaqu@cisco.com> wrote:
>>>
>>> Can you suggest what tables to look at?
>>>
>>>
>>>
>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>>> *Sent:* 07 May 2015 18:00
>>>
>>>
>>> *To:* dev
>>> *Subject:* Re: Clustered deployments of Stratos
>>>
>>>
>>>
>>> Hi Shaheed,
>>>
>>>
>>>
>>> Thanks for the clarification! May be the problem is with the MySQL
>>> active-passive configuration.
>>>
>>>
>>>
>>> I understand that you are switching the same OpenStack volume from
>>> active node to the passive node (when the passive node becomes active)
>>> therefore technically it should work. May be we need to investigate this
>>> problem further by analysing whether data is persisted properly in the
>>> active node before the passive node becomes active.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <
>>> shahhaqu@cisco.com> wrote:
>>>
>>> The data is not synchronised between the active and passive nodes. For
>>> clarity, this is the HA model we had, much as described in the blog:
>>>
>>>
>>>
>>> ·        2 nodes, with Pacemaker in active-passive mode.
>>>
>>> ·        Under Pacemaker control:
>>>
>>> o   We run MySQL in active-passive mode, using a single OpenStack
>>> volume which we attach/reattach as the active role moves around nodes.
>>>
>>> o   As the Pacemaker moves the volume, and thus MySQL around on node
>>> failures, ActiveMQ and Stratos are moved around too.
>>>
>>> o   Thus, everything operates in active-passive mode.
>>>
>>>
>>>
>>> Even in this model, as the active Stratos 4.0 is moved around (i.e. the
>>> Stratos JVM on the old active node has gone with the node, and Pacemaker
>>> starts up a new Stratos JVM on what used to be the passive node), we found
>>> that the Cartridge Definition objects were found to be missing and, as a
>>> clumsy workaround [1], we had to replay the stored copied of them into
>>> Stratos using the REST API.
>>>
>>>
>>>
>>> With Stratos 4.1, using the new object names , early indications are *Deployment
>>> Policies* and *Application Deployment* policies are lost as the active
>>> fails over to the passive. If anything, these objects are more likely to
>>> hit the problems of [1], since Stratos 4.1 expects these to be tweaked on
>>> the fly (min/max etc).
>>>
>>>
>>>
>>> Thanks, Shaheed
>>>
>>>
>>>
>>> [1] Clearly, this loses any changes that were not in the stored copies.
>>>
>>>
>>>
>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>>> *Sent:* 03 May 2015 06:43
>>> *To:* dev@stratos.apache.org
>>>
>>>
>>> *Subject:* Re: Clustered deployments of Stratos
>>>
>>>
>>>
>>> Hi Shaheed,
>>>
>>>
>>>
>>> Thanks for taking time to test this!
>>>
>>>
>>>
>>> Just to clarify the exact problem, do you mean that data is not
>>> synchronized between the active and passive nodes or they are not persisted
>>> in the active node?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>> On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>
>>> wrote:
>>>
>>>
>>> I have been looking into our use of Linux HA to setup an Active-Passive
>>> configuration. Testing indicates that in 4.1 (beta1), several objects seem
>>> not to be persisted properly. This includes at least:
>>>
>>> - Cartridges
>>> - Deployment policies
>>>
>>> Am I missing something? Is it safe to workaround this by replaying those
>>> objects?
>>>  ------------------------------
>>>
>>> *From:* Imesh Gunaratne [imesh@apache.org]
>>> *Sent:* 23 April 2015 10:47
>>> *To:* dev
>>> *Subject:* Re: Clustered deployments of Stratos
>>>
>>> Hi Shaheed,
>>>
>>>
>>>
>>> Currently N-way clustering is still not possible with CC, AS & SM. We
>>> completed the initial phase of this feature however it was not completed.
>>> You could refer mail thread "[Discuss] Clustering Feature Implementation
>>> for 4.1.0-Alpha Release" for details.
>>>
>>>
>>>
>>> However at present [1] is valid. We could use Linux HA and deploy CC, AS
>>> and SM in Active-Passive mode.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
>>> shahhaqu@cisco.com> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> We currently try to achieve HA with Stratos using something so
>>> unpleasant that I won’t even describe it here J. It has been suggested
>>> that Stratos has, for a while now, supported a clustered mode of deployment
>>> where, given N servers:
>>>
>>>
>>>
>>> ·        The SM, AS and MB operate in a N-way clustered mode
>>>
>>> ·        The CEP operates in a N-way loadsharing mode
>>>
>>> ·        The Cartridge Agents can react to a failure in one of the N
>>> CEPs by failing over to one of the other N-1 remaining servers
>>>
>>>
>>>
>>> In looking for documentation on how to set this up, I came across these
>>> two write-ups [1] and [2]. Questions:
>>>
>>>
>>>
>>> ·        Both these documents mention only using N=2. Is that still
>>> correct?
>>>
>>> ·        [1] Seems recently written, and [2] is a little older but not
>>> much. Are both documents still regarded as current?
>>>
>>>
>>>
>>> Also, I’d love to hear any other experiences people have of running
>>> configurations like this.
>>>
>>>
>>>
>>> Thanks, Shaheed
>>>
>>>
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>>>
>>> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> --
>>> Lahiru Sandaruwan
>>>
>>> Committer and PMC member, Apache Stratos,
>>> Senior Software Engineer,
>>> WSO2 Inc., http://wso2.com
>>>
>>> lean.enterprise.middleware
>>>
>>> phone: +94773325954
>>> email: lahirus@wso2.com blog: http://lahiruwrites.blogspot.com/
>>> linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
>>>
>>>
>>>
>>
>>
>>
>> --
>> Imesh Gunaratne
>>
>> Senior Technical Lead, WSO2
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
> Mobile: +94776442007
>
>
>


-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Clustered deployments of Stratos

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Shaheed,

We had been using member fault detection to test the termination-behavior
in beta2 and after that like one week before. So, i believe that It will
work in the latest master. However we will also verify this again.

Thanks,
Reka

On Mon, May 18, 2015 at 7:53 PM, Imesh Gunaratne <im...@apache.org> wrote:

> Thanks Shaheed! I will verify the second problem where Stratos is not
> detecting manually terminated members.
>
> Thanks
>
> On Mon, May 18, 2015 at 3:39 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
>>  Ack. We are just in the middle of doing getting sync’d up again to
>> master, and it sounds like that might fix the persistence issue.
>>
>>
>>
>> I guess that leaves the Cartridge Agent reconnect side of the problem…
>>
>>
>>
>> *From:* Lahiru Sandaruwan [mailto:lahirus@wso2.com]
>> *Sent:* 17 May 2015 03:06
>>
>> *To:* dev
>> *Cc:* Ryan Du Plessis (rdupless); Luca Martini (lmartini)
>> *Subject:* Re: Clustered deployments of Stratos
>>
>>
>>
>> Hi Shaheed,
>>
>>
>>
>> Similarly it would be a great help, if you can verify all these issues in
>> latest code, since we have been fixing a lot of issues in recent days, as a
>> result of RC1 testing.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> On Fri, May 15, 2015 at 9:42 PM, Imesh Gunaratne <im...@apache.org>
>> wrote:
>>
>> Hi Shaheed,
>>
>>
>>
>> Thanks for the quick response, after analyzing the results you have
>> provided again, it looks like only the deployment policies are missing
>> after the failover. We have fixed this issue in commit
>> revision: 0c515aa013850575ddcfa2e299da5f0ec250ebc3
>>
>>
>>
>>
>> http://mail-archives.apache.org/mod_mbox/incubator-stratos-commits/201504.mbox/%3C22eed4e8639c401a8fda637fa6bb4501@git.apache.org%3E
>>
>>
>>
>> Would you mind verifying whether this is there in your runtime?
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> On Fri, May 15, 2015 at 9:02 PM, Shaheedur Haque (shahhaqu) <
>> shahhaqu@cisco.com> wrote:
>>
>> The latter; we never have both Stratos instances running.
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* 15 May 2015 16:17
>> *To:* dev
>> *Cc:* Ryan Du Plessis (rdupless); Luca Martini (lmartini)
>>
>>
>> *Subject:* Re: Clustered deployments of Stratos
>>
>>
>>
>> Hi Shaheed,
>>
>>
>>
>> Do you have both active and passive Stratos nodes running at the same
>> time or do you start the passive node once the active node goes down?
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) <
>> shahhaqu@cisco.com> wrote:
>>
>> Hi Imesh,
>>
>>
>>
>> I finally got round to a proper series of tests, and here are the
>> conclusions:
>>
>>
>>
>> ·        In Stratos 4.0, after a Pacemaker driven failover, the newly
>> Active Stratos has lost all Cartridge Definitions.
>>
>> ·        In current [1] Stratos 4.1, after a Pacemaker driven failover,
>> the newly Active Stratos:
>>
>> o   Has lost all Deployment Policies.
>>
>> o   Has lost contact with the Cartridge Agents, and all VMs are stuck
>> with whatever state they had before the failover.
>>
>> ·        Note: I have not verified if Cartridge Groups are lost or not.
>>
>>
>>
>> I include the test results below at [2] and [3]. I am concerned as to
>> whether 4.1 is ready for GA on this basis, so though more testing is no
>> doubt possible (e.g. Cartridge Groups) I wanted to get this info to the
>> list ASAP.
>>
>>
>>
>> Thanks, Shaheed
>>
>>
>>
>> [1] A recent build somewhere between beta 1 and beta 2, but I don’t think
>> any relevant fixes have been made in master.
>>
>>
>>
>> [2] Persistence test output from Stratos 4.1. Note:
>>
>>
>>
>> 1.      In the build I have, the CLI is broken for a couple of commands;
>> these are supplemented by direct “curl” commands further down.
>>
>> 2.      I’ve used one of our commands to show the instances and their
>> state for a given application since there is not a compact JSON or
>> convenient Startos CLI for that.
>>
>>
>>
>> *PERSISTENCE TEST, BEFORE FAILOVER*
>>
>> *================================*
>>
>>
>>
>> stratos> list-tenants
>>
>> Tenants:
>>
>>
>> +-----------------------+-----------+------------------+--------+------------------------------+
>>
>> | Domain                | Tenant ID | Email            | State  | Created
>> Date                 |
>>
>>
>> +-----------------------+-----------+------------------+--------+------------------------------+
>>
>> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri
>> May 15 04:46:58 MDT 2015 |
>>
>>
>> +-----------------------+-----------+------------------+--------+------------------------------+
>>
>>
>>
>> stratos> list-network-partitions
>>
>> Network partitions found:
>>
>> +----------------------+----------------------+
>>
>> | Network Partition ID | Number of Partitions |
>>
>> +----------------------+----------------------+
>>
>> | RegionOne            | 1                    |
>>
>> +----------------------+----------------------+
>>
>>
>>
>> stratos> list-deployment-policies
>>
>> Deployment policies found:
>>
>> +-------------------+---------------+
>>
>> | ID                | Accessibility |
>>
>> +-------------------+---------------+
>>
>> | static-2-ha       | 1             |
>>
>> +-------------------+---------------+
>>
>> | autoscale-2-10-ha | 1             |
>>
>> +-------------------+---------------+
>>
>> | autoscale-1-5     | 1             |
>>
>> +-------------------+---------------+
>>
>> | static-1          | 1             |
>>
>> +-------------------+---------------+
>>
>>
>>
>> stratos> list-application-policies
>>
>> Error in listing application policies
>>
>> No application policies found
>>
>>
>>
>> stratos> list-autoscaling-policies
>>
>> Error in listing autoscaling policies
>>
>> No autoscaling policies found
>>
>>
>>
>> stratos> list-cartridges
>>
>> Cartridges found:
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | Type             | Category    | Name             |
>> Description                | Version | Multi-Tenant |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
>> Cartridge  | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
>> Cartridge  | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
>> Cartridge | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
>> Cartridge | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
>> Cartridge    | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
>> Cartridge    | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>>
>>
>> stratos> list-applications
>>
>> Applications found:
>>
>> +-----------------+-----------------+----------+
>>
>> | Application ID  | Alias           | Status   |
>>
>> +-----------------+-----------------+----------+
>>
>> | cartridge-proxy | cartridge-proxy | Deployed |
>>
>> +-----------------+-----------------+----------+
>>
>> | cisco-sample-vm | cisco-sample-vm | Deployed |
>>
>> +-----------------+-----------------+----------+
>>
>>
>>
>> $ curl -uadmin:admin -k -H'Content-type: application/json'
>> https://localhost:9443/api/autoscalingPolicies
>>
>>
>> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>>
>>
>>
>> $ curl -uadmin:admin -k -H'Content-type: application/json'
>> https://localhost:9443/api/applicationPolicies
>>
>>
>> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>>
>>
>>
>>
>>
>> *PERSISTENCE TEST, AFTER FAILOVER*
>>
>> *===============================*
>>
>>
>>
>> stratos> list-tenants
>>
>> Tenants:
>>
>>
>> +-----------------------+-----------+------------------+--------+------------------------------+
>>
>> | Domain                | Tenant ID | Email            | State  | Created
>> Date                 |
>>
>>
>> +-----------------------+-----------+------------------+--------+------------------------------+
>>
>> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri
>> May 15 05:26:52 MDT 2015 |
>>
>>
>> +-----------------------+-----------+------------------+--------+------------------------------+
>>
>>
>>
>> stratos> list-network-partitions
>>
>> Network partitions found:
>>
>> +----------------------+----------------------+
>>
>> | Network Partition ID | Number of Partitions |
>>
>> +----------------------+----------------------+
>>
>> | RegionOne            | 1                    |
>>
>> +----------------------+----------------------+
>>
>>
>>
>> stratos> list-deployment-policies
>>
>> No deployment policies found
>>
>>
>>
>> stratos> list-application-policies
>>
>> Error in listing application policies
>>
>> No application policies found
>>
>>
>>
>> stratos> list-autoscaling-policies
>>
>> Error in listing autoscaling policies
>>
>> No autoscaling policies found
>>
>>
>>
>> stratos> list-cartridges
>>
>> Cartridges found:
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | Type             | Category    | Name             |
>> Description                | Version | Multi-Tenant |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
>> Cartridge  | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
>> Cartridge  | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
>> Cartridge | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
>> Cartridge | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
>> Cartridge    | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
>> Cartridge    | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>>
>>
>> stratos> list-applications
>>
>> Applications found:
>>
>> +-----------------+-----------------+----------+
>>
>> | Application ID  | Alias           | Status   |
>>
>> +-----------------+-----------------+----------+
>>
>> | cartridge-proxy | cartridge-proxy | Deployed |
>>
>> +-----------------+-----------------+----------+
>>
>> | cisco-sample-vm | cisco-sample-vm | Deployed |
>>
>> +-----------------+-----------------+----------+
>>
>>
>>
>> $ curl -uadmin:admin -k -H'Content-type: application/json'
>> https://localhost:9443/api/autoscalingPolicies
>>
>>
>> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>>
>>
>>
>> $ curl -uadmin:admin -k -H'Content-type: application/json'
>> https://localhost:9443/api/applicationPolicies
>>
>>
>> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>>
>>
>>
>> [3] Cartridge test output from Stratos 4.1. Note:
>>
>>
>>
>> 1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.
>>
>> 2.      We expect the Cartridge Agent to use a DNS lookup when it ends
>> up reconnecting, and this worked just fine in Stratos 4.0.
>>
>>
>>
>> *CARTRIDGE TEST, BEFORE FAILOVER*
>>
>> *==============================*
>>
>>
>>
>> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>>
>> cisco-sample-vm: applicationInstances 1, groupInstances 0,
>> clusterInstances 1, members 1 (Active 1)
>>
>>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>>
>>
>>
>> *CARTRIDGE TEST, AFTER FAILOVER*
>>
>> *=============================*
>>
>>
>>
>> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>>
>> cisco-sample-vm: applicationInstances 1, groupInstances 0,
>> clusterInstances 1, members 1 (Active 1)
>>
>>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>>
>>
>>
>> *CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN
>> WAIT 2 MINUTES*
>>
>>
>> *===================================================================================*
>>
>>
>>
>> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>>
>> cisco-sample-vm: applicationInstances 1, groupInstances 0,
>> clusterInstances 1, members 1 (Active 1)
>>
>>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>>
>>
>>
>>
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* 14 May 2015 20:34
>>
>>
>> *To:* dev
>> *Subject:* Re: Clustered deployments of Stratos
>>
>>
>>
>> It would be better to use the REST API to query and see whether the
>> relevant entities are persisted. Since data is stored in binary format in
>> the registry it would be difficult to query the database and verify this.
>>
>>
>>
>> On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <
>> shahhaqu@cisco.com> wrote:
>>
>> I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am
>> going to need more specifics.
>>
>>
>>
>> For example, what query would you recommend to look at say deployment
>> policies and cartridge definitions?
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* 09 May 2015 09:08
>>
>>
>> *To:* dev
>> *Subject:* Re: Clustered deployments of Stratos
>>
>>
>>
>> Yes you could refer the tables that have the prefix "REG_".
>>
>>
>>
>> On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <
>> shahhaqu@cisco.com> wrote:
>>
>> Can you suggest what tables to look at?
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* 07 May 2015 18:00
>>
>>
>> *To:* dev
>> *Subject:* Re: Clustered deployments of Stratos
>>
>>
>>
>> Hi Shaheed,
>>
>>
>>
>> Thanks for the clarification! May be the problem is with the MySQL
>> active-passive configuration.
>>
>>
>>
>> I understand that you are switching the same OpenStack volume from active
>> node to the passive node (when the passive node becomes active) therefore
>> technically it should work. May be we need to investigate this problem
>> further by analysing whether data is persisted properly in the active node
>> before the passive node becomes active.
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <
>> shahhaqu@cisco.com> wrote:
>>
>> The data is not synchronised between the active and passive nodes. For
>> clarity, this is the HA model we had, much as described in the blog:
>>
>>
>>
>> ·        2 nodes, with Pacemaker in active-passive mode.
>>
>> ·        Under Pacemaker control:
>>
>> o   We run MySQL in active-passive mode, using a single OpenStack volume
>> which we attach/reattach as the active role moves around nodes.
>>
>> o   As the Pacemaker moves the volume, and thus MySQL around on node
>> failures, ActiveMQ and Stratos are moved around too.
>>
>> o   Thus, everything operates in active-passive mode.
>>
>>
>>
>> Even in this model, as the active Stratos 4.0 is moved around (i.e. the
>> Stratos JVM on the old active node has gone with the node, and Pacemaker
>> starts up a new Stratos JVM on what used to be the passive node), we found
>> that the Cartridge Definition objects were found to be missing and, as a
>> clumsy workaround [1], we had to replay the stored copied of them into
>> Stratos using the REST API.
>>
>>
>>
>> With Stratos 4.1, using the new object names , early indications are *Deployment
>> Policies* and *Application Deployment* policies are lost as the active
>> fails over to the passive. If anything, these objects are more likely to
>> hit the problems of [1], since Stratos 4.1 expects these to be tweaked on
>> the fly (min/max etc).
>>
>>
>>
>> Thanks, Shaheed
>>
>>
>>
>> [1] Clearly, this loses any changes that were not in the stored copies.
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* 03 May 2015 06:43
>> *To:* dev@stratos.apache.org
>>
>>
>> *Subject:* Re: Clustered deployments of Stratos
>>
>>
>>
>> Hi Shaheed,
>>
>>
>>
>> Thanks for taking time to test this!
>>
>>
>>
>> Just to clarify the exact problem, do you mean that data is not
>> synchronized between the active and passive nodes or they are not persisted
>> in the active node?
>>
>>
>>
>> Thanks
>>
>>
>> On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>
>> wrote:
>>
>>
>> I have been looking into our use of Linux HA to setup an Active-Passive
>> configuration. Testing indicates that in 4.1 (beta1), several objects seem
>> not to be persisted properly. This includes at least:
>>
>> - Cartridges
>> - Deployment policies
>>
>> Am I missing something? Is it safe to workaround this by replaying those
>> objects?
>>  ------------------------------
>>
>> *From:* Imesh Gunaratne [imesh@apache.org]
>> *Sent:* 23 April 2015 10:47
>> *To:* dev
>> *Subject:* Re: Clustered deployments of Stratos
>>
>> Hi Shaheed,
>>
>>
>>
>> Currently N-way clustering is still not possible with CC, AS & SM. We
>> completed the initial phase of this feature however it was not completed.
>> You could refer mail thread "[Discuss] Clustering Feature Implementation
>> for 4.1.0-Alpha Release" for details.
>>
>>
>>
>> However at present [1] is valid. We could use Linux HA and deploy CC, AS
>> and SM in Active-Passive mode.
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
>> shahhaqu@cisco.com> wrote:
>>
>> Hi,
>>
>>
>>
>> We currently try to achieve HA with Stratos using something so unpleasant
>> that I won’t even describe it here J. It has been suggested that Stratos
>> has, for a while now, supported a clustered mode of deployment where, given
>> N servers:
>>
>>
>>
>> ·        The SM, AS and MB operate in a N-way clustered mode
>>
>> ·        The CEP operates in a N-way loadsharing mode
>>
>> ·        The Cartridge Agents can react to a failure in one of the N
>> CEPs by failing over to one of the other N-1 remaining servers
>>
>>
>>
>> In looking for documentation on how to set this up, I came across these
>> two write-ups [1] and [2]. Questions:
>>
>>
>>
>> ·        Both these documents mention only using N=2. Is that still
>> correct?
>>
>> ·        [1] Seems recently written, and [2] is a little older but not
>> much. Are both documents still regarded as current?
>>
>>
>>
>> Also, I’d love to hear any other experiences people have of running
>> configurations like this.
>>
>>
>>
>> Thanks, Shaheed
>>
>>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>>
>> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>> --
>>
>> --
>> Lahiru Sandaruwan
>>
>> Committer and PMC member, Apache Stratos,
>> Senior Software Engineer,
>> WSO2 Inc., http://wso2.com
>>
>> lean.enterprise.middleware
>>
>> phone: +94773325954
>> email: lahirus@wso2.com blog: http://lahiruwrites.blogspot.com/
>> linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
>>
>>
>>
>
>
>
> --
> Imesh Gunaratne
>
> Senior Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Clustered deployments of Stratos

Posted by Imesh Gunaratne <im...@apache.org>.
Thanks Shaheed! I will verify the second problem where Stratos is not
detecting manually terminated members.

Thanks

On Mon, May 18, 2015 at 3:39 PM, Shaheedur Haque (shahhaqu) <
shahhaqu@cisco.com> wrote:

>  Ack. We are just in the middle of doing getting sync’d up again to
> master, and it sounds like that might fix the persistence issue.
>
>
>
> I guess that leaves the Cartridge Agent reconnect side of the problem…
>
>
>
> *From:* Lahiru Sandaruwan [mailto:lahirus@wso2.com]
> *Sent:* 17 May 2015 03:06
>
> *To:* dev
> *Cc:* Ryan Du Plessis (rdupless); Luca Martini (lmartini)
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Similarly it would be a great help, if you can verify all these issues in
> latest code, since we have been fixing a lot of issues in recent days, as a
> result of RC1 testing.
>
>
>
> Thanks.
>
>
>
> On Fri, May 15, 2015 at 9:42 PM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Shaheed,
>
>
>
> Thanks for the quick response, after analyzing the results you have
> provided again, it looks like only the deployment policies are missing
> after the failover. We have fixed this issue in commit
> revision: 0c515aa013850575ddcfa2e299da5f0ec250ebc3
>
>
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-stratos-commits/201504.mbox/%3C22eed4e8639c401a8fda637fa6bb4501@git.apache.org%3E
>
>
>
> Would you mind verifying whether this is there in your runtime?
>
>
>
> Thanks
>
>
>
>
>
> On Fri, May 15, 2015 at 9:02 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> The latter; we never have both Stratos instances running.
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 15 May 2015 16:17
> *To:* dev
> *Cc:* Ryan Du Plessis (rdupless); Luca Martini (lmartini)
>
>
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Do you have both active and passive Stratos nodes running at the same time
> or do you start the passive node once the active node goes down?
>
>
>
> Thanks
>
>
>
> On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Hi Imesh,
>
>
>
> I finally got round to a proper series of tests, and here are the
> conclusions:
>
>
>
> ·        In Stratos 4.0, after a Pacemaker driven failover, the newly
> Active Stratos has lost all Cartridge Definitions.
>
> ·        In current [1] Stratos 4.1, after a Pacemaker driven failover,
> the newly Active Stratos:
>
> o   Has lost all Deployment Policies.
>
> o   Has lost contact with the Cartridge Agents, and all VMs are stuck
> with whatever state they had before the failover.
>
> ·        Note: I have not verified if Cartridge Groups are lost or not.
>
>
>
> I include the test results below at [2] and [3]. I am concerned as to
> whether 4.1 is ready for GA on this basis, so though more testing is no
> doubt possible (e.g. Cartridge Groups) I wanted to get this info to the
> list ASAP.
>
>
>
> Thanks, Shaheed
>
>
>
> [1] A recent build somewhere between beta 1 and beta 2, but I don’t think
> any relevant fixes have been made in master.
>
>
>
> [2] Persistence test output from Stratos 4.1. Note:
>
>
>
> 1.      In the build I have, the CLI is broken for a couple of commands;
> these are supplemented by direct “curl” commands further down.
>
> 2.      I’ve used one of our commands to show the instances and their
> state for a given application since there is not a compact JSON or
> convenient Startos CLI for that.
>
>
>
> *PERSISTENCE TEST, BEFORE FAILOVER*
>
> *================================*
>
>
>
> stratos> list-tenants
>
> Tenants:
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | Domain                | Tenant ID | Email            | State  | Created
> Date                 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri May
> 15 04:46:58 MDT 2015 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
>
>
> stratos> list-network-partitions
>
> Network partitions found:
>
> +----------------------+----------------------+
>
> | Network Partition ID | Number of Partitions |
>
> +----------------------+----------------------+
>
> | RegionOne            | 1                    |
>
> +----------------------+----------------------+
>
>
>
> stratos> list-deployment-policies
>
> Deployment policies found:
>
> +-------------------+---------------+
>
> | ID                | Accessibility |
>
> +-------------------+---------------+
>
> | static-2-ha       | 1             |
>
> +-------------------+---------------+
>
> | autoscale-2-10-ha | 1             |
>
> +-------------------+---------------+
>
> | autoscale-1-5     | 1             |
>
> +-------------------+---------------+
>
> | static-1          | 1             |
>
> +-------------------+---------------+
>
>
>
> stratos> list-application-policies
>
> Error in listing application policies
>
> No application policies found
>
>
>
> stratos> list-autoscaling-policies
>
> Error in listing autoscaling policies
>
> No autoscaling policies found
>
>
>
> stratos> list-cartridges
>
> Cartridges found:
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | Type             | Category    | Name             |
> Description                | Version | Multi-Tenant |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +-----------------+-----------------+----------+
>
> | Application ID  | Alias           | Status   |
>
> +-----------------+-----------------+----------+
>
> | cartridge-proxy | cartridge-proxy | Deployed |
>
> +-----------------+-----------------+----------+
>
> | cisco-sample-vm | cisco-sample-vm | Deployed |
>
> +-----------------+-----------------+----------+
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/autoscalingPolicies
>
>
> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/applicationPolicies
>
>
> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>
>
>
>
>
> *PERSISTENCE TEST, AFTER FAILOVER*
>
> *===============================*
>
>
>
> stratos> list-tenants
>
> Tenants:
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | Domain                | Tenant ID | Email            | State  | Created
> Date                 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri May
> 15 05:26:52 MDT 2015 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
>
>
> stratos> list-network-partitions
>
> Network partitions found:
>
> +----------------------+----------------------+
>
> | Network Partition ID | Number of Partitions |
>
> +----------------------+----------------------+
>
> | RegionOne            | 1                    |
>
> +----------------------+----------------------+
>
>
>
> stratos> list-deployment-policies
>
> No deployment policies found
>
>
>
> stratos> list-application-policies
>
> Error in listing application policies
>
> No application policies found
>
>
>
> stratos> list-autoscaling-policies
>
> Error in listing autoscaling policies
>
> No autoscaling policies found
>
>
>
> stratos> list-cartridges
>
> Cartridges found:
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | Type             | Category    | Name             |
> Description                | Version | Multi-Tenant |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +-----------------+-----------------+----------+
>
> | Application ID  | Alias           | Status   |
>
> +-----------------+-----------------+----------+
>
> | cartridge-proxy | cartridge-proxy | Deployed |
>
> +-----------------+-----------------+----------+
>
> | cisco-sample-vm | cisco-sample-vm | Deployed |
>
> +-----------------+-----------------+----------+
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/autoscalingPolicies
>
>
> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/applicationPolicies
>
>
> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>
>
>
> [3] Cartridge test output from Stratos 4.1. Note:
>
>
>
> 1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.
>
> 2.      We expect the Cartridge Agent to use a DNS lookup when it ends up
> reconnecting, and this worked just fine in Stratos 4.0.
>
>
>
> *CARTRIDGE TEST, BEFORE FAILOVER*
>
> *==============================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
> *CARTRIDGE TEST, AFTER FAILOVER*
>
> *=============================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
> *CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN
> WAIT 2 MINUTES*
>
>
> *===================================================================================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 14 May 2015 20:34
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> It would be better to use the REST API to query and see whether the
> relevant entities are persisted. Since data is stored in binary format in
> the registry it would be difficult to query the database and verify this.
>
>
>
> On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am
> going to need more specifics.
>
>
>
> For example, what query would you recommend to look at say deployment
> policies and cartridge definitions?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 09 May 2015 09:08
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Yes you could refer the tables that have the prefix "REG_".
>
>
>
> On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Can you suggest what tables to look at?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 07 May 2015 18:00
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for the clarification! May be the problem is with the MySQL
> active-passive configuration.
>
>
>
> I understand that you are switching the same OpenStack volume from active
> node to the passive node (when the passive node becomes active) therefore
> technically it should work. May be we need to investigate this problem
> further by analysing whether data is persisted properly in the active node
> before the passive node becomes active.
>
>
>
> Thanks
>
>
>
> On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> The data is not synchronised between the active and passive nodes. For
> clarity, this is the HA model we had, much as described in the blog:
>
>
>
> ·        2 nodes, with Pacemaker in active-passive mode.
>
> ·        Under Pacemaker control:
>
> o   We run MySQL in active-passive mode, using a single OpenStack volume
> which we attach/reattach as the active role moves around nodes.
>
> o   As the Pacemaker moves the volume, and thus MySQL around on node
> failures, ActiveMQ and Stratos are moved around too.
>
> o   Thus, everything operates in active-passive mode.
>
>
>
> Even in this model, as the active Stratos 4.0 is moved around (i.e. the
> Stratos JVM on the old active node has gone with the node, and Pacemaker
> starts up a new Stratos JVM on what used to be the passive node), we found
> that the Cartridge Definition objects were found to be missing and, as a
> clumsy workaround [1], we had to replay the stored copied of them into
> Stratos using the REST API.
>
>
>
> With Stratos 4.1, using the new object names , early indications are *Deployment
> Policies* and *Application Deployment* policies are lost as the active
> fails over to the passive. If anything, these objects are more likely to
> hit the problems of [1], since Stratos 4.1 expects these to be tweaked on
> the fly (min/max etc).
>
>
>
> Thanks, Shaheed
>
>
>
> [1] Clearly, this loses any changes that were not in the stored copies.
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 03 May 2015 06:43
> *To:* dev@stratos.apache.org
>
>
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for taking time to test this!
>
>
>
> Just to clarify the exact problem, do you mean that data is not
> synchronized between the active and passive nodes or they are not persisted
> in the active node?
>
>
>
> Thanks
>
>
> On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>
> wrote:
>
>
> I have been looking into our use of Linux HA to setup an Active-Passive
> configuration. Testing indicates that in 4.1 (beta1), several objects seem
> not to be persisted properly. This includes at least:
>
> - Cartridges
> - Deployment policies
>
> Am I missing something? Is it safe to workaround this by replaying those
> objects?
>  ------------------------------
>
> *From:* Imesh Gunaratne [imesh@apache.org]
> *Sent:* 23 April 2015 10:47
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
> Hi Shaheed,
>
>
>
> Currently N-way clustering is still not possible with CC, AS & SM. We
> completed the initial phase of this feature however it was not completed.
> You could refer mail thread "[Discuss] Clustering Feature Implementation
> for 4.1.0-Alpha Release" for details.
>
>
>
> However at present [1] is valid. We could use Linux HA and deploy CC, AS
> and SM in Active-Passive mode.
>
>
>
> Thanks
>
>
>
>
>
>
>
> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Hi,
>
>
>
> We currently try to achieve HA with Stratos using something so unpleasant
> that I won’t even describe it here J. It has been suggested that Stratos
> has, for a while now, supported a clustered mode of deployment where, given
> N servers:
>
>
>
> ·        The SM, AS and MB operate in a N-way clustered mode
>
> ·        The CEP operates in a N-way loadsharing mode
>
> ·        The Cartridge Agents can react to a failure in one of the N CEPs
> by failing over to one of the other N-1 remaining servers
>
>
>
> In looking for documentation on how to set this up, I came across these
> two write-ups [1] and [2]. Questions:
>
>
>
> ·        Both these documents mention only using N=2. Is that still
> correct?
>
> ·        [1] Seems recently written, and [2] is a little older but not
> much. Are both documents still regarded as current?
>
>
>
> Also, I’d love to hear any other experiences people have of running
> configurations like this.
>
>
>
> Thanks, Shaheed
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>
> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> --
> Lahiru Sandaruwan
>
> Committer and PMC member, Apache Stratos,
> Senior Software Engineer,
> WSO2 Inc., http://wso2.com
>
> lean.enterprise.middleware
>
> phone: +94773325954
> email: lahirus@wso2.com blog: http://lahiruwrites.blogspot.com/
> linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146
>
>
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Clustered deployments of Stratos

Posted by "Shaheedur Haque (shahhaqu)" <sh...@cisco.com>.
Ack. We are just in the middle of doing getting sync’d up again to master, and it sounds like that might fix the persistence issue.

I guess that leaves the Cartridge Agent reconnect side of the problem…

From: Lahiru Sandaruwan [mailto:lahirus@wso2.com]
Sent: 17 May 2015 03:06
To: dev
Cc: Ryan Du Plessis (rdupless); Luca Martini (lmartini)
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Similarly it would be a great help, if you can verify all these issues in latest code, since we have been fixing a lot of issues in recent days, as a result of RC1 testing.

Thanks.

On Fri, May 15, 2015 at 9:42 PM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Shaheed,

Thanks for the quick response, after analyzing the results you have provided again, it looks like only the deployment policies are missing after the failover. We have fixed this issue in commit revision: 0c515aa013850575ddcfa2e299da5f0ec250ebc3

http://mail-archives.apache.org/mod_mbox/incubator-stratos-commits/201504.mbox/%3C22eed4e8639c401a8fda637fa6bb4501@git.apache.org%3E

Would you mind verifying whether this is there in your runtime?

Thanks


On Fri, May 15, 2015 at 9:02 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
The latter; we never have both Stratos instances running.

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 15 May 2015 16:17
To: dev
Cc: Ryan Du Plessis (rdupless); Luca Martini (lmartini)

Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Do you have both active and passive Stratos nodes running at the same time or do you start the passive node once the active node goes down?

Thanks

On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Hi Imesh,

I finally got round to a proper series of tests, and here are the conclusions:


•        In Stratos 4.0, after a Pacemaker driven failover, the newly Active Stratos has lost all Cartridge Definitions.

•        In current [1] Stratos 4.1, after a Pacemaker driven failover, the newly Active Stratos:

o   Has lost all Deployment Policies.

o   Has lost contact with the Cartridge Agents, and all VMs are stuck with whatever state they had before the failover.

•        Note: I have not verified if Cartridge Groups are lost or not.

I include the test results below at [2] and [3]. I am concerned as to whether 4.1 is ready for GA on this basis, so though more testing is no doubt possible (e.g. Cartridge Groups) I wanted to get this info to the list ASAP.

Thanks, Shaheed

[1] A recent build somewhere between beta 1 and beta 2, but I don’t think any relevant fixes have been made in master.

[2] Persistence test output from Stratos 4.1. Note:


1.      In the build I have, the CLI is broken for a couple of commands; these are supplemented by direct “curl” commands further down.

2.      I’ve used one of our commands to show the instances and their state for a given application since there is not a compact JSON or convenient Startos CLI for that.

PERSISTENCE TEST, BEFORE FAILOVER
================================

stratos> list-tenants
Tenants:
+-----------------------+-----------+------------------+--------+------------------------------+
| Domain                | Tenant ID | Email            | State  | Created Date                 |
+-----------------------+-----------+------------------+--------+------------------------------+
| cloud1.qmog.cisco.com<http://cloud1.qmog.cisco.com> | 1         | cloud1@cisco.com<ma...@cisco.com> | Active | Fri May 15 04:46:58 MDT 2015 |
+-----------------------+-----------+------------------+--------+------------------------------+

stratos> list-network-partitions
Network partitions found:
+----------------------+----------------------+
| Network Partition ID | Number of Partitions |
+----------------------+----------------------+
| RegionOne            | 1                    |
+----------------------+----------------------+

stratos> list-deployment-policies
Deployment policies found:
+-------------------+---------------+
| ID                | Accessibility |
+-------------------+---------------+
| static-2-ha       | 1             |
+-------------------+---------------+
| autoscale-2-10-ha | 1             |
+-------------------+---------------+
| autoscale-1-5     | 1             |
+-------------------+---------------+
| static-1          | 1             |
+-------------------+---------------+

stratos> list-application-policies
Error in listing application policies
No application policies found

stratos> list-autoscaling-policies
Error in listing autoscaling policies
No autoscaling policies found

stratos> list-cartridges
Cartridges found:
+------------------+-------------+------------------+----------------------------+---------+--------------+
| Type             | Category    | Name             | Description                | Version | Multi-Tenant |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+

stratos> list-applications
Applications found:
+-----------------+-----------------+----------+
| Application ID  | Alias           | Status   |
+-----------------+-----------------+----------+
| cartridge-proxy | cartridge-proxy | Deployed |
+-----------------+-----------------+----------+
| cisco-sample-vm | cisco-sample-vm | Deployed |
+-----------------+-----------------+----------+

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/autoscalingPolicies
[{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/applicationPolicies
[{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]


PERSISTENCE TEST, AFTER FAILOVER
===============================

stratos> list-tenants
Tenants:
+-----------------------+-----------+------------------+--------+------------------------------+
| Domain                | Tenant ID | Email            | State  | Created Date                 |
+-----------------------+-----------+------------------+--------+------------------------------+
| cloud1.qmog.cisco.com<http://cloud1.qmog.cisco.com> | 1         | cloud1@cisco.com<ma...@cisco.com> | Active | Fri May 15 05:26:52 MDT 2015 |
+-----------------------+-----------+------------------+--------+------------------------------+

stratos> list-network-partitions
Network partitions found:
+----------------------+----------------------+
| Network Partition ID | Number of Partitions |
+----------------------+----------------------+
| RegionOne            | 1                    |
+----------------------+----------------------+

stratos> list-deployment-policies
No deployment policies found

stratos> list-application-policies
Error in listing application policies
No application policies found

stratos> list-autoscaling-policies
Error in listing autoscaling policies
No autoscaling policies found

stratos> list-cartridges
Cartridges found:
+------------------+-------------+------------------+----------------------------+---------+--------------+
| Type             | Category    | Name             | Description                | Version | Multi-Tenant |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+

stratos> list-applications
Applications found:
+-----------------+-----------------+----------+
| Application ID  | Alias           | Status   |
+-----------------+-----------------+----------+
| cartridge-proxy | cartridge-proxy | Deployed |
+-----------------+-----------------+----------+
| cisco-sample-vm | cisco-sample-vm | Deployed |
+-----------------+-----------------+----------+

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/autoscalingPolicies
[{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/applicationPolicies
[{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]

[3] Cartridge test output from Stratos 4.1. Note:


1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.

2.      We expect the Cartridge Agent to use a DNS lookup when it ends up reconnecting, and this worked just fine in Stratos 4.0.

CARTRIDGE TEST, BEFORE FAILOVER
==============================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
     cisco-sample-vm: 172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active

CARTRIDGE TEST, AFTER FAILOVER
=============================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
     cisco-sample-vm: 172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active

CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN WAIT 2 MINUTES
===================================================================================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
     cisco-sample-vm: 172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active



From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 14 May 2015 20:34

To: dev
Subject: Re: Clustered deployments of Stratos

It would be better to use the REST API to query and see whether the relevant entities are persisted. Since data is stored in binary format in the registry it would be difficult to query the database and verify this.

On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am going to need more specifics.

For example, what query would you recommend to look at say deployment policies and cartridge definitions?

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 09 May 2015 09:08

To: dev
Subject: Re: Clustered deployments of Stratos

Yes you could refer the tables that have the prefix "REG_".

On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Can you suggest what tables to look at?

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 07 May 2015 18:00

To: dev
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for the clarification! May be the problem is with the MySQL active-passive configuration.

I understand that you are switching the same OpenStack volume from active node to the passive node (when the passive node becomes active) therefore technically it should work. May be we need to investigate this problem further by analysing whether data is persisted properly in the active node before the passive node becomes active.

Thanks

On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
The data is not synchronised between the active and passive nodes. For clarity, this is the HA model we had, much as described in the blog:


•        2 nodes, with Pacemaker in active-passive mode.

•        Under Pacemaker control:

o   We run MySQL in active-passive mode, using a single OpenStack volume which we attach/reattach as the active role moves around nodes.

o   As the Pacemaker moves the volume, and thus MySQL around on node failures, ActiveMQ and Stratos are moved around too.

o   Thus, everything operates in active-passive mode.

Even in this model, as the active Stratos 4.0 is moved around (i.e. the Stratos JVM on the old active node has gone with the node, and Pacemaker starts up a new Stratos JVM on what used to be the passive node), we found that the Cartridge Definition objects were found to be missing and, as a clumsy workaround [1], we had to replay the stored copied of them into Stratos using the REST API.

With Stratos 4.1, using the new object names , early indications are Deployment Policies and Application Deployment policies are lost as the active fails over to the passive. If anything, these objects are more likely to hit the problems of [1], since Stratos 4.1 expects these to be tweaked on the fly (min/max etc).

Thanks, Shaheed

[1] Clearly, this loses any changes that were not in the stored copies.

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 03 May 2015 06:43
To: dev@stratos.apache.org<ma...@stratos.apache.org>

Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for taking time to test this!

Just to clarify the exact problem, do you mean that data is not synchronized between the active and passive nodes or they are not persisted in the active node?

Thanks

On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:

I have been looking into our use of Linux HA to setup an Active-Passive configuration. Testing indicates that in 4.1 (beta1), several objects seem not to be persisted properly. This includes at least:

- Cartridges
- Deployment policies

Am I missing something? Is it safe to workaround this by replaying those objects?
________________________________
From: Imesh Gunaratne [imesh@apache.org<ma...@apache.org>]
Sent: 23 April 2015 10:47
To: dev
Subject: Re: Clustered deployments of Stratos
Hi Shaheed,

Currently N-way clustering is still not possible with CC, AS & SM. We completed the initial phase of this feature however it was not completed. You could refer mail thread "[Discuss] Clustering Feature Implementation for 4.1.0-Alpha Release" for details.

However at present [1] is valid. We could use Linux HA and deploy CC, AS and SM in Active-Passive mode.

Thanks



On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Hi,

We currently try to achieve HA with Stratos using something so unpleasant that I won’t even describe it here ☺. It has been suggested that Stratos has, for a while now, supported a clustered mode of deployment where, given N servers:


•        The SM, AS and MB operate in a N-way clustered mode

•        The CEP operates in a N-way loadsharing mode

•        The Cartridge Agents can react to a failure in one of the N CEPs by failing over to one of the other N-1 remaining servers

In looking for documentation on how to set this up, I came across these two write-ups [1] and [2]. Questions:


•        Both these documents mention only using N=2. Is that still correct?

•        [1] Seems recently written, and [2] is a little older but not much. Are both documents still regarded as current?

Also, I’d love to hear any other experiences people have of running configurations like this.

Thanks, Shaheed

[1] https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
[2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html






--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
--
Lahiru Sandaruwan
Committer and PMC member, Apache Stratos,
Senior Software Engineer,
WSO2 Inc., http://wso2.com
lean.enterprise.middleware

phone: +94773325954<tel:%2B94773325954>
email: lahirus@wso2.com<ma...@wso2.com> blog: http://lahiruwrites.blogspot.com/
linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146


Re: Clustered deployments of Stratos

Posted by Lahiru Sandaruwan <la...@wso2.com>.
Hi Shaheed,

Similarly it would be a great help, if you can verify all these issues in
latest code, since we have been fixing a lot of issues in recent days, as a
result of RC1 testing.

Thanks.

On Fri, May 15, 2015 at 9:42 PM, Imesh Gunaratne <im...@apache.org> wrote:

> Hi Shaheed,
>
> Thanks for the quick response, after analyzing the results you have
> provided again, it looks like only the deployment policies are missing
> after the failover. We have fixed this issue in commit
> revision: 0c515aa013850575ddcfa2e299da5f0ec250ebc3
>
>
> http://mail-archives.apache.org/mod_mbox/incubator-stratos-commits/201504.mbox/%3C22eed4e8639c401a8fda637fa6bb4501@git.apache.org%3E
>
> Would you mind verifying whether this is there in your runtime?
>
> Thanks
>
>
> On Fri, May 15, 2015 at 9:02 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
>>  The latter; we never have both Stratos instances running.
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* 15 May 2015 16:17
>> *To:* dev
>> *Cc:* Ryan Du Plessis (rdupless); Luca Martini (lmartini)
>>
>> *Subject:* Re: Clustered deployments of Stratos
>>
>>
>>
>> Hi Shaheed,
>>
>>
>>
>> Do you have both active and passive Stratos nodes running at the same
>> time or do you start the passive node once the active node goes down?
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) <
>> shahhaqu@cisco.com> wrote:
>>
>> Hi Imesh,
>>
>>
>>
>> I finally got round to a proper series of tests, and here are the
>> conclusions:
>>
>>
>>
>> ·        In Stratos 4.0, after a Pacemaker driven failover, the newly
>> Active Stratos has lost all Cartridge Definitions.
>>
>> ·        In current [1] Stratos 4.1, after a Pacemaker driven failover,
>> the newly Active Stratos:
>>
>> o   Has lost all Deployment Policies.
>>
>> o   Has lost contact with the Cartridge Agents, and all VMs are stuck
>> with whatever state they had before the failover.
>>
>> ·        Note: I have not verified if Cartridge Groups are lost or not.
>>
>>
>>
>> I include the test results below at [2] and [3]. I am concerned as to
>> whether 4.1 is ready for GA on this basis, so though more testing is no
>> doubt possible (e.g. Cartridge Groups) I wanted to get this info to the
>> list ASAP.
>>
>>
>>
>> Thanks, Shaheed
>>
>>
>>
>> [1] A recent build somewhere between beta 1 and beta 2, but I don’t think
>> any relevant fixes have been made in master.
>>
>>
>>
>> [2] Persistence test output from Stratos 4.1. Note:
>>
>>
>>
>> 1.      In the build I have, the CLI is broken for a couple of commands;
>> these are supplemented by direct “curl” commands further down.
>>
>> 2.      I’ve used one of our commands to show the instances and their
>> state for a given application since there is not a compact JSON or
>> convenient Startos CLI for that.
>>
>>
>>
>> *PERSISTENCE TEST, BEFORE FAILOVER*
>>
>> *================================*
>>
>>
>>
>> stratos> list-tenants
>>
>> Tenants:
>>
>>
>> +-----------------------+-----------+------------------+--------+------------------------------+
>>
>> | Domain                | Tenant ID | Email            | State  | Created
>> Date                 |
>>
>>
>> +-----------------------+-----------+------------------+--------+------------------------------+
>>
>> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri
>> May 15 04:46:58 MDT 2015 |
>>
>>
>> +-----------------------+-----------+------------------+--------+------------------------------+
>>
>>
>>
>> stratos> list-network-partitions
>>
>> Network partitions found:
>>
>> +----------------------+----------------------+
>>
>> | Network Partition ID | Number of Partitions |
>>
>> +----------------------+----------------------+
>>
>> | RegionOne            | 1                    |
>>
>> +----------------------+----------------------+
>>
>>
>>
>> stratos> list-deployment-policies
>>
>> Deployment policies found:
>>
>> +-------------------+---------------+
>>
>> | ID                | Accessibility |
>>
>> +-------------------+---------------+
>>
>> | static-2-ha       | 1             |
>>
>> +-------------------+---------------+
>>
>> | autoscale-2-10-ha | 1             |
>>
>> +-------------------+---------------+
>>
>> | autoscale-1-5     | 1             |
>>
>> +-------------------+---------------+
>>
>> | static-1          | 1             |
>>
>> +-------------------+---------------+
>>
>>
>>
>> stratos> list-application-policies
>>
>> Error in listing application policies
>>
>> No application policies found
>>
>>
>>
>> stratos> list-autoscaling-policies
>>
>> Error in listing autoscaling policies
>>
>> No autoscaling policies found
>>
>>
>>
>> stratos> list-cartridges
>>
>> Cartridges found:
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | Type             | Category    | Name             |
>> Description                | Version | Multi-Tenant |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
>> Cartridge  | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
>> Cartridge  | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
>> Cartridge | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
>> Cartridge | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
>> Cartridge    | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
>> Cartridge    | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>>
>>
>> stratos> list-applications
>>
>> Applications found:
>>
>> +-----------------+-----------------+----------+
>>
>> | Application ID  | Alias           | Status   |
>>
>> +-----------------+-----------------+----------+
>>
>> | cartridge-proxy | cartridge-proxy | Deployed |
>>
>> +-----------------+-----------------+----------+
>>
>> | cisco-sample-vm | cisco-sample-vm | Deployed |
>>
>> +-----------------+-----------------+----------+
>>
>>
>>
>> $ curl -uadmin:admin -k -H'Content-type: application/json'
>> https://localhost:9443/api/autoscalingPolicies
>>
>>
>> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>>
>>
>>
>> $ curl -uadmin:admin -k -H'Content-type: application/json'
>> https://localhost:9443/api/applicationPolicies
>>
>>
>> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>>
>>
>>
>>
>>
>> *PERSISTENCE TEST, AFTER FAILOVER*
>>
>> *===============================*
>>
>>
>>
>> stratos> list-tenants
>>
>> Tenants:
>>
>>
>> +-----------------------+-----------+------------------+--------+------------------------------+
>>
>> | Domain                | Tenant ID | Email            | State  | Created
>> Date                 |
>>
>>
>> +-----------------------+-----------+------------------+--------+------------------------------+
>>
>> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri
>> May 15 05:26:52 MDT 2015 |
>>
>>
>> +-----------------------+-----------+------------------+--------+------------------------------+
>>
>>
>>
>> stratos> list-network-partitions
>>
>> Network partitions found:
>>
>> +----------------------+----------------------+
>>
>> | Network Partition ID | Number of Partitions |
>>
>> +----------------------+----------------------+
>>
>> | RegionOne            | 1                    |
>>
>> +----------------------+----------------------+
>>
>>
>>
>> stratos> list-deployment-policies
>>
>> No deployment policies found
>>
>>
>>
>> stratos> list-application-policies
>>
>> Error in listing application policies
>>
>> No application policies found
>>
>>
>>
>> stratos> list-autoscaling-policies
>>
>> Error in listing autoscaling policies
>>
>> No autoscaling policies found
>>
>>
>>
>> stratos> list-cartridges
>>
>> Cartridges found:
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | Type             | Category    | Name             |
>> Description                | Version | Multi-Tenant |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
>> Cartridge  | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
>> Cartridge  | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
>> Cartridge | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
>> Cartridge | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
>> Cartridge    | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
>> Cartridge    | 1       | false        |
>>
>>
>> +------------------+-------------+------------------+----------------------------+---------+--------------+
>>
>>
>>
>> stratos> list-applications
>>
>> Applications found:
>>
>> +-----------------+-----------------+----------+
>>
>> | Application ID  | Alias           | Status   |
>>
>> +-----------------+-----------------+----------+
>>
>> | cartridge-proxy | cartridge-proxy | Deployed |
>>
>> +-----------------+-----------------+----------+
>>
>> | cisco-sample-vm | cisco-sample-vm | Deployed |
>>
>> +-----------------+-----------------+----------+
>>
>>
>>
>> $ curl -uadmin:admin -k -H'Content-type: application/json'
>> https://localhost:9443/api/autoscalingPolicies
>>
>>
>> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>>
>>
>>
>> $ curl -uadmin:admin -k -H'Content-type: application/json'
>> https://localhost:9443/api/applicationPolicies
>>
>>
>> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>>
>>
>>
>> [3] Cartridge test output from Stratos 4.1. Note:
>>
>>
>>
>> 1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.
>>
>> 2.      We expect the Cartridge Agent to use a DNS lookup when it ends
>> up reconnecting, and this worked just fine in Stratos 4.0.
>>
>>
>>
>> *CARTRIDGE TEST, BEFORE FAILOVER*
>>
>> *==============================*
>>
>>
>>
>> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>>
>> cisco-sample-vm: applicationInstances 1, groupInstances 0,
>> clusterInstances 1, members 1 (Active 1)
>>
>>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>>
>>
>>
>> *CARTRIDGE TEST, AFTER FAILOVER*
>>
>> *=============================*
>>
>>
>>
>> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>>
>> cisco-sample-vm: applicationInstances 1, groupInstances 0,
>> clusterInstances 1, members 1 (Active 1)
>>
>>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>>
>>
>>
>> *CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN
>> WAIT 2 MINUTES*
>>
>>
>> *===================================================================================*
>>
>>
>>
>> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>>
>> cisco-sample-vm: applicationInstances 1, groupInstances 0,
>> clusterInstances 1, members 1 (Active 1)
>>
>>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>>
>>
>>
>>
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* 14 May 2015 20:34
>>
>>
>> *To:* dev
>> *Subject:* Re: Clustered deployments of Stratos
>>
>>
>>
>> It would be better to use the REST API to query and see whether the
>> relevant entities are persisted. Since data is stored in binary format in
>> the registry it would be difficult to query the database and verify this.
>>
>>
>>
>> On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <
>> shahhaqu@cisco.com> wrote:
>>
>> I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am
>> going to need more specifics.
>>
>>
>>
>> For example, what query would you recommend to look at say deployment
>> policies and cartridge definitions?
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* 09 May 2015 09:08
>>
>>
>> *To:* dev
>> *Subject:* Re: Clustered deployments of Stratos
>>
>>
>>
>> Yes you could refer the tables that have the prefix "REG_".
>>
>>
>>
>> On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <
>> shahhaqu@cisco.com> wrote:
>>
>> Can you suggest what tables to look at?
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* 07 May 2015 18:00
>>
>>
>> *To:* dev
>> *Subject:* Re: Clustered deployments of Stratos
>>
>>
>>
>> Hi Shaheed,
>>
>>
>>
>> Thanks for the clarification! May be the problem is with the MySQL
>> active-passive configuration.
>>
>>
>>
>> I understand that you are switching the same OpenStack volume from active
>> node to the passive node (when the passive node becomes active) therefore
>> technically it should work. May be we need to investigate this problem
>> further by analysing whether data is persisted properly in the active node
>> before the passive node becomes active.
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <
>> shahhaqu@cisco.com> wrote:
>>
>> The data is not synchronised between the active and passive nodes. For
>> clarity, this is the HA model we had, much as described in the blog:
>>
>>
>>
>> ·        2 nodes, with Pacemaker in active-passive mode.
>>
>> ·        Under Pacemaker control:
>>
>> o   We run MySQL in active-passive mode, using a single OpenStack volume
>> which we attach/reattach as the active role moves around nodes.
>>
>> o   As the Pacemaker moves the volume, and thus MySQL around on node
>> failures, ActiveMQ and Stratos are moved around too.
>>
>> o   Thus, everything operates in active-passive mode.
>>
>>
>>
>> Even in this model, as the active Stratos 4.0 is moved around (i.e. the
>> Stratos JVM on the old active node has gone with the node, and Pacemaker
>> starts up a new Stratos JVM on what used to be the passive node), we found
>> that the Cartridge Definition objects were found to be missing and, as a
>> clumsy workaround [1], we had to replay the stored copied of them into
>> Stratos using the REST API.
>>
>>
>>
>> With Stratos 4.1, using the new object names , early indications are *Deployment
>> Policies* and *Application Deployment* policies are lost as the active
>> fails over to the passive. If anything, these objects are more likely to
>> hit the problems of [1], since Stratos 4.1 expects these to be tweaked on
>> the fly (min/max etc).
>>
>>
>>
>> Thanks, Shaheed
>>
>>
>>
>> [1] Clearly, this loses any changes that were not in the stored copies.
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* 03 May 2015 06:43
>> *To:* dev@stratos.apache.org
>>
>>
>> *Subject:* Re: Clustered deployments of Stratos
>>
>>
>>
>> Hi Shaheed,
>>
>>
>>
>> Thanks for taking time to test this!
>>
>>
>>
>> Just to clarify the exact problem, do you mean that data is not
>> synchronized between the active and passive nodes or they are not persisted
>> in the active node?
>>
>>
>>
>> Thanks
>>
>>
>> On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>
>> wrote:
>>
>>
>> I have been looking into our use of Linux HA to setup an Active-Passive
>> configuration. Testing indicates that in 4.1 (beta1), several objects seem
>> not to be persisted properly. This includes at least:
>>
>> - Cartridges
>> - Deployment policies
>>
>> Am I missing something? Is it safe to workaround this by replaying those
>> objects?
>>  ------------------------------
>>
>> *From:* Imesh Gunaratne [imesh@apache.org]
>> *Sent:* 23 April 2015 10:47
>> *To:* dev
>> *Subject:* Re: Clustered deployments of Stratos
>>
>> Hi Shaheed,
>>
>>
>>
>> Currently N-way clustering is still not possible with CC, AS & SM. We
>> completed the initial phase of this feature however it was not completed.
>> You could refer mail thread "[Discuss] Clustering Feature Implementation
>> for 4.1.0-Alpha Release" for details.
>>
>>
>>
>> However at present [1] is valid. We could use Linux HA and deploy CC, AS
>> and SM in Active-Passive mode.
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
>> shahhaqu@cisco.com> wrote:
>>
>> Hi,
>>
>>
>>
>> We currently try to achieve HA with Stratos using something so unpleasant
>> that I won’t even describe it here J. It has been suggested that Stratos
>> has, for a while now, supported a clustered mode of deployment where, given
>> N servers:
>>
>>
>>
>> ·        The SM, AS and MB operate in a N-way clustered mode
>>
>> ·        The CEP operates in a N-way loadsharing mode
>>
>> ·        The Cartridge Agents can react to a failure in one of the N
>> CEPs by failing over to one of the other N-1 remaining servers
>>
>>
>>
>> In looking for documentation on how to set this up, I came across these
>> two write-ups [1] and [2]. Questions:
>>
>>
>>
>> ·        Both these documents mention only using N=2. Is that still
>> correct?
>>
>> ·        [1] Seems recently written, and [2] is a little older but not
>> much. Are both documents still regarded as current?
>>
>>
>>
>> Also, I’d love to hear any other experiences people have of running
>> configurations like this.
>>
>>
>>
>> Thanks, Shaheed
>>
>>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>>
>> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> Imesh Gunaratne
>
> Senior Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
--
Lahiru Sandaruwan
Committer and PMC member, Apache Stratos,
Senior Software Engineer,
WSO2 Inc., http://wso2.com
lean.enterprise.middleware

phone: +94773325954
email: lahirus@wso2.com blog: http://lahiruwrites.blogspot.com/
linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146

Re: Clustered deployments of Stratos

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Shaheed,

Thanks for the quick response, after analyzing the results you have
provided again, it looks like only the deployment policies are missing
after the failover. We have fixed this issue in commit
revision: 0c515aa013850575ddcfa2e299da5f0ec250ebc3

http://mail-archives.apache.org/mod_mbox/incubator-stratos-commits/201504.mbox/%3C22eed4e8639c401a8fda637fa6bb4501@git.apache.org%3E

Would you mind verifying whether this is there in your runtime?

Thanks


On Fri, May 15, 2015 at 9:02 PM, Shaheedur Haque (shahhaqu) <
shahhaqu@cisco.com> wrote:

>  The latter; we never have both Stratos instances running.
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 15 May 2015 16:17
> *To:* dev
> *Cc:* Ryan Du Plessis (rdupless); Luca Martini (lmartini)
>
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Do you have both active and passive Stratos nodes running at the same time
> or do you start the passive node once the active node goes down?
>
>
>
> Thanks
>
>
>
> On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Hi Imesh,
>
>
>
> I finally got round to a proper series of tests, and here are the
> conclusions:
>
>
>
> ·        In Stratos 4.0, after a Pacemaker driven failover, the newly
> Active Stratos has lost all Cartridge Definitions.
>
> ·        In current [1] Stratos 4.1, after a Pacemaker driven failover,
> the newly Active Stratos:
>
> o   Has lost all Deployment Policies.
>
> o   Has lost contact with the Cartridge Agents, and all VMs are stuck
> with whatever state they had before the failover.
>
> ·        Note: I have not verified if Cartridge Groups are lost or not.
>
>
>
> I include the test results below at [2] and [3]. I am concerned as to
> whether 4.1 is ready for GA on this basis, so though more testing is no
> doubt possible (e.g. Cartridge Groups) I wanted to get this info to the
> list ASAP.
>
>
>
> Thanks, Shaheed
>
>
>
> [1] A recent build somewhere between beta 1 and beta 2, but I don’t think
> any relevant fixes have been made in master.
>
>
>
> [2] Persistence test output from Stratos 4.1. Note:
>
>
>
> 1.      In the build I have, the CLI is broken for a couple of commands;
> these are supplemented by direct “curl” commands further down.
>
> 2.      I’ve used one of our commands to show the instances and their
> state for a given application since there is not a compact JSON or
> convenient Startos CLI for that.
>
>
>
> *PERSISTENCE TEST, BEFORE FAILOVER*
>
> *================================*
>
>
>
> stratos> list-tenants
>
> Tenants:
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | Domain                | Tenant ID | Email            | State  | Created
> Date                 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri May
> 15 04:46:58 MDT 2015 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
>
>
> stratos> list-network-partitions
>
> Network partitions found:
>
> +----------------------+----------------------+
>
> | Network Partition ID | Number of Partitions |
>
> +----------------------+----------------------+
>
> | RegionOne            | 1                    |
>
> +----------------------+----------------------+
>
>
>
> stratos> list-deployment-policies
>
> Deployment policies found:
>
> +-------------------+---------------+
>
> | ID                | Accessibility |
>
> +-------------------+---------------+
>
> | static-2-ha       | 1             |
>
> +-------------------+---------------+
>
> | autoscale-2-10-ha | 1             |
>
> +-------------------+---------------+
>
> | autoscale-1-5     | 1             |
>
> +-------------------+---------------+
>
> | static-1          | 1             |
>
> +-------------------+---------------+
>
>
>
> stratos> list-application-policies
>
> Error in listing application policies
>
> No application policies found
>
>
>
> stratos> list-autoscaling-policies
>
> Error in listing autoscaling policies
>
> No autoscaling policies found
>
>
>
> stratos> list-cartridges
>
> Cartridges found:
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | Type             | Category    | Name             |
> Description                | Version | Multi-Tenant |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +-----------------+-----------------+----------+
>
> | Application ID  | Alias           | Status   |
>
> +-----------------+-----------------+----------+
>
> | cartridge-proxy | cartridge-proxy | Deployed |
>
> +-----------------+-----------------+----------+
>
> | cisco-sample-vm | cisco-sample-vm | Deployed |
>
> +-----------------+-----------------+----------+
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/autoscalingPolicies
>
>
> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/applicationPolicies
>
>
> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>
>
>
>
>
> *PERSISTENCE TEST, AFTER FAILOVER*
>
> *===============================*
>
>
>
> stratos> list-tenants
>
> Tenants:
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | Domain                | Tenant ID | Email            | State  | Created
> Date                 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri May
> 15 05:26:52 MDT 2015 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
>
>
> stratos> list-network-partitions
>
> Network partitions found:
>
> +----------------------+----------------------+
>
> | Network Partition ID | Number of Partitions |
>
> +----------------------+----------------------+
>
> | RegionOne            | 1                    |
>
> +----------------------+----------------------+
>
>
>
> stratos> list-deployment-policies
>
> No deployment policies found
>
>
>
> stratos> list-application-policies
>
> Error in listing application policies
>
> No application policies found
>
>
>
> stratos> list-autoscaling-policies
>
> Error in listing autoscaling policies
>
> No autoscaling policies found
>
>
>
> stratos> list-cartridges
>
> Cartridges found:
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | Type             | Category    | Name             |
> Description                | Version | Multi-Tenant |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +-----------------+-----------------+----------+
>
> | Application ID  | Alias           | Status   |
>
> +-----------------+-----------------+----------+
>
> | cartridge-proxy | cartridge-proxy | Deployed |
>
> +-----------------+-----------------+----------+
>
> | cisco-sample-vm | cisco-sample-vm | Deployed |
>
> +-----------------+-----------------+----------+
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/autoscalingPolicies
>
>
> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/applicationPolicies
>
>
> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>
>
>
> [3] Cartridge test output from Stratos 4.1. Note:
>
>
>
> 1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.
>
> 2.      We expect the Cartridge Agent to use a DNS lookup when it ends up
> reconnecting, and this worked just fine in Stratos 4.0.
>
>
>
> *CARTRIDGE TEST, BEFORE FAILOVER*
>
> *==============================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
> *CARTRIDGE TEST, AFTER FAILOVER*
>
> *=============================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
> *CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN
> WAIT 2 MINUTES*
>
>
> *===================================================================================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 14 May 2015 20:34
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> It would be better to use the REST API to query and see whether the
> relevant entities are persisted. Since data is stored in binary format in
> the registry it would be difficult to query the database and verify this.
>
>
>
> On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am
> going to need more specifics.
>
>
>
> For example, what query would you recommend to look at say deployment
> policies and cartridge definitions?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 09 May 2015 09:08
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Yes you could refer the tables that have the prefix "REG_".
>
>
>
> On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Can you suggest what tables to look at?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 07 May 2015 18:00
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for the clarification! May be the problem is with the MySQL
> active-passive configuration.
>
>
>
> I understand that you are switching the same OpenStack volume from active
> node to the passive node (when the passive node becomes active) therefore
> technically it should work. May be we need to investigate this problem
> further by analysing whether data is persisted properly in the active node
> before the passive node becomes active.
>
>
>
> Thanks
>
>
>
> On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> The data is not synchronised between the active and passive nodes. For
> clarity, this is the HA model we had, much as described in the blog:
>
>
>
> ·        2 nodes, with Pacemaker in active-passive mode.
>
> ·        Under Pacemaker control:
>
> o   We run MySQL in active-passive mode, using a single OpenStack volume
> which we attach/reattach as the active role moves around nodes.
>
> o   As the Pacemaker moves the volume, and thus MySQL around on node
> failures, ActiveMQ and Stratos are moved around too.
>
> o   Thus, everything operates in active-passive mode.
>
>
>
> Even in this model, as the active Stratos 4.0 is moved around (i.e. the
> Stratos JVM on the old active node has gone with the node, and Pacemaker
> starts up a new Stratos JVM on what used to be the passive node), we found
> that the Cartridge Definition objects were found to be missing and, as a
> clumsy workaround [1], we had to replay the stored copied of them into
> Stratos using the REST API.
>
>
>
> With Stratos 4.1, using the new object names , early indications are *Deployment
> Policies* and *Application Deployment* policies are lost as the active
> fails over to the passive. If anything, these objects are more likely to
> hit the problems of [1], since Stratos 4.1 expects these to be tweaked on
> the fly (min/max etc).
>
>
>
> Thanks, Shaheed
>
>
>
> [1] Clearly, this loses any changes that were not in the stored copies.
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 03 May 2015 06:43
> *To:* dev@stratos.apache.org
>
>
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for taking time to test this!
>
>
>
> Just to clarify the exact problem, do you mean that data is not
> synchronized between the active and passive nodes or they are not persisted
> in the active node?
>
>
>
> Thanks
>
>
> On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>
> wrote:
>
>
> I have been looking into our use of Linux HA to setup an Active-Passive
> configuration. Testing indicates that in 4.1 (beta1), several objects seem
> not to be persisted properly. This includes at least:
>
> - Cartridges
> - Deployment policies
>
> Am I missing something? Is it safe to workaround this by replaying those
> objects?
>  ------------------------------
>
> *From:* Imesh Gunaratne [imesh@apache.org]
> *Sent:* 23 April 2015 10:47
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
> Hi Shaheed,
>
>
>
> Currently N-way clustering is still not possible with CC, AS & SM. We
> completed the initial phase of this feature however it was not completed.
> You could refer mail thread "[Discuss] Clustering Feature Implementation
> for 4.1.0-Alpha Release" for details.
>
>
>
> However at present [1] is valid. We could use Linux HA and deploy CC, AS
> and SM in Active-Passive mode.
>
>
>
> Thanks
>
>
>
>
>
>
>
> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Hi,
>
>
>
> We currently try to achieve HA with Stratos using something so unpleasant
> that I won’t even describe it here J. It has been suggested that Stratos
> has, for a while now, supported a clustered mode of deployment where, given
> N servers:
>
>
>
> ·        The SM, AS and MB operate in a N-way clustered mode
>
> ·        The CEP operates in a N-way loadsharing mode
>
> ·        The Cartridge Agents can react to a failure in one of the N CEPs
> by failing over to one of the other N-1 remaining servers
>
>
>
> In looking for documentation on how to set this up, I came across these
> two write-ups [1] and [2]. Questions:
>
>
>
> ·        Both these documents mention only using N=2. Is that still
> correct?
>
> ·        [1] Seems recently written, and [2] is a little older but not
> much. Are both documents still regarded as current?
>
>
>
> Also, I’d love to hear any other experiences people have of running
> configurations like this.
>
>
>
> Thanks, Shaheed
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>
> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Clustered deployments of Stratos

Posted by "Shaheedur Haque (shahhaqu)" <sh...@cisco.com>.
The latter; we never have both Stratos instances running.

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: 15 May 2015 16:17
To: dev
Cc: Ryan Du Plessis (rdupless); Luca Martini (lmartini)
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Do you have both active and passive Stratos nodes running at the same time or do you start the passive node once the active node goes down?

Thanks

On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Hi Imesh,

I finally got round to a proper series of tests, and here are the conclusions:


•        In Stratos 4.0, after a Pacemaker driven failover, the newly Active Stratos has lost all Cartridge Definitions.

•        In current [1] Stratos 4.1, after a Pacemaker driven failover, the newly Active Stratos:

o   Has lost all Deployment Policies.

o   Has lost contact with the Cartridge Agents, and all VMs are stuck with whatever state they had before the failover.

•        Note: I have not verified if Cartridge Groups are lost or not.

I include the test results below at [2] and [3]. I am concerned as to whether 4.1 is ready for GA on this basis, so though more testing is no doubt possible (e.g. Cartridge Groups) I wanted to get this info to the list ASAP.

Thanks, Shaheed

[1] A recent build somewhere between beta 1 and beta 2, but I don’t think any relevant fixes have been made in master.

[2] Persistence test output from Stratos 4.1. Note:


1.      In the build I have, the CLI is broken for a couple of commands; these are supplemented by direct “curl” commands further down.

2.      I’ve used one of our commands to show the instances and their state for a given application since there is not a compact JSON or convenient Startos CLI for that.

PERSISTENCE TEST, BEFORE FAILOVER
================================

stratos> list-tenants
Tenants:
+-----------------------+-----------+------------------+--------+------------------------------+
| Domain                | Tenant ID | Email            | State  | Created Date                 |
+-----------------------+-----------+------------------+--------+------------------------------+
| cloud1.qmog.cisco.com<http://cloud1.qmog.cisco.com> | 1         | cloud1@cisco.com<ma...@cisco.com> | Active | Fri May 15 04:46:58 MDT 2015 |
+-----------------------+-----------+------------------+--------+------------------------------+

stratos> list-network-partitions
Network partitions found:
+----------------------+----------------------+
| Network Partition ID | Number of Partitions |
+----------------------+----------------------+
| RegionOne            | 1                    |
+----------------------+----------------------+

stratos> list-deployment-policies
Deployment policies found:
+-------------------+---------------+
| ID                | Accessibility |
+-------------------+---------------+
| static-2-ha       | 1             |
+-------------------+---------------+
| autoscale-2-10-ha | 1             |
+-------------------+---------------+
| autoscale-1-5     | 1             |
+-------------------+---------------+
| static-1          | 1             |
+-------------------+---------------+

stratos> list-application-policies
Error in listing application policies
No application policies found

stratos> list-autoscaling-policies
Error in listing autoscaling policies
No autoscaling policies found

stratos> list-cartridges
Cartridges found:
+------------------+-------------+------------------+----------------------------+---------+--------------+
| Type             | Category    | Name             | Description                | Version | Multi-Tenant |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+

stratos> list-applications
Applications found:
+-----------------+-----------------+----------+
| Application ID  | Alias           | Status   |
+-----------------+-----------------+----------+
| cartridge-proxy | cartridge-proxy | Deployed |
+-----------------+-----------------+----------+
| cisco-sample-vm | cisco-sample-vm | Deployed |
+-----------------+-----------------+----------+

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/autoscalingPolicies
[{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/applicationPolicies
[{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]


PERSISTENCE TEST, AFTER FAILOVER
===============================

stratos> list-tenants
Tenants:
+-----------------------+-----------+------------------+--------+------------------------------+
| Domain                | Tenant ID | Email            | State  | Created Date                 |
+-----------------------+-----------+------------------+--------+------------------------------+
| cloud1.qmog.cisco.com<http://cloud1.qmog.cisco.com> | 1         | cloud1@cisco.com<ma...@cisco.com> | Active | Fri May 15 05:26:52 MDT 2015 |
+-----------------------+-----------+------------------+--------+------------------------------+

stratos> list-network-partitions
Network partitions found:
+----------------------+----------------------+
| Network Partition ID | Number of Partitions |
+----------------------+----------------------+
| RegionOne            | 1                    |
+----------------------+----------------------+

stratos> list-deployment-policies
No deployment policies found

stratos> list-application-policies
Error in listing application policies
No application policies found

stratos> list-autoscaling-policies
Error in listing autoscaling policies
No autoscaling policies found

stratos> list-cartridges
Cartridges found:
+------------------+-------------+------------------+----------------------------+---------+--------------+
| Type             | Category    | Name             | Description                | Version | Multi-Tenant |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+

stratos> list-applications
Applications found:
+-----------------+-----------------+----------+
| Application ID  | Alias           | Status   |
+-----------------+-----------------+----------+
| cartridge-proxy | cartridge-proxy | Deployed |
+-----------------+-----------------+----------+
| cisco-sample-vm | cisco-sample-vm | Deployed |
+-----------------+-----------------+----------+

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/autoscalingPolicies
[{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/applicationPolicies
[{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]

[3] Cartridge test output from Stratos 4.1. Note:


1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.

2.      We expect the Cartridge Agent to use a DNS lookup when it ends up reconnecting, and this worked just fine in Stratos 4.0.

CARTRIDGE TEST, BEFORE FAILOVER
==============================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
     cisco-sample-vm: 172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active

CARTRIDGE TEST, AFTER FAILOVER
=============================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
     cisco-sample-vm: 172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active

CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN WAIT 2 MINUTES
===================================================================================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
     cisco-sample-vm: 172.16.180.30/10.0.0.101<http://172.16.180.30/10.0.0.101>: status Active



From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 14 May 2015 20:34

To: dev
Subject: Re: Clustered deployments of Stratos

It would be better to use the REST API to query and see whether the relevant entities are persisted. Since data is stored in binary format in the registry it would be difficult to query the database and verify this.

On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am going to need more specifics.

For example, what query would you recommend to look at say deployment policies and cartridge definitions?

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 09 May 2015 09:08

To: dev
Subject: Re: Clustered deployments of Stratos

Yes you could refer the tables that have the prefix "REG_".

On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Can you suggest what tables to look at?

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 07 May 2015 18:00

To: dev
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for the clarification! May be the problem is with the MySQL active-passive configuration.

I understand that you are switching the same OpenStack volume from active node to the passive node (when the passive node becomes active) therefore technically it should work. May be we need to investigate this problem further by analysing whether data is persisted properly in the active node before the passive node becomes active.

Thanks

On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
The data is not synchronised between the active and passive nodes. For clarity, this is the HA model we had, much as described in the blog:


•        2 nodes, with Pacemaker in active-passive mode.

•        Under Pacemaker control:

o   We run MySQL in active-passive mode, using a single OpenStack volume which we attach/reattach as the active role moves around nodes.

o   As the Pacemaker moves the volume, and thus MySQL around on node failures, ActiveMQ and Stratos are moved around too.

o   Thus, everything operates in active-passive mode.

Even in this model, as the active Stratos 4.0 is moved around (i.e. the Stratos JVM on the old active node has gone with the node, and Pacemaker starts up a new Stratos JVM on what used to be the passive node), we found that the Cartridge Definition objects were found to be missing and, as a clumsy workaround [1], we had to replay the stored copied of them into Stratos using the REST API.

With Stratos 4.1, using the new object names , early indications are Deployment Policies and Application Deployment policies are lost as the active fails over to the passive. If anything, these objects are more likely to hit the problems of [1], since Stratos 4.1 expects these to be tweaked on the fly (min/max etc).

Thanks, Shaheed

[1] Clearly, this loses any changes that were not in the stored copies.

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 03 May 2015 06:43
To: dev@stratos.apache.org<ma...@stratos.apache.org>

Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for taking time to test this!

Just to clarify the exact problem, do you mean that data is not synchronized between the active and passive nodes or they are not persisted in the active node?

Thanks

On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:

I have been looking into our use of Linux HA to setup an Active-Passive configuration. Testing indicates that in 4.1 (beta1), several objects seem not to be persisted properly. This includes at least:

- Cartridges
- Deployment policies

Am I missing something? Is it safe to workaround this by replaying those objects?
________________________________
From: Imesh Gunaratne [imesh@apache.org<ma...@apache.org>]
Sent: 23 April 2015 10:47
To: dev
Subject: Re: Clustered deployments of Stratos
Hi Shaheed,

Currently N-way clustering is still not possible with CC, AS & SM. We completed the initial phase of this feature however it was not completed. You could refer mail thread "[Discuss] Clustering Feature Implementation for 4.1.0-Alpha Release" for details.

However at present [1] is valid. We could use Linux HA and deploy CC, AS and SM in Active-Passive mode.

Thanks



On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Hi,

We currently try to achieve HA with Stratos using something so unpleasant that I won’t even describe it here ☺. It has been suggested that Stratos has, for a while now, supported a clustered mode of deployment where, given N servers:


•        The SM, AS and MB operate in a N-way clustered mode

•        The CEP operates in a N-way loadsharing mode

•        The Cartridge Agents can react to a failure in one of the N CEPs by failing over to one of the other N-1 remaining servers

In looking for documentation on how to set this up, I came across these two write-ups [1] and [2]. Questions:


•        Both these documents mention only using N=2. Is that still correct?

•        [1] Seems recently written, and [2] is a little older but not much. Are both documents still regarded as current?

Also, I’d love to hear any other experiences people have of running configurations like this.

Thanks, Shaheed

[1] https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
[2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html






--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Clustered deployments of Stratos

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Shaheed,

Do you have both active and passive Stratos nodes running at the same time
or do you start the passive node once the active node goes down?

Thanks

On Fri, May 15, 2015 at 6:31 PM, Shaheedur Haque (shahhaqu) <
shahhaqu@cisco.com> wrote:

>  Hi Imesh,
>
>
>
> I finally got round to a proper series of tests, and here are the
> conclusions:
>
>
>
> ·        In Stratos 4.0, after a Pacemaker driven failover, the newly
> Active Stratos has lost all Cartridge Definitions.
>
> ·        In current [1] Stratos 4.1, after a Pacemaker driven failover,
> the newly Active Stratos:
>
> o   Has lost all Deployment Policies.
>
> o   Has lost contact with the Cartridge Agents, and all VMs are stuck
> with whatever state they had before the failover.
>
> ·        Note: I have not verified if Cartridge Groups are lost or not.
>
>
>
> I include the test results below at [2] and [3]. I am concerned as to
> whether 4.1 is ready for GA on this basis, so though more testing is no
> doubt possible (e.g. Cartridge Groups) I wanted to get this info to the
> list ASAP.
>
>
>
> Thanks, Shaheed
>
>
>
> [1] A recent build somewhere between beta 1 and beta 2, but I don’t think
> any relevant fixes have been made in master.
>
>
>
> [2] Persistence test output from Stratos 4.1. Note:
>
>
>
> 1.      In the build I have, the CLI is broken for a couple of commands;
> these are supplemented by direct “curl” commands further down.
>
> 2.      I’ve used one of our commands to show the instances and their
> state for a given application since there is not a compact JSON or
> convenient Startos CLI for that.
>
>
>
> *PERSISTENCE TEST, BEFORE FAILOVER*
>
> *================================*
>
>
>
> stratos> list-tenants
>
> Tenants:
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | Domain                | Tenant ID | Email            | State  | Created
> Date                 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri May
> 15 04:46:58 MDT 2015 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
>
>
> stratos> list-network-partitions
>
> Network partitions found:
>
> +----------------------+----------------------+
>
> | Network Partition ID | Number of Partitions |
>
> +----------------------+----------------------+
>
> | RegionOne            | 1                    |
>
> +----------------------+----------------------+
>
>
>
> stratos> list-deployment-policies
>
> Deployment policies found:
>
> +-------------------+---------------+
>
> | ID                | Accessibility |
>
> +-------------------+---------------+
>
> | static-2-ha       | 1             |
>
> +-------------------+---------------+
>
> | autoscale-2-10-ha | 1             |
>
> +-------------------+---------------+
>
> | autoscale-1-5     | 1             |
>
> +-------------------+---------------+
>
> | static-1          | 1             |
>
> +-------------------+---------------+
>
>
>
> stratos> list-application-policies
>
> Error in listing application policies
>
> No application policies found
>
>
>
> stratos> list-autoscaling-policies
>
> Error in listing autoscaling policies
>
> No autoscaling policies found
>
>
>
> stratos> list-cartridges
>
> Cartridges found:
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | Type             | Category    | Name             |
> Description                | Version | Multi-Tenant |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +-----------------+-----------------+----------+
>
> | Application ID  | Alias           | Status   |
>
> +-----------------+-----------------+----------+
>
> | cartridge-proxy | cartridge-proxy | Deployed |
>
> +-----------------+-----------------+----------+
>
> | cisco-sample-vm | cisco-sample-vm | Deployed |
>
> +-----------------+-----------------+----------+
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/autoscalingPolicies
>
>
> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/applicationPolicies
>
>
> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>
>
>
>
>
> *PERSISTENCE TEST, AFTER FAILOVER*
>
> *===============================*
>
>
>
> stratos> list-tenants
>
> Tenants:
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | Domain                | Tenant ID | Email            | State  | Created
> Date                 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
> | cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri May
> 15 05:26:52 MDT 2015 |
>
>
> +-----------------------+-----------+------------------+--------+------------------------------+
>
>
>
> stratos> list-network-partitions
>
> Network partitions found:
>
> +----------------------+----------------------+
>
> | Network Partition ID | Number of Partitions |
>
> +----------------------+----------------------+
>
> | RegionOne            | 1                    |
>
> +----------------------+----------------------+
>
>
>
> stratos> list-deployment-policies
>
> No deployment policies found
>
>
>
> stratos> list-application-policies
>
> Error in listing application policies
>
> No application policies found
>
>
>
> stratos> list-autoscaling-policies
>
> Error in listing autoscaling policies
>
> No autoscaling policies found
>
>
>
> stratos> list-cartridges
>
> Cartridges found:
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | Type             | Category    | Name             |
> Description                | Version | Multi-Tenant |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm
> Cartridge  | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02
> Cartridge | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
> | cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf
> Cartridge    | 1       | false        |
>
>
> +------------------+-------------+------------------+----------------------------+---------+--------------+
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +-----------------+-----------------+----------+
>
> | Application ID  | Alias           | Status   |
>
> +-----------------+-----------------+----------+
>
> | cartridge-proxy | cartridge-proxy | Deployed |
>
> +-----------------+-----------------+----------+
>
> | cisco-sample-vm | cisco-sample-vm | Deployed |
>
> +-----------------+-----------------+----------+
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/autoscalingPolicies
>
>
> [{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]
>
>
>
> $ curl -uadmin:admin -k -H'Content-type: application/json'
> https://localhost:9443/api/applicationPolicies
>
>
> [{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]
>
>
>
> [3] Cartridge test output from Stratos 4.1. Note:
>
>
>
> 1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.
>
> 2.      We expect the Cartridge Agent to use a DNS lookup when it ends up
> reconnecting, and this worked just fine in Stratos 4.0.
>
>
>
> *CARTRIDGE TEST, BEFORE FAILOVER*
>
> *==============================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
> *CARTRIDGE TEST, AFTER FAILOVER*
>
> *=============================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
> *CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN
> WAIT 2 MINUTES*
>
>
> *===================================================================================*
>
>
>
> $ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
>
> cisco-sample-vm: applicationInstances 1, groupInstances 0,
> clusterInstances 1, members 1 (Active 1)
>
>      cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 14 May 2015 20:34
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> It would be better to use the REST API to query and see whether the
> relevant entities are persisted. Since data is stored in binary format in
> the registry it would be difficult to query the database and verify this.
>
>
>
> On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am
> going to need more specifics.
>
>
>
> For example, what query would you recommend to look at say deployment
> policies and cartridge definitions?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 09 May 2015 09:08
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Yes you could refer the tables that have the prefix "REG_".
>
>
>
> On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Can you suggest what tables to look at?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 07 May 2015 18:00
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for the clarification! May be the problem is with the MySQL
> active-passive configuration.
>
>
>
> I understand that you are switching the same OpenStack volume from active
> node to the passive node (when the passive node becomes active) therefore
> technically it should work. May be we need to investigate this problem
> further by analysing whether data is persisted properly in the active node
> before the passive node becomes active.
>
>
>
> Thanks
>
>
>
> On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> The data is not synchronised between the active and passive nodes. For
> clarity, this is the HA model we had, much as described in the blog:
>
>
>
> ·        2 nodes, with Pacemaker in active-passive mode.
>
> ·        Under Pacemaker control:
>
> o   We run MySQL in active-passive mode, using a single OpenStack volume
> which we attach/reattach as the active role moves around nodes.
>
> o   As the Pacemaker moves the volume, and thus MySQL around on node
> failures, ActiveMQ and Stratos are moved around too.
>
> o   Thus, everything operates in active-passive mode.
>
>
>
> Even in this model, as the active Stratos 4.0 is moved around (i.e. the
> Stratos JVM on the old active node has gone with the node, and Pacemaker
> starts up a new Stratos JVM on what used to be the passive node), we found
> that the Cartridge Definition objects were found to be missing and, as a
> clumsy workaround [1], we had to replay the stored copied of them into
> Stratos using the REST API.
>
>
>
> With Stratos 4.1, using the new object names , early indications are *Deployment
> Policies* and *Application Deployment* policies are lost as the active
> fails over to the passive. If anything, these objects are more likely to
> hit the problems of [1], since Stratos 4.1 expects these to be tweaked on
> the fly (min/max etc).
>
>
>
> Thanks, Shaheed
>
>
>
> [1] Clearly, this loses any changes that were not in the stored copies.
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 03 May 2015 06:43
> *To:* dev@stratos.apache.org
>
>
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for taking time to test this!
>
>
>
> Just to clarify the exact problem, do you mean that data is not
> synchronized between the active and passive nodes or they are not persisted
> in the active node?
>
>
>
> Thanks
>
>
> On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>
> wrote:
>
>
> I have been looking into our use of Linux HA to setup an Active-Passive
> configuration. Testing indicates that in 4.1 (beta1), several objects seem
> not to be persisted properly. This includes at least:
>
> - Cartridges
> - Deployment policies
>
> Am I missing something? Is it safe to workaround this by replaying those
> objects?
>  ------------------------------
>
> *From:* Imesh Gunaratne [imesh@apache.org]
> *Sent:* 23 April 2015 10:47
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
> Hi Shaheed,
>
>
>
> Currently N-way clustering is still not possible with CC, AS & SM. We
> completed the initial phase of this feature however it was not completed.
> You could refer mail thread "[Discuss] Clustering Feature Implementation
> for 4.1.0-Alpha Release" for details.
>
>
>
> However at present [1] is valid. We could use Linux HA and deploy CC, AS
> and SM in Active-Passive mode.
>
>
>
> Thanks
>
>
>
>
>
>
>
> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Hi,
>
>
>
> We currently try to achieve HA with Stratos using something so unpleasant
> that I won’t even describe it here J. It has been suggested that Stratos
> has, for a while now, supported a clustered mode of deployment where, given
> N servers:
>
>
>
> ·        The SM, AS and MB operate in a N-way clustered mode
>
> ·        The CEP operates in a N-way loadsharing mode
>
> ·        The Cartridge Agents can react to a failure in one of the N CEPs
> by failing over to one of the other N-1 remaining servers
>
>
>
> In looking for documentation on how to set this up, I came across these
> two write-ups [1] and [2]. Questions:
>
>
>
> ·        Both these documents mention only using N=2. Is that still
> correct?
>
> ·        [1] Seems recently written, and [2] is a little older but not
> much. Are both documents still regarded as current?
>
>
>
> Also, I’d love to hear any other experiences people have of running
> configurations like this.
>
>
>
> Thanks, Shaheed
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>
> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Clustered deployments of Stratos

Posted by "Shaheedur Haque (shahhaqu)" <sh...@cisco.com>.
Hi Imesh,

I finally got round to a proper series of tests, and here are the conclusions:


·        In Stratos 4.0, after a Pacemaker driven failover, the newly Active Stratos has lost all Cartridge Definitions.

·        In current [1] Stratos 4.1, after a Pacemaker driven failover, the newly Active Stratos:

o   Has lost all Deployment Policies.

o   Has lost contact with the Cartridge Agents, and all VMs are stuck with whatever state they had before the failover.

·        Note: I have not verified if Cartridge Groups are lost or not.

I include the test results below at [2] and [3]. I am concerned as to whether 4.1 is ready for GA on this basis, so though more testing is no doubt possible (e.g. Cartridge Groups) I wanted to get this info to the list ASAP.

Thanks, Shaheed

[1] A recent build somewhere between beta 1 and beta 2, but I don’t think any relevant fixes have been made in master.

[2] Persistence test output from Stratos 4.1. Note:


1.      In the build I have, the CLI is broken for a couple of commands; these are supplemented by direct “curl” commands further down.

2.      I’ve used one of our commands to show the instances and their state for a given application since there is not a compact JSON or convenient Startos CLI for that.

PERSISTENCE TEST, BEFORE FAILOVER
================================

stratos> list-tenants
Tenants:
+-----------------------+-----------+------------------+--------+------------------------------+
| Domain                | Tenant ID | Email            | State  | Created Date                 |
+-----------------------+-----------+------------------+--------+------------------------------+
| cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri May 15 04:46:58 MDT 2015 |
+-----------------------+-----------+------------------+--------+------------------------------+

stratos> list-network-partitions
Network partitions found:
+----------------------+----------------------+
| Network Partition ID | Number of Partitions |
+----------------------+----------------------+
| RegionOne            | 1                    |
+----------------------+----------------------+

stratos> list-deployment-policies
Deployment policies found:
+-------------------+---------------+
| ID                | Accessibility |
+-------------------+---------------+
| static-2-ha       | 1             |
+-------------------+---------------+
| autoscale-2-10-ha | 1             |
+-------------------+---------------+
| autoscale-1-5     | 1             |
+-------------------+---------------+
| static-1          | 1             |
+-------------------+---------------+

stratos> list-application-policies
Error in listing application policies
No application policies found

stratos> list-autoscaling-policies
Error in listing autoscaling policies
No autoscaling policies found

stratos> list-cartridges
Cartridges found:
+------------------+-------------+------------------+----------------------------+---------+--------------+
| Type             | Category    | Name             | Description                | Version | Multi-Tenant |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+

stratos> list-applications
Applications found:
+-----------------+-----------------+----------+
| Application ID  | Alias           | Status   |
+-----------------+-----------------+----------+
| cartridge-proxy | cartridge-proxy | Deployed |
+-----------------+-----------------+----------+
| cisco-sample-vm | cisco-sample-vm | Deployed |
+-----------------+-----------------+----------+

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/autoscalingPolicies
[{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/applicationPolicies
[{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]


PERSISTENCE TEST, AFTER FAILOVER
===============================

stratos> list-tenants
Tenants:
+-----------------------+-----------+------------------+--------+------------------------------+
| Domain                | Tenant ID | Email            | State  | Created Date                 |
+-----------------------+-----------+------------------+--------+------------------------------+
| cloud1.qmog.cisco.com | 1         | cloud1@cisco.com | Active | Fri May 15 05:26:52 MDT 2015 |
+-----------------------+-----------+------------------+--------+------------------------------+

stratos> list-network-partitions
Network partitions found:
+----------------------+----------------------+
| Network Partition ID | Number of Partitions |
+----------------------+----------------------+
| RegionOne            | 1                    |
+----------------------+----------------------+

stratos> list-deployment-policies
No deployment policies found

stratos> list-application-policies
Error in listing application policies
No application policies found

stratos> list-autoscaling-policies
Error in listing autoscaling policies
No autoscaling policies found

stratos> list-cartridges
Cartridges found:
+------------------+-------------+------------------+----------------------------+---------+--------------+
| Type             | Category    | Name             | Description                | Version | Multi-Tenant |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cartridge-proxy  | Application | cartridge-proxy  | cartridge-proxy Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-sample-vm  | Application | cisco-sample-vm  | cisco-sample-vm Cartridge  | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-01 | Application | cisco-qvpc-cf-01 | cisco-qvpc-cf-01 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-cf-02 | Application | cisco-qvpc-cf-02 | cisco-qvpc-cf-02 Cartridge | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-si    | Application | cisco-qvpc-si    | cisco-qvpc-si Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+
| cisco-qvpc-sf    | Application | cisco-qvpc-sf    | cisco-qvpc-sf Cartridge    | 1       | false        |
+------------------+-------------+------------------+----------------------------+---------+--------------+

stratos> list-applications
Applications found:
+-----------------+-----------------+----------+
| Application ID  | Alias           | Status   |
+-----------------+-----------------+----------+
| cartridge-proxy | cartridge-proxy | Deployed |
+-----------------+-----------------+----------+
| cisco-sample-vm | cisco-sample-vm | Deployed |
+-----------------+-----------------+----------+

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/autoscalingPolicies
[{"id":"economyPolicy","instanceRoundingFactor":0,"isPublic":false,"loadThresholds":""}]

$ curl -uadmin:admin -k -H'Content-type: application/json' https://localhost:9443/api/applicationPolicies
[{"algorithm":"one-after-another","id":"default-iaas","networkPartitions":["RegionOne"],"properties":{"name":"networkPartitionGroups","value":"RegionOne"}}]

[3] Cartridge test output from Stratos 4.1. Note:


1.      We do not use a VIP for Stratos, either for 4.0 or 4.1.

2.      We expect the Cartridge Agent to use a DNS lookup when it ends up reconnecting, and this worked just fine in Stratos 4.0.

CARTRIDGE TEST, BEFORE FAILOVER
==============================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
     cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active

CARTRIDGE TEST, AFTER FAILOVER
=============================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
     cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active

CARTRIDGE TEST,  AFTER FAILOVER WAIT 5 MINUTES, THEN KILL INSTANCE, THEN WAIT 2 MINUTES
===================================================================================

$ ./bin/orchestration subscription list-instances --admin cisco-sample-vm
cisco-sample-vm: applicationInstances 1, groupInstances 0, clusterInstances 1, members 1 (Active 1)
     cisco-sample-vm: 172.16.180.30/10.0.0.101: status Active



From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: 14 May 2015 20:34
To: dev
Subject: Re: Clustered deployments of Stratos

It would be better to use the REST API to query and see whether the relevant entities are persisted. Since data is stored in binary format in the registry it would be difficult to query the database and verify this.

On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am going to need more specifics.

For example, what query would you recommend to look at say deployment policies and cartridge definitions?

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 09 May 2015 09:08

To: dev
Subject: Re: Clustered deployments of Stratos

Yes you could refer the tables that have the prefix "REG_".

On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Can you suggest what tables to look at?

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 07 May 2015 18:00

To: dev
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for the clarification! May be the problem is with the MySQL active-passive configuration.

I understand that you are switching the same OpenStack volume from active node to the passive node (when the passive node becomes active) therefore technically it should work. May be we need to investigate this problem further by analysing whether data is persisted properly in the active node before the passive node becomes active.

Thanks

On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
The data is not synchronised between the active and passive nodes. For clarity, this is the HA model we had, much as described in the blog:


•        2 nodes, with Pacemaker in active-passive mode.

•        Under Pacemaker control:

o   We run MySQL in active-passive mode, using a single OpenStack volume which we attach/reattach as the active role moves around nodes.

o   As the Pacemaker moves the volume, and thus MySQL around on node failures, ActiveMQ and Stratos are moved around too.

o   Thus, everything operates in active-passive mode.

Even in this model, as the active Stratos 4.0 is moved around (i.e. the Stratos JVM on the old active node has gone with the node, and Pacemaker starts up a new Stratos JVM on what used to be the passive node), we found that the Cartridge Definition objects were found to be missing and, as a clumsy workaround [1], we had to replay the stored copied of them into Stratos using the REST API.

With Stratos 4.1, using the new object names , early indications are Deployment Policies and Application Deployment policies are lost as the active fails over to the passive. If anything, these objects are more likely to hit the problems of [1], since Stratos 4.1 expects these to be tweaked on the fly (min/max etc).

Thanks, Shaheed

[1] Clearly, this loses any changes that were not in the stored copies.

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 03 May 2015 06:43
To: dev@stratos.apache.org<ma...@stratos.apache.org>

Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for taking time to test this!

Just to clarify the exact problem, do you mean that data is not synchronized between the active and passive nodes or they are not persisted in the active node?

Thanks

On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:

I have been looking into our use of Linux HA to setup an Active-Passive configuration. Testing indicates that in 4.1 (beta1), several objects seem not to be persisted properly. This includes at least:

- Cartridges
- Deployment policies

Am I missing something? Is it safe to workaround this by replaying those objects?
________________________________
From: Imesh Gunaratne [imesh@apache.org<ma...@apache.org>]
Sent: 23 April 2015 10:47
To: dev
Subject: Re: Clustered deployments of Stratos
Hi Shaheed,

Currently N-way clustering is still not possible with CC, AS & SM. We completed the initial phase of this feature however it was not completed. You could refer mail thread "[Discuss] Clustering Feature Implementation for 4.1.0-Alpha Release" for details.

However at present [1] is valid. We could use Linux HA and deploy CC, AS and SM in Active-Passive mode.

Thanks



On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Hi,

We currently try to achieve HA with Stratos using something so unpleasant that I won’t even describe it here ☺. It has been suggested that Stratos has, for a while now, supported a clustered mode of deployment where, given N servers:


•        The SM, AS and MB operate in a N-way clustered mode

•        The CEP operates in a N-way loadsharing mode

•        The Cartridge Agents can react to a failure in one of the N CEPs by failing over to one of the other N-1 remaining servers

In looking for documentation on how to set this up, I came across these two write-ups [1] and [2]. Questions:


•        Both these documents mention only using N=2. Is that still correct?

•        [1] Seems recently written, and [2] is a little older but not much. Are both documents still regarded as current?

Also, I’d love to hear any other experiences people have of running configurations like this.

Thanks, Shaheed

[1] https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
[2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html






--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Clustered deployments of Stratos

Posted by Imesh Gunaratne <im...@apache.org>.
It would be better to use the REST API to query and see whether the
relevant entities are persisted. Since data is stored in binary format in
the registry it would be difficult to query the database and verify this.

On Thu, May 14, 2015 at 10:47 PM, Shaheedur Haque (shahhaqu) <
shahhaqu@cisco.com> wrote:

>  I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am
> going to need more specifics.
>
>
>
> For example, what query would you recommend to look at say deployment
> policies and cartridge definitions?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 09 May 2015 09:08
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Yes you could refer the tables that have the prefix "REG_".
>
>
>
> On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Can you suggest what tables to look at?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 07 May 2015 18:00
>
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for the clarification! May be the problem is with the MySQL
> active-passive configuration.
>
>
>
> I understand that you are switching the same OpenStack volume from active
> node to the passive node (when the passive node becomes active) therefore
> technically it should work. May be we need to investigate this problem
> further by analysing whether data is persisted properly in the active node
> before the passive node becomes active.
>
>
>
> Thanks
>
>
>
> On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> The data is not synchronised between the active and passive nodes. For
> clarity, this is the HA model we had, much as described in the blog:
>
>
>
> ·        2 nodes, with Pacemaker in active-passive mode.
>
> ·        Under Pacemaker control:
>
> o   We run MySQL in active-passive mode, using a single OpenStack volume
> which we attach/reattach as the active role moves around nodes.
>
> o   As the Pacemaker moves the volume, and thus MySQL around on node
> failures, ActiveMQ and Stratos are moved around too.
>
> o   Thus, everything operates in active-passive mode.
>
>
>
> Even in this model, as the active Stratos 4.0 is moved around (i.e. the
> Stratos JVM on the old active node has gone with the node, and Pacemaker
> starts up a new Stratos JVM on what used to be the passive node), we found
> that the Cartridge Definition objects were found to be missing and, as a
> clumsy workaround [1], we had to replay the stored copied of them into
> Stratos using the REST API.
>
>
>
> With Stratos 4.1, using the new object names , early indications are *Deployment
> Policies* and *Application Deployment* policies are lost as the active
> fails over to the passive. If anything, these objects are more likely to
> hit the problems of [1], since Stratos 4.1 expects these to be tweaked on
> the fly (min/max etc).
>
>
>
> Thanks, Shaheed
>
>
>
> [1] Clearly, this loses any changes that were not in the stored copies.
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 03 May 2015 06:43
> *To:* dev@stratos.apache.org
>
>
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for taking time to test this!
>
>
>
> Just to clarify the exact problem, do you mean that data is not
> synchronized between the active and passive nodes or they are not persisted
> in the active node?
>
>
>
> Thanks
>
>
> On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>
> wrote:
>
>
> I have been looking into our use of Linux HA to setup an Active-Passive
> configuration. Testing indicates that in 4.1 (beta1), several objects seem
> not to be persisted properly. This includes at least:
>
> - Cartridges
> - Deployment policies
>
> Am I missing something? Is it safe to workaround this by replaying those
> objects?
>  ------------------------------
>
> *From:* Imesh Gunaratne [imesh@apache.org]
> *Sent:* 23 April 2015 10:47
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
> Hi Shaheed,
>
>
>
> Currently N-way clustering is still not possible with CC, AS & SM. We
> completed the initial phase of this feature however it was not completed.
> You could refer mail thread "[Discuss] Clustering Feature Implementation
> for 4.1.0-Alpha Release" for details.
>
>
>
> However at present [1] is valid. We could use Linux HA and deploy CC, AS
> and SM in Active-Passive mode.
>
>
>
> Thanks
>
>
>
>
>
>
>
> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Hi,
>
>
>
> We currently try to achieve HA with Stratos using something so unpleasant
> that I won’t even describe it here J. It has been suggested that Stratos
> has, for a while now, supported a clustered mode of deployment where, given
> N servers:
>
>
>
> ·        The SM, AS and MB operate in a N-way clustered mode
>
> ·        The CEP operates in a N-way loadsharing mode
>
> ·        The Cartridge Agents can react to a failure in one of the N CEPs
> by failing over to one of the other N-1 remaining servers
>
>
>
> In looking for documentation on how to set this up, I came across these
> two write-ups [1] and [2]. Questions:
>
>
>
> ·        Both these documents mention only using N=2. Is that still
> correct?
>
> ·        [1] Seems recently written, and [2] is a little older but not
> much. Are both documents still regarded as current?
>
>
>
> Also, I’d love to hear any other experiences people have of running
> configurations like this.
>
>
>
> Thanks, Shaheed
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>
> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Clustered deployments of Stratos

Posted by "Shaheedur Haque (shahhaqu)" <sh...@cisco.com>.
I looked at REG_RESOURCEs a9s well as a few others) but I’m afraid I am going to need more specifics.

For example, what query would you recommend to look at say deployment policies and cartridge definitions?

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: 09 May 2015 09:08
To: dev
Subject: Re: Clustered deployments of Stratos

Yes you could refer the tables that have the prefix "REG_".

On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Can you suggest what tables to look at?

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 07 May 2015 18:00

To: dev
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for the clarification! May be the problem is with the MySQL active-passive configuration.

I understand that you are switching the same OpenStack volume from active node to the passive node (when the passive node becomes active) therefore technically it should work. May be we need to investigate this problem further by analysing whether data is persisted properly in the active node before the passive node becomes active.

Thanks

On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
The data is not synchronised between the active and passive nodes. For clarity, this is the HA model we had, much as described in the blog:


•        2 nodes, with Pacemaker in active-passive mode.

•        Under Pacemaker control:

o   We run MySQL in active-passive mode, using a single OpenStack volume which we attach/reattach as the active role moves around nodes.

o   As the Pacemaker moves the volume, and thus MySQL around on node failures, ActiveMQ and Stratos are moved around too.

o   Thus, everything operates in active-passive mode.

Even in this model, as the active Stratos 4.0 is moved around (i.e. the Stratos JVM on the old active node has gone with the node, and Pacemaker starts up a new Stratos JVM on what used to be the passive node), we found that the Cartridge Definition objects were found to be missing and, as a clumsy workaround [1], we had to replay the stored copied of them into Stratos using the REST API.

With Stratos 4.1, using the new object names , early indications are Deployment Policies and Application Deployment policies are lost as the active fails over to the passive. If anything, these objects are more likely to hit the problems of [1], since Stratos 4.1 expects these to be tweaked on the fly (min/max etc).

Thanks, Shaheed

[1] Clearly, this loses any changes that were not in the stored copies.

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 03 May 2015 06:43
To: dev@stratos.apache.org<ma...@stratos.apache.org>

Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for taking time to test this!

Just to clarify the exact problem, do you mean that data is not synchronized between the active and passive nodes or they are not persisted in the active node?

Thanks

On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:

I have been looking into our use of Linux HA to setup an Active-Passive configuration. Testing indicates that in 4.1 (beta1), several objects seem not to be persisted properly. This includes at least:

- Cartridges
- Deployment policies

Am I missing something? Is it safe to workaround this by replaying those objects?
________________________________
From: Imesh Gunaratne [imesh@apache.org<ma...@apache.org>]
Sent: 23 April 2015 10:47
To: dev
Subject: Re: Clustered deployments of Stratos
Hi Shaheed,

Currently N-way clustering is still not possible with CC, AS & SM. We completed the initial phase of this feature however it was not completed. You could refer mail thread "[Discuss] Clustering Feature Implementation for 4.1.0-Alpha Release" for details.

However at present [1] is valid. We could use Linux HA and deploy CC, AS and SM in Active-Passive mode.

Thanks



On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Hi,

We currently try to achieve HA with Stratos using something so unpleasant that I won’t even describe it here ☺. It has been suggested that Stratos has, for a while now, supported a clustered mode of deployment where, given N servers:


•        The SM, AS and MB operate in a N-way clustered mode

•        The CEP operates in a N-way loadsharing mode

•        The Cartridge Agents can react to a failure in one of the N CEPs by failing over to one of the other N-1 remaining servers

In looking for documentation on how to set this up, I came across these two write-ups [1] and [2]. Questions:


•        Both these documents mention only using N=2. Is that still correct?

•        [1] Seems recently written, and [2] is a little older but not much. Are both documents still regarded as current?

Also, I’d love to hear any other experiences people have of running configurations like this.

Thanks, Shaheed

[1] https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
[2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html






--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Clustered deployments of Stratos

Posted by Imesh Gunaratne <im...@apache.org>.
Yes you could refer the tables that have the prefix "REG_".

On Sat, May 9, 2015 at 4:11 AM, Shaheedur Haque (shahhaqu) <
shahhaqu@cisco.com> wrote:

>  Can you suggest what tables to look at?
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 07 May 2015 18:00
>
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for the clarification! May be the problem is with the MySQL
> active-passive configuration.
>
>
>
> I understand that you are switching the same OpenStack volume from active
> node to the passive node (when the passive node becomes active) therefore
> technically it should work. May be we need to investigate this problem
> further by analysing whether data is persisted properly in the active node
> before the passive node becomes active.
>
>
>
> Thanks
>
>
>
> On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> The data is not synchronised between the active and passive nodes. For
> clarity, this is the HA model we had, much as described in the blog:
>
>
>
> ·        2 nodes, with Pacemaker in active-passive mode.
>
> ·        Under Pacemaker control:
>
> o   We run MySQL in active-passive mode, using a single OpenStack volume
> which we attach/reattach as the active role moves around nodes.
>
> o   As the Pacemaker moves the volume, and thus MySQL around on node
> failures, ActiveMQ and Stratos are moved around too.
>
> o   Thus, everything operates in active-passive mode.
>
>
>
> Even in this model, as the active Stratos 4.0 is moved around (i.e. the
> Stratos JVM on the old active node has gone with the node, and Pacemaker
> starts up a new Stratos JVM on what used to be the passive node), we found
> that the Cartridge Definition objects were found to be missing and, as a
> clumsy workaround [1], we had to replay the stored copied of them into
> Stratos using the REST API.
>
>
>
> With Stratos 4.1, using the new object names , early indications are *Deployment
> Policies* and *Application Deployment* policies are lost as the active
> fails over to the passive. If anything, these objects are more likely to
> hit the problems of [1], since Stratos 4.1 expects these to be tweaked on
> the fly (min/max etc).
>
>
>
> Thanks, Shaheed
>
>
>
> [1] Clearly, this loses any changes that were not in the stored copies.
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 03 May 2015 06:43
> *To:* dev@stratos.apache.org
>
>
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for taking time to test this!
>
>
>
> Just to clarify the exact problem, do you mean that data is not
> synchronized between the active and passive nodes or they are not persisted
> in the active node?
>
>
>
> Thanks
>
>
> On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>
> wrote:
>
>
> I have been looking into our use of Linux HA to setup an Active-Passive
> configuration. Testing indicates that in 4.1 (beta1), several objects seem
> not to be persisted properly. This includes at least:
>
> - Cartridges
> - Deployment policies
>
> Am I missing something? Is it safe to workaround this by replaying those
> objects?
>  ------------------------------
>
> *From:* Imesh Gunaratne [imesh@apache.org]
> *Sent:* 23 April 2015 10:47
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
> Hi Shaheed,
>
>
>
> Currently N-way clustering is still not possible with CC, AS & SM. We
> completed the initial phase of this feature however it was not completed.
> You could refer mail thread "[Discuss] Clustering Feature Implementation
> for 4.1.0-Alpha Release" for details.
>
>
>
> However at present [1] is valid. We could use Linux HA and deploy CC, AS
> and SM in Active-Passive mode.
>
>
>
> Thanks
>
>
>
>
>
>
>
> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Hi,
>
>
>
> We currently try to achieve HA with Stratos using something so unpleasant
> that I won’t even describe it here J. It has been suggested that Stratos
> has, for a while now, supported a clustered mode of deployment where, given
> N servers:
>
>
>
> ·        The SM, AS and MB operate in a N-way clustered mode
>
> ·        The CEP operates in a N-way loadsharing mode
>
> ·        The Cartridge Agents can react to a failure in one of the N CEPs
> by failing over to one of the other N-1 remaining servers
>
>
>
> In looking for documentation on how to set this up, I came across these
> two write-ups [1] and [2]. Questions:
>
>
>
> ·        Both these documents mention only using N=2. Is that still
> correct?
>
> ·        [1] Seems recently written, and [2] is a little older but not
> much. Are both documents still regarded as current?
>
>
>
> Also, I’d love to hear any other experiences people have of running
> configurations like this.
>
>
>
> Thanks, Shaheed
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>
> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Clustered deployments of Stratos

Posted by "Shaheedur Haque (shahhaqu)" <sh...@cisco.com>.
Can you suggest what tables to look at?

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: 07 May 2015 18:00
To: dev
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for the clarification! May be the problem is with the MySQL active-passive configuration.

I understand that you are switching the same OpenStack volume from active node to the passive node (when the passive node becomes active) therefore technically it should work. May be we need to investigate this problem further by analysing whether data is persisted properly in the active node before the passive node becomes active.

Thanks

On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
The data is not synchronised between the active and passive nodes. For clarity, this is the HA model we had, much as described in the blog:


•        2 nodes, with Pacemaker in active-passive mode.

•        Under Pacemaker control:

o   We run MySQL in active-passive mode, using a single OpenStack volume which we attach/reattach as the active role moves around nodes.

o   As the Pacemaker moves the volume, and thus MySQL around on node failures, ActiveMQ and Stratos are moved around too.

o   Thus, everything operates in active-passive mode.

Even in this model, as the active Stratos 4.0 is moved around (i.e. the Stratos JVM on the old active node has gone with the node, and Pacemaker starts up a new Stratos JVM on what used to be the passive node), we found that the Cartridge Definition objects were found to be missing and, as a clumsy workaround [1], we had to replay the stored copied of them into Stratos using the REST API.

With Stratos 4.1, using the new object names , early indications are Deployment Policies and Application Deployment policies are lost as the active fails over to the passive. If anything, these objects are more likely to hit the problems of [1], since Stratos 4.1 expects these to be tweaked on the fly (min/max etc).

Thanks, Shaheed

[1] Clearly, this loses any changes that were not in the stored copies.

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: 03 May 2015 06:43
To: dev@stratos.apache.org<ma...@stratos.apache.org>

Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for taking time to test this!

Just to clarify the exact problem, do you mean that data is not synchronized between the active and passive nodes or they are not persisted in the active node?

Thanks

On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:

I have been looking into our use of Linux HA to setup an Active-Passive configuration. Testing indicates that in 4.1 (beta1), several objects seem not to be persisted properly. This includes at least:

- Cartridges
- Deployment policies

Am I missing something? Is it safe to workaround this by replaying those objects?
________________________________
From: Imesh Gunaratne [imesh@apache.org]
Sent: 23 April 2015 10:47
To: dev
Subject: Re: Clustered deployments of Stratos
Hi Shaheed,

Currently N-way clustering is still not possible with CC, AS & SM. We completed the initial phase of this feature however it was not completed. You could refer mail thread "[Discuss] Clustering Feature Implementation for 4.1.0-Alpha Release" for details.

However at present [1] is valid. We could use Linux HA and deploy CC, AS and SM in Active-Passive mode.

Thanks



On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Hi,

We currently try to achieve HA with Stratos using something so unpleasant that I won’t even describe it here ☺. It has been suggested that Stratos has, for a while now, supported a clustered mode of deployment where, given N servers:


•        The SM, AS and MB operate in a N-way clustered mode

•        The CEP operates in a N-way loadsharing mode

•        The Cartridge Agents can react to a failure in one of the N CEPs by failing over to one of the other N-1 remaining servers

In looking for documentation on how to set this up, I came across these two write-ups [1] and [2]. Questions:


•        Both these documents mention only using N=2. Is that still correct?

•        [1] Seems recently written, and [2] is a little older but not much. Are both documents still regarded as current?

Also, I’d love to hear any other experiences people have of running configurations like this.

Thanks, Shaheed

[1] https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
[2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html






--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Clustered deployments of Stratos

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Shaheed,

Thanks for the clarification! May be the problem is with the MySQL
active-passive configuration.

I understand that you are switching the same OpenStack volume from active
node to the passive node (when the passive node becomes active) therefore
technically it should work. May be we need to investigate this problem
further by analysing whether data is persisted properly in the active node
before the passive node becomes active.

Thanks

On Tue, May 5, 2015 at 4:22 PM, Shaheedur Haque (shahhaqu) <
shahhaqu@cisco.com> wrote:

>  The data is not synchronised between the active and passive nodes. For
> clarity, this is the HA model we had, much as described in the blog:
>
>
>
> ·        2 nodes, with Pacemaker in active-passive mode.
>
> ·        Under Pacemaker control:
>
> o   We run MySQL in active-passive mode, using a single OpenStack volume
> which we attach/reattach as the active role moves around nodes.
>
> o   As the Pacemaker moves the volume, and thus MySQL around on node
> failures, ActiveMQ and Stratos are moved around too.
>
> o   Thus, everything operates in active-passive mode.
>
>
>
> Even in this model, as the active Stratos 4.0 is moved around (i.e. the
> Stratos JVM on the old active node has gone with the node, and Pacemaker
> starts up a new Stratos JVM on what used to be the passive node), we found
> that the Cartridge Definition objects were found to be missing and, as a
> clumsy workaround [1], we had to replay the stored copied of them into
> Stratos using the REST API.
>
>
>
> With Stratos 4.1, using the new object names , early indications are *Deployment
> Policies* and *Application Deployment* policies are lost as the active
> fails over to the passive. If anything, these objects are more likely to
> hit the problems of [1], since Stratos 4.1 expects these to be tweaked on
> the fly (min/max etc).
>
>
>
> Thanks, Shaheed
>
>
>
> [1] Clearly, this loses any changes that were not in the stored copies.
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* 03 May 2015 06:43
> *To:* dev@stratos.apache.org
>
> *Subject:* Re: Clustered deployments of Stratos
>
>
>
> Hi Shaheed,
>
>
>
> Thanks for taking time to test this!
>
>
>
> Just to clarify the exact problem, do you mean that data is not
> synchronized between the active and passive nodes or they are not persisted
> in the active node?
>
>
>
> Thanks
>
>
> On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>
> wrote:
>
>
> I have been looking into our use of Linux HA to setup an Active-Passive
> configuration. Testing indicates that in 4.1 (beta1), several objects seem
> not to be persisted properly. This includes at least:
>
> - Cartridges
> - Deployment policies
>
> Am I missing something? Is it safe to workaround this by replaying those
> objects?
>
>   ------------------------------
>
> *From:* Imesh Gunaratne [imesh@apache.org]
> *Sent:* 23 April 2015 10:47
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
> Hi Shaheed,
>
>
>
> Currently N-way clustering is still not possible with CC, AS & SM. We
> completed the initial phase of this feature however it was not completed.
> You could refer mail thread "[Discuss] Clustering Feature Implementation
> for 4.1.0-Alpha Release" for details.
>
>
>
> However at present [1] is valid. We could use Linux HA and deploy CC, AS
> and SM in Active-Passive mode.
>
>
>
> Thanks
>
>
>
>
>
>
>
> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com> wrote:
>
> Hi,
>
>
>
> We currently try to achieve HA with Stratos using something so unpleasant
> that I won’t even describe it here J. It has been suggested that Stratos
> has, for a while now, supported a clustered mode of deployment where, given
> N servers:
>
>
>
> ·        The SM, AS and MB operate in a N-way clustered mode
>
> ·        The CEP operates in a N-way loadsharing mode
>
> ·        The Cartridge Agents can react to a failure in one of the N CEPs
> by failing over to one of the other N-1 remaining servers
>
>
>
> In looking for documentation on how to set this up, I came across these
> two write-ups [1] and [2]. Questions:
>
>
>
> ·        Both these documents mention only using N=2. Is that still
> correct?
>
> ·        [1] Seems recently written, and [2] is a little older but not
> much. Are both documents still regarded as current?
>
>
>
> Also, I’d love to hear any other experiences people have of running
> configurations like this.
>
>
>
> Thanks, Shaheed
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>
> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Clustered deployments of Stratos

Posted by "Shaheedur Haque (shahhaqu)" <sh...@cisco.com>.
The data is not synchronised between the active and passive nodes. For clarity, this is the HA model we had, much as described in the blog:


·        2 nodes, with Pacemaker in active-passive mode.

·        Under Pacemaker control:

o   We run MySQL in active-passive mode, using a single OpenStack volume which we attach/reattach as the active role moves around nodes.

o   As the Pacemaker moves the volume, and thus MySQL around on node failures, ActiveMQ and Stratos are moved around too.

o   Thus, everything operates in active-passive mode.

Even in this model, as the active Stratos 4.0 is moved around (i.e. the Stratos JVM on the old active node has gone with the node, and Pacemaker starts up a new Stratos JVM on what used to be the passive node), we found that the Cartridge Definition objects were found to be missing and, as a clumsy workaround [1], we had to replay the stored copied of them into Stratos using the REST API.

With Stratos 4.1, using the new object names , early indications are Deployment Policies and Application Deployment policies are lost as the active fails over to the passive. If anything, these objects are more likely to hit the problems of [1], since Stratos 4.1 expects these to be tweaked on the fly (min/max etc).

Thanks, Shaheed

[1] Clearly, this loses any changes that were not in the stored copies.

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: 03 May 2015 06:43
To: dev@stratos.apache.org
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Thanks for taking time to test this!

Just to clarify the exact problem, do you mean that data is not synchronized between the active and passive nodes or they are not persisted in the active node?

Thanks

On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:

I have been looking into our use of Linux HA to setup an Active-Passive configuration. Testing indicates that in 4.1 (beta1), several objects seem not to be persisted properly. This includes at least:

- Cartridges
- Deployment policies

Am I missing something? Is it safe to workaround this by replaying those objects?

________________________________
From: Imesh Gunaratne [imesh@apache.org<javascript:_e(%7B%7D,'cvml','imesh@apache.org');>]
Sent: 23 April 2015 10:47
To: dev
Subject: Re: Clustered deployments of Stratos
Hi Shaheed,

Currently N-way clustering is still not possible with CC, AS & SM. We completed the initial phase of this feature however it was not completed. You could refer mail thread "[Discuss] Clustering Feature Implementation for 4.1.0-Alpha Release" for details.

However at present [1] is valid. We could use Linux HA and deploy CC, AS and SM in Active-Passive mode.

Thanks



On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <shahhaqu@cisco.com<javascript:_e(%7B%7D,'cvml','shahhaqu@cisco.com');>> wrote:
Hi,

We currently try to achieve HA with Stratos using something so unpleasant that I won’t even describe it here ☺. It has been suggested that Stratos has, for a while now, supported a clustered mode of deployment where, given N servers:


•        The SM, AS and MB operate in a N-way clustered mode

•        The CEP operates in a N-way loadsharing mode

•        The Cartridge Agents can react to a failure in one of the N CEPs by failing over to one of the other N-1 remaining servers

In looking for documentation on how to set this up, I came across these two write-ups [1] and [2]. Questions:


•        Both these documents mention only using N=2. Is that still correct?

•        [1] Seems recently written, and [2] is a little older but not much. Are both documents still regarded as current?

Also, I’d love to hear any other experiences people have of running configurations like this.

Thanks, Shaheed

[1] https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
[2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html






--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


Re: Clustered deployments of Stratos

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Shaheed,

Thanks for taking time to test this!

Just to clarify the exact problem, do you mean that data is not
synchronized between the active and passive nodes or they are not persisted
in the active node?

Thanks

On Sunday, May 3, 2015, Shaheedur Haque (shahhaqu) <sh...@cisco.com>
wrote:

>
> I have been looking into our use of Linux HA to setup an Active-Passive
> configuration. Testing indicates that in 4.1 (beta1), several objects seem
> not to be persisted properly. This includes at least:
>
> - Cartridges
> - Deployment policies
>
> Am I missing something? Is it safe to workaround this by replaying those
> objects?
>
>
>  ------------------------------
> *From:* Imesh Gunaratne [imesh@apache.org
> <javascript:_e(%7B%7D,'cvml','imesh@apache.org');>]
> *Sent:* 23 April 2015 10:47
> *To:* dev
> *Subject:* Re: Clustered deployments of Stratos
>
>   Hi Shaheed,
>
>  Currently N-way clustering is still not possible with CC, AS & SM. We
> completed the initial phase of this feature however it was not completed.
> You could refer mail thread "[Discuss] Clustering Feature Implementation
> for 4.1.0-Alpha Release" for details.
>
>  However at present [1] is valid. We could use Linux HA and deploy CC, AS
> and SM in Active-Passive mode.
>
>  Thanks
>
>
>
> On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
> shahhaqu@cisco.com <javascript:_e(%7B%7D,'cvml','shahhaqu@cisco.com');>>
> wrote:
>
>>  Hi,
>>
>>
>>
>> We currently try to achieve HA with Stratos using something so unpleasant
>> that I won’t even describe it here J. It has been suggested that Stratos
>> has, for a while now, supported a clustered mode of deployment where, given
>> N servers:
>>
>>
>>
>> ·        The SM, AS and MB operate in a N-way clustered mode
>>
>> ·        The CEP operates in a N-way loadsharing mode
>>
>> ·        The Cartridge Agents can react to a failure in one of the N
>> CEPs by failing over to one of the other N-1 remaining servers
>>
>>
>>
>> In looking for documentation on how to set this up, I came across these
>> two write-ups [1] and [2]. Questions:
>>
>>
>>
>> ·        Both these documents mention only using N=2. Is that still
>> correct?
>>
>> ·        [1] Seems recently written, and [2] is a little older but not
>> much. Are both documents still regarded as current?
>>
>>
>>
>> Also, I’d love to hear any other experiences people have of running
>> configurations like this.
>>
>>
>>
>> Thanks, Shaheed
>>
>>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>>
>> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>>
>>
>>
>>
>>
>>
>>
>
>
>
>  --
>  Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>


-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Clustered deployments of Stratos

Posted by "Shaheedur Haque (shahhaqu)" <sh...@cisco.com>.
I have been looking into our use of Linux HA to setup an Active-Passive configuration. Testing indicates that in 4.1 (beta1), several objects seem not to be persisted properly. This includes at least:

- Cartridges
- Deployment policies

Am I missing something? Is it safe to workaround this by replaying those objects?


________________________________
From: Imesh Gunaratne [imesh@apache.org]
Sent: 23 April 2015 10:47
To: dev
Subject: Re: Clustered deployments of Stratos

Hi Shaheed,

Currently N-way clustering is still not possible with CC, AS & SM. We completed the initial phase of this feature however it was not completed. You could refer mail thread "[Discuss] Clustering Feature Implementation for 4.1.0-Alpha Release" for details.

However at present [1] is valid. We could use Linux HA and deploy CC, AS and SM in Active-Passive mode.

Thanks



On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <sh...@cisco.com>> wrote:
Hi,

We currently try to achieve HA with Stratos using something so unpleasant that I won’t even describe it here :). It has been suggested that Stratos has, for a while now, supported a clustered mode of deployment where, given N servers:


•        The SM, AS and MB operate in a N-way clustered mode

•        The CEP operates in a N-way loadsharing mode

•        The Cartridge Agents can react to a failure in one of the N CEPs by failing over to one of the other N-1 remaining servers

In looking for documentation on how to set this up, I came across these two write-ups [1] and [2]. Questions:


•        Both these documents mention only using N=2. Is that still correct?

•        [1] Seems recently written, and [2] is a little older but not much. Are both documents still regarded as current?

Also, I’d love to hear any other experiences people have of running configurations like this.

Thanks, Shaheed

[1] https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
[2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html






--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Clustered deployments of Stratos

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Shaheed,

Currently N-way clustering is still not possible with CC, AS & SM. We
completed the initial phase of this feature however it was not completed.
You could refer mail thread "[Discuss] Clustering Feature Implementation
for 4.1.0-Alpha Release" for details.

However at present [1] is valid. We could use Linux HA and deploy CC, AS
and SM in Active-Passive mode.

Thanks



On Thu, Apr 23, 2015 at 2:41 PM, Shaheedur Haque (shahhaqu) <
shahhaqu@cisco.com> wrote:

>  Hi,
>
>
>
> We currently try to achieve HA with Stratos using something so unpleasant
> that I won’t even describe it here J. It has been suggested that Stratos
> has, for a while now, supported a clustered mode of deployment where, given
> N servers:
>
>
>
> ·        The SM, AS and MB operate in a N-way clustered mode
>
> ·        The CEP operates in a N-way loadsharing mode
>
> ·        The Cartridge Agents can react to a failure in one of the N CEPs
> by failing over to one of the other N-1 remaining servers
>
>
>
> In looking for documentation on how to set this up, I came across these
> two write-ups [1] and [2]. Questions:
>
>
>
> ·        Both these documents mention only using N=2. Is that still
> correct?
>
> ·        [1] Seems recently written, and [2] is a little older but not
> much. Are both documents still regarded as current?
>
>
>
> Also, I’d love to hear any other experiences people have of running
> configurations like this.
>
>
>
> Thanks, Shaheed
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/STRATOS/4.1.0+Configuring+HA+Using+Pacemaker+and+Heartbeat
>
> [2] http://blog.lasindu.com/2014/08/wso2-private-paas-supporting.html
>
>
>
>
>
>
>



-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos