You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stratos.apache.org by "Martin Eppel (meppel)" <me...@cisco.com> on 2015/06/05 01:44:58 UTC

Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

...
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
...
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
...
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
...
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image001.png@01D09EDE.D0059720]





Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Udara Liyanage <ud...@wso2.com>.
Hi Martin,

I added some info logs in efe67ad99f9fca80d4ba66c01e9b642c57838dc3. Could
you please resend the logs with this change applied?.
I tried to reproduce this issue but was not success full. Could you please
send us debugs logs so we can get a better understanding about the exact
issue.


Additionally is this an intermittent issue or occurring frequently?

On Fri, Jun 5, 2015 at 11:41 AM, Udara Liyanage <ud...@wso2.com> wrote:

> Hi,
>
> This might be possible if AS did not receive member activated event
> published by CC. Is it possible to enable debug logs if this is
> reproducible.
> Or else I can add an INFO logs and commit.
>
>
> On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com> wrote:
>
>> Hi,
>>
>>
>> For the first issue you have mentioned, the particular member is
>> activated, but it is still identified as an obsolete member and is being
>> marked to be terminated since pending time expired. Does that mean member
>> is still in Obsolete list even though it is being activated?
>>
>> //member started
>> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
>> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
>> stat context has been added: [application] g-sc-G12-1 [cluster]
>> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
>> [partitionContext] whole-region [member-id]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>
>> //member activated
>> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>> -  Publishing member activated event: [service-name] c1 [cluster-id]
>> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>> [network-partition-id] RegionOne [partition-id] whole-region
>> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
>> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
>> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
>> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>
>> //after 15 minutes ---member is still in pending state, pending timeout
>> expired
>> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
>> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
>> -  Pending state of member expired, member will be moved to obsolete list.
>> [pending member]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
>> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>>
>> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>>>  Hi,
>>>
>>>
>>>
>>> I am running into a scenario where application un-deployment fails
>>> (using stratos with latest commit
>>>  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>>>
>>>
>>>
>>> For application structure see [1.], (debug enabled) wso2carbon.log,
>>> application.json, cartridge-group.json, deployment-policy, auto-scaling
>>> policies see attached zip file.
>>>
>>>
>>>
>>> *It is noteworthy, that while the application is running the following
>>> log statements /exceptions are observed:*
>>>
>>>
>>>
>>> *…*
>>>
>>> *Member is in the wrong list and it is removed from active members list:
>>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>>>
>>> *…*
>>>
>>> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
>>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>>> instance*
>>>
>>> *…*
>>>
>>> *// **after receiving the application undeploy event:*
>>>
>>> *[2015-06-04 20:12:39,465]  INFO
>>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>>> Application undeployment process started: [application-id] g-sc-G12-1*
>>>
>>> *// **a new instance is being started up*
>>>
>>> *…*
>>>
>>> *[2015-06-04 20:13:13,445]  INFO
>>> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
>>> Instance started successfully: [cartridge-type] c2 [cluster-id]
>>> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
>>> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>>>
>>>
>>>
>>> *// Also noteworthy seems the following warning which is seen repeatedly
>>> in the logs:*
>>>
>>> *ReadWriteLock} -  System warning! Trying to release a lock which has
>>> not been taken by the same thread: [lock-name]*
>>>
>>>
>>>
>>>
>>>
>>> [1.] Application structure
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>>
>> Udara Liyanage
>> Software Engineer
>> WSO2, Inc.: http://wso2.com
>> lean. enterprise. middleware
>>
>> web: http://udaraliyanage.wordpress.com
>> phone: +94 71 443 6897
>>
>
>
>
> --
>
> Udara Liyanage
> Software Engineer
> WSO2, Inc.: http://wso2.com
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
> phone: +94 71 443 6897
>



-- 

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com
lean. enterprise. middleware

web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Ok,

From: Lasindu Charith [mailto:lasindu@wso2.com]
Sent: Friday, June 12, 2015 9:41 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I have fixed above issue in commit 03de83172309c2932075fb5284c120ca610bbf0a. Please take a pull from the master and try-out your scenario again to see if undeployment/redeployment works as expected.

Thanks,


On Thu, Jun 11, 2015 at 11:52 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi Martin,

I guess my previous observation is incorrect. The root cause for the above issue is because ClusterStatusTerminatedProcessor does not send ClusterTerminatedEvent for all 3 clusters. It only sends 1 and fails to send other 2 clusterTerminated events. This is because, when application is activated again ClusterLevelPartitionContext is added twice to the clusterInstanceContext. This makes the if condition failed at [1] when trying to find out whether cluster monitor has any non terminated members at ClusterStatusTerminatedProcessor before sending clusterTerminated event. Will try to find a proper solution and update the thread.


[1] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/cluster/ClusterStatusTerminatedProcessor.java#L90

Thanks,


On Thu, Jun 11, 2015 at 10:29 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Is there any conclusion how to this fix this ?

Thanks


Martin

From: Lahiru Sandaruwan [mailto:lahirus@wso2.com<ma...@wso2.com>]
Sent: Wednesday, June 10, 2015 6:55 PM
To: dev
Cc: Reka Thirunavukkarasu

Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Imesh,

Following could be the possible reason for not un-deploying when member was auto healed,


  *   The particular cluster, that the member is auto healed, is terminated before others(when others are terminating state)
or

  *   The particular cluster, that the member is auto healed, is still terminating when others are terminated state
One of those two cases could happen, even if the member was not auto healed(In case of groups, where one group is very complex, and others are simple). Because, currently we check whether all the cluster and groups in terminating status in the case of the parent group is terminating, which is wrong.

Thanks.

On Thu, Jun 11, 2015 at 5:49 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Do we know why this only happens if a member was forcefully terminated and auto-healed?

On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi  all,

Cause for above issue seems to be as follows.
GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor processes the event only if all the group instances and cluster instances are in terminated state or in terminating state consequently[1][2]. But there can be situations(such as above), where some group instances are at terminated state and some at terminating state by the time GroupStatusProcessorChain is executed. For similar scenarios, both GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor executions are skipped and at GroupStatusInactiveProcessor it prints" No possible state change found" warning.

I think we need to find a way to properly fix this.

[1] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
[2] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89

On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi Martin,

I was able to reproduce this issue in the latest build with PCA in Openstack. Even after stratos is restarted, the Application is not undeployed, which makes it impossible to undeploy the application (even the forceful undeployment failed for the above obsolete application).

Currently I'm looking at possible causes for this and will update with the progress.

Thanks,

On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Here is another example where the removal fails:

For application see [1.], log file (with debug enabled) and jsons are attached.

Scenario:


•        Deploy application and wait for all cartridges to become active

•        Kill a VM (2nd in startup sequence)

•        Wait for it to restart and become active

•        Un-deploy application

a.      Un-deploy forcefully will succeed
([2015-06-08 20:38:21,487]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Forcefully un-deploying the application s-g-c1-c2-c3-s)
und

b.      Un-deploy gracefully will fail to remove app completely (although VMs are terminated successfully)
([2015-06-08 20:54:16,372]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Starting to undeploy application: [application-id])

•        Both scenarios are recorded in the same log file wso2carbon-s-g-c1-c2-c3-s.log

•        Btw, I retested the scenario and the issue is easily  reproducible following the steps listed above:
graceful application un-deploy succeeds if no VM had been restarted (terminated and restarted by autoscaler).
Once a VM is terminated , graceful application un-deploy will fail
I attached a log file which demonstrates this case (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same application is deployed, becomes active and is then removed (repetead 2 times), then, a VM is terminated and restarted by autoscaler. Afterwards, graceful application un-deploy fails.


Other Observations:

When the application successfully some events e.g. “cluster removed event”, “Application deleted event received:” are being published (see [2.] while when the application fails to be removed no such event is being observed.

[2.] cluster removed event when application is un-deployed forcefully
TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver} -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing application clusters removed event: [application-id] s-g-c1-c2-c3-s


I analyzed the differences in the successful application removal and unsuccessful log sequence and noticed a difference (see also highlighted areas):

Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)

TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -  Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [ s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry
TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Applications updated: {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}
TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher} -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group] s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  GroupProcessor chain calculating the status for the group [ s-g-c1-c2-c3-s ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  StatusChecker calculating the active status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending application instance terminated for [application] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1

Unsuccessful:

TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatingProcessor} -  StatusChecker calculating the terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  StatusChecker calculating the inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  No possible state change found for [component] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] application [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)




[1.] Application Structure
[cid:image001.png@01D0A4F4.0FDE0B50]






From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 4:38 PM

To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

This is another application, see [1.] which fails to get completely removed:

Scenario / Observation:

•        After all instances / application go active, one instance is being terminated (to verify termination behavior). Once the terminated instance is restored the application is undeployed.

•        After the Application undeployment process is started, all instances are being terminated

•        Application still shows up in stratos admin, subsequent deployments fail

stratos> list-applications
Applications found:
+---------------------+---------------------+----------+
| Application ID      | Alias               | Status   |
+---------------------+---------------------+----------+
| s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
+---------------------+---------------------+----------+


[1.] Application:

[cid:image002.png@01D0A4F4.0FDE0B50]




From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 3:26 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

After re-running it this my observations:


•        After the “Application undeployment process started” is started, there is a likelihood that (a few) VMs are still launched – I suspect this is due to some race condition between “Application undeployment process started” and the “autoscaler”.

•        All Vms which were launched before the “Application undeployment process started” get terminated as part of the undeployment process.

•        Vms which were launched after “Application undeployment process started” eventually get moved to obsolete / pending state and cleaned up, this can take up to 15- 20 minutes.

•        The application never gets completely removed,

•        The following exception is consistently observed:

ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message

org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)

                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)

•        Initiating the “Application undeployment process” again will cause the following INFO statement (without any further actions, see in log)
TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application monitor is already in terminating, graceful un-deployment is has already been attempted thus not invoking again

•        Other exceptions observed after the “Application undeployment process started”
TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException: CloudControllerServiceInvalidMemberExceptionException
        at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
        at org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
        at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
        at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)


•        Created a jira to track this issue: https://issues.apache.org/jira/browse/STRATOS-1430







Regards



Martin



Attached the log file of the last test







From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 12:59 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

For this latest test I got the latest source from stratos repo so I have this commit (see below), but the un-deployment still fails (to some extent).
As mentioned below, it seems that all the members get terminated eventually, including the ones which got started after the “application un-deployment” process started.
What is still left in stratos (even after all members got terminated) is the application (see the stratos> list-applications command result below in email thread). This would still be an issue when re-deploying the application !
I will do a few reruns to verify the removal of the VMs (members) is consistent.
Thanks

Martin

git show 2fe84b91843b20e91e8cafd06011f42d218f231c
commit 2fe84b91843b20e91e8cafd06011f42d218f231c
Author: anuruddhal <an...@gmail.com>>
Date:   Wed Jun 3 14:41:12 2015 +0530

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I also encountered a similar issue with the application un-deployment with PCA but I guess you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with the below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the termination request was made for an already terminated member. There is a possibility that Autoscaler make a second terminate request if the first request take some time to execute and at the time the second request hit Cloud Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its just this exceptions that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2
                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new members are being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<ma...@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image003.png@01D0A4F4.0FDE0B50]




...

[Message clipped]



--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>



--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

I have also fixed an issue where we execute start up dependency for the
terminate-none behavior. I have fixed it in
7db006b565437d4d18a2c66e9794a0fdbaae279c.

Please test group scaling with this fix and let us know, if you still face
any other issues.

Thanks,
Reka

On Sat, Jun 13, 2015 at 10:45 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Lasindru,
>
>
>
> I have run some tests and the issue, failure to remove an application when
> an instance is terminated and restarted seems to be resolved.
>
>
>
> However, I do seem to see some issue with group scaling and application
> removal, but still have to run some tests next week to get a better
> understanding (not sure yet if this is an issue or not), will keep you
> posted,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Lasindu Charith [mailto:lasindu@wso2.com]
> *Sent:* Friday, June 12, 2015 9:41 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
>
>
> I have fixed above issue in
> commit 03de83172309c2932075fb5284c120ca610bbf0a. Please take a pull from
> the master and try-out your scenario again to see if
> undeployment/redeployment works as expected.
>
>
>
> Thanks,
>
>
>
>
>
> On Thu, Jun 11, 2015 at 11:52 PM, Lasindu Charith <la...@wso2.com>
> wrote:
>
> Hi Martin,
>
>
>
> I guess my previous observation is incorrect. The root cause for the above
> issue is because *ClusterStatusTerminatedProcessor* does not send *ClusterTerminatedEvent
> *for all 3 clusters. It only sends 1 and fails to send other 2
> clusterTerminated events. This is because, when application is activated
> again *ClusterLevelPartitionContext *is added twice to the *clusterInstanceContext.
> *This makes the if condition failed at [1] when trying to find out
> whether cluster monitor has any non terminated members at
> *ClusterStatusTerminatedProcessor *before sending clusterTerminated
> event. Will try to find a proper solution and update the thread.
>
>
>
>
>
> [1]
> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/cluster/ClusterStatusTerminatedProcessor.java#L90
>
>
>
> Thanks,
>
>
>
>
>
> On Thu, Jun 11, 2015 at 10:29 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>   Is there any conclusion how to this fix this ?
>
>
>
> Thanks
>
>
>
>
>
> Martin
>
>
>
> *From:* Lahiru Sandaruwan [mailto:lahirus@wso2.com]
> *Sent:* Wednesday, June 10, 2015 6:55 PM
> *To:* dev
> *Cc:* Reka Thirunavukkarasu
>
>
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Imesh,
>
>
>
> Following could be the possible reason for not un-deploying when member
> was auto healed,
>
>
>
>    - The particular cluster, that the member is auto healed, is
>    terminated before others(when others are terminating state)
>
>  or
>
>    - The particular cluster, that the member is auto healed, is still
>    terminating when others are terminated state
>
>  One of those two cases could happen, even if the member was not auto
> healed(In case of groups, where one group is very complex, and others are
> simple). Because, currently we check whether all the cluster and groups in
> *terminating* status in the case of the parent group is *terminating,* which
> is wrong.
>
>
>
> Thanks.
>
>
>
> On Thu, Jun 11, 2015 at 5:49 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Do we know why this only happens if a member was forcefully terminated and
> auto-healed?
>
>
>
> On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <la...@wso2.com>
> wrote:
>
> Hi  all,
>
>
>
> Cause for above issue seems to be as follows.
>
> GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
> processes the event only if all the group instances and cluster instances
> are in terminated state or in terminating state consequently[1][2]. But
> there can be situations(such as above), where some group instances are at
> terminated state and some at terminating state by the
> time GroupStatusProcessorChain is executed. For similar scenarios, both
> GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
> executions are skipped and at GroupStatusInactiveProcessor it prints" No
> possible state change found" warning.
>
>
>
> I think we need to find a way to properly fix this.
>
>
>
> [1]
> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
>
> [2]
> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89
>
>
>
> On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <la...@wso2.com> wrote:
>
> Hi Martin,
>
>
>
> I was able to reproduce this issue in the latest build with PCA in
> Openstack. Even after stratos is restarted, the Application is not
> undeployed, which makes it impossible to undeploy the application (even the
> forceful undeployment failed for the above obsolete application).
>
>
>
> Currently I'm looking at possible causes for this and will update with the
> progress.
>
>
>
> Thanks,
>
>
>
> On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Here is another example where the removal fails:
>
>
>
> For application see [1.], log file (with debug enabled) and jsons are
> attached.
>
>
>
> Scenario:
>
>
>
> ·        Deploy application and wait for all cartridges to become active
>
> ·        Kill a VM (2nd in startup sequence)
>
> ·        Wait for it to restart and become active
>
> ·        Un-deploy application
>
> a.      Un-deploy forcefully will succeed
> ([2015-06-08 20:38:21,487]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Forcefully un-deploying the application s-g-c1-c2-c3-s)
> und
>
> b.      Un-deploy gracefully will fail to remove app completely (although
> VMs are terminated successfully)
> ([2015-06-08 20:54:16,372]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Starting to undeploy application: [application-id])
>
> ·        Both scenarios are recorded in the same log file
> wso2carbon-s-g-c1-c2-c3-s.log
>
> ·        Btw, I retested the scenario and the issue is easily
>  reproducible following the steps listed above:
> graceful application un-deploy succeeds if no VM had been restarted
> (terminated and restarted by autoscaler).
> Once a VM is terminated , graceful application un-deploy will fail
> I attached a log file which demonstrates this case
> (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same
> application is deployed, becomes active and is then removed (repetead 2
> times), then, a VM is terminated and restarted by autoscaler. Afterwards,
> graceful application un-deploy fails.
>
>
>
>
>
> Other Observations:
>
>
>
> When the application successfully some events e.g. “cluster removed
> event”, “Application deleted event received:” are being published (see [2.]
> while when the application fails to be removed no such event is being
> observed.
>
>
>
> [2.] cluster removed event when application is un-deployed forcefully
>
> TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO
> {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver}
> -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
>
> TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing application clusters removed event: [application-id]
> s-g-c1-c2-c3-s
>
>
>
>
>
> I analyzed the differences in the successful application removal and
> unsuccessful log sequence and noticed a difference (see also highlighted
> areas):
>
>
>
> Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)
>
>
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
> s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
> -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x
> [instance] s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
> {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -
> Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x
> [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
> ClusterMonitor Drools session has been disposed. ClusterMonitor
> [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
> ClusterMonitor Drools session has been disposed. ClusterMonitor
> [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
> ClusterMonitor Drools session has been disposed. ClusterMonitor
> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG
> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
> Write lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG
> {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [
> s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -
> Applications updated:
> {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO
> {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher}
> -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s
> [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO
> {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group]
> s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance]
> s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusActiveProcessor}** -  GroupProcessor chain calculating the
> status for the group [ s-g-c1-c2-c3-s ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor}
> -  StatusChecker calculating the active status for the group [
> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
> -  StatusChecker calculating the terminated status for the group [
> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
> -  Sending application instance terminated for [application] s-g-c1-c2-c3-s
> [instance] s-g-c1-c2-c3-s-1*
>
>
>
> Unsuccessful:
>
>
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
> s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
> status is: Terminating*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
> Write lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
> status is: Terminating*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusTerminatingProcessor**} -  StatusChecker calculating the
> terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance
> [ s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusInactiveProcessor**} -  StatusChecker calculating the
> inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
> s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
> -  **No possible state change found for* *[component] s-g-c1-c2-c3-s-x0x
> [instance] s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
> ClusterMonitor Drools session has been disposed. ClusterMonitor
> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR
> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
> error, lock has not released for 30 seconds: [lock-name] application
> [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2
> [stack-trace] *
>
> *java.lang.Thread.getStackTrace(Thread.java:1589)*
>
>
>
>
>
>
>
>
>
> [1.] Application Structure
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 4:38 PM
>
>
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> This is another application, see [1.] which fails to get completely
> removed:
>
>
>
> Scenario / Observation:
>
> ·        After all instances / application go active, one instance is
> being terminated (to verify termination behavior). Once the terminated
> instance is restored the application is undeployed.
>
> ·        After the Application undeployment process is started, all
> instances are being terminated
>
> ·        Application still shows up in stratos admin, subsequent
> deployments fail
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +---------------------+---------------------+----------+
>
> | Application ID      | Alias               | Status   |
>
> +---------------------+---------------------+----------+
>
> | s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
>
> +---------------------+---------------------+----------+
>
>
>
>
>
> [1.] Application:
>
>
>
>
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 3:26 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> After re-running it this my observations:
>
>
>
> ·        After the “Application undeployment process started” is started,
> there is a likelihood that (a few) VMs are still launched – I suspect this
> is due to some race condition between “Application undeployment process
> started” and the “autoscaler”.
>
> ·        All Vms which were launched before the “Application undeployment
> process started” get terminated as part of the undeployment process.
>
> ·        Vms which were launched after “Application undeployment process
> started” eventually get moved to obsolete / pending state and cleaned up,
> this can take up to 15- 20 minutes.
>
> ·        The application never gets completely removed,
>
> ·        The following exception is consistently observed:
>
> ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN
> {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System
> warning! Trying to release a lock which has not been taken by the same
> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
> pool-24-thread-2
>
> *TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR
> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
> -  Failed to retrieve topology event message*
>
> *org.apache.stratos.common.exception.InvalidLockRequestedException: System
> error, cannot acquire a write lock while having a read lock on the same
> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
> pool-24-thread-2*
>
> *                    at
> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>
> *                    at
> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>
> ·        Initiating the “Application undeployment process” again will
> cause the following INFO statement (without any further actions, see in log)
> TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Application monitor is already in terminating, graceful un-deployment is
> has already been attempted thus not invoking again
>
> ·        Other exceptions observed after the “Application undeployment
> process started”
>
> TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR
> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
> instance
>
> org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException:
> CloudControllerServiceInvalidMemberExceptionException
>
>         at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown
> Source)
>
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>
>         at java.lang.Class.newInstance(Class.java:374)
>
>         at
> org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
>
>         at
> org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
>
>         at
> org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
>
>         at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)
>
>
>
> ·        Created a jira to track this issue:
> https://issues.apache.org/jira/browse/STRATOS-1430
>
>
>
>
>
>
>
> Regards
>
>
>
> Martin
>
>
>
> Attached the log file of the last test
>
>
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 12:59 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> For this latest test I got the latest source from stratos repo so I have
> this commit (see below), but the un-deployment still fails (to some extent).
>
> As mentioned below, it seems that all the members get terminated
> eventually, including the ones which got started after the “application
> un-deployment” process started.
>
> What is still left in stratos (even after all members got terminated) is
> the application (see the stratos> list-applications command result below in
> email thread). This would still be an issue when re-deploying the
> application !
>
> I will do a few reruns to verify the removal of the VMs (members) is
> consistent.
>
> Thanks
>
>
>
> Martin
>
>
>
> git show 2fe84b91843b20e91e8cafd06011f42d218f231c
>
> commit 2fe84b91843b20e91e8cafd06011f42d218f231c
>
> Author: anuruddhal <an...@gmail.com>
>
> Date:   Wed Jun 3 14:41:12 2015 +0530
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
> *Sent:* Friday, June 05, 2015 12:46 PM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
>
>
> I also encountered a similar issue with the application un-deployment with
> PCA but I guess you are using JCA.
>
>
>
> I can see that Anuruddha has done a fix for the issue I'm referring with
> the below commit:
>
>
> https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c
>
>
>
> Regarding the member context not found error, this could occur if the
> termination request was made for an already terminated member. There is a
> possibility that Autoscaler make a second terminate request if the first
> request take some time to execute and at the time the second request hit
> Cloud Controller the member is already terminated with the first request.
>
>
>
> Can you please confirm whether the members were properly terminated and
> its just this exceptions that you are seeing?
>
>
>
> Thanks
>
>
>
>
>
> On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Udara,
>
>
>
> Picked up your commit and rerun the test case:
>
>
>
> Attached is the log file (artifacts are the same as before).
>
>
>
> *Didn’t see the issue with* “*Member is in the wrong list” …*
>
>
>
> but see the following exception after the undeploy application message:
>
> *TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR
> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
> -  Failed to retrieve topology event message*
>
> *org.apache.stratos.common.exception.InvalidLockRequestedException: System
> error, cannot acquire a write lock while having a read lock on the same
> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
> pool-24-thread-2*
>
> *                    at
> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>
> *                    at
> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>
>
>
>
>
> *Also, after the “Application undeployment process started” is started,
> new members are being instantiated:*
>
>
>
> *TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing member created event*:
>
>
>
>
>
> *Eventually, these VMs get terminated :*
>
>
>
> *TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR
> {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl}
> -  Could not terminate instance: [member-id]
> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>
> *org.apache.stratos.cloud.controller.exception.InvalidMemberException:
> Could not terminate instance, member context not found: [member-id]
> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>
> *                    at
> org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)*
>
> *                    at
> sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)*
>
> *                    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>
> *                    at java.lang.reflect.Method.invoke(Method.java:606)*
>
>
>
>
>
> *but the application remains:*
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +----------------+------------+----------+
>
> | Application ID | Alias      | Status   |
>
> +----------------+------------+----------+
>
> | g-sc-G12-1     | g-sc-G12-1 | Deployed |
>
> +----------------+------------+----------+
>
>
>
> ['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances
> 3, members 0 ()\n']
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 10:04 AM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Ok:
>
>
>
> log4j.logger.org.apache.stratos.manager=DEBUG
>
> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>
> log4j.logger.org.apache.stratos.messaging=INFO
>
> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>
> log4j.logger.org.wso2.andes.client=ERROR
>
> # Autoscaler rule logs
>
> log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG
>
>
>
> *From:* Udara Liyanage [mailto:udara@wso2.com <ud...@wso2.com>]
> *Sent:* Friday, June 05, 2015 10:00 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
>
>
> Better if you can enable debugs logs for all AS, CC and cartridge agent
>
>
>
> On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com> wrote:
>
> Hi,
>
>
>
> Please enable AS debug logs.
>
>
>
> On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Udara,
>
>
>
> Yes, this issue seems to be fairly well reproducible, which debug log do
> you want me to enable, cartridge agent logs ?
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Udara Liyanage [mailto:udara@wso2.com]
> *Sent:* Thursday, June 04, 2015 11:11 PM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi,
>
>
>
> This might be possible if AS did not receive member activated event
> published by CC. Is it possible to enable debug logs if this is
> reproducible.
>
> Or else I can add an INFO logs and commit.
>
>
>
>
>
> On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com> wrote:
>
> Hi,
>
>
>
>
>
> For the first issue you have mentioned, the particular member is
> activated, but it is still identified as an obsolete member and is being
> marked to be terminated since pending time expired. Does that mean member
> is still in Obsolete list even though it is being activated?
>
>
>
> //member started
>
> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
> stat context has been added: [application] g-sc-G12-1 [cluster]
> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
> [partitionContext] whole-region [member-id]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>
>
>
> //member activated
>
> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing member activated event: [service-name] c1 [cluster-id]
> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
> [network-partition-id] RegionOne [partition-id] whole-region
>
> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>
>
>
> //after 15 minutes ---member is still in pending state, pending timeout
> expired
>
> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
> -  Pending state of member expired, member will be moved to obsolete list.
> [pending member]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>
>
>
> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi,
>
>
>
> I am running into a scenario where application un-deployment fails (using
> stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>
>
>
> For application structure see [1.], (debug enabled) wso2carbon.log,
> application.json, cartridge-group.json, deployment-policy, auto-scaling
> policies see attached zip file.
>
>
>
> *It is noteworthy, that while the application is running the following log
> statements /exceptions are observed:*
>
>
>
> *…*
>
> *Member is in the wrong list and it is removed from active members list:
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>
> *…*
>
> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
> instance*
>
> *…*
>
> *// **after receiving the application undeploy event:*
>
> *[2015-06-04 20:12:39,465]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Application undeployment process started: [application-id] g-sc-G12-1*
>
> *// **a new instance is being started up*
>
> *…*
>
> *[2015-06-04 20:13:13,445]  INFO
> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
> Instance started successfully: [cartridge-type] c2 [cluster-id]
> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>
>
>
> *// Also noteworthy seems the following warning which is seen repeatedly
> in the logs:*
>
> *ReadWriteLock} -  System warning! Trying to release a lock which has not
> been taken by the same thread: [lock-name]*
>
>
>
>
>
> [1.] Application structure
>
>
>
>
>
>
>
>
>
>
>
> ...
>
> [Message clipped]
>
>
>
>
>
> --
>
> *Lasindu Charith*
>
> Software Engineer, WSO2 Inc.
>
> Mobile: +94714427192
>
> Web: blog.lasindu.com
>
>
>
>
>
> --
>
> *Lasindu Charith*
>
> Software Engineer, WSO2 Inc.
>
> Mobile: +94714427192
>
> Web: blog.lasindu.com
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Lasindru,

I have run some tests and the issue, failure to remove an application when an instance is terminated and restarted seems to be resolved.

However, I do seem to see some issue with group scaling and application removal, but still have to run some tests next week to get a better understanding (not sure yet if this is an issue or not), will keep you posted,

Thanks

Martin

From: Lasindu Charith [mailto:lasindu@wso2.com]
Sent: Friday, June 12, 2015 9:41 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I have fixed above issue in commit 03de83172309c2932075fb5284c120ca610bbf0a. Please take a pull from the master and try-out your scenario again to see if undeployment/redeployment works as expected.

Thanks,


On Thu, Jun 11, 2015 at 11:52 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi Martin,

I guess my previous observation is incorrect. The root cause for the above issue is because ClusterStatusTerminatedProcessor does not send ClusterTerminatedEvent for all 3 clusters. It only sends 1 and fails to send other 2 clusterTerminated events. This is because, when application is activated again ClusterLevelPartitionContext is added twice to the clusterInstanceContext. This makes the if condition failed at [1] when trying to find out whether cluster monitor has any non terminated members at ClusterStatusTerminatedProcessor before sending clusterTerminated event. Will try to find a proper solution and update the thread.


[1] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/cluster/ClusterStatusTerminatedProcessor.java#L90

Thanks,


On Thu, Jun 11, 2015 at 10:29 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Is there any conclusion how to this fix this ?

Thanks


Martin

From: Lahiru Sandaruwan [mailto:lahirus@wso2.com<ma...@wso2.com>]
Sent: Wednesday, June 10, 2015 6:55 PM
To: dev
Cc: Reka Thirunavukkarasu

Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Imesh,

Following could be the possible reason for not un-deploying when member was auto healed,


  *   The particular cluster, that the member is auto healed, is terminated before others(when others are terminating state)
or

  *   The particular cluster, that the member is auto healed, is still terminating when others are terminated state
One of those two cases could happen, even if the member was not auto healed(In case of groups, where one group is very complex, and others are simple). Because, currently we check whether all the cluster and groups in terminating status in the case of the parent group is terminating, which is wrong.

Thanks.

On Thu, Jun 11, 2015 at 5:49 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Do we know why this only happens if a member was forcefully terminated and auto-healed?

On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi  all,

Cause for above issue seems to be as follows.
GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor processes the event only if all the group instances and cluster instances are in terminated state or in terminating state consequently[1][2]. But there can be situations(such as above), where some group instances are at terminated state and some at terminating state by the time GroupStatusProcessorChain is executed. For similar scenarios, both GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor executions are skipped and at GroupStatusInactiveProcessor it prints" No possible state change found" warning.

I think we need to find a way to properly fix this.

[1] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
[2] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89

On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi Martin,

I was able to reproduce this issue in the latest build with PCA in Openstack. Even after stratos is restarted, the Application is not undeployed, which makes it impossible to undeploy the application (even the forceful undeployment failed for the above obsolete application).

Currently I'm looking at possible causes for this and will update with the progress.

Thanks,

On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Here is another example where the removal fails:

For application see [1.], log file (with debug enabled) and jsons are attached.

Scenario:


•        Deploy application and wait for all cartridges to become active

•        Kill a VM (2nd in startup sequence)

•        Wait for it to restart and become active

•        Un-deploy application

a.      Un-deploy forcefully will succeed
([2015-06-08 20:38:21,487]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Forcefully un-deploying the application s-g-c1-c2-c3-s)
und

b.      Un-deploy gracefully will fail to remove app completely (although VMs are terminated successfully)
([2015-06-08 20:54:16,372]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Starting to undeploy application: [application-id])

•        Both scenarios are recorded in the same log file wso2carbon-s-g-c1-c2-c3-s.log

•        Btw, I retested the scenario and the issue is easily  reproducible following the steps listed above:
graceful application un-deploy succeeds if no VM had been restarted (terminated and restarted by autoscaler).
Once a VM is terminated , graceful application un-deploy will fail
I attached a log file which demonstrates this case (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same application is deployed, becomes active and is then removed (repetead 2 times), then, a VM is terminated and restarted by autoscaler. Afterwards, graceful application un-deploy fails.


Other Observations:

When the application successfully some events e.g. “cluster removed event”, “Application deleted event received:” are being published (see [2.] while when the application fails to be removed no such event is being observed.

[2.] cluster removed event when application is un-deployed forcefully
TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver} -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing application clusters removed event: [application-id] s-g-c1-c2-c3-s


I analyzed the differences in the successful application removal and unsuccessful log sequence and noticed a difference (see also highlighted areas):

Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)

TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -  Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [ s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry
TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Applications updated: {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}
TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher} -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group] s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  GroupProcessor chain calculating the status for the group [ s-g-c1-c2-c3-s ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  StatusChecker calculating the active status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending application instance terminated for [application] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1

Unsuccessful:

TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatingProcessor} -  StatusChecker calculating the terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  StatusChecker calculating the inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  No possible state change found for [component] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] application [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)




[1.] Application Structure
[cid:image001.png@01D0A55D.4AA89C50]






From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 4:38 PM

To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

This is another application, see [1.] which fails to get completely removed:

Scenario / Observation:

•        After all instances / application go active, one instance is being terminated (to verify termination behavior). Once the terminated instance is restored the application is undeployed.

•        After the Application undeployment process is started, all instances are being terminated

•        Application still shows up in stratos admin, subsequent deployments fail

stratos> list-applications
Applications found:
+---------------------+---------------------+----------+
| Application ID      | Alias               | Status   |
+---------------------+---------------------+----------+
| s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
+---------------------+---------------------+----------+


[1.] Application:

[cid:image002.png@01D0A55D.4AA89C50]




From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 3:26 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

After re-running it this my observations:


•        After the “Application undeployment process started” is started, there is a likelihood that (a few) VMs are still launched – I suspect this is due to some race condition between “Application undeployment process started” and the “autoscaler”.

•        All Vms which were launched before the “Application undeployment process started” get terminated as part of the undeployment process.

•        Vms which were launched after “Application undeployment process started” eventually get moved to obsolete / pending state and cleaned up, this can take up to 15- 20 minutes.

•        The application never gets completely removed,

•        The following exception is consistently observed:

ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message

org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)

                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)

•        Initiating the “Application undeployment process” again will cause the following INFO statement (without any further actions, see in log)
TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application monitor is already in terminating, graceful un-deployment is has already been attempted thus not invoking again

•        Other exceptions observed after the “Application undeployment process started”
TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException: CloudControllerServiceInvalidMemberExceptionException
        at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
        at org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
        at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
        at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)


•        Created a jira to track this issue: https://issues.apache.org/jira/browse/STRATOS-1430







Regards



Martin



Attached the log file of the last test







From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 12:59 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

For this latest test I got the latest source from stratos repo so I have this commit (see below), but the un-deployment still fails (to some extent).
As mentioned below, it seems that all the members get terminated eventually, including the ones which got started after the “application un-deployment” process started.
What is still left in stratos (even after all members got terminated) is the application (see the stratos> list-applications command result below in email thread). This would still be an issue when re-deploying the application !
I will do a few reruns to verify the removal of the VMs (members) is consistent.
Thanks

Martin

git show 2fe84b91843b20e91e8cafd06011f42d218f231c
commit 2fe84b91843b20e91e8cafd06011f42d218f231c
Author: anuruddhal <an...@gmail.com>>
Date:   Wed Jun 3 14:41:12 2015 +0530

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I also encountered a similar issue with the application un-deployment with PCA but I guess you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with the below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the termination request was made for an already terminated member. There is a possibility that Autoscaler make a second terminate request if the first request take some time to execute and at the time the second request hit Cloud Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its just this exceptions that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2
                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new members are being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<ma...@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image003.png@01D0A55D.4AA89C50]




...

[Message clipped]



--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>



--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

Thanks for the information. I could find a locking issue while updating the
application in a hierarchical manner in your logs. I could fix the locking
issue. Since i'm unable to reproduce it, it is bit hard for me to verify
the fix. I will try further and update you on the progress...

Thanks,
Reka

On Sun, Jun 21, 2015 at 10:58 PM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Reka,
>
>
>
> Here is *anothe*r example which fails, see application at [1.], attached
> log files and jsons.  I run a few scenarios, the one which is failing is
> application with the name “s-g-c1-c2-c3” (last scenario). All members get
> removed but application remains deployed,
>
> “s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances
> 0, members 0 ()”
>
>
>
>
>
> Thanks
>
>
>
>
>
> Martin
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Sunday, June 21, 2015 1:32 AM
> *To:* Reka Thirunavukkarasu
> *Cc:* dev; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
>
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Great! Thanks Reka!
>
>
>
> On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin/Imesh,
>
> Sure..I will have a look on the logs. I will also go through the recent
> commits and try to revert the fix which added for nested group scaling as
> it is not needed for this release.  As Martin mentioned that after the
> fixes, there are more issues. Otherwise, we will have to go through another
> major effort in testing it.
>
> I will update the progress of it...
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Martin,
>
>
>
> Thanks for the quick response. Yes we will definitely go through the logs
> and investigate this.
>
>
>
> Thanks
>
>
>
> On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Isuru,
>
>
>
> No, the issue does not seem to be resolved. With the latest code I see
> issues in test cases which used to work before  (beyond the latest example
> I posted the log files for - see below), not sure yet what is going on.  I
> will be investigating further (making sure I am not mistaken) and following
> up with some examples after the weekend but if you guys can take a look at
> the log files on Monday I provided with the previous email that would be
> great,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Saturday, June 20, 2015 7:29 PM
> *To:* dev
> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
> reka@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi All,
>
>
>
> I'm sorry I could not follow the entire discussion.
>
> Can someone explain the latest status please? Have we resolved the initial
> group scaling issue and now seeing an application removal problem?
>
>
>
> Thanks
>
>
>
> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Lasindu, Reka,
>
>
>
>
>
> Just run into the issue with removing the application *again*: (with the
> fix for the issue included)
>
>
>
> Please see [1a., 1b.] for the application structure (group scaling defined
> at only one group level). See also the respective artifacts and log file
> attached.
>
>
>
> Please advise if we should reopen the JIRA
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> Application [1a.]
>
>
>
>
>
> [1b.] application after “starting application remove”
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>   --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

On Sun, Jun 21, 2015 at 10:58 PM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Reka,
>
>
>
> Here is *anothe*r example which fails, see application at [1.], attached
> log files and jsons.  I run a few scenarios, the one which is failing is
> application with the name “s-g-c1-c2-c3” (last scenario). All members get
> removed but application remains deployed,
>
> “s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances
> 0, members 0 ()”
>

This might be an expected behavior as it might drive from the bean classes
when retrieving the application. Topology should be empty in the the back
end as i could see the successful logs in this sample without any errors.
If the UI shows application in created state, then that would be fine. We
will check whether we have to fix anything in the REST endpoint in order to
give proper message when retrieving an application which is in created
state.

I observed locking issue only in your previous log files.

Thanks,
Reka

>
>
>
>
> Thanks
>
>
>
>
>
> Martin
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Sunday, June 21, 2015 1:32 AM
> *To:* Reka Thirunavukkarasu
> *Cc:* dev; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
>
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Great! Thanks Reka!
>
>
>
> On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin/Imesh,
>
> Sure..I will have a look on the logs. I will also go through the recent
> commits and try to revert the fix which added for nested group scaling as
> it is not needed for this release.  As Martin mentioned that after the
> fixes, there are more issues. Otherwise, we will have to go through another
> major effort in testing it.
>
> I will update the progress of it...
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Martin,
>
>
>
> Thanks for the quick response. Yes we will definitely go through the logs
> and investigate this.
>
>
>
> Thanks
>
>
>
> On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Isuru,
>
>
>
> No, the issue does not seem to be resolved. With the latest code I see
> issues in test cases which used to work before  (beyond the latest example
> I posted the log files for - see below), not sure yet what is going on.  I
> will be investigating further (making sure I am not mistaken) and following
> up with some examples after the weekend but if you guys can take a look at
> the log files on Monday I provided with the previous email that would be
> great,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Saturday, June 20, 2015 7:29 PM
> *To:* dev
> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
> reka@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi All,
>
>
>
> I'm sorry I could not follow the entire discussion.
>
> Can someone explain the latest status please? Have we resolved the initial
> group scaling issue and now seeing an application removal problem?
>
>
>
> Thanks
>
>
>
> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Lasindu, Reka,
>
>
>
>
>
> Just run into the issue with removing the application *again*: (with the
> fix for the issue included)
>
>
>
> Please see [1a., 1b.] for the application structure (group scaling defined
> at only one group level). See also the respective artifacts and log file
> attached.
>
>
>
> Please advise if we should reopen the JIRA
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> Application [1a.]
>
>
>
>
>
> [1b.] application after “starting application remove”
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>   --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Thanks Reka,

Posting the application deployment policy:

{
    "id": "default-iaas",
    "algorithm":"one-after-another",
    "networkPartitions":
    [
        "RegionOne"
    ],
    "properties":
    [
        {
            "name" : "networkPartitionGroups",
            "value" : "RegionOne"
        }
    ]
}

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Tuesday, June 23, 2015 8:52 PM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org; Lasindu Charith; Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
Prior to deploy an application, you will have to deploy application policy and then use that policy in order to deploy the application. You can find the sample application-policy and screen shot of UI here with. Following are the sample commands used to add application-policy, add application and deploy application. This application-policy was there from when we support group level deployment policy. I could see in some of your earlier logs where you have used an application-policy with the id "default-iaas". Can you check with your deployment and share us all the artifacts that you are using? So that we debug more on that..Otherwise, you can browse the UI (home-->configure-->application-policies section) in order to get what is the application-policy that you are using.

echo "Adding application policy..."
curl -X POST -H "Content-Type: application/json" -d "@application-policy-1.json" -k -v -u admin:admin https://127.0.0.1:9443/api/applicationPolicies


echo "Creating application..."
curl -X POST -H "Content-Type: application/json" -d "@application-s-g-c1-c2-c3-s.json" -k -v -u admin:admin https://127.0.0.1:9443/api/applications

sleep 1

echo "Deploying application..."
curl -X POST -H "Content-Type: application/json" -k -v -u admin:admin https://127.0.0.1:9443/api/applications/g-sc-G12-1/deploy/application-policy-1
Where application-policy-1 is the application policy name.

On Wed, Jun 24, 2015 at 9:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
One more comment,

I have used a very similar application with the same deployment policies just a few days ago without any issues before I picked up the latest stratos commits, something changed (for once added validation of multiple group deployment polices in nested scenarios as it seems). Again, either validation is broken or configuration requirements have changed, but how can I fix it ?

To point it out, RegionOne is defined as network partition in the deployment policies but is  listed in the exception (see below) as not being used ?


Thanks

Martin

Reposting snippet of exception:

TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application deployment failed: [application-id]g-sc-G12-1
org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException: Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne


From: Martin Eppel (meppel)
Sent: Tuesday, June 23, 2015 8:20 PM
To: Lasindu Charith
Cc: Reka Thirunavukkarasu; dev@stratos.apache.org<ma...@stratos.apache.org>; Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Lasindu,

I am confused, which application policy – beyond the deployment-policy-<1,2,3> there is no other deployment policy defined ? The network partitions are defined in the zip file which was attached to my email ? Did you take a look at the zip file ?

Thanks

Martin

e.g. deployment-policy-1.json

{
    "id": "deployment-policy-1",
    "networkPartitions":
    [
        {
            "id": "RegionOne",
            "partitionAlgo": "one-after-another",
            "partitions":
            [
                {
                    "id": "whole-region",
                    "partitionMax": 5
                }
            ]
        }
    ]
}

From: Lasindu Charith [mailto:lasindu@wso2.com]
Sent: Tuesday, June 23, 2015 7:29 PM
To: Martin Eppel (meppel)
Cc: Reka Thirunavukkarasu; dev@stratos.apache.org<ma...@stratos.apache.org>; Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Could you please share the application policy json as well as network partition jsons you have used?

Thanks,

On Wed, Jun 24, 2015 at 2:44 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I picked up the latest code but run into an issue with the deployment policy (I also noticed that validation seem to enforce that a deployment policy can only be configured once in the application at the group level, enabling group scaling ?!):

In the scenario I define a group level deployment policy at the bottom level group ("g-sc-G3-1") and also define cartridge deployment policies for all other cartridges (c1, c3) in the parent groups, each defining a network partition “RegionOne”. However, deploying the app causes the following exception below, not sure I missed a configuration or if there is an issue in the deployment policy validation ? This used to work before I picked up the latest changes. My current latest commit id is:

commit bb6e102986ad8e54556d9f6de47cc6eaa077e775
Author: reka <rt...@gmail.com>>
Date:   Tue Jun 23 19:22:04 2015 +0530

Application json and deployment policies are attached to the email,

Thanks

Martin

TID: [0] [STRATOS] [2015-06-23 20:48:09,428] ERROR {org.apache.stratos.autoscaler.util.AutoscalerUtil} -  Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne
TID: [0] [STRATOS] [2015-06-23 20:48:09,441] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application context [g-sc-G12-1] persisted successfully in the autoscaler registry
TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application deployment failed: [application-id]g-sc-G12-1
org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException: Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne
        at org.apache.stratos.autoscaler.util.AutoscalerUtil.validateApplicationPolicyAgainstApplication(AutoscalerUtil.java:746)
        at org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl.deployApplication(AutoscalerServiceImpl.java:279)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.axis2.rpc.receivers.RPCUtil.invokeServiceClass(RPCUtil.java:212)
        at org.apache.axis2.rpc.receivers.RPCMessageReceiver.invokeBusinessLogic(RPCMessageReceiver.java:117)
        at org.apache.axis2.receivers.AbstractInOutMessageReceiver.invokeBusinessLogic(AbstractInOutMessageReceiver.java:40)
        at org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:110)
        at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
        at org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:172)
        at org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:146)
        at org.wso2.carbon.core.transports.CarbonServlet.doPost(CarbonServlet.java:231)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
        at org.eclipse.equinox.http.servlet.internal.ServletRegistration.service(ServletRegistration.java:61)
        at org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:128)
        at org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Tuesday, June 23, 2015 7:00 AM

To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
As we have merged all the thread pools according to the discussion in [1], the default pool size is taken as 100 for all kind of monitors(application, group and cluster). If you need to increase this, then please add below parameter to the stratos.sh
-Dmonitor.thread.pool.size=xxxx
Also, we have fixed an application instance termination and stratos restart issue with group scaling as well. So that now you can restart stratos even when the group scaling happens. The lastes commit is:

bb6e102986ad8e54556d9f6de47cc6eaa077e775
Do let us know how your testing goes with these fixes.
[1] Merging all the threading pools used in autoscaler to one thread pool
Thanks,
Reka

On Tue, Jun 23, 2015 at 10:49 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Thanks Reka

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 9:59 PM

To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
These are actually configurable parameters. In the stratos code, these thread pool sizes are set to 20 by default. If we need to change it, then we can pass those as system properties in our stratos.sh. Since default values are taken by stratos code, we don't need to provide this in the standalone pack. When there is a complex application with more groups and clusters, there will be more use of threads. In that case, the default pool size of 20 might get exhausted. So, it would be better to have this properties customized according to the application structure. I faced some issues like events listeners didn't get triggered properly due to thread pool got exhausted with threads when i used the application sample that you have attached to this thread. After i increase the thread pool size to 50, i didn't get any issues.
I'm in the process of analyzing the thread usage in order to decide on the recommended pool size along with application structure. So that anyone can calculate the correct pool size that they require according to the application and configure this parameter.
Hope this will help you to understand on those parameters.

Thanks,
Reka

On Mon, Jun 22, 2015 at 11:50 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I am not clear on the 2 properties you mention below, are they supposed to be set in the stratos.sh ? I just picked up the latest code and from the apache stratos repo and don’t see them ?

Btw,  read.write.lock.monitor.enabled=false  is disabled in our production code (I assume it is set to false by default if not specified) , I only enable it to provide additional information

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 7:30 AM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)

Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
I have verified the fix by enabling read.write.lock.monitor.enabled=true. The fix worked fine with it. Since we are using concurrency and delegated some flow to Threads, i had to provide the thread values to below values in the stratos.sh.

    -Dapplication.monitor.thread.pool.size=50 \
    -Dgroup.monitor.thread.pool.size=50 \
Please note that it is recommended to have read.write.lock.monitor.enabled=false as it will consume more footprint in the production. This property introduce only for the testing purpose.

We are in the process of analyzing the thread size and will come up with a recommended values for it.
Also, i have fixed a small issue in the REST endpoint as it returns some default value whenever application run time is not found. Now that if runtime is not found, the below message will get populated.

{"status":"error","message":"Application runtime not found"}
I have also verified the undeployment with group scaling. Didn't find any issues with the above fixes.
Please find the latest commit as below:

0a969200d11228158606f011ca7e5e795f336d92.
Please note that below error was only observed which is harmless for now. I have verified it with a workaround and working fine. But will check on the severity and decide on a proper fix or will go with the workaround.

[1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] topology [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)
org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Thanks,
Reka



On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
Found the reason why we didn't encounter these locking issue as we were testing with default stratos pack which has read.write.lock.monitor.enabled=false. The locking warning or issue is raised only when you use read.write.lock.monitor.enabled=true. That's why you were only facing these locking issue as you use this configuration in your setup.
Since I'm able to reproduce the issue, i will test with the fix that i already pushed and update the thread.
We will discuss and try to make this read.write.lock.monitor.enabled=true by default with stratos. So that we can find issues as early and fix them.

Thanks,
Reka

On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Sorry Martin..I have only locally fixed the issue. I have pushed it now only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you get chance? I will also continue testing with this fix.
Thanks,
Reka

On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Btw,

This is my last commit I picked up from the stratos master:

commit 58bea52be814269416f70391fef50859aa5ae0a1
Author: lasinducharith <la...@gmail.com>>
Date:   Fri Jun 19 19:40:27 2015 +0530

From: Martin Eppel (meppel)
Sent: Sunday, June 21, 2015 10:28 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>; Reka Thirunavukkarasu
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Reka,

Here is another example which fails, see application at [1.], attached log files and jsons.  I run a few scenarios, the one which is failing is application with the name “s-g-c1-c2-c3” (last scenario). All members get removed but application remains deployed,

“s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances 0, members 0 ()”


Thanks


Martin




[cid:image001.png@01D0ADF9.67AF0650]




From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Sunday, June 21, 2015 1:32 AM
To: Reka Thirunavukkarasu
Cc: dev; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Great! Thanks Reka!

On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Sure..I will have a look on the logs. I will also go through the recent commits and try to revert the fix which added for nested group scaling as it is not needed for this release.  As Martin mentioned that after the fixes, there are more issues. Otherwise, we will have to go through another major effort in testing it.
I will update the progress of it...

Thanks,
Reka

On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Martin,

Thanks for the quick response. Yes we will definitely go through the logs and investigate this.

Thanks

On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Isuru,

No, the issue does not seem to be resolved. With the latest code I see issues in test cases which used to work before  (beyond the latest example I posted the log files for - see below), not sure yet what is going on.  I will be investigating further (making sure I am not mistaken) and following up with some examples after the weekend but if you guys can take a look at the log files on Monday I provided with the previous email that would be great,

Thanks

Martin

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Saturday, June 20, 2015 7:29 PM
To: dev
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Reka Thirunavukkarasu (reka@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi All,

I'm sorry I could not follow the entire discussion.
Can someone explain the latest status please? Have we resolved the initial group scaling issue and now seeing an application removal problem?

Thanks

On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Lasindu, Reka,


Just run into the issue with removing the application again: (with the fix for the issue included)

Please see [1a., 1b.] for the application structure (group scaling defined at only one group level). See also the respective artifacts and log file attached.

Please advise if we should reopen the JIRA

Thanks

Martin


Application [1a.]

[cid:image002.png@01D0ADF9.67AF0650]

[1b.] application after “starting application remove”

[cid:image003.png@01D0ADF9.67AF0650]









--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Lasindu Charith
Software Engineer, WSO2 Inc.
Committer & PMC Member, Apache Stratos
Mobile: +94714427192<tel:%2B94714427192> | Web: blog.lasindu.com<http://blog.lasindu.com>



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007


Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

I have also fixed in ba6f30ea944b2a38cf025523dafc4d9a11e65977 the locking
warnings and errors discussed in [1] which was actually harmless. But it is
better to fix them in order to avoid inconsistency at any point.

[1] Thread Synchronization issue when using Hierarchical Locking

Thanks,
Reka


On Wed, Jun 24, 2015 at 10:41 AM, Lasindu Charith <la...@wso2.com> wrote:

> Hi Martin,
>
> Will you be able to try deploying the above application against the latest
> commit 473b8906973ebd4b49f9ef56cc3f5a661a25fcf1 ? I have fixed an issue
> while validating application policy against application.
>
> Thanks,
>
> On Wed, Jun 24, 2015 at 10:38 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>>  Forgot to mention, uses the same application deployment policy
>> “default-iaas”
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Tuesday, June 23, 2015 10:06 PM
>>
>> *To:* Reka Thirunavukkarasu
>> *Cc:* dev@stratos.apache.org; Lasindu Charith; Ryan Du Plessis (rdupless)
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Btw, additional pointer,
>>
>>
>>
>> Another example of an application (below) deploys just fine, the issue
>> seems to occur in apps which define group and cartridge deployment policies
>>
>>
>>
>>
>>
>> {"alias": "subscription-c1-in-group", "applicationId":
>> "subscription-c1-in-group", "components": {"cartridges": [], "groups":
>> [{"name": "subscription-c1-in-group", "groupMaxInstances": 1,
>> "groupMinInstances": 1, "alias": "subscription-c1-in-group-x0x",
>> "cartridges": [{"cartridgeMin": 1, "cartridgeMax": 1, "type": "c1",
>> "subscribableInfo": {"alias": "c1-0x0", "deploymentPolicy": "static-1",
>> "artifactRepository": {"repoUsername": "user", "repoUrl": "
>> http://octl.qmog.cisco.com:10080/git/default.git", "privateRepo": true,
>> "repoPassword": "c-policy"}, "autoscalingPolicy": "economyPolicy"}}],
>> "groups": []}]}}
>>
>>
>>
>> Static1.json
>>
>>
>>
>> {
>>
>>     "id": "static",
>>
>>     "regions":
>>
>>     [
>>
>>         {
>>
>>             "id": "RegionOne",
>>
>>             "algorithm": "one-after-another",
>>
>>             "partitions":
>>
>>            [
>>
>>                 {
>>
>>                     "id": "whole-region",
>>
>>                     "max": 100
>>
>>                 }
>>
>>             ]
>>
>>         }
>>
>>     ]
>>
>> }
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Tuesday, June 23, 2015 9:13 PM
>> *To:* 'Reka Thirunavukkarasu'
>> *Cc:* dev@stratos.apache.org; Lasindu Charith; Ryan Du Plessis (rdupless)
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Thanks Reka,
>>
>>
>>
>> Posting the application deployment policy:
>>
>>
>>
>> {
>>
>>     "id": "default-iaas",
>>
>>     "algorithm":"one-after-another",
>>
>>     "networkPartitions":
>>
>>     [
>>
>>         "RegionOne"
>>
>>     ],
>>
>>     "properties":
>>
>>     [
>>
>>         {
>>
>>             "name" : "networkPartitionGroups",
>>
>>             "value" : "RegionOne"
>>
>>         }
>>
>>     ]
>>
>> }
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
>> *Sent:* Tuesday, June 23, 2015 8:52 PM
>> *To:* Martin Eppel (meppel)
>> *Cc:* dev@stratos.apache.org; Lasindu Charith; Ryan Du Plessis (rdupless)
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Martin,
>>
>> Prior to deploy an application, you will have to deploy application
>> policy and then use that policy in order to deploy the application. You can
>> find the sample application-policy and screen shot of UI here with.
>> Following are the sample commands used to add application-policy, add
>> application and deploy application. This application-policy was there from
>> when we support group level deployment policy. I could see in some of your
>> earlier logs where you have used an application-policy with the id
>> "default-iaas". Can you check with your deployment and share us all the
>> artifacts that you are using? So that we debug more on that..Otherwise, you
>> can browse the UI (home-->configure-->application-policies section) in
>> order to get what is the application-policy that you are using.
>>
>> echo "Adding application policy..."
>> curl -X POST -H "Content-Type: application/json" -d
>> "@application-policy-1.json" -k -v -u admin:admin
>> https://127.0.0.1:9443/api/applicationPolicies
>>
>>
>> echo "Creating application..."
>> curl -X POST -H "Content-Type: application/json" -d
>> "@application-s-g-c1-c2-c3-s.json" -k -v -u admin:admin
>> https://127.0.0.1:9443/api/applications
>>
>> sleep 1
>>
>> echo "Deploying application..."
>> curl -X POST -H "Content-Type: application/json" -k -v -u admin:admin
>> https://127.0.0.1:9443/api/applications/g-sc-G12-1/deploy/application-policy-1
>>
>> *Where application-policy-1 is the application policy name.
>>                                                   *
>>
>>
>>
>> On Wed, Jun 24, 2015 at 9:09 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> One more comment,
>>
>>
>>
>> I have used a very similar application with the same deployment policies
>> just a few days ago without any issues before I picked up the latest
>> stratos commits, something changed (for once added validation of multiple
>> group deployment polices in nested scenarios as it seems). Again, either
>> validation is broken or configuration requirements have changed, but how
>> can I fix it ?
>>
>>
>>
>> To point it out, RegionOne is defined as network partition in the
>> deployment policies but is  listed in the exception (see below) as not being
>> used ?
>>
>>
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> Reposting snippet of exception:
>>
>>
>>
>> TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Application deployment failed: [application-id]g-sc-G12-1
>>
>> org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException:
>> Invalid Application Policy: *Network partition [network-partition-id]
>> RegionOne is not used in application* [application-id] g-sc-G12-1. Hence
>> application bursting will fail. Either remove RegionOne from application
>> policy or make all the cartridges available in RegionOne
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Tuesday, June 23, 2015 8:20 PM
>> *To:* Lasindu Charith
>> *Cc:* Reka Thirunavukkarasu; dev@stratos.apache.org; Ryan Du Plessis
>> (rdupless)
>>
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Lasindu,
>>
>>
>>
>> I am confused, which application policy – beyond the
>> deployment-policy-<1,2,3> there is no other deployment policy defined ? The
>> network partitions are defined in the zip file which was attached to my
>> email ? Did you take a look at the zip file ?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> e.g. deployment-policy-1.json
>>
>>
>>
>> {
>>
>>     "id": "deployment-policy-1",
>>
>>     "networkPartitions":
>>
>>     [
>>
>>         {
>>
>>             "id": "RegionOne",
>>
>>             "partitionAlgo": "one-after-another",
>>
>>             "partitions":
>>
>>             [
>>
>>                 {
>>
>>                     "id": "whole-region",
>>
>>                     "partitionMax": 5
>>
>>                 }
>>
>>             ]
>>
>>         }
>>
>>     ]
>>
>> }
>>
>>
>>
>> *From:* Lasindu Charith [mailto:lasindu@wso2.com <la...@wso2.com>]
>> *Sent:* Tuesday, June 23, 2015 7:29 PM
>> *To:* Martin Eppel (meppel)
>> *Cc:* Reka Thirunavukkarasu; dev@stratos.apache.org; Ryan Du Plessis
>> (rdupless)
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Martin,
>>
>>
>>
>> Could you please share the application policy json as well as network
>> partition jsons you have used?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> On Wed, Jun 24, 2015 at 2:44 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> I picked up the latest code but run into an issue with the deployment
>> policy (I also noticed that validation seem to enforce that a deployment
>> policy can only be configured once in the application at the group level,
>> enabling group scaling ?!):
>>
>>
>>
>> In the scenario I define a group level deployment policy at the bottom
>> level group ("g-sc-G3-1") and also define cartridge deployment policies for
>> all other cartridges (c1, c3) in the parent groups, each defining a network
>> partition “RegionOne”. However, deploying the app causes the following
>> exception below, not sure I missed a configuration or if there is an issue
>> in the deployment policy validation ? This used to work before I picked up
>> the latest changes. My current latest commit id is:
>>
>>
>>
>> commit bb6e102986ad8e54556d9f6de47cc6eaa077e775
>>
>> Author: reka <rt...@gmail.com>
>>
>> Date:   Tue Jun 23 19:22:04 2015 +0530
>>
>>
>>
>> Application json and deployment policies are attached to the email,
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> TID: [0] [STRATOS] [2015-06-23 20:48:09,428] ERROR
>> {org.apache.stratos.autoscaler.util.AutoscalerUtil} -  Invalid Application
>> Policy: Network partition [network-partition-id] RegionOne is not used in
>> application [application-id] g-sc-G12-1. Hence application bursting will
>> fail. Either remove RegionOne from application policy or make all the
>> cartridges available in RegionOne
>>
>> TID: [0] [STRATOS] [2015-06-23 20:48:09,441] DEBUG
>> {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application
>> context [g-sc-G12-1] persisted successfully in the autoscaler registry
>>
>> TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Application deployment failed: [application-id]g-sc-G12-1
>>
>> org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException:
>> Invalid Application Policy: *Network partition [network-partition-id]
>> RegionOne is not used in application* [application-id] g-sc-G12-1. Hence
>> application bursting will fail. Either remove RegionOne from application
>> policy or make all the cartridges available in RegionOne
>>
>>         at
>> org.apache.stratos.autoscaler.util.AutoscalerUtil.validateApplicationPolicyAgainstApplication(AutoscalerUtil.java:746)
>>
>>         at
>> org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl.deployApplication(AutoscalerServiceImpl.java:279)
>>
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>
>>         at
>> org.apache.axis2.rpc.receivers.RPCUtil.invokeServiceClass(RPCUtil.java:212)
>>
>>         at
>> org.apache.axis2.rpc.receivers.RPCMessageReceiver.invokeBusinessLogic(RPCMessageReceiver.java:117)
>>
>>         at
>> org.apache.axis2.receivers.AbstractInOutMessageReceiver.invokeBusinessLogic(AbstractInOutMessageReceiver.java:40)
>>
>>         at
>> org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:110)
>>
>>         at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
>>
>>         at
>> org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:172)
>>
>>         at
>> org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:146)
>>
>>         at
>> org.wso2.carbon.core.transports.CarbonServlet.doPost(CarbonServlet.java:231)
>>
>>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
>>
>>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
>>
>>         at
>> org.eclipse.equinox.http.servlet.internal.ServletRegistration.service(ServletRegistration.java:61)
>>
>>         at
>> org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:128)
>>
>>         at
>> org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68)
>>
>>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Tuesday, June 23, 2015 7:00 AM
>>
>>
>> *To:* Martin Eppel (meppel)
>> *Cc:* dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan
>> Du Plessis (rdupless)
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Martin,
>>
>> As we have merged all the thread pools according to the discussion in
>> [1], the default pool size is taken as 100 for all kind of
>> monitors(application, group and cluster). If you need to increase this,
>> then please add below parameter to the stratos.sh
>>
>> -Dmonitor.thread.pool.size=xxxx
>>
>> Also, we have fixed an application instance termination and stratos
>> restart issue with group scaling as well. So that now you can restart
>> stratos even when the group scaling happens. The lastes commit is:
>>
>> bb6e102986ad8e54556d9f6de47cc6eaa077e775
>>
>> Do let us know how your testing goes with these fixes.
>> [1] Merging all the threading pools used in autoscaler to one thread pool
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Tue, Jun 23, 2015 at 10:49 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Thanks Reka
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Monday, June 22, 2015 9:59 PM
>>
>>
>> *To:* Martin Eppel (meppel)
>> *Cc:* dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan
>> Du Plessis (rdupless)
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Martin,
>>
>> These are actually configurable parameters. In the stratos code, these
>> thread pool sizes are set to 20 by default. If we need to change it, then
>> we can pass those as system properties in our stratos.sh. Since default
>> values are taken by stratos code, we don't need to provide this in the
>> standalone pack. When there is a complex application with more groups and
>> clusters, there will be more use of threads. In that case, the default pool
>> size of 20 might get exhausted. So, it would be better to have this
>> properties customized according to the application structure. I faced some
>> issues like events listeners didn't get triggered properly due to thread
>> pool got exhausted with threads when i used the application sample that you
>> have attached to this thread. After i increase the thread pool size to 50,
>> i didn't get any issues.
>>
>> I'm in the process of analyzing the thread usage in order to decide on
>> the recommended pool size along with application structure. So that anyone
>> can calculate the correct pool size that they require according to the
>> application and configure this parameter.
>>
>> Hope this will help you to understand on those parameters.
>>
>>
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Mon, Jun 22, 2015 at 11:50 PM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> I am not clear on the 2 properties you mention below, are they supposed
>> to be set in the stratos.sh ? I just picked up the latest code and from the
>> apache stratos repo and don’t see them ?
>>
>>
>>
>> Btw,  *read.write.lock.monitor.enabled=false * is disabled in our
>> production code (I assume it is set to false by default if not specified) ,
>> I only enable it to provide additional information
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Monday, June 22, 2015 7:30 AM
>> *To:* Martin Eppel (meppel)
>> *Cc:* dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan
>> Du Plessis (rdupless)
>>
>>
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Martin,
>>
>> I have verified the fix by enabling read.write.lock.monitor.enabled=true.
>> The fix worked fine with it. Since we are using concurrency and delegated
>> some flow to Threads, i had to provide the thread values to below values in
>> the stratos.sh.
>>
>>     -Dapplication.monitor.thread.pool.size=50 \
>>     -Dgroup.monitor.thread.pool.size=50 \
>>
>> Please note that *it is recommended to have
>> read.write.lock.monitor.enabled=false as it will consume more footprint in
>> the production*. This property introduce only for the testing purpose.
>>
>>
>>
>> We are in the process of analyzing the thread size and will come up with
>> a recommended values for it.
>>
>> Also, i have fixed a small issue in the REST endpoint as it returns some
>> default value whenever application run time is not found. Now that if
>> runtime is not found, the below message will get populated.
>>
>> {"status":"error","message":"Application runtime not found"}
>>
>> I have also verified the undeployment with group scaling. Didn't find any
>> issues with the above fixes.
>>
>> Please find the latest commit as below:
>>
>> 0a969200d11228158606f011ca7e5e795f336d92.
>>
>> Please note that below error was only observed which is harmless for now.
>> I have verified it with a workaround and working fine. But will check on
>> the severity and decide on a proper fix or will go with the workaround.
>>
>> [1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR
>> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
>> error, lock has not released for 30 seconds: [lock-name] topology
>> [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2
>> [stack-trace]
>> java.lang.Thread.getStackTrace(Thread.java:1589)
>>
>> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
>>
>> org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
>>
>> org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
>>
>> org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
>>
>> org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
>>
>> org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
>>
>> org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
>>
>> org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
>>
>> org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> java.lang.Thread.run(Thread.java:745)
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Hi Martin,
>>
>> Found the reason why we didn't encounter these locking issue as we were
>> testing with default stratos pack which has
>> read.write.lock.monitor.enabled=false. The locking warning or issue is
>> raised only when you use read.write.lock.monitor.enabled=true. That's why
>> you were only facing these locking issue as you use this configuration in
>> your setup.
>>
>> Since I'm able to reproduce the issue, i will test with the fix that i
>> already pushed and update the thread.
>>
>> We will discuss and try to make this read.write.lock.monitor.enabled=true
>> by default with stratos. So that we can find issues as early and fix them.
>>
>>
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Sorry Martin..I have only locally fixed the issue. I have pushed it now
>> only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you
>> get chance? I will also continue testing with this fix.
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Btw,
>>
>>
>>
>> This is my last commit I picked up from the stratos master:
>>
>>
>>
>> commit 58bea52be814269416f70391fef50859aa5ae0a1
>>
>> Author: lasinducharith <la...@gmail.com>
>>
>> Date:   Fri Jun 19 19:40:27 2015 +0530
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Sunday, June 21, 2015 10:28 AM
>> *To:* dev@stratos.apache.org; Reka Thirunavukkarasu
>> *Cc:* Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Reka,
>>
>>
>>
>> Here is *anothe*r example which fails, see application at [1.], attached
>> log files and jsons.  I run a few scenarios, the one which is failing is
>> application with the name “s-g-c1-c2-c3” (last scenario). All members get
>> removed but application remains deployed,
>>
>> “s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances
>> 0, members 0 ()”
>>
>>
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> Martin
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
>> *Sent:* Sunday, June 21, 2015 1:32 AM
>> *To:* Reka Thirunavukkarasu
>> *Cc:* dev; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Great! Thanks Reka!
>>
>>
>>
>> On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Hi Martin/Imesh,
>>
>> Sure..I will have a look on the logs. I will also go through the recent
>> commits and try to revert the fix which added for nested group scaling as
>> it is not needed for this release.  As Martin mentioned that after the
>> fixes, there are more issues. Otherwise, we will have to go through another
>> major effort in testing it.
>>
>> I will update the progress of it...
>>
>>
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>
>> wrote:
>>
>> Hi Martin,
>>
>>
>>
>> Thanks for the quick response. Yes we will definitely go through the logs
>> and investigate this.
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Isuru,
>>
>>
>>
>> No, the issue does not seem to be resolved. With the latest code I see
>> issues in test cases which used to work before  (beyond the latest example
>> I posted the log files for - see below), not sure yet what is going on.  I
>> will be investigating further (making sure I am not mistaken) and following
>> up with some examples after the weekend but if you guys can take a look at
>> the log files on Monday I provided with the previous email that would be
>> great,
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* Saturday, June 20, 2015 7:29 PM
>> *To:* dev
>> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
>> reka@wso2.com); Ryan Du Plessis (rdupless)
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi All,
>>
>>
>>
>> I'm sorry I could not follow the entire discussion.
>>
>> Can someone explain the latest status please? Have we resolved the
>> initial group scaling issue and now seeing an application removal problem?
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Lasindu, Reka,
>>
>>
>>
>>
>>
>> Just run into the issue with removing the application *again*: (with the
>> fix for the issue included)
>>
>>
>>
>> Please see [1a., 1b.] for the application structure (group scaling
>> defined at only one group level). See also the respective artifacts and log
>> file attached.
>>
>>
>>
>> Please advise if we should reopen the JIRA
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>>
>>
>> Application [1a.]
>>
>>
>>
>>
>>
>> [1b.] application after “starting application remove”
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>   --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> *Lasindu Charith*
>>
>> Software Engineer, WSO2 Inc.
>>
>> Committer & PMC Member, Apache Stratos
>>
>> Mobile: +94714427192 | Web: blog.lasindu.com
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>
>
>
> --
> *Lasindu Charith*
> Software Engineer, WSO2 Inc.
> Committer & PMC Member, Apache Stratos
> Mobile: +94714427192 | Web: blog.lasindu.com
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Forgot to mention, uses the same application deployment policy “default-iaas”

From: Martin Eppel (meppel)
Sent: Tuesday, June 23, 2015 10:06 PM
To: Reka Thirunavukkarasu
Cc: dev@stratos.apache.org; Lasindu Charith; Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Btw, additional pointer,

Another example of an application (below) deploys just fine, the issue seems to occur in apps which define group and cartridge deployment policies


{"alias": "subscription-c1-in-group", "applicationId": "subscription-c1-in-group", "components": {"cartridges": [], "groups": [{"name": "subscription-c1-in-group", "groupMaxInstances": 1, "groupMinInstances": 1, "alias": "subscription-c1-in-group-x0x", "cartridges": [{"cartridgeMin": 1, "cartridgeMax": 1, "type": "c1", "subscribableInfo": {"alias": "c1-0x0", "deploymentPolicy": "static-1", "artifactRepository": {"repoUsername": "user", "repoUrl": "http://octl.qmog.cisco.com:10080/git/default.git", "privateRepo": true, "repoPassword": "c-policy"}, "autoscalingPolicy": "economyPolicy"}}], "groups": []}]}}

Static1.json

{
    "id": "static",
    "regions":
    [
        {
            "id": "RegionOne",
            "algorithm": "one-after-another",
            "partitions":
           [
                {
                    "id": "whole-region",
                    "max": 100
                }
            ]
        }
    ]
}



From: Martin Eppel (meppel)
Sent: Tuesday, June 23, 2015 9:13 PM
To: 'Reka Thirunavukkarasu'
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith; Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Thanks Reka,

Posting the application deployment policy:

{
    "id": "default-iaas",
    "algorithm":"one-after-another",
    "networkPartitions":
    [
        "RegionOne"
    ],
    "properties":
    [
        {
            "name" : "networkPartitionGroups",
            "value" : "RegionOne"
        }
    ]
}

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Tuesday, June 23, 2015 8:52 PM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith; Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
Prior to deploy an application, you will have to deploy application policy and then use that policy in order to deploy the application. You can find the sample application-policy and screen shot of UI here with. Following are the sample commands used to add application-policy, add application and deploy application. This application-policy was there from when we support group level deployment policy. I could see in some of your earlier logs where you have used an application-policy with the id "default-iaas". Can you check with your deployment and share us all the artifacts that you are using? So that we debug more on that..Otherwise, you can browse the UI (home-->configure-->application-policies section) in order to get what is the application-policy that you are using.

echo "Adding application policy..."
curl -X POST -H "Content-Type: application/json" -d "@application-policy-1.json" -k -v -u admin:admin https://127.0.0.1:9443/api/applicationPolicies


echo "Creating application..."
curl -X POST -H "Content-Type: application/json" -d "@application-s-g-c1-c2-c3-s.json" -k -v -u admin:admin https://127.0.0.1:9443/api/applications

sleep 1

echo "Deploying application..."
curl -X POST -H "Content-Type: application/json" -k -v -u admin:admin https://127.0.0.1:9443/api/applications/g-sc-G12-1/deploy/application-policy-1
Where application-policy-1 is the application policy name.

On Wed, Jun 24, 2015 at 9:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
One more comment,

I have used a very similar application with the same deployment policies just a few days ago without any issues before I picked up the latest stratos commits, something changed (for once added validation of multiple group deployment polices in nested scenarios as it seems). Again, either validation is broken or configuration requirements have changed, but how can I fix it ?

To point it out, RegionOne is defined as network partition in the deployment policies but is  listed in the exception (see below) as not being used ?


Thanks

Martin

Reposting snippet of exception:

TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application deployment failed: [application-id]g-sc-G12-1
org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException: Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne


From: Martin Eppel (meppel)
Sent: Tuesday, June 23, 2015 8:20 PM
To: Lasindu Charith
Cc: Reka Thirunavukkarasu; dev@stratos.apache.org<ma...@stratos.apache.org>; Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Lasindu,

I am confused, which application policy – beyond the deployment-policy-<1,2,3> there is no other deployment policy defined ? The network partitions are defined in the zip file which was attached to my email ? Did you take a look at the zip file ?

Thanks

Martin

e.g. deployment-policy-1.json

{
    "id": "deployment-policy-1",
    "networkPartitions":
    [
        {
            "id": "RegionOne",
            "partitionAlgo": "one-after-another",
            "partitions":
            [
                {
                    "id": "whole-region",
                    "partitionMax": 5
                }
            ]
        }
    ]
}

From: Lasindu Charith [mailto:lasindu@wso2.com]
Sent: Tuesday, June 23, 2015 7:29 PM
To: Martin Eppel (meppel)
Cc: Reka Thirunavukkarasu; dev@stratos.apache.org<ma...@stratos.apache.org>; Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Could you please share the application policy json as well as network partition jsons you have used?

Thanks,

On Wed, Jun 24, 2015 at 2:44 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I picked up the latest code but run into an issue with the deployment policy (I also noticed that validation seem to enforce that a deployment policy can only be configured once in the application at the group level, enabling group scaling ?!):

In the scenario I define a group level deployment policy at the bottom level group ("g-sc-G3-1") and also define cartridge deployment policies for all other cartridges (c1, c3) in the parent groups, each defining a network partition “RegionOne”. However, deploying the app causes the following exception below, not sure I missed a configuration or if there is an issue in the deployment policy validation ? This used to work before I picked up the latest changes. My current latest commit id is:

commit bb6e102986ad8e54556d9f6de47cc6eaa077e775
Author: reka <rt...@gmail.com>>
Date:   Tue Jun 23 19:22:04 2015 +0530

Application json and deployment policies are attached to the email,

Thanks

Martin

TID: [0] [STRATOS] [2015-06-23 20:48:09,428] ERROR {org.apache.stratos.autoscaler.util.AutoscalerUtil} -  Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne
TID: [0] [STRATOS] [2015-06-23 20:48:09,441] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application context [g-sc-G12-1] persisted successfully in the autoscaler registry
TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application deployment failed: [application-id]g-sc-G12-1
org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException: Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne
        at org.apache.stratos.autoscaler.util.AutoscalerUtil.validateApplicationPolicyAgainstApplication(AutoscalerUtil.java:746)
        at org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl.deployApplication(AutoscalerServiceImpl.java:279)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.axis2.rpc.receivers.RPCUtil.invokeServiceClass(RPCUtil.java:212)
        at org.apache.axis2.rpc.receivers.RPCMessageReceiver.invokeBusinessLogic(RPCMessageReceiver.java:117)
        at org.apache.axis2.receivers.AbstractInOutMessageReceiver.invokeBusinessLogic(AbstractInOutMessageReceiver.java:40)
        at org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:110)
        at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
        at org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:172)
        at org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:146)
        at org.wso2.carbon.core.transports.CarbonServlet.doPost(CarbonServlet.java:231)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
        at org.eclipse.equinox.http.servlet.internal.ServletRegistration.service(ServletRegistration.java:61)
        at org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:128)
        at org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Tuesday, June 23, 2015 7:00 AM

To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
As we have merged all the thread pools according to the discussion in [1], the default pool size is taken as 100 for all kind of monitors(application, group and cluster). If you need to increase this, then please add below parameter to the stratos.sh
-Dmonitor.thread.pool.size=xxxx
Also, we have fixed an application instance termination and stratos restart issue with group scaling as well. So that now you can restart stratos even when the group scaling happens. The lastes commit is:

bb6e102986ad8e54556d9f6de47cc6eaa077e775
Do let us know how your testing goes with these fixes.
[1] Merging all the threading pools used in autoscaler to one thread pool
Thanks,
Reka

On Tue, Jun 23, 2015 at 10:49 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Thanks Reka

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 9:59 PM

To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
These are actually configurable parameters. In the stratos code, these thread pool sizes are set to 20 by default. If we need to change it, then we can pass those as system properties in our stratos.sh. Since default values are taken by stratos code, we don't need to provide this in the standalone pack. When there is a complex application with more groups and clusters, there will be more use of threads. In that case, the default pool size of 20 might get exhausted. So, it would be better to have this properties customized according to the application structure. I faced some issues like events listeners didn't get triggered properly due to thread pool got exhausted with threads when i used the application sample that you have attached to this thread. After i increase the thread pool size to 50, i didn't get any issues.
I'm in the process of analyzing the thread usage in order to decide on the recommended pool size along with application structure. So that anyone can calculate the correct pool size that they require according to the application and configure this parameter.
Hope this will help you to understand on those parameters.

Thanks,
Reka

On Mon, Jun 22, 2015 at 11:50 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I am not clear on the 2 properties you mention below, are they supposed to be set in the stratos.sh ? I just picked up the latest code and from the apache stratos repo and don’t see them ?

Btw,  read.write.lock.monitor.enabled=false  is disabled in our production code (I assume it is set to false by default if not specified) , I only enable it to provide additional information

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 7:30 AM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)

Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
I have verified the fix by enabling read.write.lock.monitor.enabled=true. The fix worked fine with it. Since we are using concurrency and delegated some flow to Threads, i had to provide the thread values to below values in the stratos.sh.

    -Dapplication.monitor.thread.pool.size=50 \
    -Dgroup.monitor.thread.pool.size=50 \
Please note that it is recommended to have read.write.lock.monitor.enabled=false as it will consume more footprint in the production. This property introduce only for the testing purpose.

We are in the process of analyzing the thread size and will come up with a recommended values for it.
Also, i have fixed a small issue in the REST endpoint as it returns some default value whenever application run time is not found. Now that if runtime is not found, the below message will get populated.

{"status":"error","message":"Application runtime not found"}
I have also verified the undeployment with group scaling. Didn't find any issues with the above fixes.
Please find the latest commit as below:

0a969200d11228158606f011ca7e5e795f336d92.
Please note that below error was only observed which is harmless for now. I have verified it with a workaround and working fine. But will check on the severity and decide on a proper fix or will go with the workaround.

[1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] topology [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)
org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Thanks,
Reka



On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
Found the reason why we didn't encounter these locking issue as we were testing with default stratos pack which has read.write.lock.monitor.enabled=false. The locking warning or issue is raised only when you use read.write.lock.monitor.enabled=true. That's why you were only facing these locking issue as you use this configuration in your setup.
Since I'm able to reproduce the issue, i will test with the fix that i already pushed and update the thread.
We will discuss and try to make this read.write.lock.monitor.enabled=true by default with stratos. So that we can find issues as early and fix them.

Thanks,
Reka

On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Sorry Martin..I have only locally fixed the issue. I have pushed it now only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you get chance? I will also continue testing with this fix.
Thanks,
Reka

On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Btw,

This is my last commit I picked up from the stratos master:

commit 58bea52be814269416f70391fef50859aa5ae0a1
Author: lasinducharith <la...@gmail.com>>
Date:   Fri Jun 19 19:40:27 2015 +0530

From: Martin Eppel (meppel)
Sent: Sunday, June 21, 2015 10:28 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>; Reka Thirunavukkarasu
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Reka,

Here is another example which fails, see application at [1.], attached log files and jsons.  I run a few scenarios, the one which is failing is application with the name “s-g-c1-c2-c3” (last scenario). All members get removed but application remains deployed,

“s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances 0, members 0 ()”


Thanks


Martin




[cid:image001.png@01D0AE01.285ABD20]




From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Sunday, June 21, 2015 1:32 AM
To: Reka Thirunavukkarasu
Cc: dev; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Great! Thanks Reka!

On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Sure..I will have a look on the logs. I will also go through the recent commits and try to revert the fix which added for nested group scaling as it is not needed for this release.  As Martin mentioned that after the fixes, there are more issues. Otherwise, we will have to go through another major effort in testing it.
I will update the progress of it...

Thanks,
Reka

On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Martin,

Thanks for the quick response. Yes we will definitely go through the logs and investigate this.

Thanks

On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Isuru,

No, the issue does not seem to be resolved. With the latest code I see issues in test cases which used to work before  (beyond the latest example I posted the log files for - see below), not sure yet what is going on.  I will be investigating further (making sure I am not mistaken) and following up with some examples after the weekend but if you guys can take a look at the log files on Monday I provided with the previous email that would be great,

Thanks

Martin

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Saturday, June 20, 2015 7:29 PM
To: dev
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Reka Thirunavukkarasu (reka@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi All,

I'm sorry I could not follow the entire discussion.
Can someone explain the latest status please? Have we resolved the initial group scaling issue and now seeing an application removal problem?

Thanks

On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Lasindu, Reka,


Just run into the issue with removing the application again: (with the fix for the issue included)

Please see [1a., 1b.] for the application structure (group scaling defined at only one group level). See also the respective artifacts and log file attached.

Please advise if we should reopen the JIRA

Thanks

Martin


Application [1a.]

[cid:image002.png@01D0AE01.285ABD20]

[1b.] application after “starting application remove”

[cid:image003.png@01D0AE01.285ABD20]









--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Lasindu Charith
Software Engineer, WSO2 Inc.
Committer & PMC Member, Apache Stratos
Mobile: +94714427192<tel:%2B94714427192> | Web: blog.lasindu.com<http://blog.lasindu.com>



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007


RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Btw, additional pointer,

Another example of an application (below) deploys just fine, the issue seems to occur in apps which define group and cartridge deployment policies


{"alias": "subscription-c1-in-group", "applicationId": "subscription-c1-in-group", "components": {"cartridges": [], "groups": [{"name": "subscription-c1-in-group", "groupMaxInstances": 1, "groupMinInstances": 1, "alias": "subscription-c1-in-group-x0x", "cartridges": [{"cartridgeMin": 1, "cartridgeMax": 1, "type": "c1", "subscribableInfo": {"alias": "c1-0x0", "deploymentPolicy": "static-1", "artifactRepository": {"repoUsername": "user", "repoUrl": "http://octl.qmog.cisco.com:10080/git/default.git", "privateRepo": true, "repoPassword": "c-policy"}, "autoscalingPolicy": "economyPolicy"}}], "groups": []}]}}

Static1.json

{
    "id": "static",
    "regions":
    [
        {
            "id": "RegionOne",
            "algorithm": "one-after-another",
            "partitions":
           [
                {
                    "id": "whole-region",
                    "max": 100
                }
            ]
        }
    ]
}



From: Martin Eppel (meppel)
Sent: Tuesday, June 23, 2015 9:13 PM
To: 'Reka Thirunavukkarasu'
Cc: dev@stratos.apache.org; Lasindu Charith; Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Thanks Reka,

Posting the application deployment policy:

{
    "id": "default-iaas",
    "algorithm":"one-after-another",
    "networkPartitions":
    [
        "RegionOne"
    ],
    "properties":
    [
        {
            "name" : "networkPartitionGroups",
            "value" : "RegionOne"
        }
    ]
}

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Tuesday, June 23, 2015 8:52 PM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org; Lasindu Charith; Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
Prior to deploy an application, you will have to deploy application policy and then use that policy in order to deploy the application. You can find the sample application-policy and screen shot of UI here with. Following are the sample commands used to add application-policy, add application and deploy application. This application-policy was there from when we support group level deployment policy. I could see in some of your earlier logs where you have used an application-policy with the id "default-iaas". Can you check with your deployment and share us all the artifacts that you are using? So that we debug more on that..Otherwise, you can browse the UI (home-->configure-->application-policies section) in order to get what is the application-policy that you are using.

echo "Adding application policy..."
curl -X POST -H "Content-Type: application/json" -d "@application-policy-1.json" -k -v -u admin:admin https://127.0.0.1:9443/api/applicationPolicies


echo "Creating application..."
curl -X POST -H "Content-Type: application/json" -d "@application-s-g-c1-c2-c3-s.json" -k -v -u admin:admin https://127.0.0.1:9443/api/applications

sleep 1

echo "Deploying application..."
curl -X POST -H "Content-Type: application/json" -k -v -u admin:admin https://127.0.0.1:9443/api/applications/g-sc-G12-1/deploy/application-policy-1
Where application-policy-1 is the application policy name.

On Wed, Jun 24, 2015 at 9:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
One more comment,

I have used a very similar application with the same deployment policies just a few days ago without any issues before I picked up the latest stratos commits, something changed (for once added validation of multiple group deployment polices in nested scenarios as it seems). Again, either validation is broken or configuration requirements have changed, but how can I fix it ?

To point it out, RegionOne is defined as network partition in the deployment policies but is  listed in the exception (see below) as not being used ?


Thanks

Martin

Reposting snippet of exception:

TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application deployment failed: [application-id]g-sc-G12-1
org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException: Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne


From: Martin Eppel (meppel)
Sent: Tuesday, June 23, 2015 8:20 PM
To: Lasindu Charith
Cc: Reka Thirunavukkarasu; dev@stratos.apache.org<ma...@stratos.apache.org>; Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Lasindu,

I am confused, which application policy – beyond the deployment-policy-<1,2,3> there is no other deployment policy defined ? The network partitions are defined in the zip file which was attached to my email ? Did you take a look at the zip file ?

Thanks

Martin

e.g. deployment-policy-1.json

{
    "id": "deployment-policy-1",
    "networkPartitions":
    [
        {
            "id": "RegionOne",
            "partitionAlgo": "one-after-another",
            "partitions":
            [
                {
                    "id": "whole-region",
                    "partitionMax": 5
                }
            ]
        }
    ]
}

From: Lasindu Charith [mailto:lasindu@wso2.com]
Sent: Tuesday, June 23, 2015 7:29 PM
To: Martin Eppel (meppel)
Cc: Reka Thirunavukkarasu; dev@stratos.apache.org<ma...@stratos.apache.org>; Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Could you please share the application policy json as well as network partition jsons you have used?

Thanks,

On Wed, Jun 24, 2015 at 2:44 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I picked up the latest code but run into an issue with the deployment policy (I also noticed that validation seem to enforce that a deployment policy can only be configured once in the application at the group level, enabling group scaling ?!):

In the scenario I define a group level deployment policy at the bottom level group ("g-sc-G3-1") and also define cartridge deployment policies for all other cartridges (c1, c3) in the parent groups, each defining a network partition “RegionOne”. However, deploying the app causes the following exception below, not sure I missed a configuration or if there is an issue in the deployment policy validation ? This used to work before I picked up the latest changes. My current latest commit id is:

commit bb6e102986ad8e54556d9f6de47cc6eaa077e775
Author: reka <rt...@gmail.com>>
Date:   Tue Jun 23 19:22:04 2015 +0530

Application json and deployment policies are attached to the email,

Thanks

Martin

TID: [0] [STRATOS] [2015-06-23 20:48:09,428] ERROR {org.apache.stratos.autoscaler.util.AutoscalerUtil} -  Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne
TID: [0] [STRATOS] [2015-06-23 20:48:09,441] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application context [g-sc-G12-1] persisted successfully in the autoscaler registry
TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application deployment failed: [application-id]g-sc-G12-1
org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException: Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne
        at org.apache.stratos.autoscaler.util.AutoscalerUtil.validateApplicationPolicyAgainstApplication(AutoscalerUtil.java:746)
        at org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl.deployApplication(AutoscalerServiceImpl.java:279)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.axis2.rpc.receivers.RPCUtil.invokeServiceClass(RPCUtil.java:212)
        at org.apache.axis2.rpc.receivers.RPCMessageReceiver.invokeBusinessLogic(RPCMessageReceiver.java:117)
        at org.apache.axis2.receivers.AbstractInOutMessageReceiver.invokeBusinessLogic(AbstractInOutMessageReceiver.java:40)
        at org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:110)
        at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
        at org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:172)
        at org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:146)
        at org.wso2.carbon.core.transports.CarbonServlet.doPost(CarbonServlet.java:231)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
        at org.eclipse.equinox.http.servlet.internal.ServletRegistration.service(ServletRegistration.java:61)
        at org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:128)
        at org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Tuesday, June 23, 2015 7:00 AM

To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
As we have merged all the thread pools according to the discussion in [1], the default pool size is taken as 100 for all kind of monitors(application, group and cluster). If you need to increase this, then please add below parameter to the stratos.sh
-Dmonitor.thread.pool.size=xxxx
Also, we have fixed an application instance termination and stratos restart issue with group scaling as well. So that now you can restart stratos even when the group scaling happens. The lastes commit is:

bb6e102986ad8e54556d9f6de47cc6eaa077e775
Do let us know how your testing goes with these fixes.
[1] Merging all the threading pools used in autoscaler to one thread pool
Thanks,
Reka

On Tue, Jun 23, 2015 at 10:49 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Thanks Reka

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 9:59 PM

To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
These are actually configurable parameters. In the stratos code, these thread pool sizes are set to 20 by default. If we need to change it, then we can pass those as system properties in our stratos.sh. Since default values are taken by stratos code, we don't need to provide this in the standalone pack. When there is a complex application with more groups and clusters, there will be more use of threads. In that case, the default pool size of 20 might get exhausted. So, it would be better to have this properties customized according to the application structure. I faced some issues like events listeners didn't get triggered properly due to thread pool got exhausted with threads when i used the application sample that you have attached to this thread. After i increase the thread pool size to 50, i didn't get any issues.
I'm in the process of analyzing the thread usage in order to decide on the recommended pool size along with application structure. So that anyone can calculate the correct pool size that they require according to the application and configure this parameter.
Hope this will help you to understand on those parameters.

Thanks,
Reka

On Mon, Jun 22, 2015 at 11:50 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I am not clear on the 2 properties you mention below, are they supposed to be set in the stratos.sh ? I just picked up the latest code and from the apache stratos repo and don’t see them ?

Btw,  read.write.lock.monitor.enabled=false  is disabled in our production code (I assume it is set to false by default if not specified) , I only enable it to provide additional information

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 7:30 AM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)

Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
I have verified the fix by enabling read.write.lock.monitor.enabled=true. The fix worked fine with it. Since we are using concurrency and delegated some flow to Threads, i had to provide the thread values to below values in the stratos.sh.

    -Dapplication.monitor.thread.pool.size=50 \
    -Dgroup.monitor.thread.pool.size=50 \
Please note that it is recommended to have read.write.lock.monitor.enabled=false as it will consume more footprint in the production. This property introduce only for the testing purpose.

We are in the process of analyzing the thread size and will come up with a recommended values for it.
Also, i have fixed a small issue in the REST endpoint as it returns some default value whenever application run time is not found. Now that if runtime is not found, the below message will get populated.

{"status":"error","message":"Application runtime not found"}
I have also verified the undeployment with group scaling. Didn't find any issues with the above fixes.
Please find the latest commit as below:

0a969200d11228158606f011ca7e5e795f336d92.
Please note that below error was only observed which is harmless for now. I have verified it with a workaround and working fine. But will check on the severity and decide on a proper fix or will go with the workaround.

[1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] topology [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)
org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Thanks,
Reka



On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
Found the reason why we didn't encounter these locking issue as we were testing with default stratos pack which has read.write.lock.monitor.enabled=false. The locking warning or issue is raised only when you use read.write.lock.monitor.enabled=true. That's why you were only facing these locking issue as you use this configuration in your setup.
Since I'm able to reproduce the issue, i will test with the fix that i already pushed and update the thread.
We will discuss and try to make this read.write.lock.monitor.enabled=true by default with stratos. So that we can find issues as early and fix them.

Thanks,
Reka

On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Sorry Martin..I have only locally fixed the issue. I have pushed it now only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you get chance? I will also continue testing with this fix.
Thanks,
Reka

On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Btw,

This is my last commit I picked up from the stratos master:

commit 58bea52be814269416f70391fef50859aa5ae0a1
Author: lasinducharith <la...@gmail.com>>
Date:   Fri Jun 19 19:40:27 2015 +0530

From: Martin Eppel (meppel)
Sent: Sunday, June 21, 2015 10:28 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>; Reka Thirunavukkarasu
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Reka,

Here is another example which fails, see application at [1.], attached log files and jsons.  I run a few scenarios, the one which is failing is application with the name “s-g-c1-c2-c3” (last scenario). All members get removed but application remains deployed,

“s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances 0, members 0 ()”


Thanks


Martin




[cid:image001.png@01D0AE00.C372D460]




From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Sunday, June 21, 2015 1:32 AM
To: Reka Thirunavukkarasu
Cc: dev; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Great! Thanks Reka!

On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Sure..I will have a look on the logs. I will also go through the recent commits and try to revert the fix which added for nested group scaling as it is not needed for this release.  As Martin mentioned that after the fixes, there are more issues. Otherwise, we will have to go through another major effort in testing it.
I will update the progress of it...

Thanks,
Reka

On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Martin,

Thanks for the quick response. Yes we will definitely go through the logs and investigate this.

Thanks

On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Isuru,

No, the issue does not seem to be resolved. With the latest code I see issues in test cases which used to work before  (beyond the latest example I posted the log files for - see below), not sure yet what is going on.  I will be investigating further (making sure I am not mistaken) and following up with some examples after the weekend but if you guys can take a look at the log files on Monday I provided with the previous email that would be great,

Thanks

Martin

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Saturday, June 20, 2015 7:29 PM
To: dev
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Reka Thirunavukkarasu (reka@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi All,

I'm sorry I could not follow the entire discussion.
Can someone explain the latest status please? Have we resolved the initial group scaling issue and now seeing an application removal problem?

Thanks

On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Lasindu, Reka,


Just run into the issue with removing the application again: (with the fix for the issue included)

Please see [1a., 1b.] for the application structure (group scaling defined at only one group level). See also the respective artifacts and log file attached.

Please advise if we should reopen the JIRA

Thanks

Martin


Application [1a.]

[cid:image002.png@01D0AE00.C372D460]

[1b.] application after “starting application remove”

[cid:image003.png@01D0AE00.C372D460]









--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Lasindu Charith
Software Engineer, WSO2 Inc.
Committer & PMC Member, Apache Stratos
Mobile: +94714427192<tel:%2B94714427192> | Web: blog.lasindu.com<http://blog.lasindu.com>



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007


Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

Prior to deploy an application, you will have to deploy application policy
and then use that policy in order to deploy the application. You can find
the sample application-policy and screen shot of UI here with. Following
are the sample commands used to add application-policy, add application and
deploy application. This application-policy was there from when we support
group level deployment policy. I could see in some of your earlier logs
where you have used an application-policy with the id "default-iaas". Can
you check with your deployment and share us all the artifacts that you are
using? So that we debug more on that..Otherwise, you can browse the UI
(home-->configure-->application-policies section) in order to get what is
the application-policy that you are using.

echo "Adding application policy..."
curl -X POST -H "Content-Type: application/json" -d
"@application-policy-1.json" -k -v -u admin:admin
https://127.0.0.1:9443/api/applicationPolicies


echo "Creating application..."
curl -X POST -H "Content-Type: application/json" -d
"@application-s-g-c1-c2-c3-s.json" -k -v -u admin:admin
https://127.0.0.1:9443/api/applications

sleep 1

echo "Deploying application..."
curl -X POST -H "Content-Type: application/json" -k -v -u admin:admin
https://127.0.0.1:9443/api/applications/g-sc-G12-1/deploy/application-policy-1

*Where application-policy-1 is the application policy name.
                                                  *

On Wed, Jun 24, 2015 at 9:09 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  One more comment,
>
>
>
> I have used a very similar application with the same deployment policies
> just a few days ago without any issues before I picked up the latest
> stratos commits, something changed (for once added validation of multiple
> group deployment polices in nested scenarios as it seems). Again, either
> validation is broken or configuration requirements have changed, but how
> can I fix it ?
>
>
>
> To point it out, RegionOne is defined as network partition in the
> deployment policies but is  listed in the exception (see below) as not being
> used ?
>
>
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> Reposting snippet of exception:
>
>
>
> TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Application deployment failed: [application-id]g-sc-G12-1
>
> org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException:
> Invalid Application Policy: *Network partition [network-partition-id]
> RegionOne is not used in application* [application-id] g-sc-G12-1. Hence
> application bursting will fail. Either remove RegionOne from application
> policy or make all the cartridges available in RegionOne
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Tuesday, June 23, 2015 8:20 PM
> *To:* Lasindu Charith
> *Cc:* Reka Thirunavukkarasu; dev@stratos.apache.org; Ryan Du Plessis
> (rdupless)
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Lasindu,
>
>
>
> I am confused, which application policy – beyond the
> deployment-policy-<1,2,3> there is no other deployment policy defined ? The
> network partitions are defined in the zip file which was attached to my
> email ? Did you take a look at the zip file ?
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> e.g. deployment-policy-1.json
>
>
>
> {
>
>     "id": "deployment-policy-1",
>
>     "networkPartitions":
>
>     [
>
>         {
>
>             "id": "RegionOne",
>
>             "partitionAlgo": "one-after-another",
>
>             "partitions":
>
>             [
>
>                 {
>
>                     "id": "whole-region",
>
>                     "partitionMax": 5
>
>                 }
>
>             ]
>
>         }
>
>     ]
>
> }
>
>
>
> *From:* Lasindu Charith [mailto:lasindu@wso2.com <la...@wso2.com>]
> *Sent:* Tuesday, June 23, 2015 7:29 PM
> *To:* Martin Eppel (meppel)
> *Cc:* Reka Thirunavukkarasu; dev@stratos.apache.org; Ryan Du Plessis
> (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
>
>
> Could you please share the application policy json as well as network
> partition jsons you have used?
>
>
>
> Thanks,
>
>
>
> On Wed, Jun 24, 2015 at 2:44 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> I picked up the latest code but run into an issue with the deployment
> policy (I also noticed that validation seem to enforce that a deployment
> policy can only be configured once in the application at the group level,
> enabling group scaling ?!):
>
>
>
> In the scenario I define a group level deployment policy at the bottom
> level group ("g-sc-G3-1") and also define cartridge deployment policies for
> all other cartridges (c1, c3) in the parent groups, each defining a network
> partition “RegionOne”. However, deploying the app causes the following
> exception below, not sure I missed a configuration or if there is an issue
> in the deployment policy validation ? This used to work before I picked up
> the latest changes. My current latest commit id is:
>
>
>
> commit bb6e102986ad8e54556d9f6de47cc6eaa077e775
>
> Author: reka <rt...@gmail.com>
>
> Date:   Tue Jun 23 19:22:04 2015 +0530
>
>
>
> Application json and deployment policies are attached to the email,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> TID: [0] [STRATOS] [2015-06-23 20:48:09,428] ERROR
> {org.apache.stratos.autoscaler.util.AutoscalerUtil} -  Invalid Application
> Policy: Network partition [network-partition-id] RegionOne is not used in
> application [application-id] g-sc-G12-1. Hence application bursting will
> fail. Either remove RegionOne from application policy or make all the
> cartridges available in RegionOne
>
> TID: [0] [STRATOS] [2015-06-23 20:48:09,441] DEBUG
> {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application
> context [g-sc-G12-1] persisted successfully in the autoscaler registry
>
> TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Application deployment failed: [application-id]g-sc-G12-1
>
> org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException:
> Invalid Application Policy: *Network partition [network-partition-id]
> RegionOne is not used in application* [application-id] g-sc-G12-1. Hence
> application bursting will fail. Either remove RegionOne from application
> policy or make all the cartridges available in RegionOne
>
>         at
> org.apache.stratos.autoscaler.util.AutoscalerUtil.validateApplicationPolicyAgainstApplication(AutoscalerUtil.java:746)
>
>         at
> org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl.deployApplication(AutoscalerServiceImpl.java:279)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:606)
>
>         at
> org.apache.axis2.rpc.receivers.RPCUtil.invokeServiceClass(RPCUtil.java:212)
>
>         at
> org.apache.axis2.rpc.receivers.RPCMessageReceiver.invokeBusinessLogic(RPCMessageReceiver.java:117)
>
>         at
> org.apache.axis2.receivers.AbstractInOutMessageReceiver.invokeBusinessLogic(AbstractInOutMessageReceiver.java:40)
>
>         at
> org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:110)
>
>         at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
>
>         at
> org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:172)
>
>         at
> org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:146)
>
>         at
> org.wso2.carbon.core.transports.CarbonServlet.doPost(CarbonServlet.java:231)
>
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
>
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
>
>         at
> org.eclipse.equinox.http.servlet.internal.ServletRegistration.service(ServletRegistration.java:61)
>
>         at
> org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:128)
>
>         at
> org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68)
>
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Tuesday, June 23, 2015 7:00 AM
>
>
> *To:* Martin Eppel (meppel)
> *Cc:* dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan Du
> Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
> As we have merged all the thread pools according to the discussion in [1],
> the default pool size is taken as 100 for all kind of monitors(application,
> group and cluster). If you need to increase this, then please add below
> parameter to the stratos.sh
>
> -Dmonitor.thread.pool.size=xxxx
>
> Also, we have fixed an application instance termination and stratos
> restart issue with group scaling as well. So that now you can restart
> stratos even when the group scaling happens. The lastes commit is:
>
> bb6e102986ad8e54556d9f6de47cc6eaa077e775
>
> Do let us know how your testing goes with these fixes.
> [1] Merging all the threading pools used in autoscaler to one thread pool
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Jun 23, 2015 at 10:49 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Thanks Reka
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Monday, June 22, 2015 9:59 PM
>
>
> *To:* Martin Eppel (meppel)
> *Cc:* dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan Du
> Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
> These are actually configurable parameters. In the stratos code, these
> thread pool sizes are set to 20 by default. If we need to change it, then
> we can pass those as system properties in our stratos.sh. Since default
> values are taken by stratos code, we don't need to provide this in the
> standalone pack. When there is a complex application with more groups and
> clusters, there will be more use of threads. In that case, the default pool
> size of 20 might get exhausted. So, it would be better to have this
> properties customized according to the application structure. I faced some
> issues like events listeners didn't get triggered properly due to thread
> pool got exhausted with threads when i used the application sample that you
> have attached to this thread. After i increase the thread pool size to 50,
> i didn't get any issues.
>
> I'm in the process of analyzing the thread usage in order to decide on the
> recommended pool size along with application structure. So that anyone can
> calculate the correct pool size that they require according to the
> application and configure this parameter.
>
> Hope this will help you to understand on those parameters.
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Mon, Jun 22, 2015 at 11:50 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> I am not clear on the 2 properties you mention below, are they supposed to
> be set in the stratos.sh ? I just picked up the latest code and from the
> apache stratos repo and don’t see them ?
>
>
>
> Btw,  *read.write.lock.monitor.enabled=false * is disabled in our
> production code (I assume it is set to false by default if not specified) ,
> I only enable it to provide additional information
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Monday, June 22, 2015 7:30 AM
> *To:* Martin Eppel (meppel)
> *Cc:* dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan Du
> Plessis (rdupless)
>
>
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
> I have verified the fix by enabling read.write.lock.monitor.enabled=true.
> The fix worked fine with it. Since we are using concurrency and delegated
> some flow to Threads, i had to provide the thread values to below values in
> the stratos.sh.
>
>     -Dapplication.monitor.thread.pool.size=50 \
>     -Dgroup.monitor.thread.pool.size=50 \
>
> Please note that *it is recommended to have
> read.write.lock.monitor.enabled=false as it will consume more footprint in
> the production*. This property introduce only for the testing purpose.
>
>
>
> We are in the process of analyzing the thread size and will come up with a
> recommended values for it.
>
> Also, i have fixed a small issue in the REST endpoint as it returns some
> default value whenever application run time is not found. Now that if
> runtime is not found, the below message will get populated.
>
> {"status":"error","message":"Application runtime not found"}
>
> I have also verified the undeployment with group scaling. Didn't find any
> issues with the above fixes.
>
> Please find the latest commit as below:
>
> 0a969200d11228158606f011ca7e5e795f336d92.
>
> Please note that below error was only observed which is harmless for now.
> I have verified it with a workaround and working fine. But will check on
> the severity and decide on a proper fix or will go with the workaround.
>
> [1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR
> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
> error, lock has not released for 30 seconds: [lock-name] topology
> [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2
> [stack-trace]
> java.lang.Thread.getStackTrace(Thread.java:1589)
>
> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
>
> org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
>
> org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
>
> org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
>
> org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
>
> org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
>
> org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
>
> org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
>
> org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
>
> Thanks,
>
> Reka
>
>
>
>
>
>
>
> On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> Found the reason why we didn't encounter these locking issue as we were
> testing with default stratos pack which has
> read.write.lock.monitor.enabled=false. The locking warning or issue is
> raised only when you use read.write.lock.monitor.enabled=true. That's why
> you were only facing these locking issue as you use this configuration in
> your setup.
>
> Since I'm able to reproduce the issue, i will test with the fix that i
> already pushed and update the thread.
>
> We will discuss and try to make this read.write.lock.monitor.enabled=true
> by default with stratos. So that we can find issues as early and fix them.
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Sorry Martin..I have only locally fixed the issue. I have pushed it now
> only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you
> get chance? I will also continue testing with this fix.
>
> Thanks,
>
> Reka
>
>
>
> On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Btw,
>
>
>
> This is my last commit I picked up from the stratos master:
>
>
>
> commit 58bea52be814269416f70391fef50859aa5ae0a1
>
> Author: lasinducharith <la...@gmail.com>
>
> Date:   Fri Jun 19 19:40:27 2015 +0530
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Sunday, June 21, 2015 10:28 AM
> *To:* dev@stratos.apache.org; Reka Thirunavukkarasu
> *Cc:* Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Reka,
>
>
>
> Here is *anothe*r example which fails, see application at [1.], attached
> log files and jsons.  I run a few scenarios, the one which is failing is
> application with the name “s-g-c1-c2-c3” (last scenario). All members get
> removed but application remains deployed,
>
> “s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances
> 0, members 0 ()”
>
>
>
>
>
> Thanks
>
>
>
>
>
> Martin
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
> *Sent:* Sunday, June 21, 2015 1:32 AM
> *To:* Reka Thirunavukkarasu
> *Cc:* dev; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Great! Thanks Reka!
>
>
>
> On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin/Imesh,
>
> Sure..I will have a look on the logs. I will also go through the recent
> commits and try to revert the fix which added for nested group scaling as
> it is not needed for this release.  As Martin mentioned that after the
> fixes, there are more issues. Otherwise, we will have to go through another
> major effort in testing it.
>
> I will update the progress of it...
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Martin,
>
>
>
> Thanks for the quick response. Yes we will definitely go through the logs
> and investigate this.
>
>
>
> Thanks
>
>
>
> On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Isuru,
>
>
>
> No, the issue does not seem to be resolved. With the latest code I see
> issues in test cases which used to work before  (beyond the latest example
> I posted the log files for - see below), not sure yet what is going on.  I
> will be investigating further (making sure I am not mistaken) and following
> up with some examples after the weekend but if you guys can take a look at
> the log files on Monday I provided with the previous email that would be
> great,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Saturday, June 20, 2015 7:29 PM
> *To:* dev
> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
> reka@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi All,
>
>
>
> I'm sorry I could not follow the entire discussion.
>
> Can someone explain the latest status please? Have we resolved the initial
> group scaling issue and now seeing an application removal problem?
>
>
>
> Thanks
>
>
>
> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Lasindu, Reka,
>
>
>
>
>
> Just run into the issue with removing the application *again*: (with the
> fix for the issue included)
>
>
>
> Please see [1a., 1b.] for the application structure (group scaling defined
> at only one group level). See also the respective artifacts and log file
> attached.
>
>
>
> Please advise if we should reopen the JIRA
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> Application [1a.]
>
>
>
>
>
> [1b.] application after “starting application remove”
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>   --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
>
> --
>
> *Lasindu Charith*
>
> Software Engineer, WSO2 Inc.
>
> Committer & PMC Member, Apache Stratos
>
> Mobile: +94714427192 | Web: blog.lasindu.com
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
One more comment,

I have used a very similar application with the same deployment policies just a few days ago without any issues before I picked up the latest stratos commits, something changed (for once added validation of multiple group deployment polices in nested scenarios as it seems). Again, either validation is broken or configuration requirements have changed, but how can I fix it ?

To point it out, RegionOne is defined as network partition in the deployment policies but is  listed in the exception (see below) as not being used ?


Thanks

Martin

Reposting snippet of exception:

TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application deployment failed: [application-id]g-sc-G12-1
org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException: Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne


From: Martin Eppel (meppel)
Sent: Tuesday, June 23, 2015 8:20 PM
To: Lasindu Charith
Cc: Reka Thirunavukkarasu; dev@stratos.apache.org; Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Lasindu,

I am confused, which application policy – beyond the deployment-policy-<1,2,3> there is no other deployment policy defined ? The network partitions are defined in the zip file which was attached to my email ? Did you take a look at the zip file ?

Thanks

Martin

e.g. deployment-policy-1.json

{
    "id": "deployment-policy-1",
    "networkPartitions":
    [
        {
            "id": "RegionOne",
            "partitionAlgo": "one-after-another",
            "partitions":
            [
                {
                    "id": "whole-region",
                    "partitionMax": 5
                }
            ]
        }
    ]
}

From: Lasindu Charith [mailto:lasindu@wso2.com]
Sent: Tuesday, June 23, 2015 7:29 PM
To: Martin Eppel (meppel)
Cc: Reka Thirunavukkarasu; dev@stratos.apache.org<ma...@stratos.apache.org>; Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Could you please share the application policy json as well as network partition jsons you have used?

Thanks,

On Wed, Jun 24, 2015 at 2:44 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I picked up the latest code but run into an issue with the deployment policy (I also noticed that validation seem to enforce that a deployment policy can only be configured once in the application at the group level, enabling group scaling ?!):

In the scenario I define a group level deployment policy at the bottom level group ("g-sc-G3-1") and also define cartridge deployment policies for all other cartridges (c1, c3) in the parent groups, each defining a network partition “RegionOne”. However, deploying the app causes the following exception below, not sure I missed a configuration or if there is an issue in the deployment policy validation ? This used to work before I picked up the latest changes. My current latest commit id is:

commit bb6e102986ad8e54556d9f6de47cc6eaa077e775
Author: reka <rt...@gmail.com>>
Date:   Tue Jun 23 19:22:04 2015 +0530

Application json and deployment policies are attached to the email,

Thanks

Martin

TID: [0] [STRATOS] [2015-06-23 20:48:09,428] ERROR {org.apache.stratos.autoscaler.util.AutoscalerUtil} -  Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne
TID: [0] [STRATOS] [2015-06-23 20:48:09,441] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application context [g-sc-G12-1] persisted successfully in the autoscaler registry
TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application deployment failed: [application-id]g-sc-G12-1
org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException: Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne
        at org.apache.stratos.autoscaler.util.AutoscalerUtil.validateApplicationPolicyAgainstApplication(AutoscalerUtil.java:746)
        at org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl.deployApplication(AutoscalerServiceImpl.java:279)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.axis2.rpc.receivers.RPCUtil.invokeServiceClass(RPCUtil.java:212)
        at org.apache.axis2.rpc.receivers.RPCMessageReceiver.invokeBusinessLogic(RPCMessageReceiver.java:117)
        at org.apache.axis2.receivers.AbstractInOutMessageReceiver.invokeBusinessLogic(AbstractInOutMessageReceiver.java:40)
        at org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:110)
        at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
        at org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:172)
        at org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:146)
        at org.wso2.carbon.core.transports.CarbonServlet.doPost(CarbonServlet.java:231)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
        at org.eclipse.equinox.http.servlet.internal.ServletRegistration.service(ServletRegistration.java:61)
        at org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:128)
        at org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Tuesday, June 23, 2015 7:00 AM

To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
As we have merged all the thread pools according to the discussion in [1], the default pool size is taken as 100 for all kind of monitors(application, group and cluster). If you need to increase this, then please add below parameter to the stratos.sh
-Dmonitor.thread.pool.size=xxxx
Also, we have fixed an application instance termination and stratos restart issue with group scaling as well. So that now you can restart stratos even when the group scaling happens. The lastes commit is:

bb6e102986ad8e54556d9f6de47cc6eaa077e775
Do let us know how your testing goes with these fixes.
[1] Merging all the threading pools used in autoscaler to one thread pool
Thanks,
Reka

On Tue, Jun 23, 2015 at 10:49 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Thanks Reka

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 9:59 PM

To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
These are actually configurable parameters. In the stratos code, these thread pool sizes are set to 20 by default. If we need to change it, then we can pass those as system properties in our stratos.sh. Since default values are taken by stratos code, we don't need to provide this in the standalone pack. When there is a complex application with more groups and clusters, there will be more use of threads. In that case, the default pool size of 20 might get exhausted. So, it would be better to have this properties customized according to the application structure. I faced some issues like events listeners didn't get triggered properly due to thread pool got exhausted with threads when i used the application sample that you have attached to this thread. After i increase the thread pool size to 50, i didn't get any issues.
I'm in the process of analyzing the thread usage in order to decide on the recommended pool size along with application structure. So that anyone can calculate the correct pool size that they require according to the application and configure this parameter.
Hope this will help you to understand on those parameters.

Thanks,
Reka

On Mon, Jun 22, 2015 at 11:50 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I am not clear on the 2 properties you mention below, are they supposed to be set in the stratos.sh ? I just picked up the latest code and from the apache stratos repo and don’t see them ?

Btw,  read.write.lock.monitor.enabled=false  is disabled in our production code (I assume it is set to false by default if not specified) , I only enable it to provide additional information

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 7:30 AM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)

Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
I have verified the fix by enabling read.write.lock.monitor.enabled=true. The fix worked fine with it. Since we are using concurrency and delegated some flow to Threads, i had to provide the thread values to below values in the stratos.sh.

    -Dapplication.monitor.thread.pool.size=50 \
    -Dgroup.monitor.thread.pool.size=50 \
Please note that it is recommended to have read.write.lock.monitor.enabled=false as it will consume more footprint in the production. This property introduce only for the testing purpose.

We are in the process of analyzing the thread size and will come up with a recommended values for it.
Also, i have fixed a small issue in the REST endpoint as it returns some default value whenever application run time is not found. Now that if runtime is not found, the below message will get populated.

{"status":"error","message":"Application runtime not found"}
I have also verified the undeployment with group scaling. Didn't find any issues with the above fixes.
Please find the latest commit as below:

0a969200d11228158606f011ca7e5e795f336d92.
Please note that below error was only observed which is harmless for now. I have verified it with a workaround and working fine. But will check on the severity and decide on a proper fix or will go with the workaround.

[1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] topology [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)
org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Thanks,
Reka



On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
Found the reason why we didn't encounter these locking issue as we were testing with default stratos pack which has read.write.lock.monitor.enabled=false. The locking warning or issue is raised only when you use read.write.lock.monitor.enabled=true. That's why you were only facing these locking issue as you use this configuration in your setup.
Since I'm able to reproduce the issue, i will test with the fix that i already pushed and update the thread.
We will discuss and try to make this read.write.lock.monitor.enabled=true by default with stratos. So that we can find issues as early and fix them.

Thanks,
Reka

On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Sorry Martin..I have only locally fixed the issue. I have pushed it now only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you get chance? I will also continue testing with this fix.
Thanks,
Reka

On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Btw,

This is my last commit I picked up from the stratos master:

commit 58bea52be814269416f70391fef50859aa5ae0a1
Author: lasinducharith <la...@gmail.com>>
Date:   Fri Jun 19 19:40:27 2015 +0530

From: Martin Eppel (meppel)
Sent: Sunday, June 21, 2015 10:28 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>; Reka Thirunavukkarasu
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Reka,

Here is another example which fails, see application at [1.], attached log files and jsons.  I run a few scenarios, the one which is failing is application with the name “s-g-c1-c2-c3” (last scenario). All members get removed but application remains deployed,

“s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances 0, members 0 ()”


Thanks


Martin




[cid:image001.png@01D0ADF4.B9C02FF0]




From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Sunday, June 21, 2015 1:32 AM
To: Reka Thirunavukkarasu
Cc: dev; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Great! Thanks Reka!

On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Sure..I will have a look on the logs. I will also go through the recent commits and try to revert the fix which added for nested group scaling as it is not needed for this release.  As Martin mentioned that after the fixes, there are more issues. Otherwise, we will have to go through another major effort in testing it.
I will update the progress of it...

Thanks,
Reka

On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Martin,

Thanks for the quick response. Yes we will definitely go through the logs and investigate this.

Thanks

On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Isuru,

No, the issue does not seem to be resolved. With the latest code I see issues in test cases which used to work before  (beyond the latest example I posted the log files for - see below), not sure yet what is going on.  I will be investigating further (making sure I am not mistaken) and following up with some examples after the weekend but if you guys can take a look at the log files on Monday I provided with the previous email that would be great,

Thanks

Martin

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Saturday, June 20, 2015 7:29 PM
To: dev
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Reka Thirunavukkarasu (reka@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi All,

I'm sorry I could not follow the entire discussion.
Can someone explain the latest status please? Have we resolved the initial group scaling issue and now seeing an application removal problem?

Thanks

On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Lasindu, Reka,


Just run into the issue with removing the application again: (with the fix for the issue included)

Please see [1a., 1b.] for the application structure (group scaling defined at only one group level). See also the respective artifacts and log file attached.

Please advise if we should reopen the JIRA

Thanks

Martin


Application [1a.]

[cid:image002.png@01D0ADF4.B9C02FF0]

[1b.] application after “starting application remove”

[cid:image003.png@01D0ADF4.B9C02FF0]









--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Lasindu Charith
Software Engineer, WSO2 Inc.
Committer & PMC Member, Apache Stratos
Mobile: +94714427192 | Web: blog.lasindu.com<http://blog.lasindu.com>

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Lasindu,

I am confused, which application policy – beyond the deployment-policy-<1,2,3> there is no other deployment policy defined ? The network partitions are defined in the zip file which was attached to my email ? Did you take a look at the zip file ?

Thanks

Martin

e.g. deployment-policy-1.json

{
    "id": "deployment-policy-1",
    "networkPartitions":
    [
        {
            "id": "RegionOne",
            "partitionAlgo": "one-after-another",
            "partitions":
            [
                {
                    "id": "whole-region",
                    "partitionMax": 5
                }
            ]
        }
    ]
}

From: Lasindu Charith [mailto:lasindu@wso2.com]
Sent: Tuesday, June 23, 2015 7:29 PM
To: Martin Eppel (meppel)
Cc: Reka Thirunavukkarasu; dev@stratos.apache.org; Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Could you please share the application policy json as well as network partition jsons you have used?

Thanks,

On Wed, Jun 24, 2015 at 2:44 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I picked up the latest code but run into an issue with the deployment policy (I also noticed that validation seem to enforce that a deployment policy can only be configured once in the application at the group level, enabling group scaling ?!):

In the scenario I define a group level deployment policy at the bottom level group ("g-sc-G3-1") and also define cartridge deployment policies for all other cartridges (c1, c3) in the parent groups, each defining a network partition “RegionOne”. However, deploying the app causes the following exception below, not sure I missed a configuration or if there is an issue in the deployment policy validation ? This used to work before I picked up the latest changes. My current latest commit id is:

commit bb6e102986ad8e54556d9f6de47cc6eaa077e775
Author: reka <rt...@gmail.com>>
Date:   Tue Jun 23 19:22:04 2015 +0530

Application json and deployment policies are attached to the email,

Thanks

Martin

TID: [0] [STRATOS] [2015-06-23 20:48:09,428] ERROR {org.apache.stratos.autoscaler.util.AutoscalerUtil} -  Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne
TID: [0] [STRATOS] [2015-06-23 20:48:09,441] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application context [g-sc-G12-1] persisted successfully in the autoscaler registry
TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application deployment failed: [application-id]g-sc-G12-1
org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException: Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne
        at org.apache.stratos.autoscaler.util.AutoscalerUtil.validateApplicationPolicyAgainstApplication(AutoscalerUtil.java:746)
        at org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl.deployApplication(AutoscalerServiceImpl.java:279)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.axis2.rpc.receivers.RPCUtil.invokeServiceClass(RPCUtil.java:212)
        at org.apache.axis2.rpc.receivers.RPCMessageReceiver.invokeBusinessLogic(RPCMessageReceiver.java:117)
        at org.apache.axis2.receivers.AbstractInOutMessageReceiver.invokeBusinessLogic(AbstractInOutMessageReceiver.java:40)
        at org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:110)
        at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
        at org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:172)
        at org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:146)
        at org.wso2.carbon.core.transports.CarbonServlet.doPost(CarbonServlet.java:231)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
        at org.eclipse.equinox.http.servlet.internal.ServletRegistration.service(ServletRegistration.java:61)
        at org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:128)
        at org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Tuesday, June 23, 2015 7:00 AM

To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
As we have merged all the thread pools according to the discussion in [1], the default pool size is taken as 100 for all kind of monitors(application, group and cluster). If you need to increase this, then please add below parameter to the stratos.sh
-Dmonitor.thread.pool.size=xxxx
Also, we have fixed an application instance termination and stratos restart issue with group scaling as well. So that now you can restart stratos even when the group scaling happens. The lastes commit is:

bb6e102986ad8e54556d9f6de47cc6eaa077e775
Do let us know how your testing goes with these fixes.
[1] Merging all the threading pools used in autoscaler to one thread pool
Thanks,
Reka

On Tue, Jun 23, 2015 at 10:49 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Thanks Reka

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 9:59 PM

To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
These are actually configurable parameters. In the stratos code, these thread pool sizes are set to 20 by default. If we need to change it, then we can pass those as system properties in our stratos.sh. Since default values are taken by stratos code, we don't need to provide this in the standalone pack. When there is a complex application with more groups and clusters, there will be more use of threads. In that case, the default pool size of 20 might get exhausted. So, it would be better to have this properties customized according to the application structure. I faced some issues like events listeners didn't get triggered properly due to thread pool got exhausted with threads when i used the application sample that you have attached to this thread. After i increase the thread pool size to 50, i didn't get any issues.
I'm in the process of analyzing the thread usage in order to decide on the recommended pool size along with application structure. So that anyone can calculate the correct pool size that they require according to the application and configure this parameter.
Hope this will help you to understand on those parameters.

Thanks,
Reka

On Mon, Jun 22, 2015 at 11:50 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I am not clear on the 2 properties you mention below, are they supposed to be set in the stratos.sh ? I just picked up the latest code and from the apache stratos repo and don’t see them ?

Btw,  read.write.lock.monitor.enabled=false  is disabled in our production code (I assume it is set to false by default if not specified) , I only enable it to provide additional information

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 7:30 AM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)

Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
I have verified the fix by enabling read.write.lock.monitor.enabled=true. The fix worked fine with it. Since we are using concurrency and delegated some flow to Threads, i had to provide the thread values to below values in the stratos.sh.

    -Dapplication.monitor.thread.pool.size=50 \
    -Dgroup.monitor.thread.pool.size=50 \
Please note that it is recommended to have read.write.lock.monitor.enabled=false as it will consume more footprint in the production. This property introduce only for the testing purpose.

We are in the process of analyzing the thread size and will come up with a recommended values for it.
Also, i have fixed a small issue in the REST endpoint as it returns some default value whenever application run time is not found. Now that if runtime is not found, the below message will get populated.

{"status":"error","message":"Application runtime not found"}
I have also verified the undeployment with group scaling. Didn't find any issues with the above fixes.
Please find the latest commit as below:

0a969200d11228158606f011ca7e5e795f336d92.
Please note that below error was only observed which is harmless for now. I have verified it with a workaround and working fine. But will check on the severity and decide on a proper fix or will go with the workaround.

[1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] topology [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)
org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Thanks,
Reka



On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
Found the reason why we didn't encounter these locking issue as we were testing with default stratos pack which has read.write.lock.monitor.enabled=false. The locking warning or issue is raised only when you use read.write.lock.monitor.enabled=true. That's why you were only facing these locking issue as you use this configuration in your setup.
Since I'm able to reproduce the issue, i will test with the fix that i already pushed and update the thread.
We will discuss and try to make this read.write.lock.monitor.enabled=true by default with stratos. So that we can find issues as early and fix them.

Thanks,
Reka

On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Sorry Martin..I have only locally fixed the issue. I have pushed it now only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you get chance? I will also continue testing with this fix.
Thanks,
Reka

On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Btw,

This is my last commit I picked up from the stratos master:

commit 58bea52be814269416f70391fef50859aa5ae0a1
Author: lasinducharith <la...@gmail.com>>
Date:   Fri Jun 19 19:40:27 2015 +0530

From: Martin Eppel (meppel)
Sent: Sunday, June 21, 2015 10:28 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>; Reka Thirunavukkarasu
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Reka,

Here is another example which fails, see application at [1.], attached log files and jsons.  I run a few scenarios, the one which is failing is application with the name “s-g-c1-c2-c3” (last scenario). All members get removed but application remains deployed,

“s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances 0, members 0 ()”


Thanks


Martin




[cid:image001.png@01D0ADF1.83CE3FC0]




From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Sunday, June 21, 2015 1:32 AM
To: Reka Thirunavukkarasu
Cc: dev; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Great! Thanks Reka!

On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Sure..I will have a look on the logs. I will also go through the recent commits and try to revert the fix which added for nested group scaling as it is not needed for this release.  As Martin mentioned that after the fixes, there are more issues. Otherwise, we will have to go through another major effort in testing it.
I will update the progress of it...

Thanks,
Reka

On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Martin,

Thanks for the quick response. Yes we will definitely go through the logs and investigate this.

Thanks

On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Isuru,

No, the issue does not seem to be resolved. With the latest code I see issues in test cases which used to work before  (beyond the latest example I posted the log files for - see below), not sure yet what is going on.  I will be investigating further (making sure I am not mistaken) and following up with some examples after the weekend but if you guys can take a look at the log files on Monday I provided with the previous email that would be great,

Thanks

Martin

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Saturday, June 20, 2015 7:29 PM
To: dev
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Reka Thirunavukkarasu (reka@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi All,

I'm sorry I could not follow the entire discussion.
Can someone explain the latest status please? Have we resolved the initial group scaling issue and now seeing an application removal problem?

Thanks

On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Lasindu, Reka,


Just run into the issue with removing the application again: (with the fix for the issue included)

Please see [1a., 1b.] for the application structure (group scaling defined at only one group level). See also the respective artifacts and log file attached.

Please advise if we should reopen the JIRA

Thanks

Martin


Application [1a.]

[cid:image002.png@01D0ADF1.83CE3FC0]

[1b.] application after “starting application remove”

[cid:image003.png@01D0ADF1.83CE3FC0]









--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Lasindu Charith
Software Engineer, WSO2 Inc.
Committer & PMC Member, Apache Stratos
Mobile: +94714427192 | Web: blog.lasindu.com<http://blog.lasindu.com>

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Lasindu Charith <la...@wso2.com>.
Hi Martin,

Could you please share the application policy json as well as network
partition jsons you have used?

Thanks,

On Wed, Jun 24, 2015 at 2:44 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Reka,
>
>
>
> I picked up the latest code but run into an issue with the deployment
> policy (I also noticed that validation seem to enforce that a deployment
> policy can only be configured once in the application at the group level,
> enabling group scaling ?!):
>
>
>
> In the scenario I define a group level deployment policy at the bottom
> level group ("g-sc-G3-1") and also define cartridge deployment policies for
> all other cartridges (c1, c3) in the parent groups, each defining a network
> partition “RegionOne”. However, deploying the app causes the following
> exception below, not sure I missed a configuration or if there is an issue
> in the deployment policy validation ? This used to work before I picked up
> the latest changes. My current latest commit id is:
>
>
>
> commit bb6e102986ad8e54556d9f6de47cc6eaa077e775
>
> Author: reka <rt...@gmail.com>
>
> Date:   Tue Jun 23 19:22:04 2015 +0530
>
>
>
> Application json and deployment policies are attached to the email,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> TID: [0] [STRATOS] [2015-06-23 20:48:09,428] ERROR
> {org.apache.stratos.autoscaler.util.AutoscalerUtil} -  Invalid Application
> Policy: Network partition [network-partition-id] RegionOne is not used in
> application [application-id] g-sc-G12-1. Hence application bursting will
> fail. Either remove RegionOne from application policy or make all the
> cartridges available in RegionOne
>
> TID: [0] [STRATOS] [2015-06-23 20:48:09,441] DEBUG
> {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application
> context [g-sc-G12-1] persisted successfully in the autoscaler registry
>
> TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Application deployment failed: [application-id]g-sc-G12-1
>
> org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException:
> Invalid Application Policy: *Network partition [network-partition-id]
> RegionOne is not used in application* [application-id] g-sc-G12-1. Hence
> application bursting will fail. Either remove RegionOne from application
> policy or make all the cartridges available in RegionOne
>
>         at
> org.apache.stratos.autoscaler.util.AutoscalerUtil.validateApplicationPolicyAgainstApplication(AutoscalerUtil.java:746)
>
>         at
> org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl.deployApplication(AutoscalerServiceImpl.java:279)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:606)
>
>         at
> org.apache.axis2.rpc.receivers.RPCUtil.invokeServiceClass(RPCUtil.java:212)
>
>         at
> org.apache.axis2.rpc.receivers.RPCMessageReceiver.invokeBusinessLogic(RPCMessageReceiver.java:117)
>
>         at
> org.apache.axis2.receivers.AbstractInOutMessageReceiver.invokeBusinessLogic(AbstractInOutMessageReceiver.java:40)
>
>         at
> org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:110)
>
>         at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
>
>         at
> org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:172)
>
>         at
> org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:146)
>
>         at
> org.wso2.carbon.core.transports.CarbonServlet.doPost(CarbonServlet.java:231)
>
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
>
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
>
>         at
> org.eclipse.equinox.http.servlet.internal.ServletRegistration.service(ServletRegistration.java:61)
>
>         at
> org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:128)
>
>         at
> org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68)
>
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Tuesday, June 23, 2015 7:00 AM
>
> *To:* Martin Eppel (meppel)
> *Cc:* dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan Du
> Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
> As we have merged all the thread pools according to the discussion in [1],
> the default pool size is taken as 100 for all kind of monitors(application,
> group and cluster). If you need to increase this, then please add below
> parameter to the stratos.sh
>
> -Dmonitor.thread.pool.size=xxxx
>
> Also, we have fixed an application instance termination and stratos
> restart issue with group scaling as well. So that now you can restart
> stratos even when the group scaling happens. The lastes commit is:
>
> bb6e102986ad8e54556d9f6de47cc6eaa077e775
>
> Do let us know how your testing goes with these fixes.
> [1] Merging all the threading pools used in autoscaler to one thread pool
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Jun 23, 2015 at 10:49 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Thanks Reka
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Monday, June 22, 2015 9:59 PM
>
>
> *To:* Martin Eppel (meppel)
> *Cc:* dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan Du
> Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
> These are actually configurable parameters. In the stratos code, these
> thread pool sizes are set to 20 by default. If we need to change it, then
> we can pass those as system properties in our stratos.sh. Since default
> values are taken by stratos code, we don't need to provide this in the
> standalone pack. When there is a complex application with more groups and
> clusters, there will be more use of threads. In that case, the default pool
> size of 20 might get exhausted. So, it would be better to have this
> properties customized according to the application structure. I faced some
> issues like events listeners didn't get triggered properly due to thread
> pool got exhausted with threads when i used the application sample that you
> have attached to this thread. After i increase the thread pool size to 50,
> i didn't get any issues.
>
> I'm in the process of analyzing the thread usage in order to decide on the
> recommended pool size along with application structure. So that anyone can
> calculate the correct pool size that they require according to the
> application and configure this parameter.
>
> Hope this will help you to understand on those parameters.
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Mon, Jun 22, 2015 at 11:50 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> I am not clear on the 2 properties you mention below, are they supposed to
> be set in the stratos.sh ? I just picked up the latest code and from the
> apache stratos repo and don’t see them ?
>
>
>
> Btw,  *read.write.lock.monitor.enabled=false * is disabled in our
> production code (I assume it is set to false by default if not specified) ,
> I only enable it to provide additional information
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Monday, June 22, 2015 7:30 AM
> *To:* Martin Eppel (meppel)
> *Cc:* dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan Du
> Plessis (rdupless)
>
>
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
> I have verified the fix by enabling read.write.lock.monitor.enabled=true.
> The fix worked fine with it. Since we are using concurrency and delegated
> some flow to Threads, i had to provide the thread values to below values in
> the stratos.sh.
>
>     -Dapplication.monitor.thread.pool.size=50 \
>     -Dgroup.monitor.thread.pool.size=50 \
>
> Please note that *it is recommended to have
> read.write.lock.monitor.enabled=false as it will consume more footprint in
> the production*. This property introduce only for the testing purpose.
>
>
>
> We are in the process of analyzing the thread size and will come up with a
> recommended values for it.
>
> Also, i have fixed a small issue in the REST endpoint as it returns some
> default value whenever application run time is not found. Now that if
> runtime is not found, the below message will get populated.
>
> {"status":"error","message":"Application runtime not found"}
>
> I have also verified the undeployment with group scaling. Didn't find any
> issues with the above fixes.
>
> Please find the latest commit as below:
>
> 0a969200d11228158606f011ca7e5e795f336d92.
>
> Please note that below error was only observed which is harmless for now.
> I have verified it with a workaround and working fine. But will check on
> the severity and decide on a proper fix or will go with the workaround.
>
> [1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR
> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
> error, lock has not released for 30 seconds: [lock-name] topology
> [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2
> [stack-trace]
> java.lang.Thread.getStackTrace(Thread.java:1589)
>
> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
>
> org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
>
> org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
>
> org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
>
> org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
>
> org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
>
> org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
>
> org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
>
> org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
>
> Thanks,
>
> Reka
>
>
>
>
>
>
>
> On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> Found the reason why we didn't encounter these locking issue as we were
> testing with default stratos pack which has
> read.write.lock.monitor.enabled=false. The locking warning or issue is
> raised only when you use read.write.lock.monitor.enabled=true. That's why
> you were only facing these locking issue as you use this configuration in
> your setup.
>
> Since I'm able to reproduce the issue, i will test with the fix that i
> already pushed and update the thread.
>
> We will discuss and try to make this read.write.lock.monitor.enabled=true
> by default with stratos. So that we can find issues as early and fix them.
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Sorry Martin..I have only locally fixed the issue. I have pushed it now
> only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you
> get chance? I will also continue testing with this fix.
>
> Thanks,
>
> Reka
>
>
>
> On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Btw,
>
>
>
> This is my last commit I picked up from the stratos master:
>
>
>
> commit 58bea52be814269416f70391fef50859aa5ae0a1
>
> Author: lasinducharith <la...@gmail.com>
>
> Date:   Fri Jun 19 19:40:27 2015 +0530
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Sunday, June 21, 2015 10:28 AM
> *To:* dev@stratos.apache.org; Reka Thirunavukkarasu
> *Cc:* Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Reka,
>
>
>
> Here is *anothe*r example which fails, see application at [1.], attached
> log files and jsons.  I run a few scenarios, the one which is failing is
> application with the name “s-g-c1-c2-c3” (last scenario). All members get
> removed but application remains deployed,
>
> “s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances
> 0, members 0 ()”
>
>
>
>
>
> Thanks
>
>
>
>
>
> Martin
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
> *Sent:* Sunday, June 21, 2015 1:32 AM
> *To:* Reka Thirunavukkarasu
> *Cc:* dev; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Great! Thanks Reka!
>
>
>
> On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin/Imesh,
>
> Sure..I will have a look on the logs. I will also go through the recent
> commits and try to revert the fix which added for nested group scaling as
> it is not needed for this release.  As Martin mentioned that after the
> fixes, there are more issues. Otherwise, we will have to go through another
> major effort in testing it.
>
> I will update the progress of it...
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Martin,
>
>
>
> Thanks for the quick response. Yes we will definitely go through the logs
> and investigate this.
>
>
>
> Thanks
>
>
>
> On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Isuru,
>
>
>
> No, the issue does not seem to be resolved. With the latest code I see
> issues in test cases which used to work before  (beyond the latest example
> I posted the log files for - see below), not sure yet what is going on.  I
> will be investigating further (making sure I am not mistaken) and following
> up with some examples after the weekend but if you guys can take a look at
> the log files on Monday I provided with the previous email that would be
> great,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Saturday, June 20, 2015 7:29 PM
> *To:* dev
> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
> reka@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi All,
>
>
>
> I'm sorry I could not follow the entire discussion.
>
> Can someone explain the latest status please? Have we resolved the initial
> group scaling issue and now seeing an application removal problem?
>
>
>
> Thanks
>
>
>
> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Lasindu, Reka,
>
>
>
>
>
> Just run into the issue with removing the application *again*: (with the
> fix for the issue included)
>
>
>
> Please see [1a., 1b.] for the application structure (group scaling defined
> at only one group level). See also the respective artifacts and log file
> attached.
>
>
>
> Please advise if we should reopen the JIRA
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> Application [1a.]
>
>
>
>
>
> [1b.] application after “starting application remove”
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>   --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>



-- 
*Lasindu Charith*
Software Engineer, WSO2 Inc.
Committer & PMC Member, Apache Stratos
Mobile: +94714427192 | Web: blog.lasindu.com

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Reka,

I picked up the latest code but run into an issue with the deployment policy (I also noticed that validation seem to enforce that a deployment policy can only be configured once in the application at the group level, enabling group scaling ?!):

In the scenario I define a group level deployment policy at the bottom level group ("g-sc-G3-1") and also define cartridge deployment policies for all other cartridges (c1, c3) in the parent groups, each defining a network partition “RegionOne”. However, deploying the app causes the following exception below, not sure I missed a configuration or if there is an issue in the deployment policy validation ? This used to work before I picked up the latest changes. My current latest commit id is:

commit bb6e102986ad8e54556d9f6de47cc6eaa077e775
Author: reka <rt...@gmail.com>
Date:   Tue Jun 23 19:22:04 2015 +0530

Application json and deployment policies are attached to the email,

Thanks

Martin

TID: [0] [STRATOS] [2015-06-23 20:48:09,428] ERROR {org.apache.stratos.autoscaler.util.AutoscalerUtil} -  Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne
TID: [0] [STRATOS] [2015-06-23 20:48:09,441] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application context [g-sc-G12-1] persisted successfully in the autoscaler registry
TID: [0] [STRATOS] [2015-06-23 20:48:09,441] ERROR {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application deployment failed: [application-id]g-sc-G12-1
org.apache.stratos.autoscaler.exception.application.InvalidApplicationPolicyException: Invalid Application Policy: Network partition [network-partition-id] RegionOne is not used in application [application-id] g-sc-G12-1. Hence application bursting will fail. Either remove RegionOne from application policy or make all the cartridges available in RegionOne
        at org.apache.stratos.autoscaler.util.AutoscalerUtil.validateApplicationPolicyAgainstApplication(AutoscalerUtil.java:746)
        at org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl.deployApplication(AutoscalerServiceImpl.java:279)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.axis2.rpc.receivers.RPCUtil.invokeServiceClass(RPCUtil.java:212)
        at org.apache.axis2.rpc.receivers.RPCMessageReceiver.invokeBusinessLogic(RPCMessageReceiver.java:117)
        at org.apache.axis2.receivers.AbstractInOutMessageReceiver.invokeBusinessLogic(AbstractInOutMessageReceiver.java:40)
        at org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:110)
        at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
        at org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:172)
        at org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:146)
        at org.wso2.carbon.core.transports.CarbonServlet.doPost(CarbonServlet.java:231)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:755)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
        at org.eclipse.equinox.http.servlet.internal.ServletRegistration.service(ServletRegistration.java:61)
        at org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:128)
        at org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Tuesday, June 23, 2015 7:00 AM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
As we have merged all the thread pools according to the discussion in [1], the default pool size is taken as 100 for all kind of monitors(application, group and cluster). If you need to increase this, then please add below parameter to the stratos.sh
-Dmonitor.thread.pool.size=xxxx
Also, we have fixed an application instance termination and stratos restart issue with group scaling as well. So that now you can restart stratos even when the group scaling happens. The lastes commit is:

bb6e102986ad8e54556d9f6de47cc6eaa077e775
Do let us know how your testing goes with these fixes.
[1] Merging all the threading pools used in autoscaler to one thread pool
Thanks,
Reka

On Tue, Jun 23, 2015 at 10:49 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Thanks Reka

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 9:59 PM

To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
These are actually configurable parameters. In the stratos code, these thread pool sizes are set to 20 by default. If we need to change it, then we can pass those as system properties in our stratos.sh. Since default values are taken by stratos code, we don't need to provide this in the standalone pack. When there is a complex application with more groups and clusters, there will be more use of threads. In that case, the default pool size of 20 might get exhausted. So, it would be better to have this properties customized according to the application structure. I faced some issues like events listeners didn't get triggered properly due to thread pool got exhausted with threads when i used the application sample that you have attached to this thread. After i increase the thread pool size to 50, i didn't get any issues.
I'm in the process of analyzing the thread usage in order to decide on the recommended pool size along with application structure. So that anyone can calculate the correct pool size that they require according to the application and configure this parameter.
Hope this will help you to understand on those parameters.

Thanks,
Reka

On Mon, Jun 22, 2015 at 11:50 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I am not clear on the 2 properties you mention below, are they supposed to be set in the stratos.sh ? I just picked up the latest code and from the apache stratos repo and don’t see them ?

Btw,  read.write.lock.monitor.enabled=false  is disabled in our production code (I assume it is set to false by default if not specified) , I only enable it to provide additional information

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 7:30 AM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)

Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
I have verified the fix by enabling read.write.lock.monitor.enabled=true. The fix worked fine with it. Since we are using concurrency and delegated some flow to Threads, i had to provide the thread values to below values in the stratos.sh.

    -Dapplication.monitor.thread.pool.size=50 \
    -Dgroup.monitor.thread.pool.size=50 \
Please note that it is recommended to have read.write.lock.monitor.enabled=false as it will consume more footprint in the production. This property introduce only for the testing purpose.

We are in the process of analyzing the thread size and will come up with a recommended values for it.
Also, i have fixed a small issue in the REST endpoint as it returns some default value whenever application run time is not found. Now that if runtime is not found, the below message will get populated.

{"status":"error","message":"Application runtime not found"}
I have also verified the undeployment with group scaling. Didn't find any issues with the above fixes.
Please find the latest commit as below:

0a969200d11228158606f011ca7e5e795f336d92.
Please note that below error was only observed which is harmless for now. I have verified it with a workaround and working fine. But will check on the severity and decide on a proper fix or will go with the workaround.

[1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] topology [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)
org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Thanks,
Reka



On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
Found the reason why we didn't encounter these locking issue as we were testing with default stratos pack which has read.write.lock.monitor.enabled=false. The locking warning or issue is raised only when you use read.write.lock.monitor.enabled=true. That's why you were only facing these locking issue as you use this configuration in your setup.
Since I'm able to reproduce the issue, i will test with the fix that i already pushed and update the thread.
We will discuss and try to make this read.write.lock.monitor.enabled=true by default with stratos. So that we can find issues as early and fix them.

Thanks,
Reka

On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Sorry Martin..I have only locally fixed the issue. I have pushed it now only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you get chance? I will also continue testing with this fix.
Thanks,
Reka

On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Btw,

This is my last commit I picked up from the stratos master:

commit 58bea52be814269416f70391fef50859aa5ae0a1
Author: lasinducharith <la...@gmail.com>>
Date:   Fri Jun 19 19:40:27 2015 +0530

From: Martin Eppel (meppel)
Sent: Sunday, June 21, 2015 10:28 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>; Reka Thirunavukkarasu
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Reka,

Here is another example which fails, see application at [1.], attached log files and jsons.  I run a few scenarios, the one which is failing is application with the name “s-g-c1-c2-c3” (last scenario). All members get removed but application remains deployed,

“s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances 0, members 0 ()”


Thanks


Martin




[cid:image001.png@01D0ADB8.72D20AA0]




From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Sunday, June 21, 2015 1:32 AM
To: Reka Thirunavukkarasu
Cc: dev; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Great! Thanks Reka!

On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Sure..I will have a look on the logs. I will also go through the recent commits and try to revert the fix which added for nested group scaling as it is not needed for this release.  As Martin mentioned that after the fixes, there are more issues. Otherwise, we will have to go through another major effort in testing it.
I will update the progress of it...

Thanks,
Reka

On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Martin,

Thanks for the quick response. Yes we will definitely go through the logs and investigate this.

Thanks

On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Isuru,

No, the issue does not seem to be resolved. With the latest code I see issues in test cases which used to work before  (beyond the latest example I posted the log files for - see below), not sure yet what is going on.  I will be investigating further (making sure I am not mistaken) and following up with some examples after the weekend but if you guys can take a look at the log files on Monday I provided with the previous email that would be great,

Thanks

Martin

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Saturday, June 20, 2015 7:29 PM
To: dev
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Reka Thirunavukkarasu (reka@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi All,

I'm sorry I could not follow the entire discussion.
Can someone explain the latest status please? Have we resolved the initial group scaling issue and now seeing an application removal problem?

Thanks

On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Lasindu, Reka,


Just run into the issue with removing the application again: (with the fix for the issue included)

Please see [1a., 1b.] for the application structure (group scaling defined at only one group level). See also the respective artifacts and log file attached.

Please advise if we should reopen the JIRA

Thanks

Martin


Application [1a.]

[cid:image002.png@01D0ADB8.72D20AA0]

[1b.] application after “starting application remove”

[cid:image003.png@01D0ADB8.72D20AA0]









--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>


Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

As we have merged all the thread pools according to the discussion in [1],
the default pool size is taken as 100 for all kind of monitors(application,
group and cluster). If you need to increase this, then please add below
parameter to the stratos.sh

-Dmonitor.thread.pool.size=xxxx

Also, we have fixed an application instance termination and stratos restart
issue with group scaling as well. So that now you can restart stratos even
when the group scaling happens. The lastes commit is:

bb6e102986ad8e54556d9f6de47cc6eaa077e775

Do let us know how your testing goes with these fixes.
[1] Merging all the threading pools used in autoscaler to one thread pool
Thanks,
Reka

On Tue, Jun 23, 2015 at 10:49 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Thanks Reka
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Monday, June 22, 2015 9:59 PM
>
> *To:* Martin Eppel (meppel)
> *Cc:* dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan Du
> Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
> These are actually configurable parameters. In the stratos code, these
> thread pool sizes are set to 20 by default. If we need to change it, then
> we can pass those as system properties in our stratos.sh. Since default
> values are taken by stratos code, we don't need to provide this in the
> standalone pack. When there is a complex application with more groups and
> clusters, there will be more use of threads. In that case, the default pool
> size of 20 might get exhausted. So, it would be better to have this
> properties customized according to the application structure. I faced some
> issues like events listeners didn't get triggered properly due to thread
> pool got exhausted with threads when i used the application sample that you
> have attached to this thread. After i increase the thread pool size to 50,
> i didn't get any issues.
>
> I'm in the process of analyzing the thread usage in order to decide on the
> recommended pool size along with application structure. So that anyone can
> calculate the correct pool size that they require according to the
> application and configure this parameter.
>
> Hope this will help you to understand on those parameters.
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Mon, Jun 22, 2015 at 11:50 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> I am not clear on the 2 properties you mention below, are they supposed to
> be set in the stratos.sh ? I just picked up the latest code and from the
> apache stratos repo and don’t see them ?
>
>
>
> Btw,  *read.write.lock.monitor.enabled=false * is disabled in our
> production code (I assume it is set to false by default if not specified) ,
> I only enable it to provide additional information
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Monday, June 22, 2015 7:30 AM
> *To:* Martin Eppel (meppel)
> *Cc:* dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan Du
> Plessis (rdupless)
>
>
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
> I have verified the fix by enabling read.write.lock.monitor.enabled=true.
> The fix worked fine with it. Since we are using concurrency and delegated
> some flow to Threads, i had to provide the thread values to below values in
> the stratos.sh.
>
>     -Dapplication.monitor.thread.pool.size=50 \
>     -Dgroup.monitor.thread.pool.size=50 \
>
> Please note that *it is recommended to have
> read.write.lock.monitor.enabled=false as it will consume more footprint in
> the production*. This property introduce only for the testing purpose.
>
>
>
> We are in the process of analyzing the thread size and will come up with a
> recommended values for it.
>
> Also, i have fixed a small issue in the REST endpoint as it returns some
> default value whenever application run time is not found. Now that if
> runtime is not found, the below message will get populated.
>
> {"status":"error","message":"Application runtime not found"}
>
> I have also verified the undeployment with group scaling. Didn't find any
> issues with the above fixes.
>
> Please find the latest commit as below:
>
> 0a969200d11228158606f011ca7e5e795f336d92.
>
> Please note that below error was only observed which is harmless for now.
> I have verified it with a workaround and working fine. But will check on
> the severity and decide on a proper fix or will go with the workaround.
>
> [1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR
> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
> error, lock has not released for 30 seconds: [lock-name] topology
> [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2
> [stack-trace]
> java.lang.Thread.getStackTrace(Thread.java:1589)
>
> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
>
> org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
>
> org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
>
> org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
>
> org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
>
> org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
>
> org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
>
> org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
>
> org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
>
> Thanks,
>
> Reka
>
>
>
>
>
>
>
> On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> Found the reason why we didn't encounter these locking issue as we were
> testing with default stratos pack which has
> read.write.lock.monitor.enabled=false. The locking warning or issue is
> raised only when you use read.write.lock.monitor.enabled=true. That's why
> you were only facing these locking issue as you use this configuration in
> your setup.
>
> Since I'm able to reproduce the issue, i will test with the fix that i
> already pushed and update the thread.
>
> We will discuss and try to make this read.write.lock.monitor.enabled=true
> by default with stratos. So that we can find issues as early and fix them.
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Sorry Martin..I have only locally fixed the issue. I have pushed it now
> only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you
> get chance? I will also continue testing with this fix.
>
> Thanks,
>
> Reka
>
>
>
> On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Btw,
>
>
>
> This is my last commit I picked up from the stratos master:
>
>
>
> commit 58bea52be814269416f70391fef50859aa5ae0a1
>
> Author: lasinducharith <la...@gmail.com>
>
> Date:   Fri Jun 19 19:40:27 2015 +0530
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Sunday, June 21, 2015 10:28 AM
> *To:* dev@stratos.apache.org; Reka Thirunavukkarasu
> *Cc:* Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Reka,
>
>
>
> Here is *anothe*r example which fails, see application at [1.], attached
> log files and jsons.  I run a few scenarios, the one which is failing is
> application with the name “s-g-c1-c2-c3” (last scenario). All members get
> removed but application remains deployed,
>
> “s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances
> 0, members 0 ()”
>
>
>
>
>
> Thanks
>
>
>
>
>
> Martin
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
> *Sent:* Sunday, June 21, 2015 1:32 AM
> *To:* Reka Thirunavukkarasu
> *Cc:* dev; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Great! Thanks Reka!
>
>
>
> On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin/Imesh,
>
> Sure..I will have a look on the logs. I will also go through the recent
> commits and try to revert the fix which added for nested group scaling as
> it is not needed for this release.  As Martin mentioned that after the
> fixes, there are more issues. Otherwise, we will have to go through another
> major effort in testing it.
>
> I will update the progress of it...
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Martin,
>
>
>
> Thanks for the quick response. Yes we will definitely go through the logs
> and investigate this.
>
>
>
> Thanks
>
>
>
> On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Isuru,
>
>
>
> No, the issue does not seem to be resolved. With the latest code I see
> issues in test cases which used to work before  (beyond the latest example
> I posted the log files for - see below), not sure yet what is going on.  I
> will be investigating further (making sure I am not mistaken) and following
> up with some examples after the weekend but if you guys can take a look at
> the log files on Monday I provided with the previous email that would be
> great,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Saturday, June 20, 2015 7:29 PM
> *To:* dev
> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
> reka@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi All,
>
>
>
> I'm sorry I could not follow the entire discussion.
>
> Can someone explain the latest status please? Have we resolved the initial
> group scaling issue and now seeing an application removal problem?
>
>
>
> Thanks
>
>
>
> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Lasindu, Reka,
>
>
>
>
>
> Just run into the issue with removing the application *again*: (with the
> fix for the issue included)
>
>
>
> Please see [1a., 1b.] for the application structure (group scaling defined
> at only one group level). See also the respective artifacts and log file
> attached.
>
>
>
> Please advise if we should reopen the JIRA
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> Application [1a.]
>
>
>
>
>
> [1b.] application after “starting application remove”
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>   --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Thanks Reka

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Monday, June 22, 2015 9:59 PM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
These are actually configurable parameters. In the stratos code, these thread pool sizes are set to 20 by default. If we need to change it, then we can pass those as system properties in our stratos.sh. Since default values are taken by stratos code, we don't need to provide this in the standalone pack. When there is a complex application with more groups and clusters, there will be more use of threads. In that case, the default pool size of 20 might get exhausted. So, it would be better to have this properties customized according to the application structure. I faced some issues like events listeners didn't get triggered properly due to thread pool got exhausted with threads when i used the application sample that you have attached to this thread. After i increase the thread pool size to 50, i didn't get any issues.
I'm in the process of analyzing the thread usage in order to decide on the recommended pool size along with application structure. So that anyone can calculate the correct pool size that they require according to the application and configure this parameter.
Hope this will help you to understand on those parameters.

Thanks,
Reka

On Mon, Jun 22, 2015 at 11:50 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I am not clear on the 2 properties you mention below, are they supposed to be set in the stratos.sh ? I just picked up the latest code and from the apache stratos repo and don’t see them ?

Btw,  read.write.lock.monitor.enabled=false  is disabled in our production code (I assume it is set to false by default if not specified) , I only enable it to provide additional information

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Monday, June 22, 2015 7:30 AM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org<ma...@stratos.apache.org>; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)

Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
I have verified the fix by enabling read.write.lock.monitor.enabled=true. The fix worked fine with it. Since we are using concurrency and delegated some flow to Threads, i had to provide the thread values to below values in the stratos.sh.

    -Dapplication.monitor.thread.pool.size=50 \
    -Dgroup.monitor.thread.pool.size=50 \
Please note that it is recommended to have read.write.lock.monitor.enabled=false as it will consume more footprint in the production. This property introduce only for the testing purpose.

We are in the process of analyzing the thread size and will come up with a recommended values for it.
Also, i have fixed a small issue in the REST endpoint as it returns some default value whenever application run time is not found. Now that if runtime is not found, the below message will get populated.

{"status":"error","message":"Application runtime not found"}
I have also verified the undeployment with group scaling. Didn't find any issues with the above fixes.
Please find the latest commit as below:

0a969200d11228158606f011ca7e5e795f336d92.
Please note that below error was only observed which is harmless for now. I have verified it with a workaround and working fine. But will check on the severity and decide on a proper fix or will go with the workaround.

[1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] topology [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)
org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Thanks,
Reka



On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
Found the reason why we didn't encounter these locking issue as we were testing with default stratos pack which has read.write.lock.monitor.enabled=false. The locking warning or issue is raised only when you use read.write.lock.monitor.enabled=true. That's why you were only facing these locking issue as you use this configuration in your setup.
Since I'm able to reproduce the issue, i will test with the fix that i already pushed and update the thread.
We will discuss and try to make this read.write.lock.monitor.enabled=true by default with stratos. So that we can find issues as early and fix them.

Thanks,
Reka

On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Sorry Martin..I have only locally fixed the issue. I have pushed it now only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you get chance? I will also continue testing with this fix.
Thanks,
Reka

On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Btw,

This is my last commit I picked up from the stratos master:

commit 58bea52be814269416f70391fef50859aa5ae0a1
Author: lasinducharith <la...@gmail.com>>
Date:   Fri Jun 19 19:40:27 2015 +0530

From: Martin Eppel (meppel)
Sent: Sunday, June 21, 2015 10:28 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>; Reka Thirunavukkarasu
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Reka,

Here is another example which fails, see application at [1.], attached log files and jsons.  I run a few scenarios, the one which is failing is application with the name “s-g-c1-c2-c3” (last scenario). All members get removed but application remains deployed,

“s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances 0, members 0 ()”


Thanks


Martin




[cid:image001.png@01D0AD39.674B17A0]




From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Sunday, June 21, 2015 1:32 AM
To: Reka Thirunavukkarasu
Cc: dev; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Great! Thanks Reka!

On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Sure..I will have a look on the logs. I will also go through the recent commits and try to revert the fix which added for nested group scaling as it is not needed for this release.  As Martin mentioned that after the fixes, there are more issues. Otherwise, we will have to go through another major effort in testing it.
I will update the progress of it...

Thanks,
Reka

On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Martin,

Thanks for the quick response. Yes we will definitely go through the logs and investigate this.

Thanks

On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Isuru,

No, the issue does not seem to be resolved. With the latest code I see issues in test cases which used to work before  (beyond the latest example I posted the log files for - see below), not sure yet what is going on.  I will be investigating further (making sure I am not mistaken) and following up with some examples after the weekend but if you guys can take a look at the log files on Monday I provided with the previous email that would be great,

Thanks

Martin

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Saturday, June 20, 2015 7:29 PM
To: dev
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Reka Thirunavukkarasu (reka@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi All,

I'm sorry I could not follow the entire discussion.
Can someone explain the latest status please? Have we resolved the initial group scaling issue and now seeing an application removal problem?

Thanks

On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Lasindu, Reka,


Just run into the issue with removing the application again: (with the fix for the issue included)

Please see [1a., 1b.] for the application structure (group scaling defined at only one group level). See also the respective artifacts and log file attached.

Please advise if we should reopen the JIRA

Thanks

Martin


Application [1a.]

[cid:image002.png@01D0AD39.674B17A0]

[1b.] application after “starting application remove”

[cid:image003.png@01D0AD39.674B17A0]









--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007


Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

These are actually configurable parameters. In the stratos code, these
thread pool sizes are set to 20 by default. If we need to change it, then
we can pass those as system properties in our stratos.sh. Since default
values are taken by stratos code, we don't need to provide this in the
standalone pack. When there is a complex application with more groups and
clusters, there will be more use of threads. In that case, the default pool
size of 20 might get exhausted. So, it would be better to have this
properties customized according to the application structure. I faced some
issues like events listeners didn't get triggered properly due to thread
pool got exhausted with threads when i used the application sample that you
have attached to this thread. After i increase the thread pool size to 50,
i didn't get any issues.

I'm in the process of analyzing the thread usage in order to decide on the
recommended pool size along with application structure. So that anyone can
calculate the correct pool size that they require according to the
application and configure this parameter.

Hope this will help you to understand on those parameters.

Thanks,
Reka

On Mon, Jun 22, 2015 at 11:50 PM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Reka,
>
>
>
> I am not clear on the 2 properties you mention below, are they supposed to
> be set in the stratos.sh ? I just picked up the latest code and from the
> apache stratos repo and don’t see them ?
>
>
>
> Btw,  *read.write.lock.monitor.enabled=false * is disabled in our
> production code (I assume it is set to false by default if not specified) ,
> I only enable it to provide additional information
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Monday, June 22, 2015 7:30 AM
> *To:* Martin Eppel (meppel)
> *Cc:* dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan Du
> Plessis (rdupless)
>
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
> I have verified the fix by enabling read.write.lock.monitor.enabled=true.
> The fix worked fine with it. Since we are using concurrency and delegated
> some flow to Threads, i had to provide the thread values to below values in
> the stratos.sh.
>
>     -Dapplication.monitor.thread.pool.size=50 \
>     -Dgroup.monitor.thread.pool.size=50 \
>
> Please note that *it is recommended to have
> read.write.lock.monitor.enabled=false as it will consume more footprint in
> the production*. This property introduce only for the testing purpose.
>
>
>
> We are in the process of analyzing the thread size and will come up with a
> recommended values for it.
>
> Also, i have fixed a small issue in the REST endpoint as it returns some
> default value whenever application run time is not found. Now that if
> runtime is not found, the below message will get populated.
>
> {"status":"error","message":"Application runtime not found"}
>
> I have also verified the undeployment with group scaling. Didn't find any
> issues with the above fixes.
>
> Please find the latest commit as below:
>
> 0a969200d11228158606f011ca7e5e795f336d92.
>
> Please note that below error was only observed which is harmless for now.
> I have verified it with a workaround and working fine. But will check on
> the severity and decide on a proper fix or will go with the workaround.
>
> [1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR
> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
> error, lock has not released for 30 seconds: [lock-name] topology
> [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2
> [stack-trace]
> java.lang.Thread.getStackTrace(Thread.java:1589)
>
> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
>
> org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
>
> org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
>
> org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
>
> org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
>
> org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
>
> org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
>
> org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
>
> org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
>
> Thanks,
>
> Reka
>
>
>
>
>
>
>
> On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> Found the reason why we didn't encounter these locking issue as we were
> testing with default stratos pack which has
> read.write.lock.monitor.enabled=false. The locking warning or issue is
> raised only when you use read.write.lock.monitor.enabled=true. That's why
> you were only facing these locking issue as you use this configuration in
> your setup.
>
> Since I'm able to reproduce the issue, i will test with the fix that i
> already pushed and update the thread.
>
> We will discuss and try to make this read.write.lock.monitor.enabled=true
> by default with stratos. So that we can find issues as early and fix them.
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Sorry Martin..I have only locally fixed the issue. I have pushed it now
> only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you
> get chance? I will also continue testing with this fix.
>
> Thanks,
>
> Reka
>
>
>
> On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Btw,
>
>
>
> This is my last commit I picked up from the stratos master:
>
>
>
> commit 58bea52be814269416f70391fef50859aa5ae0a1
>
> Author: lasinducharith <la...@gmail.com>
>
> Date:   Fri Jun 19 19:40:27 2015 +0530
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Sunday, June 21, 2015 10:28 AM
> *To:* dev@stratos.apache.org; Reka Thirunavukkarasu
> *Cc:* Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Reka,
>
>
>
> Here is *anothe*r example which fails, see application at [1.], attached
> log files and jsons.  I run a few scenarios, the one which is failing is
> application with the name “s-g-c1-c2-c3” (last scenario). All members get
> removed but application remains deployed,
>
> “s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances
> 0, members 0 ()”
>
>
>
>
>
> Thanks
>
>
>
>
>
> Martin
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
> *Sent:* Sunday, June 21, 2015 1:32 AM
> *To:* Reka Thirunavukkarasu
> *Cc:* dev; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Great! Thanks Reka!
>
>
>
> On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin/Imesh,
>
> Sure..I will have a look on the logs. I will also go through the recent
> commits and try to revert the fix which added for nested group scaling as
> it is not needed for this release.  As Martin mentioned that after the
> fixes, there are more issues. Otherwise, we will have to go through another
> major effort in testing it.
>
> I will update the progress of it...
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Martin,
>
>
>
> Thanks for the quick response. Yes we will definitely go through the logs
> and investigate this.
>
>
>
> Thanks
>
>
>
> On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Isuru,
>
>
>
> No, the issue does not seem to be resolved. With the latest code I see
> issues in test cases which used to work before  (beyond the latest example
> I posted the log files for - see below), not sure yet what is going on.  I
> will be investigating further (making sure I am not mistaken) and following
> up with some examples after the weekend but if you guys can take a look at
> the log files on Monday I provided with the previous email that would be
> great,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Saturday, June 20, 2015 7:29 PM
> *To:* dev
> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
> reka@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi All,
>
>
>
> I'm sorry I could not follow the entire discussion.
>
> Can someone explain the latest status please? Have we resolved the initial
> group scaling issue and now seeing an application removal problem?
>
>
>
> Thanks
>
>
>
> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Lasindu, Reka,
>
>
>
>
>
> Just run into the issue with removing the application *again*: (with the
> fix for the issue included)
>
>
>
> Please see [1a., 1b.] for the application structure (group scaling defined
> at only one group level). See also the respective artifacts and log file
> attached.
>
>
>
> Please advise if we should reopen the JIRA
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> Application [1a.]
>
>
>
>
>
> [1b.] application after “starting application remove”
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>   --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Reka,

I am not clear on the 2 properties you mention below, are they supposed to be set in the stratos.sh ? I just picked up the latest code and from the apache stratos repo and don’t see them ?

Btw,  read.write.lock.monitor.enabled=false  is disabled in our production code (I assume it is set to false by default if not specified) , I only enable it to provide additional information

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Monday, June 22, 2015 7:30 AM
To: Martin Eppel (meppel)
Cc: dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,
I have verified the fix by enabling read.write.lock.monitor.enabled=true. The fix worked fine with it. Since we are using concurrency and delegated some flow to Threads, i had to provide the thread values to below values in the stratos.sh.

    -Dapplication.monitor.thread.pool.size=50 \
    -Dgroup.monitor.thread.pool.size=50 \
Please note that it is recommended to have read.write.lock.monitor.enabled=false as it will consume more footprint in the production. This property introduce only for the testing purpose.

We are in the process of analyzing the thread size and will come up with a recommended values for it.
Also, i have fixed a small issue in the REST endpoint as it returns some default value whenever application run time is not found. Now that if runtime is not found, the below message will get populated.

{"status":"error","message":"Application runtime not found"}
I have also verified the undeployment with group scaling. Didn't find any issues with the above fixes.
Please find the latest commit as below:

0a969200d11228158606f011ca7e5e795f336d92.
Please note that below error was only observed which is harmless for now. I have verified it with a workaround and working fine. But will check on the severity and decide on a proper fix or will go with the workaround.

[1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] topology [lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)
org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)
Thanks,
Reka



On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
Found the reason why we didn't encounter these locking issue as we were testing with default stratos pack which has read.write.lock.monitor.enabled=false. The locking warning or issue is raised only when you use read.write.lock.monitor.enabled=true. That's why you were only facing these locking issue as you use this configuration in your setup.
Since I'm able to reproduce the issue, i will test with the fix that i already pushed and update the thread.
We will discuss and try to make this read.write.lock.monitor.enabled=true by default with stratos. So that we can find issues as early and fix them.

Thanks,
Reka

On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Sorry Martin..I have only locally fixed the issue. I have pushed it now only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you get chance? I will also continue testing with this fix.
Thanks,
Reka

On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Btw,

This is my last commit I picked up from the stratos master:

commit 58bea52be814269416f70391fef50859aa5ae0a1
Author: lasinducharith <la...@gmail.com>>
Date:   Fri Jun 19 19:40:27 2015 +0530

From: Martin Eppel (meppel)
Sent: Sunday, June 21, 2015 10:28 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>; Reka Thirunavukkarasu
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Reka,

Here is another example which fails, see application at [1.], attached log files and jsons.  I run a few scenarios, the one which is failing is application with the name “s-g-c1-c2-c3” (last scenario). All members get removed but application remains deployed,

“s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances 0, members 0 ()”


Thanks


Martin




[cid:image001.png@01D0ACDD.5F28BE80]




From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Sunday, June 21, 2015 1:32 AM
To: Reka Thirunavukkarasu
Cc: dev; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Great! Thanks Reka!

On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Sure..I will have a look on the logs. I will also go through the recent commits and try to revert the fix which added for nested group scaling as it is not needed for this release.  As Martin mentioned that after the fixes, there are more issues. Otherwise, we will have to go through another major effort in testing it.
I will update the progress of it...

Thanks,
Reka

On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Martin,

Thanks for the quick response. Yes we will definitely go through the logs and investigate this.

Thanks

On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Isuru,

No, the issue does not seem to be resolved. With the latest code I see issues in test cases which used to work before  (beyond the latest example I posted the log files for - see below), not sure yet what is going on.  I will be investigating further (making sure I am not mistaken) and following up with some examples after the weekend but if you guys can take a look at the log files on Monday I provided with the previous email that would be great,

Thanks

Martin

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Saturday, June 20, 2015 7:29 PM
To: dev
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Reka Thirunavukkarasu (reka@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi All,

I'm sorry I could not follow the entire discussion.
Can someone explain the latest status please? Have we resolved the initial group scaling issue and now seeing an application removal problem?

Thanks

On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Lasindu, Reka,


Just run into the issue with removing the application again: (with the fix for the issue included)

Please see [1a., 1b.] for the application structure (group scaling defined at only one group level). See also the respective artifacts and log file attached.

Please advise if we should reopen the JIRA

Thanks

Martin


Application [1a.]

[cid:image002.png@01D0ACDD.5F28BE80]

[1b.] application after “starting application remove”

[cid:image003.png@01D0ACDD.5F28BE80]









--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007


Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

I have verified the fix by enabling read.write.lock.monitor.enabled=true.
The fix worked fine with it. Since we are using concurrency and delegated
some flow to Threads, i had to provide the thread values to below values in
the stratos.sh.

    -Dapplication.monitor.thread.pool.size=50 \
    -Dgroup.monitor.thread.pool.size=50 \

Please note that *it is recommended to have
read.write.lock.monitor.**enabled=false
as it will consume more footprint in the production*. This property
introduce only for the testing purpose.

We are in the process of analyzing the thread size and will come up with a
recommended values for it.

Also, i have fixed a small issue in the REST endpoint as it returns some
default value whenever application run time is not found. Now that if
runtime is not found, the below message will get populated.

{"status":"error","message":"Application runtime not found"}

I have also verified the undeployment with group scaling. Didn't find any
issues with the above fixes.

Please find the latest commit as below:

0a969200d11228158606f011ca7e5e795f336d92.

Please note that below error was only observed which is harmless for now. I
have verified it with a workaround and working fine. But will check on the
severity and decide on a proper fix or will go with the workaround.

[1]. TID: [0] [STRATOS] [2015-06-22 14:22:01,872] ERROR
{org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
error, lock has not released for 30 seconds: [lock-name] topology
[lock-type] Write [thread-id] 117 [thread-name] pool-24-thread-2
[stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)
org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:123)
org.apache.stratos.messaging.message.processor.topology.updater.TopologyUpdater.acquireWriteLockForService(TopologyUpdater.java:123)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.doProcess(ApplicationClustersCreatedMessageProcessor.java:78)
org.apache.stratos.messaging.message.processor.topology.ApplicationClustersCreatedMessageProcessor.process(ApplicationClustersCreatedMessageProcessor.java:59)
org.apache.stratos.messaging.message.processor.topology.ServiceRemovedMessageProcessor.process(ServiceRemovedMessageProcessor.java:64)
org.apache.stratos.messaging.message.processor.topology.ServiceCreatedMessageProcessor.process(ServiceCreatedMessageProcessor.java:65)
org.apache.stratos.messaging.message.processor.topology.CompleteTopologyMessageProcessor.process(CompleteTopologyMessageProcessor.java:74)
org.apache.stratos.messaging.message.processor.MessageProcessorChain.process(MessageProcessorChain.java:61)
org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator.run(TopologyEventMessageDelegator.java:73)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:745)

Thanks,
Reka



On Mon, Jun 22, 2015 at 12:24 PM, Reka Thirunavukkarasu <re...@wso2.com>
wrote:

> Hi Martin,
>
> Found the reason why we didn't encounter these locking issue as we were
> testing with default stratos pack which has
> read.write.lock.monitor.enabled=false. The locking warning or issue is
> raised only when you use read.write.lock.monitor.enabled=true. That's why
> you were only facing these locking issue as you use this configuration in
> your setup.
>
> Since I'm able to reproduce the issue, i will test with the fix that i
> already pushed and update the thread.
>
> We will discuss and try to make this read.write.lock.monitor.enabled=true
> by default with stratos. So that we can find issues as early and fix them.
>
> Thanks,
> Reka
>
> On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
>> Sorry Martin..I have only locally fixed the issue. I have pushed it now
>> only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you
>> get chance? I will also continue testing with this fix.
>>
>> Thanks,
>> Reka
>>
>> On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <meppel@cisco.com
>> > wrote:
>>
>>>  Btw,
>>>
>>>
>>>
>>> This is my last commit I picked up from the stratos master:
>>>
>>>
>>>
>>> commit 58bea52be814269416f70391fef50859aa5ae0a1
>>>
>>> Author: lasinducharith <la...@gmail.com>
>>>
>>> Date:   Fri Jun 19 19:40:27 2015 +0530
>>>
>>>
>>>
>>> *From:* Martin Eppel (meppel)
>>> *Sent:* Sunday, June 21, 2015 10:28 AM
>>> *To:* dev@stratos.apache.org; Reka Thirunavukkarasu
>>> *Cc:* Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
>>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>>> application fails to undeploy (nested grouping, group scaling)
>>>
>>>
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> Here is *anothe*r example which fails, see application at [1.],
>>> attached log files and jsons.  I run a few scenarios, the one which is
>>> failing is application with the name “s-g-c1-c2-c3” (last scenario). All
>>> members get removed but application remains deployed,
>>>
>>> “s-g-c1-c2-c3: applicationInstances 0, groupInstances 0,
>>> clusterInstances 0, members 0 ()”
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
>>> *Sent:* Sunday, June 21, 2015 1:32 AM
>>> *To:* Reka Thirunavukkarasu
>>> *Cc:* dev; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis
>>> (rdupless)
>>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>>> application fails to undeploy (nested grouping, group scaling)
>>>
>>>
>>>
>>> Great! Thanks Reka!
>>>
>>>
>>>
>>> On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>
>>> wrote:
>>>
>>> Hi Martin/Imesh,
>>>
>>> Sure..I will have a look on the logs. I will also go through the recent
>>> commits and try to revert the fix which added for nested group scaling as
>>> it is not needed for this release.  As Martin mentioned that after the
>>> fixes, there are more issues. Otherwise, we will have to go through another
>>> major effort in testing it.
>>>
>>> I will update the progress of it...
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>
>>> wrote:
>>>
>>> Hi Martin,
>>>
>>>
>>>
>>> Thanks for the quick response. Yes we will definitely go through the
>>> logs and investigate this.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
>>> wrote:
>>>
>>> Hi Isuru,
>>>
>>>
>>>
>>> No, the issue does not seem to be resolved. With the latest code I see
>>> issues in test cases which used to work before  (beyond the latest example
>>> I posted the log files for - see below), not sure yet what is going on.  I
>>> will be investigating further (making sure I am not mistaken) and following
>>> up with some examples after the weekend but if you guys can take a look at
>>> the log files on Monday I provided with the previous email that would be
>>> great,
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>>> *Sent:* Saturday, June 20, 2015 7:29 PM
>>> *To:* dev
>>> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
>>> reka@wso2.com); Ryan Du Plessis (rdupless)
>>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>>> application fails to undeploy (nested grouping, group scaling)
>>>
>>>
>>>
>>> Hi All,
>>>
>>>
>>>
>>> I'm sorry I could not follow the entire discussion.
>>>
>>> Can someone explain the latest status please? Have we resolved the
>>> initial group scaling issue and now seeing an application removal problem?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
>>> wrote:
>>>
>>> Hi Lasindu, Reka,
>>>
>>>
>>>
>>>
>>>
>>> Just run into the issue with removing the application *again*: (with
>>> the fix for the issue included)
>>>
>>>
>>>
>>> Please see [1a., 1b.] for the application structure (group scaling
>>> defined at only one group level). See also the respective artifacts and log
>>> file attached.
>>>
>>>
>>>
>>> Please advise if we should reopen the JIRA
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>>
>>>
>>> Application [1a.]
>>>
>>>
>>>
>>>
>>>
>>> [1b.] application after “starting application remove”
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>>
>>>
>>>   --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>
>>
>>
>> --
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>> Mobile: +94776442007
>>
>>
>>
>
>
> --
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
> Mobile: +94776442007
>
>
>


-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

Found the reason why we didn't encounter these locking issue as we were
testing with default stratos pack which has
read.write.lock.monitor.enabled=false. The locking warning or issue is
raised only when you use read.write.lock.monitor.enabled=true. That's why
you were only facing these locking issue as you use this configuration in
your setup.

Since I'm able to reproduce the issue, i will test with the fix that i
already pushed and update the thread.

We will discuss and try to make this read.write.lock.monitor.enabled=true
by default with stratos. So that we can find issues as early and fix them.

Thanks,
Reka

On Mon, Jun 22, 2015 at 12:16 AM, Reka Thirunavukkarasu <re...@wso2.com>
wrote:

> Sorry Martin..I have only locally fixed the issue. I have pushed it now
> only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you
> get chance? I will also continue testing with this fix.
>
> Thanks,
> Reka
>
> On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>>  Btw,
>>
>>
>>
>> This is my last commit I picked up from the stratos master:
>>
>>
>>
>> commit 58bea52be814269416f70391fef50859aa5ae0a1
>>
>> Author: lasinducharith <la...@gmail.com>
>>
>> Date:   Fri Jun 19 19:40:27 2015 +0530
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Sunday, June 21, 2015 10:28 AM
>> *To:* dev@stratos.apache.org; Reka Thirunavukkarasu
>> *Cc:* Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Reka,
>>
>>
>>
>> Here is *anothe*r example which fails, see application at [1.], attached
>> log files and jsons.  I run a few scenarios, the one which is failing is
>> application with the name “s-g-c1-c2-c3” (last scenario). All members get
>> removed but application remains deployed,
>>
>> “s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances
>> 0, members 0 ()”
>>
>>
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> Martin
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
>> *Sent:* Sunday, June 21, 2015 1:32 AM
>> *To:* Reka Thirunavukkarasu
>> *Cc:* dev; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Great! Thanks Reka!
>>
>>
>>
>> On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Hi Martin/Imesh,
>>
>> Sure..I will have a look on the logs. I will also go through the recent
>> commits and try to revert the fix which added for nested group scaling as
>> it is not needed for this release.  As Martin mentioned that after the
>> fixes, there are more issues. Otherwise, we will have to go through another
>> major effort in testing it.
>>
>> I will update the progress of it...
>>
>>
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>
>> wrote:
>>
>> Hi Martin,
>>
>>
>>
>> Thanks for the quick response. Yes we will definitely go through the logs
>> and investigate this.
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Isuru,
>>
>>
>>
>> No, the issue does not seem to be resolved. With the latest code I see
>> issues in test cases which used to work before  (beyond the latest example
>> I posted the log files for - see below), not sure yet what is going on.  I
>> will be investigating further (making sure I am not mistaken) and following
>> up with some examples after the weekend but if you guys can take a look at
>> the log files on Monday I provided with the previous email that would be
>> great,
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* Saturday, June 20, 2015 7:29 PM
>> *To:* dev
>> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
>> reka@wso2.com); Ryan Du Plessis (rdupless)
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi All,
>>
>>
>>
>> I'm sorry I could not follow the entire discussion.
>>
>> Can someone explain the latest status please? Have we resolved the
>> initial group scaling issue and now seeing an application removal problem?
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Lasindu, Reka,
>>
>>
>>
>>
>>
>> Just run into the issue with removing the application *again*: (with the
>> fix for the issue included)
>>
>>
>>
>> Please see [1a., 1b.] for the application structure (group scaling
>> defined at only one group level). See also the respective artifacts and log
>> file attached.
>>
>>
>>
>> Please advise if we should reopen the JIRA
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>>
>>
>> Application [1a.]
>>
>>
>>
>>
>>
>> [1b.] application after “starting application remove”
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>   --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
> Mobile: +94776442007
>
>
>


-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Sorry Martin..I have only locally fixed the issue. I have pushed it now
only. Can you test with 1c21daaeea7b27ad0a0141a70b32e9443e78e309 when you
get chance? I will also continue testing with this fix.

Thanks,
Reka

On Mon, Jun 22, 2015 at 12:07 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Btw,
>
>
>
> This is my last commit I picked up from the stratos master:
>
>
>
> commit 58bea52be814269416f70391fef50859aa5ae0a1
>
> Author: lasinducharith <la...@gmail.com>
>
> Date:   Fri Jun 19 19:40:27 2015 +0530
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Sunday, June 21, 2015 10:28 AM
> *To:* dev@stratos.apache.org; Reka Thirunavukkarasu
> *Cc:* Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Reka,
>
>
>
> Here is *anothe*r example which fails, see application at [1.], attached
> log files and jsons.  I run a few scenarios, the one which is failing is
> application with the name “s-g-c1-c2-c3” (last scenario). All members get
> removed but application remains deployed,
>
> “s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances
> 0, members 0 ()”
>
>
>
>
>
> Thanks
>
>
>
>
>
> Martin
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
> *Sent:* Sunday, June 21, 2015 1:32 AM
> *To:* Reka Thirunavukkarasu
> *Cc:* dev; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Great! Thanks Reka!
>
>
>
> On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin/Imesh,
>
> Sure..I will have a look on the logs. I will also go through the recent
> commits and try to revert the fix which added for nested group scaling as
> it is not needed for this release.  As Martin mentioned that after the
> fixes, there are more issues. Otherwise, we will have to go through another
> major effort in testing it.
>
> I will update the progress of it...
>
>
>
> Thanks,
>
> Reka
>
>
>
> On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Martin,
>
>
>
> Thanks for the quick response. Yes we will definitely go through the logs
> and investigate this.
>
>
>
> Thanks
>
>
>
> On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Isuru,
>
>
>
> No, the issue does not seem to be resolved. With the latest code I see
> issues in test cases which used to work before  (beyond the latest example
> I posted the log files for - see below), not sure yet what is going on.  I
> will be investigating further (making sure I am not mistaken) and following
> up with some examples after the weekend but if you guys can take a look at
> the log files on Monday I provided with the previous email that would be
> great,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Saturday, June 20, 2015 7:29 PM
> *To:* dev
> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
> reka@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi All,
>
>
>
> I'm sorry I could not follow the entire discussion.
>
> Can someone explain the latest status please? Have we resolved the initial
> group scaling issue and now seeing an application removal problem?
>
>
>
> Thanks
>
>
>
> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Lasindu, Reka,
>
>
>
>
>
> Just run into the issue with removing the application *again*: (with the
> fix for the issue included)
>
>
>
> Please see [1a., 1b.] for the application structure (group scaling defined
> at only one group level). See also the respective artifacts and log file
> attached.
>
>
>
> Please advise if we should reopen the JIRA
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> Application [1a.]
>
>
>
>
>
> [1b.] application after “starting application remove”
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>   --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Btw,

This is my last commit I picked up from the stratos master:

commit 58bea52be814269416f70391fef50859aa5ae0a1
Author: lasinducharith <la...@gmail.com>
Date:   Fri Jun 19 19:40:27 2015 +0530

From: Martin Eppel (meppel)
Sent: Sunday, June 21, 2015 10:28 AM
To: dev@stratos.apache.org; Reka Thirunavukkarasu
Cc: Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Reka,

Here is another example which fails, see application at [1.], attached log files and jsons.  I run a few scenarios, the one which is failing is application with the name “s-g-c1-c2-c3” (last scenario). All members get removed but application remains deployed,

“s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances 0, members 0 ()”


Thanks


Martin




[cid:image001.png@01D0AC16.A6B7B660]




From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Sunday, June 21, 2015 1:32 AM
To: Reka Thirunavukkarasu
Cc: dev; Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Great! Thanks Reka!

On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Sure..I will have a look on the logs. I will also go through the recent commits and try to revert the fix which added for nested group scaling as it is not needed for this release.  As Martin mentioned that after the fixes, there are more issues. Otherwise, we will have to go through another major effort in testing it.
I will update the progress of it...

Thanks,
Reka

On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Martin,

Thanks for the quick response. Yes we will definitely go through the logs and investigate this.

Thanks

On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Isuru,

No, the issue does not seem to be resolved. With the latest code I see issues in test cases which used to work before  (beyond the latest example I posted the log files for - see below), not sure yet what is going on.  I will be investigating further (making sure I am not mistaken) and following up with some examples after the weekend but if you guys can take a look at the log files on Monday I provided with the previous email that would be great,

Thanks

Martin

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Saturday, June 20, 2015 7:29 PM
To: dev
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Reka Thirunavukkarasu (reka@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi All,

I'm sorry I could not follow the entire discussion.
Can someone explain the latest status please? Have we resolved the initial group scaling issue and now seeing an application removal problem?

Thanks

On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Lasindu, Reka,


Just run into the issue with removing the application again: (with the fix for the issue included)

Please see [1a., 1b.] for the application structure (group scaling defined at only one group level). See also the respective artifacts and log file attached.

Please advise if we should reopen the JIRA

Thanks

Martin


Application [1a.]

[cid:image002.png@01D0AC16.A6B7B660]

[1b.] application after “starting application remove”

[cid:image003.png@01D0AC16.A6B7B660]









--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Reka,

Here is another example which fails, see application at [1.], attached log files and jsons.  I run a few scenarios, the one which is failing is application with the name “s-g-c1-c2-c3” (last scenario). All members get removed but application remains deployed,

“s-g-c1-c2-c3: applicationInstances 0, groupInstances 0, clusterInstances 0, members 0 ()”


Thanks


Martin




[cid:image001.png@01D0AC0C.59B7B0E0]




From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Sunday, June 21, 2015 1:32 AM
To: Reka Thirunavukkarasu
Cc: dev; Lasindu Charith (lasindu@wso2.com); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Great! Thanks Reka!

On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Sure..I will have a look on the logs. I will also go through the recent commits and try to revert the fix which added for nested group scaling as it is not needed for this release.  As Martin mentioned that after the fixes, there are more issues. Otherwise, we will have to go through another major effort in testing it.
I will update the progress of it...

Thanks,
Reka

On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Martin,

Thanks for the quick response. Yes we will definitely go through the logs and investigate this.

Thanks

On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Isuru,

No, the issue does not seem to be resolved. With the latest code I see issues in test cases which used to work before  (beyond the latest example I posted the log files for - see below), not sure yet what is going on.  I will be investigating further (making sure I am not mistaken) and following up with some examples after the weekend but if you guys can take a look at the log files on Monday I provided with the previous email that would be great,

Thanks

Martin

From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Saturday, June 20, 2015 7:29 PM
To: dev
Cc: Lasindu Charith (lasindu@wso2.com<ma...@wso2.com>); Reka Thirunavukkarasu (reka@wso2.com<ma...@wso2.com>); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi All,

I'm sorry I could not follow the entire discussion.
Can someone explain the latest status please? Have we resolved the initial group scaling issue and now seeing an application removal problem?

Thanks

On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Lasindu, Reka,


Just run into the issue with removing the application again: (with the fix for the issue included)

Please see [1a., 1b.] for the application structure (group scaling defined at only one group level). See also the respective artifacts and log file attached.

Please advise if we should reopen the JIRA

Thanks

Martin


Application [1a.]

[cid:image002.png@01D0AC0C.59B7B0E0]

[1b.] application after “starting application remove”

[cid:image003.png@01D0AC0C.59B7B0E0]









--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos


--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Imesh Gunaratne <im...@apache.org>.
Great! Thanks Reka!

On Sun, Jun 21, 2015 at 8:34 AM, Reka Thirunavukkarasu <re...@wso2.com>
wrote:

> Hi Martin/Imesh,
>
> Sure..I will have a look on the logs. I will also go through the recent
> commits and try to revert the fix which added for nested group scaling as
> it is not needed for this release.  As Martin mentioned that after the
> fixes, there are more issues. Otherwise, we will have to go through another
> major effort in testing it.
>
> I will update the progress of it...
>
> Thanks,
> Reka
>
> On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
>> Hi Martin,
>>
>> Thanks for the quick response. Yes we will definitely go through the logs
>> and investigate this.
>>
>> Thanks
>>
>> On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>>>  Hi Isuru,
>>>
>>>
>>>
>>> No, the issue does not seem to be resolved. With the latest code I see
>>> issues in test cases which used to work before  (beyond the latest example
>>> I posted the log files for - see below), not sure yet what is going on.  I
>>> will be investigating further (making sure I am not mistaken) and following
>>> up with some examples after the weekend but if you guys can take a look at
>>> the log files on Monday I provided with the previous email that would be
>>> great,
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>>> *Sent:* Saturday, June 20, 2015 7:29 PM
>>> *To:* dev
>>> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
>>> reka@wso2.com); Ryan Du Plessis (rdupless)
>>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>>> application fails to undeploy (nested grouping, group scaling)
>>>
>>>
>>>
>>> Hi All,
>>>
>>>
>>>
>>> I'm sorry I could not follow the entire discussion.
>>>
>>> Can someone explain the latest status please? Have we resolved the
>>> initial group scaling issue and now seeing an application removal problem?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
>>> wrote:
>>>
>>> Hi Lasindu, Reka,
>>>
>>>
>>>
>>>
>>>
>>> Just run into the issue with removing the application *again*: (with
>>> the fix for the issue included)
>>>
>>>
>>>
>>> Please see [1a., 1b.] for the application structure (group scaling
>>> defined at only one group level). See also the respective artifacts and log
>>> file attached.
>>>
>>>
>>>
>>> Please advise if we should reopen the JIRA
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>>
>>>
>>> Application [1a.]
>>>
>>>
>>>
>>>
>>>
>>> [1b.] application after “starting application remove”
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>
>>
>>
>> --
>> Imesh Gunaratne
>>
>> Senior Technical Lead, WSO2
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
> Mobile: +94776442007
>
>
>


-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin/Imesh,

Sure..I will have a look on the logs. I will also go through the recent
commits and try to revert the fix which added for nested group scaling as
it is not needed for this release.  As Martin mentioned that after the
fixes, there are more issues. Otherwise, we will have to go through another
major effort in testing it.

I will update the progress of it...

Thanks,
Reka

On Sun, Jun 21, 2015 at 8:14 AM, Imesh Gunaratne <im...@apache.org> wrote:

> Hi Martin,
>
> Thanks for the quick response. Yes we will definitely go through the logs
> and investigate this.
>
> Thanks
>
> On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>>  Hi Isuru,
>>
>>
>>
>> No, the issue does not seem to be resolved. With the latest code I see
>> issues in test cases which used to work before  (beyond the latest example
>> I posted the log files for - see below), not sure yet what is going on.  I
>> will be investigating further (making sure I am not mistaken) and following
>> up with some examples after the weekend but if you guys can take a look at
>> the log files on Monday I provided with the previous email that would be
>> great,
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* Saturday, June 20, 2015 7:29 PM
>> *To:* dev
>> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
>> reka@wso2.com); Ryan Du Plessis (rdupless)
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi All,
>>
>>
>>
>> I'm sorry I could not follow the entire discussion.
>>
>> Can someone explain the latest status please? Have we resolved the
>> initial group scaling issue and now seeing an application removal problem?
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Lasindu, Reka,
>>
>>
>>
>>
>>
>> Just run into the issue with removing the application *again*: (with the
>> fix for the issue included)
>>
>>
>>
>> Please see [1a., 1b.] for the application structure (group scaling
>> defined at only one group level). See also the respective artifacts and log
>> file attached.
>>
>>
>>
>> Please advise if we should reopen the JIRA
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>>
>>
>> Application [1a.]
>>
>>
>>
>>
>>
>> [1b.] application after “starting application remove”
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> Imesh Gunaratne
>
> Senior Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Martin,

Thanks for the quick response. Yes we will definitely go through the logs
and investigate this.

Thanks

On Sun, Jun 21, 2015 at 8:09 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Isuru,
>
>
>
> No, the issue does not seem to be resolved. With the latest code I see
> issues in test cases which used to work before  (beyond the latest example
> I posted the log files for - see below), not sure yet what is going on.  I
> will be investigating further (making sure I am not mistaken) and following
> up with some examples after the weekend but if you guys can take a look at
> the log files on Monday I provided with the previous email that would be
> great,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Saturday, June 20, 2015 7:29 PM
> *To:* dev
> *Cc:* Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (
> reka@wso2.com); Ryan Du Plessis (rdupless)
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi All,
>
>
>
> I'm sorry I could not follow the entire discussion.
>
> Can someone explain the latest status please? Have we resolved the initial
> group scaling issue and now seeing an application removal problem?
>
>
>
> Thanks
>
>
>
> On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Lasindu, Reka,
>
>
>
>
>
> Just run into the issue with removing the application *again*: (with the
> fix for the issue included)
>
>
>
> Please see [1a., 1b.] for the application structure (group scaling defined
> at only one group level). See also the respective artifacts and log file
> attached.
>
>
>
> Please advise if we should reopen the JIRA
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> Application [1a.]
>
>
>
>
>
> [1b.] application after “starting application remove”
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Isuru,

No, the issue does not seem to be resolved. With the latest code I see issues in test cases which used to work before  (beyond the latest example I posted the log files for - see below), not sure yet what is going on.  I will be investigating further (making sure I am not mistaken) and following up with some examples after the weekend but if you guys can take a look at the log files on Monday I provided with the previous email that would be great,

Thanks

Martin

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Saturday, June 20, 2015 7:29 PM
To: dev
Cc: Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (reka@wso2.com); Ryan Du Plessis (rdupless)
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi All,

I'm sorry I could not follow the entire discussion.
Can someone explain the latest status please? Have we resolved the initial group scaling issue and now seeing an application removal problem?

Thanks

On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Lasindu, Reka,


Just run into the issue with removing the application again: (with the fix for the issue included)

Please see [1a., 1b.] for the application structure (group scaling defined at only one group level). See also the respective artifacts and log file attached.

Please advise if we should reopen the JIRA

Thanks

Martin


Application [1a.]

[cid:image002.png@01D0AB90.D002D030]

[1b.] application after “starting application remove”

[cid:image003.png@01D0AB90.D002D030]









--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Imesh Gunaratne <im...@apache.org>.
Hi All,

I'm sorry I could not follow the entire discussion.
Can someone explain the latest status please? Have we resolved the initial
group scaling issue and now seeing an application removal problem?

Thanks

On Sat, Jun 20, 2015 at 2:06 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Lasindu, Reka,
>
>
>
>
>
> Just run into the issue with removing the application *again*: (with the
> fix for the issue included)
>
>
>
> Please see [1a., 1b.] for the application structure (group scaling defined
> at only one group level). See also the respective artifacts and log file
> attached.
>
>
>
> Please advise if we should reopen the JIRA
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> Application [1a.]
>
>
>
>
>
> [1b.] application after “starting application remove”
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Lasindu, Reka,


Just run into the issue with removing the application again: (with the fix for the issue included)

Please see [1a., 1b.] for the application structure (group scaling defined at only one group level). See also the respective artifacts and log file attached.

Please advise if we should reopen the JIRA

Thanks

Martin


Application [1a.]

[cid:image002.png@01D0AA92.6F1EEFF0]

[1b.] application after “starting application remove”

[cid:image001.png@01D0AA93.587BB0C0]







RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Update,

Just noted that the last instance got removed (after a long time) but the application is still there, attached the (complete) log and application picture [1f.]


[1f.]

[cid:image009.png@01D0A861.FB166F70]


From: Martin Eppel (meppel)
Sent: Tuesday, June 16, 2015 5:55 PM
To: dev@stratos.apache.org; Lasindu Charith (lasindu@wso2.com); Reka Thirunavukkarasu (reka@wso2.com)
Cc: Ryan Du Plessis (rdupless)
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)


Hi Lasindu, Reka

After incorporating the latest fixes (which solves some part of the “application removal issues”, Thanks !) howevever I still have an issue with application removal in the case of a group scaling scenario. Please see the application structure [1a. …] and the attached zip file with the artifacts (wso2carbon.log, cartridge-group.json, application.json).

Please also note, I updated the application accordingly and removed the cartridge deployment policy where a group deployment policy was defined, please review the application.json for correctness.

Observation: After all VMs go active (including the ones spun up by group scaling) and the “application removal process” is started the application fails to be removed. The majority of the VMs are terminated but there are 1 or 2 Vms which seem to get started when the application removed process is imitated. These VMs typically get stuck in the “Started” or “Initialized” state and are not being terminated.
Please note the group scaling (1.a … 1.c) progression, what I noted is that the groups, which were created by the autoscaler don’ seem to go into active state, see also attached log “wso2carbon-all-active.log”

After starting application removal process, see [1d.], logs “wso2carbon-after-remove.log”, “wso2carbon-after-remove-1.log” and final log “wso2carbon-after-remove-2.log”

Thanks

Martin


[1a.]

[cid:image010.png@01D0A861.FB166F70]

[1b.]

[cid:image011.png@01D0A861.FB166F70]

[1c.] all cartridges active, but it seems “auto scaled “ groups are not in active state

[cid:image012.png@01D0A861.FB166F70]

[1d.]

[cid:image013.png@01D0A861.FB166F70]



[1e.] after application removal process started, instance (stuck) in “Starting” state, see log wso2carbon-after-remove-1.log

[cid:image014.png@01D0A861.FB166F70]














Thanks

Martin







From: Martin Eppel (meppel)
Sent: Friday, June 12, 2015 10:16 PM
To: dev
Cc: 'Ryan Du Plessis (rdupless)'
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Lasindru,

I have run some tests and the issue, failure to remove an application when an instance is terminated and restarted seems to be resolved.

However, I do seem to see some issue with group scaling and application removal, but still have to run some tests next week to get a better understanding (not sure yet if this is an issue or not), will keep you posted,

Thanks

Martin

From: Lasindu Charith [mailto:lasindu@wso2.com]
Sent: Friday, June 12, 2015 9:41 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I have fixed above issue in commit 03de83172309c2932075fb5284c120ca610bbf0a. Please take a pull from the master and try-out your scenario again to see if undeployment/redeployment works as expected.

Thanks,


On Thu, Jun 11, 2015 at 11:52 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi Martin,

I guess my previous observation is incorrect. The root cause for the above issue is because ClusterStatusTerminatedProcessor does not send ClusterTerminatedEvent for all 3 clusters. It only sends 1 and fails to send other 2 clusterTerminated events. This is because, when application is activated again ClusterLevelPartitionContext is added twice to the clusterInstanceContext. This makes the if condition failed at [1] when trying to find out whether cluster monitor has any non terminated members at ClusterStatusTerminatedProcessor before sending clusterTerminated event. Will try to find a proper solution and update the thread.


[1] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/cluster/ClusterStatusTerminatedProcessor.java#L90

Thanks,


On Thu, Jun 11, 2015 at 10:29 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Is there any conclusion how to this fix this ?

Thanks


Martin

From: Lahiru Sandaruwan [mailto:lahirus@wso2.com<ma...@wso2.com>]
Sent: Wednesday, June 10, 2015 6:55 PM
To: dev
Cc: Reka Thirunavukkarasu

Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Imesh,

Following could be the possible reason for not un-deploying when member was auto healed,


  *   The particular cluster, that the member is auto healed, is terminated before others(when others are terminating state)
or

  *   The particular cluster, that the member is auto healed, is still terminating when others are terminated state
One of those two cases could happen, even if the member was not auto healed(In case of groups, where one group is very complex, and others are simple). Because, currently we check whether all the cluster and groups in terminating status in the case of the parent group is terminating, which is wrong.

Thanks.

On Thu, Jun 11, 2015 at 5:49 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Do we know why this only happens if a member was forcefully terminated and auto-healed?

On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi  all,

Cause for above issue seems to be as follows.
GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor processes the event only if all the group instances and cluster instances are in terminated state or in terminating state consequently[1][2]. But there can be situations(such as above), where some group instances are at terminated state and some at terminating state by the time GroupStatusProcessorChain is executed. For similar scenarios, both GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor executions are skipped and at GroupStatusInactiveProcessor it prints" No possible state change found" warning.

I think we need to find a way to properly fix this.

[1] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
[2] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89

On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi Martin,

I was able to reproduce this issue in the latest build with PCA in Openstack. Even after stratos is restarted, the Application is not undeployed, which makes it impossible to undeploy the application (even the forceful undeployment failed for the above obsolete application).

Currently I'm looking at possible causes for this and will update with the progress.

Thanks,

On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Here is another example where the removal fails:

For application see [1.], log file (with debug enabled) and jsons are attached.

Scenario:


•        Deploy application and wait for all cartridges to become active

•        Kill a VM (2nd in startup sequence)

•        Wait for it to restart and become active

•        Un-deploy application

a.      Un-deploy forcefully will succeed
([2015-06-08 20:38:21,487]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Forcefully un-deploying the application s-g-c1-c2-c3-s)
und

b.      Un-deploy gracefully will fail to remove app completely (although VMs are terminated successfully)
([2015-06-08 20:54:16,372]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Starting to undeploy application: [application-id])

•        Both scenarios are recorded in the same log file wso2carbon-s-g-c1-c2-c3-s.log

•        Btw, I retested the scenario and the issue is easily  reproducible following the steps listed above:
graceful application un-deploy succeeds if no VM had been restarted (terminated and restarted by autoscaler).
Once a VM is terminated , graceful application un-deploy will fail
I attached a log file which demonstrates this case (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same application is deployed, becomes active and is then removed (repetead 2 times), then, a VM is terminated and restarted by autoscaler. Afterwards, graceful application un-deploy fails.


Other Observations:

When the application successfully some events e.g. “cluster removed event”, “Application deleted event received:” are being published (see [2.] while when the application fails to be removed no such event is being observed.

[2.] cluster removed event when application is un-deployed forcefully
TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver} -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing application clusters removed event: [application-id] s-g-c1-c2-c3-s


I analyzed the differences in the successful application removal and unsuccessful log sequence and noticed a difference (see also highlighted areas):

Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)

TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -  Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [ s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry
TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Applications updated: {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}
TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher} -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group] s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  GroupProcessor chain calculating the status for the group [ s-g-c1-c2-c3-s ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  StatusChecker calculating the active status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending application instance terminated for [application] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1

Unsuccessful:

TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatingProcessor} -  StatusChecker calculating the terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  StatusChecker calculating the inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  No possible state change found for [component] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] application [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)




[1.] Application Structure
[cid:image015.png@01D0A861.FB166F70]






From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 4:38 PM

To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

This is another application, see [1.] which fails to get completely removed:

Scenario / Observation:

•        After all instances / application go active, one instance is being terminated (to verify termination behavior). Once the terminated instance is restored the application is undeployed.

•        After the Application undeployment process is started, all instances are being terminated

•        Application still shows up in stratos admin, subsequent deployments fail

stratos> list-applications
Applications found:
+---------------------+---------------------+----------+
| Application ID      | Alias               | Status   |
+---------------------+---------------------+----------+
| s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
+---------------------+---------------------+----------+


[1.] Application:

[cid:image016.png@01D0A861.FB166F70]




From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 3:26 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

After re-running it this my observations:


•        After the “Application undeployment process started” is started, there is a likelihood that (a few) VMs are still launched – I suspect this is due to some race condition between “Application undeployment process started” and the “autoscaler”.

•        All Vms which were launched before the “Application undeployment process started” get terminated as part of the undeployment process.

•        Vms which were launched after “Application undeployment process started” eventually get moved to obsolete / pending state and cleaned up, this can take up to 15- 20 minutes.

•        The application never gets completely removed,

•        The following exception is consistently observed:

ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message

org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)

                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)

•        Initiating the “Application undeployment process” again will cause the following INFO statement (without any further actions, see in log)
TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application monitor is already in terminating, graceful un-deployment is has already been attempted thus not invoking again

•        Other exceptions observed after the “Application undeployment process started”
TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException: CloudControllerServiceInvalidMemberExceptionException
        at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
        at org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
        at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
        at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)


•        Created a jira to track this issue: https://issues.apache.org/jira/browse/STRATOS-1430







Regards



Martin



Attached the log file of the last test







From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 12:59 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

For this latest test I got the latest source from stratos repo so I have this commit (see below), but the un-deployment still fails (to some extent).
As mentioned below, it seems that all the members get terminated eventually, including the ones which got started after the “application un-deployment” process started.
What is still left in stratos (even after all members got terminated) is the application (see the stratos> list-applications command result below in email thread). This would still be an issue when re-deploying the application !
I will do a few reruns to verify the removal of the VMs (members) is consistent.
Thanks

Martin

git show 2fe84b91843b20e91e8cafd06011f42d218f231c
commit 2fe84b91843b20e91e8cafd06011f42d218f231c
Author: anuruddhal <an...@gmail.com>>
Date:   Wed Jun 3 14:41:12 2015 +0530

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I also encountered a similar issue with the application un-deployment with PCA but I guess you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with the below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the termination request was made for an already terminated member. There is a possibility that Autoscaler make a second terminate request if the first request take some time to execute and at the time the second request hit Cloud Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its just this exceptions that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2
                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new members are being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<ma...@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image017.png@01D0A861.FB166F70]




...

[Message clipped]



--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>



--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Lasindu, Reka

After incorporating the latest fixes (which solves some part of the “application removal issues”, Thanks !) howevever I still have an issue with application removal in the case of a group scaling scenario. Please see the application structure [1a. …] and the attached zip file with the artifacts (wso2carbon.log, cartridge-group.json, application.json).

Please also note, I updated the application accordingly and removed the cartridge deployment policy where a group deployment policy was defined, please review the application.json for correctness.

Observation: After all VMs go active (including the ones spun up by group scaling) and the “application removal process” is started the application fails to be removed. The majority of the VMs are terminated but there are 1 or 2 Vms which seem to get started when the application removed process is imitated. These VMs typically get stuck in the “Started” or “Initialized” state and are not being terminated.
Please note the group scaling (1.a … 1.c) progression, what I noted is that the groups, which were created by the autoscaler don’ seem to go into active state, see also attached log “wso2carbon-all-active.log”

After starting application removal process, see [1d.], logs “wso2carbon-after-remove.log”, “wso2carbon-after-remove-1.log” and final log “wso2carbon-after-remove-2.log”

Thanks

Martin


[1a.]

[cid:image001.png@01D0A857.79928150]

[1b.]

[cid:image002.png@01D0A857.B722B2B0]

[1c.] all cartridges active, but it seems “auto scaled “ groups are not in active state

[cid:image003.png@01D0A85B.2DB28970]

[1d.]

[cid:image004.png@01D0A85B.AF6D71A0]



[1e.] after application removal process started, instance (stuck) in “Starting” state, see log wso2carbon-after-remove-1.log

[cid:image008.png@01D0A85C.8D6A5CC0]














Thanks

Martin







From: Martin Eppel (meppel)
Sent: Friday, June 12, 2015 10:16 PM
To: dev
Cc: 'Ryan Du Plessis (rdupless)'
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Lasindru,

I have run some tests and the issue, failure to remove an application when an instance is terminated and restarted seems to be resolved.

However, I do seem to see some issue with group scaling and application removal, but still have to run some tests next week to get a better understanding (not sure yet if this is an issue or not), will keep you posted,

Thanks

Martin

From: Lasindu Charith [mailto:lasindu@wso2.com]
Sent: Friday, June 12, 2015 9:41 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I have fixed above issue in commit 03de83172309c2932075fb5284c120ca610bbf0a. Please take a pull from the master and try-out your scenario again to see if undeployment/redeployment works as expected.

Thanks,


On Thu, Jun 11, 2015 at 11:52 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi Martin,

I guess my previous observation is incorrect. The root cause for the above issue is because ClusterStatusTerminatedProcessor does not send ClusterTerminatedEvent for all 3 clusters. It only sends 1 and fails to send other 2 clusterTerminated events. This is because, when application is activated again ClusterLevelPartitionContext is added twice to the clusterInstanceContext. This makes the if condition failed at [1] when trying to find out whether cluster monitor has any non terminated members at ClusterStatusTerminatedProcessor before sending clusterTerminated event. Will try to find a proper solution and update the thread.


[1] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/cluster/ClusterStatusTerminatedProcessor.java#L90

Thanks,


On Thu, Jun 11, 2015 at 10:29 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Is there any conclusion how to this fix this ?

Thanks


Martin

From: Lahiru Sandaruwan [mailto:lahirus@wso2.com<ma...@wso2.com>]
Sent: Wednesday, June 10, 2015 6:55 PM
To: dev
Cc: Reka Thirunavukkarasu

Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Imesh,

Following could be the possible reason for not un-deploying when member was auto healed,


  *   The particular cluster, that the member is auto healed, is terminated before others(when others are terminating state)
or

  *   The particular cluster, that the member is auto healed, is still terminating when others are terminated state
One of those two cases could happen, even if the member was not auto healed(In case of groups, where one group is very complex, and others are simple). Because, currently we check whether all the cluster and groups in terminating status in the case of the parent group is terminating, which is wrong.

Thanks.

On Thu, Jun 11, 2015 at 5:49 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Do we know why this only happens if a member was forcefully terminated and auto-healed?

On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi  all,

Cause for above issue seems to be as follows.
GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor processes the event only if all the group instances and cluster instances are in terminated state or in terminating state consequently[1][2]. But there can be situations(such as above), where some group instances are at terminated state and some at terminating state by the time GroupStatusProcessorChain is executed. For similar scenarios, both GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor executions are skipped and at GroupStatusInactiveProcessor it prints" No possible state change found" warning.

I think we need to find a way to properly fix this.

[1] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
[2] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89

On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi Martin,

I was able to reproduce this issue in the latest build with PCA in Openstack. Even after stratos is restarted, the Application is not undeployed, which makes it impossible to undeploy the application (even the forceful undeployment failed for the above obsolete application).

Currently I'm looking at possible causes for this and will update with the progress.

Thanks,

On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Here is another example where the removal fails:

For application see [1.], log file (with debug enabled) and jsons are attached.

Scenario:


•        Deploy application and wait for all cartridges to become active

•        Kill a VM (2nd in startup sequence)

•        Wait for it to restart and become active

•        Un-deploy application

a.      Un-deploy forcefully will succeed
([2015-06-08 20:38:21,487]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Forcefully un-deploying the application s-g-c1-c2-c3-s)
und

b.      Un-deploy gracefully will fail to remove app completely (although VMs are terminated successfully)
([2015-06-08 20:54:16,372]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Starting to undeploy application: [application-id])

•        Both scenarios are recorded in the same log file wso2carbon-s-g-c1-c2-c3-s.log

•        Btw, I retested the scenario and the issue is easily  reproducible following the steps listed above:
graceful application un-deploy succeeds if no VM had been restarted (terminated and restarted by autoscaler).
Once a VM is terminated , graceful application un-deploy will fail
I attached a log file which demonstrates this case (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same application is deployed, becomes active and is then removed (repetead 2 times), then, a VM is terminated and restarted by autoscaler. Afterwards, graceful application un-deploy fails.


Other Observations:

When the application successfully some events e.g. “cluster removed event”, “Application deleted event received:” are being published (see [2.] while when the application fails to be removed no such event is being observed.

[2.] cluster removed event when application is un-deployed forcefully
TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver} -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing application clusters removed event: [application-id] s-g-c1-c2-c3-s


I analyzed the differences in the successful application removal and unsuccessful log sequence and noticed a difference (see also highlighted areas):

Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)

TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -  Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [ s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry
TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Applications updated: {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}
TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher} -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group] s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  GroupProcessor chain calculating the status for the group [ s-g-c1-c2-c3-s ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  StatusChecker calculating the active status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending application instance terminated for [application] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1

Unsuccessful:

TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatingProcessor} -  StatusChecker calculating the terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  StatusChecker calculating the inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  No possible state change found for [component] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] application [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)




[1.] Application Structure
[cid:image005.png@01D0A828.5A295200]






From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 4:38 PM

To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

This is another application, see [1.] which fails to get completely removed:

Scenario / Observation:

•        After all instances / application go active, one instance is being terminated (to verify termination behavior). Once the terminated instance is restored the application is undeployed.

•        After the Application undeployment process is started, all instances are being terminated

•        Application still shows up in stratos admin, subsequent deployments fail

stratos> list-applications
Applications found:
+---------------------+---------------------+----------+
| Application ID      | Alias               | Status   |
+---------------------+---------------------+----------+
| s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
+---------------------+---------------------+----------+


[1.] Application:

[cid:image006.png@01D0A828.5A295200]




From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 3:26 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

After re-running it this my observations:


•        After the “Application undeployment process started” is started, there is a likelihood that (a few) VMs are still launched – I suspect this is due to some race condition between “Application undeployment process started” and the “autoscaler”.

•        All Vms which were launched before the “Application undeployment process started” get terminated as part of the undeployment process.

•        Vms which were launched after “Application undeployment process started” eventually get moved to obsolete / pending state and cleaned up, this can take up to 15- 20 minutes.

•        The application never gets completely removed,

•        The following exception is consistently observed:

ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message

org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)

                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)

•        Initiating the “Application undeployment process” again will cause the following INFO statement (without any further actions, see in log)
TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application monitor is already in terminating, graceful un-deployment is has already been attempted thus not invoking again

•        Other exceptions observed after the “Application undeployment process started”
TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException: CloudControllerServiceInvalidMemberExceptionException
        at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
        at org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
        at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
        at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)


•        Created a jira to track this issue: https://issues.apache.org/jira/browse/STRATOS-1430







Regards



Martin



Attached the log file of the last test







From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 12:59 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

For this latest test I got the latest source from stratos repo so I have this commit (see below), but the un-deployment still fails (to some extent).
As mentioned below, it seems that all the members get terminated eventually, including the ones which got started after the “application un-deployment” process started.
What is still left in stratos (even after all members got terminated) is the application (see the stratos> list-applications command result below in email thread). This would still be an issue when re-deploying the application !
I will do a few reruns to verify the removal of the VMs (members) is consistent.
Thanks

Martin

git show 2fe84b91843b20e91e8cafd06011f42d218f231c
commit 2fe84b91843b20e91e8cafd06011f42d218f231c
Author: anuruddhal <an...@gmail.com>>
Date:   Wed Jun 3 14:41:12 2015 +0530

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I also encountered a similar issue with the application un-deployment with PCA but I guess you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with the below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the termination request was made for an already terminated member. There is a possibility that Autoscaler make a second terminate request if the first request take some time to execute and at the time the second request hit Cloud Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its just this exceptions that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2
                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new members are being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<ma...@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image007.png@01D0A828.5A295200]




...

[Message clipped]



--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>



--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Lasindu Charith <la...@wso2.com>.
Hi Martin,

I have fixed above issue in
commit 03de83172309c2932075fb5284c120ca610bbf0a. Please take a pull from
the master and try-out your scenario again to see if
undeployment/redeployment works as expected.

Thanks,


On Thu, Jun 11, 2015 at 11:52 PM, Lasindu Charith <la...@wso2.com> wrote:

> Hi Martin,
>
> I guess my previous observation is incorrect. The root cause for the above
> issue is because *ClusterStatusTerminatedProcessor* does not send *ClusterTerminatedEvent
> *for all 3 clusters. It only sends 1 and fails to send other 2
> clusterTerminated events. This is because, when application is activated
> again *ClusterLevelPartitionContext *is added twice to the *clusterInstanceContext.
> *This makes the if condition failed at [1] when trying to find out
> whether cluster monitor has any non terminated members at
> *ClusterStatusTerminatedProcessor *before sending clusterTerminated
> event. Will try to find a proper solution and update the thread.
>
>
> [1]
> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/cluster/ClusterStatusTerminatedProcessor.java#L90
>
> Thanks,
>
>
> On Thu, Jun 11, 2015 at 10:29 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>>  Is there any conclusion how to this fix this ?
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Lahiru Sandaruwan [mailto:lahirus@wso2.com]
>> *Sent:* Wednesday, June 10, 2015 6:55 PM
>> *To:* dev
>> *Cc:* Reka Thirunavukkarasu
>>
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Imesh,
>>
>>
>>
>> Following could be the possible reason for not un-deploying when member
>> was auto healed,
>>
>>
>>
>>    - The particular cluster, that the member is auto healed, is
>>    terminated before others(when others are terminating state)
>>
>>  or
>>
>>    - The particular cluster, that the member is auto healed, is still
>>    terminating when others are terminated state
>>
>>  One of those two cases could happen, even if the member was not auto
>> healed(In case of groups, where one group is very complex, and others are
>> simple). Because, currently we check whether all the cluster and groups in
>> *terminating* status in the case of the parent group is *terminating,* which
>> is wrong.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> On Thu, Jun 11, 2015 at 5:49 AM, Imesh Gunaratne <im...@apache.org>
>> wrote:
>>
>> Do we know why this only happens if a member was forcefully terminated
>> and auto-healed?
>>
>>
>>
>> On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <la...@wso2.com>
>> wrote:
>>
>> Hi  all,
>>
>>
>>
>> Cause for above issue seems to be as follows.
>>
>> GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
>> processes the event only if all the group instances and cluster instances
>> are in terminated state or in terminating state consequently[1][2]. But
>> there can be situations(such as above), where some group instances are at
>> terminated state and some at terminating state by the
>> time GroupStatusProcessorChain is executed. For similar scenarios, both
>> GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
>> executions are skipped and at GroupStatusInactiveProcessor it prints" No
>> possible state change found" warning.
>>
>>
>>
>> I think we need to find a way to properly fix this.
>>
>>
>>
>> [1]
>> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
>>
>> [2]
>> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89
>>
>>
>>
>> On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <la...@wso2.com> wrote:
>>
>> Hi Martin,
>>
>>
>>
>> I was able to reproduce this issue in the latest build with PCA in
>> Openstack. Even after stratos is restarted, the Application is not
>> undeployed, which makes it impossible to undeploy the application (even the
>> forceful undeployment failed for the above obsolete application).
>>
>>
>>
>> Currently I'm looking at possible causes for this and will update with
>> the progress.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Here is another example where the removal fails:
>>
>>
>>
>> For application see [1.], log file (with debug enabled) and jsons are
>> attached.
>>
>>
>>
>> Scenario:
>>
>>
>>
>> ·        Deploy application and wait for all cartridges to become active
>>
>> ·        Kill a VM (2nd in startup sequence)
>>
>> ·        Wait for it to restart and become active
>>
>> ·        Un-deploy application
>>
>> a.      Un-deploy forcefully will succeed
>> ([2015-06-08 20:38:21,487]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Forcefully un-deploying the application s-g-c1-c2-c3-s)
>> und
>>
>> b.      Un-deploy gracefully will fail to remove app completely
>> (although VMs are terminated successfully)
>> ([2015-06-08 20:54:16,372]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Starting to undeploy application: [application-id])
>>
>> ·        Both scenarios are recorded in the same log file
>> wso2carbon-s-g-c1-c2-c3-s.log
>>
>> ·        Btw, I retested the scenario and the issue is easily
>>  reproducible following the steps listed above:
>> graceful application un-deploy succeeds if no VM had been restarted
>> (terminated and restarted by autoscaler).
>> Once a VM is terminated , graceful application un-deploy will fail
>> I attached a log file which demonstrates this case
>> (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same
>> application is deployed, becomes active and is then removed (repetead 2
>> times), then, a VM is terminated and restarted by autoscaler. Afterwards,
>> graceful application un-deploy fails.
>>
>>
>>
>>
>>
>> Other Observations:
>>
>>
>>
>> When the application successfully some events e.g. “cluster removed
>> event”, “Application deleted event received:” are being published (see [2.]
>> while when the application fails to be removed no such event is being
>> observed.
>>
>>
>>
>> [2.] cluster removed event when application is un-deployed forcefully
>>
>> TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO
>> {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver}
>> -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
>>
>> TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO
>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>> -  Publishing application clusters removed event: [application-id]
>> s-g-c1-c2-c3-s
>>
>>
>>
>>
>>
>> I analyzed the differences in the successful application removal and
>> unsuccessful log sequence and noticed a difference (see also highlighted
>> areas):
>>
>>
>>
>> Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)
>>
>>
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
>> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>> s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>> -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x
>> [instance] s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>> {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -
>> Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x
>> [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>> [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>> [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG
>> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
>> Write lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG
>> {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [
>> s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -
>> Applications updated:
>> {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO
>> {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher}
>> -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s
>> [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group]
>> s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance]
>> s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusActiveProcessor}** -  GroupProcessor chain calculating the
>> status for the group [ s-g-c1-c2-c3-s ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor}
>> -  StatusChecker calculating the active status for the group [
>> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>> -  StatusChecker calculating the terminated status for the group [
>> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>> -  Sending application instance terminated for [application] s-g-c1-c2-c3-s
>> [instance] s-g-c1-c2-c3-s-1*
>>
>>
>>
>> Unsuccessful:
>>
>>
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
>> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>> s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
>> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
>> status is: Terminating*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
>> Write lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
>> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
>> status is: Terminating*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusTerminatingProcessor**} -  StatusChecker calculating the
>> terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance
>> [ s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusInactiveProcessor**} -  StatusChecker calculating the
>> inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>> s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
>> -  **No possible state change found for* *[component] s-g-c1-c2-c3-s-x0x
>> [instance] s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR
>> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
>> error, lock has not released for 30 seconds: [lock-name] application
>> [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2
>> [stack-trace] *
>>
>> *java.lang.Thread.getStackTrace(Thread.java:1589)*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [1.] Application Structure
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Friday, June 05, 2015 4:38 PM
>>
>>
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> This is another application, see [1.] which fails to get completely
>> removed:
>>
>>
>>
>> Scenario / Observation:
>>
>> ·        After all instances / application go active, one instance is
>> being terminated (to verify termination behavior). Once the terminated
>> instance is restored the application is undeployed.
>>
>> ·        After the Application undeployment process is started, all
>> instances are being terminated
>>
>> ·        Application still shows up in stratos admin, subsequent
>> deployments fail
>>
>>
>>
>> stratos> list-applications
>>
>> Applications found:
>>
>> +---------------------+---------------------+----------+
>>
>> | Application ID      | Alias               | Status   |
>>
>> +---------------------+---------------------+----------+
>>
>> | s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
>>
>> +---------------------+---------------------+----------+
>>
>>
>>
>>
>>
>> [1.] Application:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Friday, June 05, 2015 3:26 PM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> After re-running it this my observations:
>>
>>
>>
>> ·        After the “Application undeployment process started” is
>> started, there is a likelihood that (a few) VMs are still launched – I
>> suspect this is due to some race condition between “Application
>> undeployment process started” and the “autoscaler”.
>>
>> ·        All Vms which were launched before the “Application
>> undeployment process started” get terminated as part of the undeployment
>> process.
>>
>> ·        Vms which were launched after “Application undeployment process
>> started” eventually get moved to obsolete / pending state and cleaned up,
>> this can take up to 15- 20 minutes.
>>
>> ·        The application never gets completely removed,
>>
>> ·        The following exception is consistently observed:
>>
>> ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN
>> {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System
>> warning! Trying to release a lock which has not been taken by the same
>> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>> pool-24-thread-2
>>
>> *TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR
>> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
>> -  Failed to retrieve topology event message*
>>
>> *org.apache.stratos.common.exception.InvalidLockRequestedException:
>> System error, cannot acquire a write lock while having a read lock on the
>> same thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>> pool-24-thread-2*
>>
>> *                    at
>> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>>
>> *                    at
>> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>>
>> ·        Initiating the “Application undeployment process” again will
>> cause the following INFO statement (without any further actions, see in log)
>> TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Application monitor is already in terminating, graceful un-deployment is
>> has already been attempted thus not invoking again
>>
>> ·        Other exceptions observed after the “Application undeployment
>> process started”
>>
>> TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR
>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>> instance
>>
>> org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException:
>> CloudControllerServiceInvalidMemberExceptionException
>>
>>         at
>> sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
>>
>>         at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>
>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>
>>         at java.lang.Class.newInstance(Class.java:374)
>>
>>         at
>> org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
>>
>>         at
>> org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
>>
>>         at
>> org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
>>
>>         at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)
>>
>>
>>
>> ·        Created a jira to track this issue:
>> https://issues.apache.org/jira/browse/STRATOS-1430
>>
>>
>>
>>
>>
>>
>>
>> Regards
>>
>>
>>
>> Martin
>>
>>
>>
>> Attached the log file of the last test
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Friday, June 05, 2015 12:59 PM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> For this latest test I got the latest source from stratos repo so I have
>> this commit (see below), but the un-deployment still fails (to some extent).
>>
>> As mentioned below, it seems that all the members get terminated
>> eventually, including the ones which got started after the “application
>> un-deployment” process started.
>>
>> What is still left in stratos (even after all members got terminated) is
>> the application (see the stratos> list-applications command result below in
>> email thread). This would still be an issue when re-deploying the
>> application !
>>
>> I will do a few reruns to verify the removal of the VMs (members) is
>> consistent.
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> git show 2fe84b91843b20e91e8cafd06011f42d218f231c
>>
>> commit 2fe84b91843b20e91e8cafd06011f42d218f231c
>>
>> Author: anuruddhal <an...@gmail.com>
>>
>> Date:   Wed Jun 3 14:41:12 2015 +0530
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
>> *Sent:* Friday, June 05, 2015 12:46 PM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Martin,
>>
>>
>>
>> I also encountered a similar issue with the application un-deployment
>> with PCA but I guess you are using JCA.
>>
>>
>>
>> I can see that Anuruddha has done a fix for the issue I'm referring with
>> the below commit:
>>
>>
>> https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c
>>
>>
>>
>> Regarding the member context not found error, this could occur if the
>> termination request was made for an already terminated member. There is a
>> possibility that Autoscaler make a second terminate request if the first
>> request take some time to execute and at the time the second request hit
>> Cloud Controller the member is already terminated with the first request.
>>
>>
>>
>> Can you please confirm whether the members were properly terminated and
>> its just this exceptions that you are seeing?
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Udara,
>>
>>
>>
>> Picked up your commit and rerun the test case:
>>
>>
>>
>> Attached is the log file (artifacts are the same as before).
>>
>>
>>
>> *Didn’t see the issue with* “*Member is in the wrong list” …*
>>
>>
>>
>> but see the following exception after the undeploy application message:
>>
>> *TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR
>> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
>> -  Failed to retrieve topology event message*
>>
>> *org.apache.stratos.common.exception.InvalidLockRequestedException:
>> System error, cannot acquire a write lock while having a read lock on the
>> same thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>> pool-24-thread-2*
>>
>> *                    at
>> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>>
>> *                    at
>> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>>
>>
>>
>>
>>
>> *Also, after the “Application undeployment process started” is started,
>> new members are being instantiated:*
>>
>>
>>
>> *TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO
>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>> -  Publishing member created event*:
>>
>>
>>
>>
>>
>> *Eventually, these VMs get terminated :*
>>
>>
>>
>> *TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR
>> {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl}
>> -  Could not terminate instance: [member-id]
>> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>>
>> *org.apache.stratos.cloud.controller.exception.InvalidMemberException:
>> Could not terminate instance, member context not found: [member-id]
>> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>>
>> *                    at
>> org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)*
>>
>> *                    at
>> sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)*
>>
>> *                    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>>
>> *                    at java.lang.reflect.Method.invoke(Method.java:606)*
>>
>>
>>
>>
>>
>> *but the application remains:*
>>
>>
>>
>> stratos> list-applications
>>
>> Applications found:
>>
>> +----------------+------------+----------+
>>
>> | Application ID | Alias      | Status   |
>>
>> +----------------+------------+----------+
>>
>> | g-sc-G12-1     | g-sc-G12-1 | Deployed |
>>
>> +----------------+------------+----------+
>>
>>
>>
>> ['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances
>> 3, members 0 ()\n']
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Friday, June 05, 2015 10:04 AM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Ok:
>>
>>
>>
>> log4j.logger.org.apache.stratos.manager=DEBUG
>>
>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>
>> log4j.logger.org.apache.stratos.messaging=INFO
>>
>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>
>> log4j.logger.org.wso2.andes.client=ERROR
>>
>> # Autoscaler rule logs
>>
>> log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG
>>
>>
>>
>> *From:* Udara Liyanage [mailto:udara@wso2.com <ud...@wso2.com>]
>> *Sent:* Friday, June 05, 2015 10:00 AM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Martin,
>>
>>
>>
>> Better if you can enable debugs logs for all AS, CC and cartridge agent
>>
>>
>>
>> On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com> wrote:
>>
>> Hi,
>>
>>
>>
>> Please enable AS debug logs.
>>
>>
>>
>> On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Udara,
>>
>>
>>
>> Yes, this issue seems to be fairly well reproducible, which debug log do
>> you want me to enable, cartridge agent logs ?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Udara Liyanage [mailto:udara@wso2.com]
>> *Sent:* Thursday, June 04, 2015 11:11 PM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi,
>>
>>
>>
>> This might be possible if AS did not receive member activated event
>> published by CC. Is it possible to enable debug logs if this is
>> reproducible.
>>
>> Or else I can add an INFO logs and commit.
>>
>>
>>
>>
>>
>> On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com> wrote:
>>
>> Hi,
>>
>>
>>
>>
>>
>> For the first issue you have mentioned, the particular member is
>> activated, but it is still identified as an obsolete member and is being
>> marked to be terminated since pending time expired. Does that mean member
>> is still in Obsolete list even though it is being activated?
>>
>>
>>
>> //member started
>>
>> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
>> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
>> stat context has been added: [application] g-sc-G12-1 [cluster]
>> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
>> [partitionContext] whole-region [member-id]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>
>>
>>
>> //member activated
>>
>> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>> -  Publishing member activated event: [service-name] c1 [cluster-id]
>> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>> [network-partition-id] RegionOne [partition-id] whole-region
>>
>> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
>> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
>> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
>> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>
>>
>>
>> //after 15 minutes ---member is still in pending state, pending timeout
>> expired
>>
>> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
>> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
>> -  Pending state of member expired, member will be moved to obsolete list.
>> [pending member]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
>> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>>
>>
>>
>> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> I am running into a scenario where application un-deployment fails (using
>> stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>>
>>
>>
>> For application structure see [1.], (debug enabled) wso2carbon.log,
>> application.json, cartridge-group.json, deployment-policy, auto-scaling
>> policies see attached zip file.
>>
>>
>>
>> *It is noteworthy, that while the application is running the following
>> log statements /exceptions are observed:*
>>
>>
>>
>> *…*
>>
>> *Member is in the wrong list and it is removed from active members list:
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>>
>> *…*
>>
>> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>> instance*
>>
>> *…*
>>
>> *// **after receiving the application undeploy event:*
>>
>> *[2015-06-04 20:12:39,465]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Application undeployment process started: [application-id] g-sc-G12-1*
>>
>> *// **a new instance is being started up*
>>
>> *…*
>>
>> *[2015-06-04 20:13:13,445]  INFO
>> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
>> Instance started successfully: [cartridge-type] c2 [cluster-id]
>> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
>> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>>
>>
>>
>> *// Also noteworthy seems the following warning which is seen repeatedly
>> in the logs:*
>>
>> *ReadWriteLock} -  System warning! Trying to release a lock which has not
>> been taken by the same thread: [lock-name]*
>>
>>
>>
>>
>>
>> [1.] Application structure
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ...
>>
>> [Message clipped]
>
>
>
>
> --
> *Lasindu Charith*
> Software Engineer, WSO2 Inc.
> Mobile: +94714427192
> Web: blog.lasindu.com
>



-- 
*Lasindu Charith*
Software Engineer, WSO2 Inc.
Mobile: +94714427192
Web: blog.lasindu.com

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Lasindu Charith <la...@wso2.com>.
Hi Martin,

I guess my previous observation is incorrect. The root cause for the above
issue is because *ClusterStatusTerminatedProcessor* does not send
*ClusterTerminatedEvent
*for all 3 clusters. It only sends 1 and fails to send other 2
clusterTerminated events. This is because, when application is activated
again *ClusterLevelPartitionContext *is added twice to the
*clusterInstanceContext.
*This makes the if condition failed at [1] when trying to find out whether
cluster monitor has any non terminated members at
*ClusterStatusTerminatedProcessor *before sending clusterTerminated event.
Will try to find a proper solution and update the thread.


[1]
https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/cluster/ClusterStatusTerminatedProcessor.java#L90

Thanks,


On Thu, Jun 11, 2015 at 10:29 PM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Is there any conclusion how to this fix this ?
>
>
>
> Thanks
>
>
>
>
>
> Martin
>
>
>
> *From:* Lahiru Sandaruwan [mailto:lahirus@wso2.com]
> *Sent:* Wednesday, June 10, 2015 6:55 PM
> *To:* dev
> *Cc:* Reka Thirunavukkarasu
>
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Imesh,
>
>
>
> Following could be the possible reason for not un-deploying when member
> was auto healed,
>
>
>
>    - The particular cluster, that the member is auto healed, is
>    terminated before others(when others are terminating state)
>
>  or
>
>    - The particular cluster, that the member is auto healed, is still
>    terminating when others are terminated state
>
>  One of those two cases could happen, even if the member was not auto
> healed(In case of groups, where one group is very complex, and others are
> simple). Because, currently we check whether all the cluster and groups in
> *terminating* status in the case of the parent group is *terminating,* which
> is wrong.
>
>
>
> Thanks.
>
>
>
> On Thu, Jun 11, 2015 at 5:49 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Do we know why this only happens if a member was forcefully terminated and
> auto-healed?
>
>
>
> On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <la...@wso2.com>
> wrote:
>
> Hi  all,
>
>
>
> Cause for above issue seems to be as follows.
>
> GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
> processes the event only if all the group instances and cluster instances
> are in terminated state or in terminating state consequently[1][2]. But
> there can be situations(such as above), where some group instances are at
> terminated state and some at terminating state by the
> time GroupStatusProcessorChain is executed. For similar scenarios, both
> GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
> executions are skipped and at GroupStatusInactiveProcessor it prints" No
> possible state change found" warning.
>
>
>
> I think we need to find a way to properly fix this.
>
>
>
> [1]
> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
>
> [2]
> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89
>
>
>
> On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <la...@wso2.com> wrote:
>
> Hi Martin,
>
>
>
> I was able to reproduce this issue in the latest build with PCA in
> Openstack. Even after stratos is restarted, the Application is not
> undeployed, which makes it impossible to undeploy the application (even the
> forceful undeployment failed for the above obsolete application).
>
>
>
> Currently I'm looking at possible causes for this and will update with the
> progress.
>
>
>
> Thanks,
>
>
>
> On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Here is another example where the removal fails:
>
>
>
> For application see [1.], log file (with debug enabled) and jsons are
> attached.
>
>
>
> Scenario:
>
>
>
> ·        Deploy application and wait for all cartridges to become active
>
> ·        Kill a VM (2nd in startup sequence)
>
> ·        Wait for it to restart and become active
>
> ·        Un-deploy application
>
> a.      Un-deploy forcefully will succeed
> ([2015-06-08 20:38:21,487]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Forcefully un-deploying the application s-g-c1-c2-c3-s)
> und
>
> b.      Un-deploy gracefully will fail to remove app completely (although
> VMs are terminated successfully)
> ([2015-06-08 20:54:16,372]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Starting to undeploy application: [application-id])
>
> ·        Both scenarios are recorded in the same log file
> wso2carbon-s-g-c1-c2-c3-s.log
>
> ·        Btw, I retested the scenario and the issue is easily
>  reproducible following the steps listed above:
> graceful application un-deploy succeeds if no VM had been restarted
> (terminated and restarted by autoscaler).
> Once a VM is terminated , graceful application un-deploy will fail
> I attached a log file which demonstrates this case
> (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same
> application is deployed, becomes active and is then removed (repetead 2
> times), then, a VM is terminated and restarted by autoscaler. Afterwards,
> graceful application un-deploy fails.
>
>
>
>
>
> Other Observations:
>
>
>
> When the application successfully some events e.g. “cluster removed
> event”, “Application deleted event received:” are being published (see [2.]
> while when the application fails to be removed no such event is being
> observed.
>
>
>
> [2.] cluster removed event when application is un-deployed forcefully
>
> TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO
> {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver}
> -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
>
> TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing application clusters removed event: [application-id]
> s-g-c1-c2-c3-s
>
>
>
>
>
> I analyzed the differences in the successful application removal and
> unsuccessful log sequence and noticed a difference (see also highlighted
> areas):
>
>
>
> Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)
>
>
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
> s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
> -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x
> [instance] s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
> {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -
> Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x
> [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
> ClusterMonitor Drools session has been disposed. ClusterMonitor
> [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
> ClusterMonitor Drools session has been disposed. ClusterMonitor
> [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
> ClusterMonitor Drools session has been disposed. ClusterMonitor
> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG
> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
> Write lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG
> {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [
> s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -
> Applications updated:
> {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO
> {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher}
> -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s
> [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO
> {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group]
> s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance]
> s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusActiveProcessor}** -  GroupProcessor chain calculating the
> status for the group [ s-g-c1-c2-c3-s ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor}
> -  StatusChecker calculating the active status for the group [
> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
> -  StatusChecker calculating the terminated status for the group [
> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
> -  Sending application instance terminated for [application] s-g-c1-c2-c3-s
> [instance] s-g-c1-c2-c3-s-1*
>
>
>
> Unsuccessful:
>
>
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
> s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
> status is: Terminating*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
> Write lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
> status is: Terminating*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusTerminatingProcessor**} -  StatusChecker calculating the
> terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance
> [ s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusInactiveProcessor**} -  StatusChecker calculating the
> inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
> s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
> -  **No possible state change found for* *[component] s-g-c1-c2-c3-s-x0x
> [instance] s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
> ClusterMonitor Drools session has been disposed. ClusterMonitor
> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR
> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
> error, lock has not released for 30 seconds: [lock-name] application
> [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2
> [stack-trace] *
>
> *java.lang.Thread.getStackTrace(Thread.java:1589)*
>
>
>
>
>
>
>
>
>
> [1.] Application Structure
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 4:38 PM
>
>
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> This is another application, see [1.] which fails to get completely
> removed:
>
>
>
> Scenario / Observation:
>
> ·        After all instances / application go active, one instance is
> being terminated (to verify termination behavior). Once the terminated
> instance is restored the application is undeployed.
>
> ·        After the Application undeployment process is started, all
> instances are being terminated
>
> ·        Application still shows up in stratos admin, subsequent
> deployments fail
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +---------------------+---------------------+----------+
>
> | Application ID      | Alias               | Status   |
>
> +---------------------+---------------------+----------+
>
> | s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
>
> +---------------------+---------------------+----------+
>
>
>
>
>
> [1.] Application:
>
>
>
>
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 3:26 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> After re-running it this my observations:
>
>
>
> ·        After the “Application undeployment process started” is started,
> there is a likelihood that (a few) VMs are still launched – I suspect this
> is due to some race condition between “Application undeployment process
> started” and the “autoscaler”.
>
> ·        All Vms which were launched before the “Application undeployment
> process started” get terminated as part of the undeployment process.
>
> ·        Vms which were launched after “Application undeployment process
> started” eventually get moved to obsolete / pending state and cleaned up,
> this can take up to 15- 20 minutes.
>
> ·        The application never gets completely removed,
>
> ·        The following exception is consistently observed:
>
> ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN
> {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System
> warning! Trying to release a lock which has not been taken by the same
> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
> pool-24-thread-2
>
> *TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR
> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
> -  Failed to retrieve topology event message*
>
> *org.apache.stratos.common.exception.InvalidLockRequestedException: System
> error, cannot acquire a write lock while having a read lock on the same
> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
> pool-24-thread-2*
>
> *                    at
> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>
> *                    at
> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>
> ·        Initiating the “Application undeployment process” again will
> cause the following INFO statement (without any further actions, see in log)
> TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Application monitor is already in terminating, graceful un-deployment is
> has already been attempted thus not invoking again
>
> ·        Other exceptions observed after the “Application undeployment
> process started”
>
> TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR
> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
> instance
>
> org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException:
> CloudControllerServiceInvalidMemberExceptionException
>
>         at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown
> Source)
>
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>
>         at java.lang.Class.newInstance(Class.java:374)
>
>         at
> org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
>
>         at
> org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
>
>         at
> org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
>
>         at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)
>
>
>
> ·        Created a jira to track this issue:
> https://issues.apache.org/jira/browse/STRATOS-1430
>
>
>
>
>
>
>
> Regards
>
>
>
> Martin
>
>
>
> Attached the log file of the last test
>
>
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 12:59 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> For this latest test I got the latest source from stratos repo so I have
> this commit (see below), but the un-deployment still fails (to some extent).
>
> As mentioned below, it seems that all the members get terminated
> eventually, including the ones which got started after the “application
> un-deployment” process started.
>
> What is still left in stratos (even after all members got terminated) is
> the application (see the stratos> list-applications command result below in
> email thread). This would still be an issue when re-deploying the
> application !
>
> I will do a few reruns to verify the removal of the VMs (members) is
> consistent.
>
> Thanks
>
>
>
> Martin
>
>
>
> git show 2fe84b91843b20e91e8cafd06011f42d218f231c
>
> commit 2fe84b91843b20e91e8cafd06011f42d218f231c
>
> Author: anuruddhal <an...@gmail.com>
>
> Date:   Wed Jun 3 14:41:12 2015 +0530
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
> *Sent:* Friday, June 05, 2015 12:46 PM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
>
>
> I also encountered a similar issue with the application un-deployment with
> PCA but I guess you are using JCA.
>
>
>
> I can see that Anuruddha has done a fix for the issue I'm referring with
> the below commit:
>
>
> https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c
>
>
>
> Regarding the member context not found error, this could occur if the
> termination request was made for an already terminated member. There is a
> possibility that Autoscaler make a second terminate request if the first
> request take some time to execute and at the time the second request hit
> Cloud Controller the member is already terminated with the first request.
>
>
>
> Can you please confirm whether the members were properly terminated and
> its just this exceptions that you are seeing?
>
>
>
> Thanks
>
>
>
>
>
> On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Udara,
>
>
>
> Picked up your commit and rerun the test case:
>
>
>
> Attached is the log file (artifacts are the same as before).
>
>
>
> *Didn’t see the issue with* “*Member is in the wrong list” …*
>
>
>
> but see the following exception after the undeploy application message:
>
> *TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR
> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
> -  Failed to retrieve topology event message*
>
> *org.apache.stratos.common.exception.InvalidLockRequestedException: System
> error, cannot acquire a write lock while having a read lock on the same
> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
> pool-24-thread-2*
>
> *                    at
> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>
> *                    at
> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>
>
>
>
>
> *Also, after the “Application undeployment process started” is started,
> new members are being instantiated:*
>
>
>
> *TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing member created event*:
>
>
>
>
>
> *Eventually, these VMs get terminated :*
>
>
>
> *TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR
> {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl}
> -  Could not terminate instance: [member-id]
> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>
> *org.apache.stratos.cloud.controller.exception.InvalidMemberException:
> Could not terminate instance, member context not found: [member-id]
> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>
> *                    at
> org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)*
>
> *                    at
> sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)*
>
> *                    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>
> *                    at java.lang.reflect.Method.invoke(Method.java:606)*
>
>
>
>
>
> *but the application remains:*
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +----------------+------------+----------+
>
> | Application ID | Alias      | Status   |
>
> +----------------+------------+----------+
>
> | g-sc-G12-1     | g-sc-G12-1 | Deployed |
>
> +----------------+------------+----------+
>
>
>
> ['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances
> 3, members 0 ()\n']
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 10:04 AM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Ok:
>
>
>
> log4j.logger.org.apache.stratos.manager=DEBUG
>
> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>
> log4j.logger.org.apache.stratos.messaging=INFO
>
> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>
> log4j.logger.org.wso2.andes.client=ERROR
>
> # Autoscaler rule logs
>
> log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG
>
>
>
> *From:* Udara Liyanage [mailto:udara@wso2.com <ud...@wso2.com>]
> *Sent:* Friday, June 05, 2015 10:00 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
>
>
> Better if you can enable debugs logs for all AS, CC and cartridge agent
>
>
>
> On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com> wrote:
>
> Hi,
>
>
>
> Please enable AS debug logs.
>
>
>
> On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Udara,
>
>
>
> Yes, this issue seems to be fairly well reproducible, which debug log do
> you want me to enable, cartridge agent logs ?
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Udara Liyanage [mailto:udara@wso2.com]
> *Sent:* Thursday, June 04, 2015 11:11 PM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi,
>
>
>
> This might be possible if AS did not receive member activated event
> published by CC. Is it possible to enable debug logs if this is
> reproducible.
>
> Or else I can add an INFO logs and commit.
>
>
>
>
>
> On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com> wrote:
>
> Hi,
>
>
>
>
>
> For the first issue you have mentioned, the particular member is
> activated, but it is still identified as an obsolete member and is being
> marked to be terminated since pending time expired. Does that mean member
> is still in Obsolete list even though it is being activated?
>
>
>
> //member started
>
> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
> stat context has been added: [application] g-sc-G12-1 [cluster]
> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
> [partitionContext] whole-region [member-id]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>
>
>
> //member activated
>
> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing member activated event: [service-name] c1 [cluster-id]
> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
> [network-partition-id] RegionOne [partition-id] whole-region
>
> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>
>
>
> //after 15 minutes ---member is still in pending state, pending timeout
> expired
>
> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
> -  Pending state of member expired, member will be moved to obsolete list.
> [pending member]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>
>
>
> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi,
>
>
>
> I am running into a scenario where application un-deployment fails (using
> stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>
>
>
> For application structure see [1.], (debug enabled) wso2carbon.log,
> application.json, cartridge-group.json, deployment-policy, auto-scaling
> policies see attached zip file.
>
>
>
> *It is noteworthy, that while the application is running the following log
> statements /exceptions are observed:*
>
>
>
> *…*
>
> *Member is in the wrong list and it is removed from active members list:
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>
> *…*
>
> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
> instance*
>
> *…*
>
> *// **after receiving the application undeploy event:*
>
> *[2015-06-04 20:12:39,465]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Application undeployment process started: [application-id] g-sc-G12-1*
>
> *// **a new instance is being started up*
>
> *…*
>
> *[2015-06-04 20:13:13,445]  INFO
> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
> Instance started successfully: [cartridge-type] c2 [cluster-id]
> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>
>
>
> *// Also noteworthy seems the following warning which is seen repeatedly
> in the logs:*
>
> *ReadWriteLock} -  System warning! Trying to release a lock which has not
> been taken by the same thread: [lock-name]*
>
>
>
>
>
> [1.] Application structure
>
>
>
>
>
>
>
>
>
>
>
> ...
>
> [Message clipped]




-- 
*Lasindu Charith*
Software Engineer, WSO2 Inc.
Mobile: +94714427192
Web: blog.lasindu.com

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Is there any conclusion how to this fix this ?

Thanks


Martin

From: Lahiru Sandaruwan [mailto:lahirus@wso2.com]
Sent: Wednesday, June 10, 2015 6:55 PM
To: dev
Cc: Reka Thirunavukkarasu
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Imesh,

Following could be the possible reason for not un-deploying when member was auto healed,


  *   The particular cluster, that the member is auto healed, is terminated before others(when others are terminating state)
or

  *   The particular cluster, that the member is auto healed, is still terminating when others are terminated state
One of those two cases could happen, even if the member was not auto healed(In case of groups, where one group is very complex, and others are simple). Because, currently we check whether all the cluster and groups in terminating status in the case of the parent group is terminating, which is wrong.

Thanks.

On Thu, Jun 11, 2015 at 5:49 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Do we know why this only happens if a member was forcefully terminated and auto-healed?

On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi  all,

Cause for above issue seems to be as follows.
GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor processes the event only if all the group instances and cluster instances are in terminated state or in terminating state consequently[1][2]. But there can be situations(such as above), where some group instances are at terminated state and some at terminating state by the time GroupStatusProcessorChain is executed. For similar scenarios, both GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor executions are skipped and at GroupStatusInactiveProcessor it prints" No possible state change found" warning.

I think we need to find a way to properly fix this.

[1] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
[2] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89

On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi Martin,

I was able to reproduce this issue in the latest build with PCA in Openstack. Even after stratos is restarted, the Application is not undeployed, which makes it impossible to undeploy the application (even the forceful undeployment failed for the above obsolete application).

Currently I'm looking at possible causes for this and will update with the progress.

Thanks,

On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Here is another example where the removal fails:

For application see [1.], log file (with debug enabled) and jsons are attached.

Scenario:


•        Deploy application and wait for all cartridges to become active

•        Kill a VM (2nd in startup sequence)

•        Wait for it to restart and become active

•        Un-deploy application

a.      Un-deploy forcefully will succeed
([2015-06-08 20:38:21,487]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Forcefully un-deploying the application s-g-c1-c2-c3-s)
und

b.      Un-deploy gracefully will fail to remove app completely (although VMs are terminated successfully)
([2015-06-08 20:54:16,372]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Starting to undeploy application: [application-id])

•        Both scenarios are recorded in the same log file wso2carbon-s-g-c1-c2-c3-s.log

•        Btw, I retested the scenario and the issue is easily  reproducible following the steps listed above:
graceful application un-deploy succeeds if no VM had been restarted (terminated and restarted by autoscaler).
Once a VM is terminated , graceful application un-deploy will fail
I attached a log file which demonstrates this case (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same application is deployed, becomes active and is then removed (repetead 2 times), then, a VM is terminated and restarted by autoscaler. Afterwards, graceful application un-deploy fails.


Other Observations:

When the application successfully some events e.g. “cluster removed event”, “Application deleted event received:” are being published (see [2.] while when the application fails to be removed no such event is being observed.

[2.] cluster removed event when application is un-deployed forcefully
TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver} -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing application clusters removed event: [application-id] s-g-c1-c2-c3-s


I analyzed the differences in the successful application removal and unsuccessful log sequence and noticed a difference (see also highlighted areas):

Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)

TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -  Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [ s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry
TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Applications updated: {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}
TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher} -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group] s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  GroupProcessor chain calculating the status for the group [ s-g-c1-c2-c3-s ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  StatusChecker calculating the active status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending application instance terminated for [application] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1

Unsuccessful:

TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatingProcessor} -  StatusChecker calculating the terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  StatusChecker calculating the inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  No possible state change found for [component] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] application [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)




[1.] Application Structure
[cid:image001.png@01D0A42D.4CB9E840]






From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 4:38 PM

To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

This is another application, see [1.] which fails to get completely removed:

Scenario / Observation:

•        After all instances / application go active, one instance is being terminated (to verify termination behavior). Once the terminated instance is restored the application is undeployed.

•        After the Application undeployment process is started, all instances are being terminated

•        Application still shows up in stratos admin, subsequent deployments fail

stratos> list-applications
Applications found:
+---------------------+---------------------+----------+
| Application ID      | Alias               | Status   |
+---------------------+---------------------+----------+
| s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
+---------------------+---------------------+----------+


[1.] Application:

[cid:image002.png@01D0A42D.4CB9E840]




From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 3:26 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

After re-running it this my observations:


•        After the “Application undeployment process started” is started, there is a likelihood that (a few) VMs are still launched – I suspect this is due to some race condition between “Application undeployment process started” and the “autoscaler”.

•        All Vms which were launched before the “Application undeployment process started” get terminated as part of the undeployment process.

•        Vms which were launched after “Application undeployment process started” eventually get moved to obsolete / pending state and cleaned up, this can take up to 15- 20 minutes.

•        The application never gets completely removed,

•        The following exception is consistently observed:

ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message

org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)

                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)

•        Initiating the “Application undeployment process” again will cause the following INFO statement (without any further actions, see in log)
TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application monitor is already in terminating, graceful un-deployment is has already been attempted thus not invoking again

•        Other exceptions observed after the “Application undeployment process started”
TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException: CloudControllerServiceInvalidMemberExceptionException
        at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
        at org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
        at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
        at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)


•        Created a jira to track this issue: https://issues.apache.org/jira/browse/STRATOS-1430







Regards



Martin



Attached the log file of the last test







From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 12:59 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

For this latest test I got the latest source from stratos repo so I have this commit (see below), but the un-deployment still fails (to some extent).
As mentioned below, it seems that all the members get terminated eventually, including the ones which got started after the “application un-deployment” process started.
What is still left in stratos (even after all members got terminated) is the application (see the stratos> list-applications command result below in email thread). This would still be an issue when re-deploying the application !
I will do a few reruns to verify the removal of the VMs (members) is consistent.
Thanks

Martin

git show 2fe84b91843b20e91e8cafd06011f42d218f231c
commit 2fe84b91843b20e91e8cafd06011f42d218f231c
Author: anuruddhal <an...@gmail.com>>
Date:   Wed Jun 3 14:41:12 2015 +0530

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I also encountered a similar issue with the application un-deployment with PCA but I guess you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with the below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the termination request was made for an already terminated member. There is a possibility that Autoscaler make a second terminate request if the first request take some time to execute and at the time the second request hit Cloud Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its just this exceptions that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2
                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new members are being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<ma...@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image003.png@01D0A42D.4CB9E840]







--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>


Thanks,
--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
--
Lahiru Sandaruwan
Committer and PMC member, Apache Stratos,
Senior Software Engineer,
WSO2 Inc., http://wso2.com
lean.enterprise.middleware

phone: +94773325954<tel:%2B94773325954>
email: lahirus@wso2.com<ma...@wso2.com> blog: http://lahiruwrites.blogspot.com/
linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146


Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Lahiru Sandaruwan <la...@wso2.com>.
Hi Imesh,

Following could be the possible reason for not un-deploying when member was
auto healed,


   - The particular cluster, that the member is auto healed, is terminated
   before others(when others are terminating state)

or

   - The particular cluster, that the member is auto healed, is still
   terminating when others are terminated state

One of those two cases could happen, even if the member was not auto
healed(In case of groups, where one group is very complex, and others are
simple). Because, currently we check whether all the cluster and groups in
*terminating* status in the case of the parent group is *terminating,* which
is wrong.

Thanks.

On Thu, Jun 11, 2015 at 5:49 AM, Imesh Gunaratne <im...@apache.org> wrote:

> Do we know why this only happens if a member was forcefully terminated and
> auto-healed?
>
> On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <la...@wso2.com>
> wrote:
>
>> Hi  all,
>>
>> Cause for above issue seems to be as follows.
>> GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
>> processes the event only if all the group instances and cluster instances
>> are in terminated state or in terminating state consequently[1][2]. But
>> there can be situations(such as above), where some group instances are at
>> terminated state and some at terminating state by the
>> time GroupStatusProcessorChain is executed. For similar scenarios, both
>> GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
>> executions are skipped and at GroupStatusInactiveProcessor it prints" No
>> possible state change found" warning.
>>
>> I think we need to find a way to properly fix this.
>>
>> [1]
>> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
>> [2]
>> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89
>>
>> On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <la...@wso2.com> wrote:
>>
>>> Hi Martin,
>>>
>>> I was able to reproduce this issue in the latest build with PCA in
>>> Openstack. Even after stratos is restarted, the Application is not
>>> undeployed, which makes it impossible to undeploy the application (even the
>>> forceful undeployment failed for the above obsolete application).
>>>
>>> Currently I'm looking at possible causes for this and will update with
>>> the progress.
>>>
>>> Thanks,
>>>
>>> On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>
>>> wrote:
>>>
>>>>  Here is another example where the removal fails:
>>>>
>>>>
>>>>
>>>> For application see [1.], log file (with debug enabled) and jsons are
>>>> attached.
>>>>
>>>>
>>>>
>>>> Scenario:
>>>>
>>>>
>>>>
>>>> ·        Deploy application and wait for all cartridges to become
>>>> active
>>>>
>>>> ·        Kill a VM (2nd in startup sequence)
>>>>
>>>> ·        Wait for it to restart and become active
>>>>
>>>> ·        Un-deploy application
>>>>
>>>> a.      Un-deploy forcefully will succeed
>>>> ([2015-06-08 20:38:21,487]  INFO
>>>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>>>> Forcefully un-deploying the application s-g-c1-c2-c3-s)
>>>> und
>>>>
>>>> b.      Un-deploy gracefully will fail to remove app completely
>>>> (although VMs are terminated successfully)
>>>> ([2015-06-08 20:54:16,372]  INFO
>>>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>>>> Starting to undeploy application: [application-id])
>>>>
>>>> ·        Both scenarios are recorded in the same log file
>>>> wso2carbon-s-g-c1-c2-c3-s.log
>>>>
>>>> ·        Btw, I retested the scenario and the issue is easily
>>>>  reproducible following the steps listed above:
>>>> graceful application un-deploy succeeds if no VM had been restarted
>>>> (terminated and restarted by autoscaler).
>>>> Once a VM is terminated , graceful application un-deploy will fail
>>>> I attached a log file which demonstrates this case
>>>> (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same
>>>> application is deployed, becomes active and is then removed (repetead 2
>>>> times), then, a VM is terminated and restarted by autoscaler. Afterwards,
>>>> graceful application un-deploy fails.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Other Observations:
>>>>
>>>>
>>>>
>>>> When the application successfully some events e.g. “cluster removed
>>>> event”, “Application deleted event received:” are being published (see [2.]
>>>> while when the application fails to be removed no such event is being
>>>> observed.
>>>>
>>>>
>>>>
>>>> [2.] cluster removed event when application is un-deployed forcefully
>>>>
>>>> TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO
>>>> {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver}
>>>> -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
>>>>
>>>> TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO
>>>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>>>> -  Publishing application clusters removed event: [application-id]
>>>> s-g-c1-c2-c3-s
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> I analyzed the differences in the successful application removal and
>>>> unsuccessful log sequence and noticed a difference (see also highlighted
>>>> areas):
>>>>
>>>>
>>>>
>>>> Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)
>>>>
>>>>
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>>>> {org.apache.stratos.autoscaler.status.processor.group.*
>>>> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
>>>> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>>>> s-g-c1-c2-c3-s-1 ]*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>>> lock acquired*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO
>>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>>>> -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x
>>>> [instance] s-g-c1-c2-c3-s-1*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -
>>>> Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x
>>>> [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>>>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>>>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>>>> [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>>>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>>>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>>>> [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>>>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>>>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>>>> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG
>>>> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
>>>> Write lock released*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG
>>>> {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [
>>>> s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -
>>>> Applications updated:
>>>> {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO
>>>> {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher}
>>>> -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s
>>>> [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO
>>>> {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group]
>>>> s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance]
>>>> s-g-c1-c2-c3-s-1*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG
>>>> {org.apache.stratos.autoscaler.status.processor.group.*
>>>> *GroupStatusActiveProcessor}** -  GroupProcessor chain calculating the
>>>> status for the group [ s-g-c1-c2-c3-s ]*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor}
>>>> -  StatusChecker calculating the active status for the group [
>>>> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>>> lock acquired*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>>> lock released*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>>>> -  StatusChecker calculating the terminated status for the group [
>>>> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>>> lock acquired*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO
>>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>>>> -  Sending application instance terminated for [application] s-g-c1-c2-c3-s
>>>> [instance] s-g-c1-c2-c3-s-1*
>>>>
>>>>
>>>>
>>>> Unsuccessful:
>>>>
>>>>
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
>>>> {org.apache.stratos.autoscaler.status.processor.group.*
>>>> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
>>>> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>>>> s-g-c1-c2-c3-s-1 ]*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>>> lock acquired*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG
>>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
>>>> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
>>>> status is: Terminating*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>>> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
>>>> Write lock released*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
>>>> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
>>>> status is: Terminating*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>>> lock released*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>>> {org.apache.stratos.autoscaler.status.processor.group.*
>>>> *GroupStatusTerminatingProcessor**} -  StatusChecker calculating the
>>>> terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance
>>>> [ s-g-c1-c2-c3-s-1 ]*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>>> lock acquired*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>>> lock released*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>>> {org.apache.stratos.autoscaler.status.processor.group.*
>>>> *GroupStatusInactiveProcessor**} -  StatusChecker calculating the
>>>> inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>>>> s-g-c1-c2-c3-s-1 ]*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>>> lock acquired*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>>> lock released*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN
>>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
>>>> -  **No possible state change found for* *[component]
>>>> s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>>>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>>>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>>>> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR
>>>> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
>>>> error, lock has not released for 30 seconds: [lock-name] application
>>>> [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2
>>>> [stack-trace] *
>>>>
>>>> *java.lang.Thread.getStackTrace(Thread.java:1589)*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [1.] Application Structure
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Martin Eppel (meppel)
>>>> *Sent:* Friday, June 05, 2015 4:38 PM
>>>>
>>>> *To:* dev@stratos.apache.org
>>>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>>>> application fails to undeploy (nested grouping, group scaling)
>>>>
>>>>
>>>>
>>>> This is another application, see [1.] which fails to get completely
>>>> removed:
>>>>
>>>>
>>>>
>>>> Scenario / Observation:
>>>>
>>>> ·        After all instances / application go active, one instance is
>>>> being terminated (to verify termination behavior). Once the terminated
>>>> instance is restored the application is undeployed.
>>>>
>>>> ·        After the Application undeployment process is started, all
>>>> instances are being terminated
>>>>
>>>> ·        Application still shows up in stratos admin, subsequent
>>>> deployments fail
>>>>
>>>>
>>>>
>>>> stratos> list-applications
>>>>
>>>> Applications found:
>>>>
>>>> +---------------------+---------------------+----------+
>>>>
>>>> | Application ID      | Alias               | Status   |
>>>>
>>>> +---------------------+---------------------+----------+
>>>>
>>>> | s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
>>>>
>>>> +---------------------+---------------------+----------+
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [1.] Application:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Martin Eppel (meppel)
>>>> *Sent:* Friday, June 05, 2015 3:26 PM
>>>> *To:* dev@stratos.apache.org
>>>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>>>> application fails to undeploy (nested grouping, group scaling)
>>>>
>>>>
>>>>
>>>> After re-running it this my observations:
>>>>
>>>>
>>>>
>>>> ·        After the “Application undeployment process started” is
>>>> started, there is a likelihood that (a few) VMs are still launched – I
>>>> suspect this is due to some race condition between “Application
>>>> undeployment process started” and the “autoscaler”.
>>>>
>>>> ·        All Vms which were launched before the “Application
>>>> undeployment process started” get terminated as part of the undeployment
>>>> process.
>>>>
>>>> ·        Vms which were launched after “Application undeployment
>>>> process started” eventually get moved to obsolete / pending state and
>>>> cleaned up, this can take up to 15- 20 minutes.
>>>>
>>>> ·        The application never gets completely removed,
>>>>
>>>> ·        The following exception is consistently observed:
>>>>
>>>> ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN
>>>> {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System
>>>> warning! Trying to release a lock which has not been taken by the same
>>>> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>>>> pool-24-thread-2
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR
>>>> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
>>>> -  Failed to retrieve topology event message*
>>>>
>>>> *org.apache.stratos.common.exception.InvalidLockRequestedException:
>>>> System error, cannot acquire a write lock while having a read lock on the
>>>> same thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>>>> pool-24-thread-2*
>>>>
>>>> *                    at
>>>> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>>>>
>>>> *                    at
>>>> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>>>>
>>>> ·        Initiating the “Application undeployment process” again will
>>>> cause the following INFO statement (without any further actions, see in log)
>>>> TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO
>>>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>>>> Application monitor is already in terminating, graceful un-deployment is
>>>> has already been attempted thus not invoking again
>>>>
>>>> ·        Other exceptions observed after the “Application undeployment
>>>> process started”
>>>>
>>>> TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR
>>>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>>>> instance
>>>>
>>>> org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException:
>>>> CloudControllerServiceInvalidMemberExceptionException
>>>>
>>>>         at
>>>> sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
>>>>
>>>>         at
>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>
>>>>         at
>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>>
>>>>         at java.lang.Class.newInstance(Class.java:374)
>>>>
>>>>         at
>>>> org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
>>>>
>>>>         at
>>>> org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
>>>>
>>>>         at
>>>> org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
>>>>
>>>>         at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)
>>>>
>>>>
>>>>
>>>> ·        Created a jira to track this issue:
>>>> https://issues.apache.org/jira/browse/STRATOS-1430
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>>
>>>>
>>>> Martin
>>>>
>>>>
>>>>
>>>> Attached the log file of the last test
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Martin Eppel (meppel)
>>>> *Sent:* Friday, June 05, 2015 12:59 PM
>>>> *To:* dev@stratos.apache.org
>>>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>>>> application fails to undeploy (nested grouping, group scaling)
>>>>
>>>>
>>>>
>>>> For this latest test I got the latest source from stratos repo so I
>>>> have this commit (see below), but the un-deployment still fails (to some
>>>> extent).
>>>>
>>>> As mentioned below, it seems that all the members get terminated
>>>> eventually, including the ones which got started after the “application
>>>> un-deployment” process started.
>>>>
>>>> What is still left in stratos (even after all members got terminated)
>>>> is the application (see the stratos> list-applications command result below
>>>> in email thread). This would still be an issue when re-deploying the
>>>> application !
>>>>
>>>> I will do a few reruns to verify the removal of the VMs (members) is
>>>> consistent.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> Martin
>>>>
>>>>
>>>>
>>>> git show 2fe84b91843b20e91e8cafd06011f42d218f231c
>>>>
>>>> commit 2fe84b91843b20e91e8cafd06011f42d218f231c
>>>>
>>>> Author: anuruddhal <an...@gmail.com>
>>>>
>>>> Date:   Wed Jun 3 14:41:12 2015 +0530
>>>>
>>>>
>>>>
>>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
>>>> *Sent:* Friday, June 05, 2015 12:46 PM
>>>> *To:* dev
>>>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>>>> application fails to undeploy (nested grouping, group scaling)
>>>>
>>>>
>>>>
>>>> Hi Martin,
>>>>
>>>>
>>>>
>>>> I also encountered a similar issue with the application un-deployment
>>>> with PCA but I guess you are using JCA.
>>>>
>>>>
>>>>
>>>> I can see that Anuruddha has done a fix for the issue I'm referring
>>>> with the below commit:
>>>>
>>>>
>>>> https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c
>>>>
>>>>
>>>>
>>>> Regarding the member context not found error, this could occur if the
>>>> termination request was made for an already terminated member. There is a
>>>> possibility that Autoscaler make a second terminate request if the first
>>>> request take some time to execute and at the time the second request hit
>>>> Cloud Controller the member is already terminated with the first request.
>>>>
>>>>
>>>>
>>>> Can you please confirm whether the members were properly terminated and
>>>> its just this exceptions that you are seeing?
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <
>>>> meppel@cisco.com> wrote:
>>>>
>>>> Hi Udara,
>>>>
>>>>
>>>>
>>>> Picked up your commit and rerun the test case:
>>>>
>>>>
>>>>
>>>> Attached is the log file (artifacts are the same as before).
>>>>
>>>>
>>>>
>>>> *Didn’t see the issue with* “*Member is in the wrong list” …*
>>>>
>>>>
>>>>
>>>> but see the following exception after the undeploy application message:
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR
>>>> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
>>>> -  Failed to retrieve topology event message*
>>>>
>>>> *org.apache.stratos.common.exception.InvalidLockRequestedException:
>>>> System error, cannot acquire a write lock while having a read lock on the
>>>> same thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>>>> pool-24-thread-2*
>>>>
>>>> *                    at
>>>> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>>>>
>>>> *                    at
>>>> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Also, after the “Application undeployment process started” is started,
>>>> new members are being instantiated:*
>>>>
>>>>
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO
>>>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>>>> -  Publishing member created event*:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Eventually, these VMs get terminated :*
>>>>
>>>>
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR
>>>> {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl}
>>>> -  Could not terminate instance: [member-id]
>>>> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>>>>
>>>> *org.apache.stratos.cloud.controller.exception.InvalidMemberException:
>>>> Could not terminate instance, member context not found: [member-id]
>>>> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>>>>
>>>> *                    at
>>>> org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)*
>>>>
>>>> *                    at
>>>> sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)*
>>>>
>>>> *                    at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>>>>
>>>> *                    at
>>>> java.lang.reflect.Method.invoke(Method.java:606)*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *but the application remains:*
>>>>
>>>>
>>>>
>>>> stratos> list-applications
>>>>
>>>> Applications found:
>>>>
>>>> +----------------+------------+----------+
>>>>
>>>> | Application ID | Alias      | Status   |
>>>>
>>>> +----------------+------------+----------+
>>>>
>>>> | g-sc-G12-1     | g-sc-G12-1 | Deployed |
>>>>
>>>> +----------------+------------+----------+
>>>>
>>>>
>>>>
>>>> ['g-sc-G12-1: applicationInstances 1, groupInstances 2,
>>>> clusterInstances 3, members 0 ()\n']
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Martin Eppel (meppel)
>>>> *Sent:* Friday, June 05, 2015 10:04 AM
>>>> *To:* dev@stratos.apache.org
>>>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>>>> application fails to undeploy (nested grouping, group scaling)
>>>>
>>>>
>>>>
>>>> Ok:
>>>>
>>>>
>>>>
>>>> log4j.logger.org.apache.stratos.manager=DEBUG
>>>>
>>>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>>>
>>>> log4j.logger.org.apache.stratos.messaging=INFO
>>>>
>>>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>>>
>>>> log4j.logger.org.wso2.andes.client=ERROR
>>>>
>>>> # Autoscaler rule logs
>>>>
>>>> log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG
>>>>
>>>>
>>>>
>>>> *From:* Udara Liyanage [mailto:udara@wso2.com <ud...@wso2.com>]
>>>> *Sent:* Friday, June 05, 2015 10:00 AM
>>>> *To:* dev
>>>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>>>> application fails to undeploy (nested grouping, group scaling)
>>>>
>>>>
>>>>
>>>> Hi Martin,
>>>>
>>>>
>>>>
>>>> Better if you can enable debugs logs for all AS, CC and cartridge agent
>>>>
>>>>
>>>>
>>>> On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> Please enable AS debug logs.
>>>>
>>>>
>>>>
>>>> On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>
>>>> wrote:
>>>>
>>>> Hi Udara,
>>>>
>>>>
>>>>
>>>> Yes, this issue seems to be fairly well reproducible, which debug log
>>>> do you want me to enable, cartridge agent logs ?
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> Martin
>>>>
>>>>
>>>>
>>>> *From:* Udara Liyanage [mailto:udara@wso2.com]
>>>> *Sent:* Thursday, June 04, 2015 11:11 PM
>>>> *To:* dev
>>>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>>>> application fails to undeploy (nested grouping, group scaling)
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> This might be possible if AS did not receive member activated event
>>>> published by CC. Is it possible to enable debug logs if this is
>>>> reproducible.
>>>>
>>>> Or else I can add an INFO logs and commit.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> For the first issue you have mentioned, the particular member is
>>>> activated, but it is still identified as an obsolete member and is being
>>>> marked to be terminated since pending time expired. Does that mean member
>>>> is still in Obsolete list even though it is being activated?
>>>>
>>>>
>>>>
>>>> //member started
>>>>
>>>> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
>>>> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
>>>> stat context has been added: [application] g-sc-G12-1 [cluster]
>>>> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
>>>> [partitionContext] whole-region [member-id]
>>>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>>>
>>>>
>>>>
>>>> //member activated
>>>>
>>>> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
>>>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>>>> -  Publishing member activated event: [service-name] c1 [cluster-id]
>>>> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
>>>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>>> [network-partition-id] RegionOne [partition-id] whole-region
>>>>
>>>> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
>>>> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
>>>> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
>>>> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>>>
>>>>
>>>>
>>>> //after 15 minutes ---member is still in pending state, pending timeout
>>>> expired
>>>>
>>>> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
>>>> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
>>>> -  Pending state of member expired, member will be moved to obsolete list.
>>>> [pending member]
>>>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
>>>> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>>>>
>>>>
>>>>
>>>> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> I am running into a scenario where application un-deployment fails
>>>> (using stratos with latest commit
>>>>  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>>>>
>>>>
>>>>
>>>> For application structure see [1.], (debug enabled) wso2carbon.log,
>>>> application.json, cartridge-group.json, deployment-policy, auto-scaling
>>>> policies see attached zip file.
>>>>
>>>>
>>>>
>>>> *It is noteworthy, that while the application is running the following
>>>> log statements /exceptions are observed:*
>>>>
>>>>
>>>>
>>>> *…*
>>>>
>>>> *Member is in the wrong list and it is removed from active members
>>>> list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>>>>
>>>> *…*
>>>>
>>>> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
>>>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>>>> instance*
>>>>
>>>> *…*
>>>>
>>>> *// **after receiving the application undeploy event:*
>>>>
>>>> *[2015-06-04 20:12:39,465]  INFO
>>>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>>>> Application undeployment process started: [application-id] g-sc-G12-1*
>>>>
>>>> *// **a new instance is being started up*
>>>>
>>>> *…*
>>>>
>>>> *[2015-06-04 20:13:13,445]  INFO
>>>> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
>>>> Instance started successfully: [cartridge-type] c2 [cluster-id]
>>>> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
>>>> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>>>>
>>>>
>>>>
>>>> *// Also noteworthy seems the following warning which is seen
>>>> repeatedly in the logs:*
>>>>
>>>> *ReadWriteLock} -  System warning! Trying to release a lock which has
>>>> not been taken by the same thread: [lock-name]*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [1.] Application structure
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Udara Liyanage
>>>>
>>>> Software Engineer
>>>>
>>>> WSO2, Inc.: http://wso2.com
>>>>
>>>> lean. enterprise. middleware
>>>>
>>>> web: http://udaraliyanage.wordpress.com
>>>>
>>>> phone: +94 71 443 6897
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Udara Liyanage
>>>>
>>>> Software Engineer
>>>>
>>>> WSO2, Inc.: http://wso2.com
>>>>
>>>> lean. enterprise. middleware
>>>>
>>>> web: http://udaraliyanage.wordpress.com
>>>>
>>>> phone: +94 71 443 6897
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Udara Liyanage
>>>>
>>>> Software Engineer
>>>>
>>>> WSO2, Inc.: http://wso2.com
>>>>
>>>> lean. enterprise. middleware
>>>>
>>>> web: http://udaraliyanage.wordpress.com
>>>>
>>>> phone: +94 71 443 6897
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Udara Liyanage
>>>>
>>>> Software Engineer
>>>>
>>>> WSO2, Inc.: http://wso2.com
>>>>
>>>> lean. enterprise. middleware
>>>>
>>>> web: http://udaraliyanage.wordpress.com
>>>>
>>>> phone: +94 71 443 6897
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Imesh Gunaratne
>>>>
>>>>
>>>>
>>>> Senior Technical Lead, WSO2
>>>>
>>>> Committer & PMC Member, Apache Stratos
>>>>
>>>
>>>
>>>
>>> --
>>> *Lasindu Charith*
>>> Software Engineer, WSO2 Inc.
>>> Mobile: +94714427192
>>> Web: blog.lasindu.com
>>>
>>
>>
>> Thanks,
>> --
>> *Lasindu Charith*
>> Software Engineer, WSO2 Inc.
>> Mobile: +94714427192
>> Web: blog.lasindu.com
>>
>
>
>
> --
> Imesh Gunaratne
>
> Senior Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
--
Lahiru Sandaruwan
Committer and PMC member, Apache Stratos,
Senior Software Engineer,
WSO2 Inc., http://wso2.com
lean.enterprise.middleware

phone: +94773325954
email: lahirus@wso2.com blog: http://lahiruwrites.blogspot.com/
linked-in: http://lk.linkedin.com/pub/lahiru-sandaruwan/16/153/146

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Imesh Gunaratne <im...@apache.org>.
Do we know why this only happens if a member was forcefully terminated and
auto-healed?

On Wed, Jun 10, 2015 at 10:01 PM, Lasindu Charith <la...@wso2.com> wrote:

> Hi  all,
>
> Cause for above issue seems to be as follows.
> GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
> processes the event only if all the group instances and cluster instances
> are in terminated state or in terminating state consequently[1][2]. But
> there can be situations(such as above), where some group instances are at
> terminated state and some at terminating state by the
> time GroupStatusProcessorChain is executed. For similar scenarios, both
> GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor
> executions are skipped and at GroupStatusInactiveProcessor it prints" No
> possible state change found" warning.
>
> I think we need to find a way to properly fix this.
>
> [1]
> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
> [2]
> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89
>
> On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <la...@wso2.com> wrote:
>
>> Hi Martin,
>>
>> I was able to reproduce this issue in the latest build with PCA in
>> Openstack. Even after stratos is restarted, the Application is not
>> undeployed, which makes it impossible to undeploy the application (even the
>> forceful undeployment failed for the above obsolete application).
>>
>> Currently I'm looking at possible causes for this and will update with
>> the progress.
>>
>> Thanks,
>>
>> On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>>>  Here is another example where the removal fails:
>>>
>>>
>>>
>>> For application see [1.], log file (with debug enabled) and jsons are
>>> attached.
>>>
>>>
>>>
>>> Scenario:
>>>
>>>
>>>
>>> ·        Deploy application and wait for all cartridges to become active
>>>
>>> ·        Kill a VM (2nd in startup sequence)
>>>
>>> ·        Wait for it to restart and become active
>>>
>>> ·        Un-deploy application
>>>
>>> a.      Un-deploy forcefully will succeed
>>> ([2015-06-08 20:38:21,487]  INFO
>>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>>> Forcefully un-deploying the application s-g-c1-c2-c3-s)
>>> und
>>>
>>> b.      Un-deploy gracefully will fail to remove app completely
>>> (although VMs are terminated successfully)
>>> ([2015-06-08 20:54:16,372]  INFO
>>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>>> Starting to undeploy application: [application-id])
>>>
>>> ·        Both scenarios are recorded in the same log file
>>> wso2carbon-s-g-c1-c2-c3-s.log
>>>
>>> ·        Btw, I retested the scenario and the issue is easily
>>>  reproducible following the steps listed above:
>>> graceful application un-deploy succeeds if no VM had been restarted
>>> (terminated and restarted by autoscaler).
>>> Once a VM is terminated , graceful application un-deploy will fail
>>> I attached a log file which demonstrates this case
>>> (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same
>>> application is deployed, becomes active and is then removed (repetead 2
>>> times), then, a VM is terminated and restarted by autoscaler. Afterwards,
>>> graceful application un-deploy fails.
>>>
>>>
>>>
>>>
>>>
>>> Other Observations:
>>>
>>>
>>>
>>> When the application successfully some events e.g. “cluster removed
>>> event”, “Application deleted event received:” are being published (see [2.]
>>> while when the application fails to be removed no such event is being
>>> observed.
>>>
>>>
>>>
>>> [2.] cluster removed event when application is un-deployed forcefully
>>>
>>> TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO
>>> {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver}
>>> -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
>>>
>>> TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO
>>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>>> -  Publishing application clusters removed event: [application-id]
>>> s-g-c1-c2-c3-s
>>>
>>>
>>>
>>>
>>>
>>> I analyzed the differences in the successful application removal and
>>> unsuccessful log sequence and noticed a difference (see also highlighted
>>> areas):
>>>
>>>
>>>
>>> Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)
>>>
>>>
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>>> {org.apache.stratos.autoscaler.status.processor.group.*
>>> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
>>> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>>> s-g-c1-c2-c3-s-1 ]*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>> lock acquired*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO
>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>>> -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x
>>> [instance] s-g-c1-c2-c3-s-1*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>>> {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -
>>> Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x
>>> [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>>> [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>>> [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>>> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG
>>> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
>>> Write lock released*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG
>>> {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [
>>> s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -
>>> Applications updated:
>>> {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO
>>> {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher}
>>> -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s
>>> [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group]
>>> s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance]
>>> s-g-c1-c2-c3-s-1*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG
>>> {org.apache.stratos.autoscaler.status.processor.group.*
>>> *GroupStatusActiveProcessor}** -  GroupProcessor chain calculating the
>>> status for the group [ s-g-c1-c2-c3-s ]*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor}
>>> -  StatusChecker calculating the active status for the group [
>>> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>> lock acquired*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>> lock released*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>>> -  StatusChecker calculating the terminated status for the group [
>>> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>> lock acquired*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO
>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>>> -  Sending application instance terminated for [application] s-g-c1-c2-c3-s
>>> [instance] s-g-c1-c2-c3-s-1*
>>>
>>>
>>>
>>> Unsuccessful:
>>>
>>>
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
>>> {org.apache.stratos.autoscaler.status.processor.group.*
>>> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
>>> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>>> s-g-c1-c2-c3-s-1 ]*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>> lock acquired*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG
>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
>>> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
>>> status is: Terminating*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
>>> Write lock released*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
>>> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
>>> status is: Terminating*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>> lock released*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>> {org.apache.stratos.autoscaler.status.processor.group.*
>>> *GroupStatusTerminatingProcessor**} -  StatusChecker calculating the
>>> terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance
>>> [ s-g-c1-c2-c3-s-1 ]*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>> lock acquired*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>> lock released*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>>> {org.apache.stratos.autoscaler.status.processor.group.*
>>> *GroupStatusInactiveProcessor**} -  StatusChecker calculating the
>>> inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>>> s-g-c1-c2-c3-s-1 ]*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>> lock acquired*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>>> lock released*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN
>>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
>>> -  **No possible state change found for* *[component]
>>> s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>>> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>>>
>>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR
>>> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
>>> error, lock has not released for 30 seconds: [lock-name] application
>>> [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2
>>> [stack-trace] *
>>>
>>> *java.lang.Thread.getStackTrace(Thread.java:1589)*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> [1.] Application Structure
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Martin Eppel (meppel)
>>> *Sent:* Friday, June 05, 2015 4:38 PM
>>>
>>> *To:* dev@stratos.apache.org
>>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>>> application fails to undeploy (nested grouping, group scaling)
>>>
>>>
>>>
>>> This is another application, see [1.] which fails to get completely
>>> removed:
>>>
>>>
>>>
>>> Scenario / Observation:
>>>
>>> ·        After all instances / application go active, one instance is
>>> being terminated (to verify termination behavior). Once the terminated
>>> instance is restored the application is undeployed.
>>>
>>> ·        After the Application undeployment process is started, all
>>> instances are being terminated
>>>
>>> ·        Application still shows up in stratos admin, subsequent
>>> deployments fail
>>>
>>>
>>>
>>> stratos> list-applications
>>>
>>> Applications found:
>>>
>>> +---------------------+---------------------+----------+
>>>
>>> | Application ID      | Alias               | Status   |
>>>
>>> +---------------------+---------------------+----------+
>>>
>>> | s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
>>>
>>> +---------------------+---------------------+----------+
>>>
>>>
>>>
>>>
>>>
>>> [1.] Application:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Martin Eppel (meppel)
>>> *Sent:* Friday, June 05, 2015 3:26 PM
>>> *To:* dev@stratos.apache.org
>>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>>> application fails to undeploy (nested grouping, group scaling)
>>>
>>>
>>>
>>> After re-running it this my observations:
>>>
>>>
>>>
>>> ·        After the “Application undeployment process started” is
>>> started, there is a likelihood that (a few) VMs are still launched – I
>>> suspect this is due to some race condition between “Application
>>> undeployment process started” and the “autoscaler”.
>>>
>>> ·        All Vms which were launched before the “Application
>>> undeployment process started” get terminated as part of the undeployment
>>> process.
>>>
>>> ·        Vms which were launched after “Application undeployment
>>> process started” eventually get moved to obsolete / pending state and
>>> cleaned up, this can take up to 15- 20 minutes.
>>>
>>> ·        The application never gets completely removed,
>>>
>>> ·        The following exception is consistently observed:
>>>
>>> ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN
>>> {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System
>>> warning! Trying to release a lock which has not been taken by the same
>>> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>>> pool-24-thread-2
>>>
>>> *TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR
>>> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
>>> -  Failed to retrieve topology event message*
>>>
>>> *org.apache.stratos.common.exception.InvalidLockRequestedException:
>>> System error, cannot acquire a write lock while having a read lock on the
>>> same thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>>> pool-24-thread-2*
>>>
>>> *                    at
>>> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>>>
>>> *                    at
>>> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>>>
>>> ·        Initiating the “Application undeployment process” again will
>>> cause the following INFO statement (without any further actions, see in log)
>>> TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO
>>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>>> Application monitor is already in terminating, graceful un-deployment is
>>> has already been attempted thus not invoking again
>>>
>>> ·        Other exceptions observed after the “Application undeployment
>>> process started”
>>>
>>> TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR
>>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>>> instance
>>>
>>> org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException:
>>> CloudControllerServiceInvalidMemberExceptionException
>>>
>>>         at
>>> sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
>>>
>>>         at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>
>>>         at
>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>
>>>         at java.lang.Class.newInstance(Class.java:374)
>>>
>>>         at
>>> org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
>>>
>>>         at
>>> org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
>>>
>>>         at
>>> org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
>>>
>>>         at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)
>>>
>>>
>>>
>>> ·        Created a jira to track this issue:
>>> https://issues.apache.org/jira/browse/STRATOS-1430
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Regards
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> Attached the log file of the last test
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Martin Eppel (meppel)
>>> *Sent:* Friday, June 05, 2015 12:59 PM
>>> *To:* dev@stratos.apache.org
>>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>>> application fails to undeploy (nested grouping, group scaling)
>>>
>>>
>>>
>>> For this latest test I got the latest source from stratos repo so I have
>>> this commit (see below), but the un-deployment still fails (to some extent).
>>>
>>> As mentioned below, it seems that all the members get terminated
>>> eventually, including the ones which got started after the “application
>>> un-deployment” process started.
>>>
>>> What is still left in stratos (even after all members got terminated) is
>>> the application (see the stratos> list-applications command result below in
>>> email thread). This would still be an issue when re-deploying the
>>> application !
>>>
>>> I will do a few reruns to verify the removal of the VMs (members) is
>>> consistent.
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> git show 2fe84b91843b20e91e8cafd06011f42d218f231c
>>>
>>> commit 2fe84b91843b20e91e8cafd06011f42d218f231c
>>>
>>> Author: anuruddhal <an...@gmail.com>
>>>
>>> Date:   Wed Jun 3 14:41:12 2015 +0530
>>>
>>>
>>>
>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
>>> *Sent:* Friday, June 05, 2015 12:46 PM
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>>> application fails to undeploy (nested grouping, group scaling)
>>>
>>>
>>>
>>> Hi Martin,
>>>
>>>
>>>
>>> I also encountered a similar issue with the application un-deployment
>>> with PCA but I guess you are using JCA.
>>>
>>>
>>>
>>> I can see that Anuruddha has done a fix for the issue I'm referring with
>>> the below commit:
>>>
>>>
>>> https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c
>>>
>>>
>>>
>>> Regarding the member context not found error, this could occur if the
>>> termination request was made for an already terminated member. There is a
>>> possibility that Autoscaler make a second terminate request if the first
>>> request take some time to execute and at the time the second request hit
>>> Cloud Controller the member is already terminated with the first request.
>>>
>>>
>>>
>>> Can you please confirm whether the members were properly terminated and
>>> its just this exceptions that you are seeing?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>
>>> wrote:
>>>
>>> Hi Udara,
>>>
>>>
>>>
>>> Picked up your commit and rerun the test case:
>>>
>>>
>>>
>>> Attached is the log file (artifacts are the same as before).
>>>
>>>
>>>
>>> *Didn’t see the issue with* “*Member is in the wrong list” …*
>>>
>>>
>>>
>>> but see the following exception after the undeploy application message:
>>>
>>> *TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR
>>> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
>>> -  Failed to retrieve topology event message*
>>>
>>> *org.apache.stratos.common.exception.InvalidLockRequestedException:
>>> System error, cannot acquire a write lock while having a read lock on the
>>> same thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>>> pool-24-thread-2*
>>>
>>> *                    at
>>> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>>>
>>> *                    at
>>> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>>>
>>>
>>>
>>>
>>>
>>> *Also, after the “Application undeployment process started” is started,
>>> new members are being instantiated:*
>>>
>>>
>>>
>>> *TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO
>>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>>> -  Publishing member created event*:
>>>
>>>
>>>
>>>
>>>
>>> *Eventually, these VMs get terminated :*
>>>
>>>
>>>
>>> *TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR
>>> {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl}
>>> -  Could not terminate instance: [member-id]
>>> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>>>
>>> *org.apache.stratos.cloud.controller.exception.InvalidMemberException:
>>> Could not terminate instance, member context not found: [member-id]
>>> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>>>
>>> *                    at
>>> org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)*
>>>
>>> *                    at
>>> sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)*
>>>
>>> *                    at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>>>
>>> *                    at java.lang.reflect.Method.invoke(Method.java:606)*
>>>
>>>
>>>
>>>
>>>
>>> *but the application remains:*
>>>
>>>
>>>
>>> stratos> list-applications
>>>
>>> Applications found:
>>>
>>> +----------------+------------+----------+
>>>
>>> | Application ID | Alias      | Status   |
>>>
>>> +----------------+------------+----------+
>>>
>>> | g-sc-G12-1     | g-sc-G12-1 | Deployed |
>>>
>>> +----------------+------------+----------+
>>>
>>>
>>>
>>> ['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances
>>> 3, members 0 ()\n']
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Martin Eppel (meppel)
>>> *Sent:* Friday, June 05, 2015 10:04 AM
>>> *To:* dev@stratos.apache.org
>>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>>> application fails to undeploy (nested grouping, group scaling)
>>>
>>>
>>>
>>> Ok:
>>>
>>>
>>>
>>> log4j.logger.org.apache.stratos.manager=DEBUG
>>>
>>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>>
>>> log4j.logger.org.apache.stratos.messaging=INFO
>>>
>>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>>
>>> log4j.logger.org.wso2.andes.client=ERROR
>>>
>>> # Autoscaler rule logs
>>>
>>> log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG
>>>
>>>
>>>
>>> *From:* Udara Liyanage [mailto:udara@wso2.com <ud...@wso2.com>]
>>> *Sent:* Friday, June 05, 2015 10:00 AM
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>>> application fails to undeploy (nested grouping, group scaling)
>>>
>>>
>>>
>>> Hi Martin,
>>>
>>>
>>>
>>> Better if you can enable debugs logs for all AS, CC and cartridge agent
>>>
>>>
>>>
>>> On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> Please enable AS debug logs.
>>>
>>>
>>>
>>> On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>
>>> wrote:
>>>
>>> Hi Udara,
>>>
>>>
>>>
>>> Yes, this issue seems to be fairly well reproducible, which debug log do
>>> you want me to enable, cartridge agent logs ?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> *From:* Udara Liyanage [mailto:udara@wso2.com]
>>> *Sent:* Thursday, June 04, 2015 11:11 PM
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>>> application fails to undeploy (nested grouping, group scaling)
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> This might be possible if AS did not receive member activated event
>>> published by CC. Is it possible to enable debug logs if this is
>>> reproducible.
>>>
>>> Or else I can add an INFO logs and commit.
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>>
>>>
>>> For the first issue you have mentioned, the particular member is
>>> activated, but it is still identified as an obsolete member and is being
>>> marked to be terminated since pending time expired. Does that mean member
>>> is still in Obsolete list even though it is being activated?
>>>
>>>
>>>
>>> //member started
>>>
>>> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
>>> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
>>> stat context has been added: [application] g-sc-G12-1 [cluster]
>>> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
>>> [partitionContext] whole-region [member-id]
>>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>>
>>>
>>>
>>> //member activated
>>>
>>> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
>>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>>> -  Publishing member activated event: [service-name] c1 [cluster-id]
>>> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
>>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>> [network-partition-id] RegionOne [partition-id] whole-region
>>>
>>> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
>>> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
>>> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
>>> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>>
>>>
>>>
>>> //after 15 minutes ---member is still in pending state, pending timeout
>>> expired
>>>
>>> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
>>> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
>>> -  Pending state of member expired, member will be moved to obsolete list.
>>> [pending member]
>>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
>>> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>>>
>>>
>>>
>>> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> I am running into a scenario where application un-deployment fails
>>> (using stratos with latest commit
>>>  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>>>
>>>
>>>
>>> For application structure see [1.], (debug enabled) wso2carbon.log,
>>> application.json, cartridge-group.json, deployment-policy, auto-scaling
>>> policies see attached zip file.
>>>
>>>
>>>
>>> *It is noteworthy, that while the application is running the following
>>> log statements /exceptions are observed:*
>>>
>>>
>>>
>>> *…*
>>>
>>> *Member is in the wrong list and it is removed from active members list:
>>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>>>
>>> *…*
>>>
>>> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
>>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>>> instance*
>>>
>>> *…*
>>>
>>> *// **after receiving the application undeploy event:*
>>>
>>> *[2015-06-04 20:12:39,465]  INFO
>>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>>> Application undeployment process started: [application-id] g-sc-G12-1*
>>>
>>> *// **a new instance is being started up*
>>>
>>> *…*
>>>
>>> *[2015-06-04 20:13:13,445]  INFO
>>> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
>>> Instance started successfully: [cartridge-type] c2 [cluster-id]
>>> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
>>> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>>>
>>>
>>>
>>> *// Also noteworthy seems the following warning which is seen repeatedly
>>> in the logs:*
>>>
>>> *ReadWriteLock} -  System warning! Trying to release a lock which has
>>> not been taken by the same thread: [lock-name]*
>>>
>>>
>>>
>>>
>>>
>>> [1.] Application structure
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Udara Liyanage
>>>
>>> Software Engineer
>>>
>>> WSO2, Inc.: http://wso2.com
>>>
>>> lean. enterprise. middleware
>>>
>>> web: http://udaraliyanage.wordpress.com
>>>
>>> phone: +94 71 443 6897
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Udara Liyanage
>>>
>>> Software Engineer
>>>
>>> WSO2, Inc.: http://wso2.com
>>>
>>> lean. enterprise. middleware
>>>
>>> web: http://udaraliyanage.wordpress.com
>>>
>>> phone: +94 71 443 6897
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Udara Liyanage
>>>
>>> Software Engineer
>>>
>>> WSO2, Inc.: http://wso2.com
>>>
>>> lean. enterprise. middleware
>>>
>>> web: http://udaraliyanage.wordpress.com
>>>
>>> phone: +94 71 443 6897
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Udara Liyanage
>>>
>>> Software Engineer
>>>
>>> WSO2, Inc.: http://wso2.com
>>>
>>> lean. enterprise. middleware
>>>
>>> web: http://udaraliyanage.wordpress.com
>>>
>>> phone: +94 71 443 6897
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>
>>
>>
>> --
>> *Lasindu Charith*
>> Software Engineer, WSO2 Inc.
>> Mobile: +94714427192
>> Web: blog.lasindu.com
>>
>
>
> Thanks,
> --
> *Lasindu Charith*
> Software Engineer, WSO2 Inc.
> Mobile: +94714427192
> Web: blog.lasindu.com
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Good finding,

Thanks

Martin

From: Lasindu Charith [mailto:lasindu@wso2.com]
Sent: Wednesday, June 10, 2015 9:32 AM
To: dev; Reka Thirunavukkarasu
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi  all,

Cause for above issue seems to be as follows.
GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor processes the event only if all the group instances and cluster instances are in terminated state or in terminating state consequently[1][2]. But there can be situations(such as above), where some group instances are at terminated state and some at terminating state by the time GroupStatusProcessorChain is executed. For similar scenarios, both GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor executions are skipped and at GroupStatusInactiveProcessor it prints" No possible state change found" warning.

I think we need to find a way to properly fix this.

[1] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
[2] https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89

On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <la...@wso2.com>> wrote:
Hi Martin,

I was able to reproduce this issue in the latest build with PCA in Openstack. Even after stratos is restarted, the Application is not undeployed, which makes it impossible to undeploy the application (even the forceful undeployment failed for the above obsolete application).

Currently I'm looking at possible causes for this and will update with the progress.

Thanks,

On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Here is another example where the removal fails:

For application see [1.], log file (with debug enabled) and jsons are attached.

Scenario:


•        Deploy application and wait for all cartridges to become active

•        Kill a VM (2nd in startup sequence)

•        Wait for it to restart and become active

•        Un-deploy application

a.      Un-deploy forcefully will succeed
([2015-06-08 20:38:21,487]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Forcefully un-deploying the application s-g-c1-c2-c3-s)
und

b.      Un-deploy gracefully will fail to remove app completely (although VMs are terminated successfully)
([2015-06-08 20:54:16,372]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Starting to undeploy application: [application-id])

•        Both scenarios are recorded in the same log file wso2carbon-s-g-c1-c2-c3-s.log

•        Btw, I retested the scenario and the issue is easily  reproducible following the steps listed above:
graceful application un-deploy succeeds if no VM had been restarted (terminated and restarted by autoscaler).
Once a VM is terminated , graceful application un-deploy will fail
I attached a log file which demonstrates this case (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same application is deployed, becomes active and is then removed (repetead 2 times), then, a VM is terminated and restarted by autoscaler. Afterwards, graceful application un-deploy fails.


Other Observations:

When the application successfully some events e.g. “cluster removed event”, “Application deleted event received:” are being published (see [2.] while when the application fails to be removed no such event is being observed.

[2.] cluster removed event when application is un-deployed forcefully
TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver} -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing application clusters removed event: [application-id] s-g-c1-c2-c3-s


I analyzed the differences in the successful application removal and unsuccessful log sequence and noticed a difference (see also highlighted areas):

Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)

TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -  Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [ s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry
TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Applications updated: {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}
TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher} -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group] s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  GroupProcessor chain calculating the status for the group [ s-g-c1-c2-c3-s ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  StatusChecker calculating the active status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending application instance terminated for [application] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1

Unsuccessful:

TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatingProcessor} -  StatusChecker calculating the terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  StatusChecker calculating the inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  No possible state change found for [component] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] application [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)




[1.] Application Structure
[cid:image001.png@01D0A364.89B8A160]






From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 4:38 PM

To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

This is another application, see [1.] which fails to get completely removed:

Scenario / Observation:

•        After all instances / application go active, one instance is being terminated (to verify termination behavior). Once the terminated instance is restored the application is undeployed.

•        After the Application undeployment process is started, all instances are being terminated

•        Application still shows up in stratos admin, subsequent deployments fail

stratos> list-applications
Applications found:
+---------------------+---------------------+----------+
| Application ID      | Alias               | Status   |
+---------------------+---------------------+----------+
| s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
+---------------------+---------------------+----------+


[1.] Application:

[cid:image002.png@01D0A364.89B8A160]




From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 3:26 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

After re-running it this my observations:


•        After the “Application undeployment process started” is started, there is a likelihood that (a few) VMs are still launched – I suspect this is due to some race condition between “Application undeployment process started” and the “autoscaler”.

•        All Vms which were launched before the “Application undeployment process started” get terminated as part of the undeployment process.

•        Vms which were launched after “Application undeployment process started” eventually get moved to obsolete / pending state and cleaned up, this can take up to 15- 20 minutes.

•        The application never gets completely removed,

•        The following exception is consistently observed:

ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message

org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)

                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)

•        Initiating the “Application undeployment process” again will cause the following INFO statement (without any further actions, see in log)
TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application monitor is already in terminating, graceful un-deployment is has already been attempted thus not invoking again

•        Other exceptions observed after the “Application undeployment process started”
TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException: CloudControllerServiceInvalidMemberExceptionException
        at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
        at org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
        at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
        at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)


•        Created a jira to track this issue: https://issues.apache.org/jira/browse/STRATOS-1430







Regards



Martin



Attached the log file of the last test







From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 12:59 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

For this latest test I got the latest source from stratos repo so I have this commit (see below), but the un-deployment still fails (to some extent).
As mentioned below, it seems that all the members get terminated eventually, including the ones which got started after the “application un-deployment” process started.
What is still left in stratos (even after all members got terminated) is the application (see the stratos> list-applications command result below in email thread). This would still be an issue when re-deploying the application !
I will do a few reruns to verify the removal of the VMs (members) is consistent.
Thanks

Martin

git show 2fe84b91843b20e91e8cafd06011f42d218f231c
commit 2fe84b91843b20e91e8cafd06011f42d218f231c
Author: anuruddhal <an...@gmail.com>>
Date:   Wed Jun 3 14:41:12 2015 +0530

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I also encountered a similar issue with the application un-deployment with PCA but I guess you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with the below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the termination request was made for an already terminated member. There is a possibility that Autoscaler make a second terminate request if the first request take some time to execute and at the time the second request hit Cloud Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its just this exceptions that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2
                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new members are being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<ma...@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image003.png@01D0A364.89B8A160]







--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>


Thanks,
--
Lasindu Charith
Software Engineer, WSO2 Inc.
Mobile: +94714427192<tel:%2B94714427192>
Web: blog.lasindu.com<http://blog.lasindu.com>

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Lasindu Charith <la...@wso2.com>.
Hi  all,

Cause for above issue seems to be as follows.
GroupStatusTerminatedProcessor and GroupStatusTerminatedProcessor processes
the event only if all the group instances and cluster instances are in
terminated state or in terminating state consequently[1][2]. But there can
be situations(such as above), where some group instances are at terminated
state and some at terminating state by the time GroupStatusProcessorChain
is executed. For similar scenarios, both GroupStatusTerminatedProcessor and
GroupStatusTerminatedProcessor executions are skipped and
at GroupStatusInactiveProcessor it prints" No possible state change found"
warning.

I think we need to find a way to properly fix this.

[1]
https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatedProcessor.java#L91
[2]
https://github.com/apache/stratos/blob/master/components/org.apache.stratos.autoscaler/src/main/java/org/apache/stratos/autoscaler/status/processor/group/GroupStatusTerminatingProcessor.java#L89

On Tue, Jun 9, 2015 at 8:09 PM, Lasindu Charith <la...@wso2.com> wrote:

> Hi Martin,
>
> I was able to reproduce this issue in the latest build with PCA in
> Openstack. Even after stratos is restarted, the Application is not
> undeployed, which makes it impossible to undeploy the application (even the
> forceful undeployment failed for the above obsolete application).
>
> Currently I'm looking at possible causes for this and will update with the
> progress.
>
> Thanks,
>
> On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>>  Here is another example where the removal fails:
>>
>>
>>
>> For application see [1.], log file (with debug enabled) and jsons are
>> attached.
>>
>>
>>
>> Scenario:
>>
>>
>>
>> ·        Deploy application and wait for all cartridges to become active
>>
>> ·        Kill a VM (2nd in startup sequence)
>>
>> ·        Wait for it to restart and become active
>>
>> ·        Un-deploy application
>>
>> a.      Un-deploy forcefully will succeed
>> ([2015-06-08 20:38:21,487]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Forcefully un-deploying the application s-g-c1-c2-c3-s)
>> und
>>
>> b.      Un-deploy gracefully will fail to remove app completely
>> (although VMs are terminated successfully)
>> ([2015-06-08 20:54:16,372]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Starting to undeploy application: [application-id])
>>
>> ·        Both scenarios are recorded in the same log file
>> wso2carbon-s-g-c1-c2-c3-s.log
>>
>> ·        Btw, I retested the scenario and the issue is easily
>>  reproducible following the steps listed above:
>> graceful application un-deploy succeeds if no VM had been restarted
>> (terminated and restarted by autoscaler).
>> Once a VM is terminated , graceful application un-deploy will fail
>> I attached a log file which demonstrates this case
>> (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same
>> application is deployed, becomes active and is then removed (repetead 2
>> times), then, a VM is terminated and restarted by autoscaler. Afterwards,
>> graceful application un-deploy fails.
>>
>>
>>
>>
>>
>> Other Observations:
>>
>>
>>
>> When the application successfully some events e.g. “cluster removed
>> event”, “Application deleted event received:” are being published (see [2.]
>> while when the application fails to be removed no such event is being
>> observed.
>>
>>
>>
>> [2.] cluster removed event when application is un-deployed forcefully
>>
>> TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO
>> {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver}
>> -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
>>
>> TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO
>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>> -  Publishing application clusters removed event: [application-id]
>> s-g-c1-c2-c3-s
>>
>>
>>
>>
>>
>> I analyzed the differences in the successful application removal and
>> unsuccessful log sequence and noticed a difference (see also highlighted
>> areas):
>>
>>
>>
>> Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)
>>
>>
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
>> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>> s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>> -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x
>> [instance] s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
>> {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -
>> Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x
>> [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>> [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>> [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG
>> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
>> Write lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG
>> {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [
>> s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -
>> Applications updated:
>> {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO
>> {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher}
>> -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s
>> [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group]
>> s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance]
>> s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusActiveProcessor}** -  GroupProcessor chain calculating the
>> status for the group [ s-g-c1-c2-c3-s ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor}
>> -  StatusChecker calculating the active status for the group [
>> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>> -  StatusChecker calculating the terminated status for the group [
>> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
>> -  Sending application instance terminated for [application] s-g-c1-c2-c3-s
>> [instance] s-g-c1-c2-c3-s-1*
>>
>>
>>
>> Unsuccessful:
>>
>>
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
>> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>> s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
>> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
>> status is: Terminating*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
>> Write lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
>> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
>> status is: Terminating*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusTerminatingProcessor**} -  StatusChecker calculating the
>> terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance
>> [ s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
>> {org.apache.stratos.autoscaler.status.processor.group.*
>> *GroupStatusInactiveProcessor**} -  StatusChecker calculating the
>> inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
>> s-g-c1-c2-c3-s-1 ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock acquired*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
>> lock released*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN
>> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
>> -  **No possible state change found for* *[component] s-g-c1-c2-c3-s-x0x
>> [instance] s-g-c1-c2-c3-s-1*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
>> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
>> ClusterMonitor Drools session has been disposed. ClusterMonitor
>> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>>
>> *TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR
>> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
>> error, lock has not released for 30 seconds: [lock-name] application
>> [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2
>> [stack-trace] *
>>
>> *java.lang.Thread.getStackTrace(Thread.java:1589)*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [1.] Application Structure
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Friday, June 05, 2015 4:38 PM
>>
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> This is another application, see [1.] which fails to get completely
>> removed:
>>
>>
>>
>> Scenario / Observation:
>>
>> ·        After all instances / application go active, one instance is
>> being terminated (to verify termination behavior). Once the terminated
>> instance is restored the application is undeployed.
>>
>> ·        After the Application undeployment process is started, all
>> instances are being terminated
>>
>> ·        Application still shows up in stratos admin, subsequent
>> deployments fail
>>
>>
>>
>> stratos> list-applications
>>
>> Applications found:
>>
>> +---------------------+---------------------+----------+
>>
>> | Application ID      | Alias               | Status   |
>>
>> +---------------------+---------------------+----------+
>>
>> | s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
>>
>> +---------------------+---------------------+----------+
>>
>>
>>
>>
>>
>> [1.] Application:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Friday, June 05, 2015 3:26 PM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> After re-running it this my observations:
>>
>>
>>
>> ·        After the “Application undeployment process started” is
>> started, there is a likelihood that (a few) VMs are still launched – I
>> suspect this is due to some race condition between “Application
>> undeployment process started” and the “autoscaler”.
>>
>> ·        All Vms which were launched before the “Application
>> undeployment process started” get terminated as part of the undeployment
>> process.
>>
>> ·        Vms which were launched after “Application undeployment process
>> started” eventually get moved to obsolete / pending state and cleaned up,
>> this can take up to 15- 20 minutes.
>>
>> ·        The application never gets completely removed,
>>
>> ·        The following exception is consistently observed:
>>
>> ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN
>> {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System
>> warning! Trying to release a lock which has not been taken by the same
>> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>> pool-24-thread-2
>>
>> *TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR
>> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
>> -  Failed to retrieve topology event message*
>>
>> *org.apache.stratos.common.exception.InvalidLockRequestedException:
>> System error, cannot acquire a write lock while having a read lock on the
>> same thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>> pool-24-thread-2*
>>
>> *                    at
>> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>>
>> *                    at
>> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>>
>> ·        Initiating the “Application undeployment process” again will
>> cause the following INFO statement (without any further actions, see in log)
>> TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Application monitor is already in terminating, graceful un-deployment is
>> has already been attempted thus not invoking again
>>
>> ·        Other exceptions observed after the “Application undeployment
>> process started”
>>
>> TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR
>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>> instance
>>
>> org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException:
>> CloudControllerServiceInvalidMemberExceptionException
>>
>>         at
>> sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
>>
>>         at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>
>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>
>>         at java.lang.Class.newInstance(Class.java:374)
>>
>>         at
>> org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
>>
>>         at
>> org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
>>
>>         at
>> org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
>>
>>         at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)
>>
>>
>>
>> ·        Created a jira to track this issue:
>> https://issues.apache.org/jira/browse/STRATOS-1430
>>
>>
>>
>>
>>
>>
>>
>> Regards
>>
>>
>>
>> Martin
>>
>>
>>
>> Attached the log file of the last test
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Friday, June 05, 2015 12:59 PM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> For this latest test I got the latest source from stratos repo so I have
>> this commit (see below), but the un-deployment still fails (to some extent).
>>
>> As mentioned below, it seems that all the members get terminated
>> eventually, including the ones which got started after the “application
>> un-deployment” process started.
>>
>> What is still left in stratos (even after all members got terminated) is
>> the application (see the stratos> list-applications command result below in
>> email thread). This would still be an issue when re-deploying the
>> application !
>>
>> I will do a few reruns to verify the removal of the VMs (members) is
>> consistent.
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> git show 2fe84b91843b20e91e8cafd06011f42d218f231c
>>
>> commit 2fe84b91843b20e91e8cafd06011f42d218f231c
>>
>> Author: anuruddhal <an...@gmail.com>
>>
>> Date:   Wed Jun 3 14:41:12 2015 +0530
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
>> *Sent:* Friday, June 05, 2015 12:46 PM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Martin,
>>
>>
>>
>> I also encountered a similar issue with the application un-deployment
>> with PCA but I guess you are using JCA.
>>
>>
>>
>> I can see that Anuruddha has done a fix for the issue I'm referring with
>> the below commit:
>>
>>
>> https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c
>>
>>
>>
>> Regarding the member context not found error, this could occur if the
>> termination request was made for an already terminated member. There is a
>> possibility that Autoscaler make a second terminate request if the first
>> request take some time to execute and at the time the second request hit
>> Cloud Controller the member is already terminated with the first request.
>>
>>
>>
>> Can you please confirm whether the members were properly terminated and
>> its just this exceptions that you are seeing?
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Udara,
>>
>>
>>
>> Picked up your commit and rerun the test case:
>>
>>
>>
>> Attached is the log file (artifacts are the same as before).
>>
>>
>>
>> *Didn’t see the issue with* “*Member is in the wrong list” …*
>>
>>
>>
>> but see the following exception after the undeploy application message:
>>
>> *TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR
>> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
>> -  Failed to retrieve topology event message*
>>
>> *org.apache.stratos.common.exception.InvalidLockRequestedException:
>> System error, cannot acquire a write lock while having a read lock on the
>> same thread: [lock-name] application-holder [thread-id] 114 [thread-name]
>> pool-24-thread-2*
>>
>> *                    at
>> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>>
>> *                    at
>> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>>
>>
>>
>>
>>
>> *Also, after the “Application undeployment process started” is started,
>> new members are being instantiated:*
>>
>>
>>
>> *TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO
>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>> -  Publishing member created event*:
>>
>>
>>
>>
>>
>> *Eventually, these VMs get terminated :*
>>
>>
>>
>> *TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR
>> {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl}
>> -  Could not terminate instance: [member-id]
>> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>>
>> *org.apache.stratos.cloud.controller.exception.InvalidMemberException:
>> Could not terminate instance, member context not found: [member-id]
>> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>>
>> *                    at
>> org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)*
>>
>> *                    at
>> sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)*
>>
>> *                    at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>>
>> *                    at java.lang.reflect.Method.invoke(Method.java:606)*
>>
>>
>>
>>
>>
>> *but the application remains:*
>>
>>
>>
>> stratos> list-applications
>>
>> Applications found:
>>
>> +----------------+------------+----------+
>>
>> | Application ID | Alias      | Status   |
>>
>> +----------------+------------+----------+
>>
>> | g-sc-G12-1     | g-sc-G12-1 | Deployed |
>>
>> +----------------+------------+----------+
>>
>>
>>
>> ['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances
>> 3, members 0 ()\n']
>>
>>
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Friday, June 05, 2015 10:04 AM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Ok:
>>
>>
>>
>> log4j.logger.org.apache.stratos.manager=DEBUG
>>
>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>
>> log4j.logger.org.apache.stratos.messaging=INFO
>>
>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>
>> log4j.logger.org.wso2.andes.client=ERROR
>>
>> # Autoscaler rule logs
>>
>> log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG
>>
>>
>>
>> *From:* Udara Liyanage [mailto:udara@wso2.com <ud...@wso2.com>]
>> *Sent:* Friday, June 05, 2015 10:00 AM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi Martin,
>>
>>
>>
>> Better if you can enable debugs logs for all AS, CC and cartridge agent
>>
>>
>>
>> On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com> wrote:
>>
>> Hi,
>>
>>
>>
>> Please enable AS debug logs.
>>
>>
>>
>> On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Udara,
>>
>>
>>
>> Yes, this issue seems to be fairly well reproducible, which debug log do
>> you want me to enable, cartridge agent logs ?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Udara Liyanage [mailto:udara@wso2.com]
>> *Sent:* Thursday, June 04, 2015 11:11 PM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi,
>>
>>
>>
>> This might be possible if AS did not receive member activated event
>> published by CC. Is it possible to enable debug logs if this is
>> reproducible.
>>
>> Or else I can add an INFO logs and commit.
>>
>>
>>
>>
>>
>> On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com> wrote:
>>
>> Hi,
>>
>>
>>
>>
>>
>> For the first issue you have mentioned, the particular member is
>> activated, but it is still identified as an obsolete member and is being
>> marked to be terminated since pending time expired. Does that mean member
>> is still in Obsolete list even though it is being activated?
>>
>>
>>
>> //member started
>>
>> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
>> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
>> stat context has been added: [application] g-sc-G12-1 [cluster]
>> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
>> [partitionContext] whole-region [member-id]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>
>>
>>
>> //member activated
>>
>> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>> -  Publishing member activated event: [service-name] c1 [cluster-id]
>> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>> [network-partition-id] RegionOne [partition-id] whole-region
>>
>> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
>> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
>> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
>> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>
>>
>>
>> //after 15 minutes ---member is still in pending state, pending timeout
>> expired
>>
>> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
>> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
>> -  Pending state of member expired, member will be moved to obsolete list.
>> [pending member]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
>> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>>
>>
>>
>> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> I am running into a scenario where application un-deployment fails (using
>> stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>>
>>
>>
>> For application structure see [1.], (debug enabled) wso2carbon.log,
>> application.json, cartridge-group.json, deployment-policy, auto-scaling
>> policies see attached zip file.
>>
>>
>>
>> *It is noteworthy, that while the application is running the following
>> log statements /exceptions are observed:*
>>
>>
>>
>> *…*
>>
>> *Member is in the wrong list and it is removed from active members list:
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>>
>> *…*
>>
>> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>> instance*
>>
>> *…*
>>
>> *// **after receiving the application undeploy event:*
>>
>> *[2015-06-04 20:12:39,465]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Application undeployment process started: [application-id] g-sc-G12-1*
>>
>> *// **a new instance is being started up*
>>
>> *…*
>>
>> *[2015-06-04 20:13:13,445]  INFO
>> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
>> Instance started successfully: [cartridge-type] c2 [cluster-id]
>> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
>> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>>
>>
>>
>> *// Also noteworthy seems the following warning which is seen repeatedly
>> in the logs:*
>>
>> *ReadWriteLock} -  System warning! Trying to release a lock which has not
>> been taken by the same thread: [lock-name]*
>>
>>
>>
>>
>>
>> [1.] Application structure
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>>
>> Udara Liyanage
>>
>> Software Engineer
>>
>> WSO2, Inc.: http://wso2.com
>>
>> lean. enterprise. middleware
>>
>> web: http://udaraliyanage.wordpress.com
>>
>> phone: +94 71 443 6897
>>
>>
>>
>>
>>
>> --
>>
>>
>> Udara Liyanage
>>
>> Software Engineer
>>
>> WSO2, Inc.: http://wso2.com
>>
>> lean. enterprise. middleware
>>
>> web: http://udaraliyanage.wordpress.com
>>
>> phone: +94 71 443 6897
>>
>>
>>
>>
>>
>> --
>>
>>
>> Udara Liyanage
>>
>> Software Engineer
>>
>> WSO2, Inc.: http://wso2.com
>>
>> lean. enterprise. middleware
>>
>> web: http://udaraliyanage.wordpress.com
>>
>> phone: +94 71 443 6897
>>
>>
>>
>>
>>
>> --
>>
>>
>> Udara Liyanage
>>
>> Software Engineer
>>
>> WSO2, Inc.: http://wso2.com
>>
>> lean. enterprise. middleware
>>
>> web: http://udaraliyanage.wordpress.com
>>
>> phone: +94 71 443 6897
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> *Lasindu Charith*
> Software Engineer, WSO2 Inc.
> Mobile: +94714427192
> Web: blog.lasindu.com
>


Thanks,
-- 
*Lasindu Charith*
Software Engineer, WSO2 Inc.
Mobile: +94714427192
Web: blog.lasindu.com

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Lasindu Charith <la...@wso2.com>.
Hi Martin,

I was able to reproduce this issue in the latest build with PCA in
Openstack. Even after stratos is restarted, the Application is not
undeployed, which makes it impossible to undeploy the application (even the
forceful undeployment failed for the above obsolete application).

Currently I'm looking at possible causes for this and will update with the
progress.

Thanks,

On Tue, Jun 9, 2015 at 5:59 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Here is another example where the removal fails:
>
>
>
> For application see [1.], log file (with debug enabled) and jsons are
> attached.
>
>
>
> Scenario:
>
>
>
> ·        Deploy application and wait for all cartridges to become active
>
> ·        Kill a VM (2nd in startup sequence)
>
> ·        Wait for it to restart and become active
>
> ·        Un-deploy application
>
> a.      Un-deploy forcefully will succeed
> ([2015-06-08 20:38:21,487]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Forcefully un-deploying the application s-g-c1-c2-c3-s)
> und
>
> b.      Un-deploy gracefully will fail to remove app completely (although
> VMs are terminated successfully)
> ([2015-06-08 20:54:16,372]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Starting to undeploy application: [application-id])
>
> ·        Both scenarios are recorded in the same log file
> wso2carbon-s-g-c1-c2-c3-s.log
>
> ·        Btw, I retested the scenario and the issue is easily
>  reproducible following the steps listed above:
> graceful application un-deploy succeeds if no VM had been restarted
> (terminated and restarted by autoscaler).
> Once a VM is terminated , graceful application un-deploy will fail
> I attached a log file which demonstrates this case
> (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same
> application is deployed, becomes active and is then removed (repetead 2
> times), then, a VM is terminated and restarted by autoscaler. Afterwards,
> graceful application un-deploy fails.
>
>
>
>
>
> Other Observations:
>
>
>
> When the application successfully some events e.g. “cluster removed
> event”, “Application deleted event received:” are being published (see [2.]
> while when the application fails to be removed no such event is being
> observed.
>
>
>
> [2.] cluster removed event when application is un-deployed forcefully
>
> TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO
> {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver}
> -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
>
> TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing application clusters removed event: [application-id]
> s-g-c1-c2-c3-s
>
>
>
>
>
> I analyzed the differences in the successful application removal and
> unsuccessful log sequence and noticed a difference (see also highlighted
> areas):
>
>
>
> Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)
>
>
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
> s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
> -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x
> [instance] s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG
> {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -
> Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x
> [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
> ClusterMonitor Drools session has been disposed. ClusterMonitor
> [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
> ClusterMonitor Drools session has been disposed. ClusterMonitor
> [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG
> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
> ClusterMonitor Drools session has been disposed. ClusterMonitor
> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG
> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
> Write lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG
> {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [
> s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -
> Applications updated:
> {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO
> {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher}
> -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s
> [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO
> {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group]
> s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance]
> s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusActiveProcessor}** -  GroupProcessor chain calculating the
> status for the group [ s-g-c1-c2-c3-s ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor}
> -  StatusChecker calculating the active status for the group [
> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
> -  StatusChecker calculating the terminated status for the group [
> s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor}
> -  Sending application instance terminated for [application] s-g-c1-c2-c3-s
> [instance] s-g-c1-c2-c3-s-1*
>
>
>
> Unsuccessful:
>
>
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusTerminatedProcessor**} -  StatusChecker calculating the
> terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
> s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
> status is: Terminating*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -
> Write lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor}
> -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance
> status is: Terminating*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusTerminatingProcessor**} -  StatusChecker calculating the
> terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance
> [ s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG
> {org.apache.stratos.autoscaler.status.processor.group.*
> *GroupStatusInactiveProcessor**} -  StatusChecker calculating the
> inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [
> s-g-c1-c2-c3-s-1 ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock acquired*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write
> lock released*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
> -  **No possible state change found for* *[component] s-g-c1-c2-c3-s-x0x
> [instance] s-g-c1-c2-c3-s-1*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG
> {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -
> ClusterMonitor Drools session has been disposed. ClusterMonitor
> [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]*
>
> *TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR
> {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System
> error, lock has not released for 30 seconds: [lock-name] application
> [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2
> [stack-trace] *
>
> *java.lang.Thread.getStackTrace(Thread.java:1589)*
>
>
>
>
>
>
>
>
>
> [1.] Application Structure
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 4:38 PM
>
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> This is another application, see [1.] which fails to get completely
> removed:
>
>
>
> Scenario / Observation:
>
> ·        After all instances / application go active, one instance is
> being terminated (to verify termination behavior). Once the terminated
> instance is restored the application is undeployed.
>
> ·        After the Application undeployment process is started, all
> instances are being terminated
>
> ·        Application still shows up in stratos admin, subsequent
> deployments fail
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +---------------------+---------------------+----------+
>
> | Application ID      | Alias               | Status   |
>
> +---------------------+---------------------+----------+
>
> | s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
>
> +---------------------+---------------------+----------+
>
>
>
>
>
> [1.] Application:
>
>
>
>
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 3:26 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> After re-running it this my observations:
>
>
>
> ·        After the “Application undeployment process started” is started,
> there is a likelihood that (a few) VMs are still launched – I suspect this
> is due to some race condition between “Application undeployment process
> started” and the “autoscaler”.
>
> ·        All Vms which were launched before the “Application undeployment
> process started” get terminated as part of the undeployment process.
>
> ·        Vms which were launched after “Application undeployment process
> started” eventually get moved to obsolete / pending state and cleaned up,
> this can take up to 15- 20 minutes.
>
> ·        The application never gets completely removed,
>
> ·        The following exception is consistently observed:
>
> ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN
> {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System
> warning! Trying to release a lock which has not been taken by the same
> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
> pool-24-thread-2
>
> *TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR
> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
> -  Failed to retrieve topology event message*
>
> *org.apache.stratos.common.exception.InvalidLockRequestedException: System
> error, cannot acquire a write lock while having a read lock on the same
> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
> pool-24-thread-2*
>
> *                    at
> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>
> *                    at
> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>
> ·        Initiating the “Application undeployment process” again will
> cause the following INFO statement (without any further actions, see in log)
> TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Application monitor is already in terminating, graceful un-deployment is
> has already been attempted thus not invoking again
>
> ·        Other exceptions observed after the “Application undeployment
> process started”
>
> TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR
> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
> instance
>
> org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException:
> CloudControllerServiceInvalidMemberExceptionException
>
>         at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown
> Source)
>
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>
>         at java.lang.Class.newInstance(Class.java:374)
>
>         at
> org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
>
>         at
> org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
>
>         at
> org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
>
>         at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)
>
>
>
> ·        Created a jira to track this issue:
> https://issues.apache.org/jira/browse/STRATOS-1430
>
>
>
>
>
>
>
> Regards
>
>
>
> Martin
>
>
>
> Attached the log file of the last test
>
>
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 12:59 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> For this latest test I got the latest source from stratos repo so I have
> this commit (see below), but the un-deployment still fails (to some extent).
>
> As mentioned below, it seems that all the members get terminated
> eventually, including the ones which got started after the “application
> un-deployment” process started.
>
> What is still left in stratos (even after all members got terminated) is
> the application (see the stratos> list-applications command result below in
> email thread). This would still be an issue when re-deploying the
> application !
>
> I will do a few reruns to verify the removal of the VMs (members) is
> consistent.
>
> Thanks
>
>
>
> Martin
>
>
>
> git show 2fe84b91843b20e91e8cafd06011f42d218f231c
>
> commit 2fe84b91843b20e91e8cafd06011f42d218f231c
>
> Author: anuruddhal <an...@gmail.com>
>
> Date:   Wed Jun 3 14:41:12 2015 +0530
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org <im...@apache.org>]
> *Sent:* Friday, June 05, 2015 12:46 PM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
>
>
> I also encountered a similar issue with the application un-deployment with
> PCA but I guess you are using JCA.
>
>
>
> I can see that Anuruddha has done a fix for the issue I'm referring with
> the below commit:
>
>
> https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c
>
>
>
> Regarding the member context not found error, this could occur if the
> termination request was made for an already terminated member. There is a
> possibility that Autoscaler make a second terminate request if the first
> request take some time to execute and at the time the second request hit
> Cloud Controller the member is already terminated with the first request.
>
>
>
> Can you please confirm whether the members were properly terminated and
> its just this exceptions that you are seeing?
>
>
>
> Thanks
>
>
>
>
>
> On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Udara,
>
>
>
> Picked up your commit and rerun the test case:
>
>
>
> Attached is the log file (artifacts are the same as before).
>
>
>
> *Didn’t see the issue with* “*Member is in the wrong list” …*
>
>
>
> but see the following exception after the undeploy application message:
>
> *TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR
> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
> -  Failed to retrieve topology event message*
>
> *org.apache.stratos.common.exception.InvalidLockRequestedException: System
> error, cannot acquire a write lock while having a read lock on the same
> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
> pool-24-thread-2*
>
> *                    at
> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>
> *                    at
> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>
>
>
>
>
> *Also, after the “Application undeployment process started” is started,
> new members are being instantiated:*
>
>
>
> *TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing member created event*:
>
>
>
>
>
> *Eventually, these VMs get terminated :*
>
>
>
> *TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR
> {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl}
> -  Could not terminate instance: [member-id]
> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>
> *org.apache.stratos.cloud.controller.exception.InvalidMemberException:
> Could not terminate instance, member context not found: [member-id]
> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>
> *                    at
> org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)*
>
> *                    at
> sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)*
>
> *                    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>
> *                    at java.lang.reflect.Method.invoke(Method.java:606)*
>
>
>
>
>
> *but the application remains:*
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +----------------+------------+----------+
>
> | Application ID | Alias      | Status   |
>
> +----------------+------------+----------+
>
> | g-sc-G12-1     | g-sc-G12-1 | Deployed |
>
> +----------------+------------+----------+
>
>
>
> ['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances
> 3, members 0 ()\n']
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 10:04 AM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Ok:
>
>
>
> log4j.logger.org.apache.stratos.manager=DEBUG
>
> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>
> log4j.logger.org.apache.stratos.messaging=INFO
>
> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>
> log4j.logger.org.wso2.andes.client=ERROR
>
> # Autoscaler rule logs
>
> log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG
>
>
>
> *From:* Udara Liyanage [mailto:udara@wso2.com <ud...@wso2.com>]
> *Sent:* Friday, June 05, 2015 10:00 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
>
>
> Better if you can enable debugs logs for all AS, CC and cartridge agent
>
>
>
> On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com> wrote:
>
> Hi,
>
>
>
> Please enable AS debug logs.
>
>
>
> On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Udara,
>
>
>
> Yes, this issue seems to be fairly well reproducible, which debug log do
> you want me to enable, cartridge agent logs ?
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Udara Liyanage [mailto:udara@wso2.com]
> *Sent:* Thursday, June 04, 2015 11:11 PM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi,
>
>
>
> This might be possible if AS did not receive member activated event
> published by CC. Is it possible to enable debug logs if this is
> reproducible.
>
> Or else I can add an INFO logs and commit.
>
>
>
>
>
> On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com> wrote:
>
> Hi,
>
>
>
>
>
> For the first issue you have mentioned, the particular member is
> activated, but it is still identified as an obsolete member and is being
> marked to be terminated since pending time expired. Does that mean member
> is still in Obsolete list even though it is being activated?
>
>
>
> //member started
>
> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
> stat context has been added: [application] g-sc-G12-1 [cluster]
> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
> [partitionContext] whole-region [member-id]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>
>
>
> //member activated
>
> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing member activated event: [service-name] c1 [cluster-id]
> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
> [network-partition-id] RegionOne [partition-id] whole-region
>
> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>
>
>
> //after 15 minutes ---member is still in pending state, pending timeout
> expired
>
> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
> -  Pending state of member expired, member will be moved to obsolete list.
> [pending member]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>
>
>
> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi,
>
>
>
> I am running into a scenario where application un-deployment fails (using
> stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>
>
>
> For application structure see [1.], (debug enabled) wso2carbon.log,
> application.json, cartridge-group.json, deployment-policy, auto-scaling
> policies see attached zip file.
>
>
>
> *It is noteworthy, that while the application is running the following log
> statements /exceptions are observed:*
>
>
>
> *…*
>
> *Member is in the wrong list and it is removed from active members list:
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>
> *…*
>
> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
> instance*
>
> *…*
>
> *// **after receiving the application undeploy event:*
>
> *[2015-06-04 20:12:39,465]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Application undeployment process started: [application-id] g-sc-G12-1*
>
> *// **a new instance is being started up*
>
> *…*
>
> *[2015-06-04 20:13:13,445]  INFO
> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
> Instance started successfully: [cartridge-type] c2 [cluster-id]
> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>
>
>
> *// Also noteworthy seems the following warning which is seen repeatedly
> in the logs:*
>
> *ReadWriteLock} -  System warning! Trying to release a lock which has not
> been taken by the same thread: [lock-name]*
>
>
>
>
>
> [1.] Application structure
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
>
> Udara Liyanage
>
> Software Engineer
>
> WSO2, Inc.: http://wso2.com
>
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
>
> phone: +94 71 443 6897
>
>
>
>
>
> --
>
>
> Udara Liyanage
>
> Software Engineer
>
> WSO2, Inc.: http://wso2.com
>
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
>
> phone: +94 71 443 6897
>
>
>
>
>
> --
>
>
> Udara Liyanage
>
> Software Engineer
>
> WSO2, Inc.: http://wso2.com
>
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
>
> phone: +94 71 443 6897
>
>
>
>
>
> --
>
>
> Udara Liyanage
>
> Software Engineer
>
> WSO2, Inc.: http://wso2.com
>
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
>
> phone: +94 71 443 6897
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>



-- 
*Lasindu Charith*
Software Engineer, WSO2 Inc.
Mobile: +94714427192
Web: blog.lasindu.com

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Here is another example where the removal fails:

For application see [1.], log file (with debug enabled) and jsons are attached.

Scenario:


·        Deploy application and wait for all cartridges to become active

·        Kill a VM (2nd in startup sequence)

·        Wait for it to restart and become active

·        Un-deploy application

a.      Un-deploy forcefully will succeed
([2015-06-08 20:38:21,487]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Forcefully un-deploying the application s-g-c1-c2-c3-s)
und

b.      Un-deploy gracefully will fail to remove app completely (although VMs are terminated successfully)
([2015-06-08 20:54:16,372]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Starting to undeploy application: [application-id])

·        Both scenarios are recorded in the same log file wso2carbon-s-g-c1-c2-c3-s.log

·        Btw, I retested the scenario and the issue is easily  reproducible following the steps listed above:
graceful application un-deploy succeeds if no VM had been restarted (terminated and restarted by autoscaler).
Once a VM is terminated , graceful application un-deploy will fail
I attached a log file which demonstrates this case (wso2carbon-s-g-c1-c2-c3-s-scen-2.log). In this scenario, the same application is deployed, becomes active and is then removed (repetead 2 times), then, a VM is terminated and restarted by autoscaler. Afterwards, graceful application un-deploy fails.


Other Observations:

When the application successfully some events e.g. “cluster removed event”, “Application deleted event received:” are being published (see [2.] while when the application fails to be removed no such event is being observed.

[2.] cluster removed event when application is un-deployed forcefully
TID: [0] [STRATOS] [2015-06-08 20:38:34,187]  INFO {org.apache.stratos.cloud.controller.messaging.receiver.application.ApplicationEventReceiver} -  Application deleted event received: [application-id] s-g-c1-c2-c3-s
TID: [0] [STRATOS] [2015-06-08 20:38:34,220]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing application clusters removed event: [application-id] s-g-c1-c2-c3-s


I analyzed the differences in the successful application removal and unsuccessful log sequence and noticed a difference (see also highlighted areas):

Successful (see logs in wso2carbon-s-g-c1-c2-c3-s-scen-2.log)

TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,527]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending group instance terminated for [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,527] DEBUG {org.apache.stratos.autoscaler.applications.topic.ApplicationBuilder} -  Handling group terminated event: [group-id] s-g-c1-c2-c3-s-x0x [application-id] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c3-0x0.c3.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c1-0x0.c1.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,528] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,529] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,536] DEBUG {org.apache.stratos.autoscaler.registry.RegistryManager} -  Application [ s-g-c1-c2-c3-s ] persisted successfully in the Autoscaler Registry
TID: [0] [STRATOS] [2015-06-08 22:18:41,538] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Applications updated: {"applicationMap":{"s-g-c1-c2-c3-s":{"id":"s-g-c1-c2-c3-s","key":"l8V7OpRqOfBcWbBw","tenantId":-1234,"tenantDomain":"carbon.super","tenantAdminUserName":"admin","applicationPolicyId":"default-iaas","aliasToGroupMap":{"s-g-c1-c2-c3-s-x0x":{"name":"s-g-c1-c2-c3-s","alias":"s-g-c1-c2-c3-s-x0x","groupMinInstances":1,"groupMaxInstances":1,"applicationId":"s-g-c1-c2-c3-s","aliasToGroupMap":{},"aliasToClusterDataMap":{"c2-0x0":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3-0x0":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1},"c1-0x0":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1}},"typeToClusterDataMap":{"c1":{"serviceType":"c1","clusterId":"s-g-c1-c2-c3-s.c1-0x0.c1.domain","minInstances":1,"maxInstances":1},"c2":{"serviceType":"c2","clusterId":"s-g-c1-c2-c3-s.c2-0x0.c2.domain","minInstances":1,"maxInstances":1},"c3":{"serviceType":"c3","clusterId":"s-g-c1-c2-c3-s.c3-0x0.c3.domain","minInstances":1,"maxInstances":1}},"instanceIdToInstanceContextMap":{},"dependencyOrder":{"startupOrders":[{"startupOrderComponentList":["cartridge.c3-0x0","cartridge.c2-0x0"]},{"startupOrderComponentList":["cartridge.c2-0x0","cartridge.c1-0x0"]}],"terminationBehaviour":"terminate-none"},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":0}}},"aliasToClusterDataMap":{},"typeToClusterDataMap":{},"aliasToDeploymentPolicyIdMap":{"c3-0x0":"static-1","c2-0x0":"static-1","c1-0x0":"static-1"},"instanceIdToInstanceContextMap":{"s-g-c1-c2-c3-s-1":{"alias":"s-g-c1-c2-c3-s","instanceId":"s-g-c1-c2-c3-s-1","instanceProperties":{},"lifeCycleStateManager":{"stateStack":["Created","Active","Terminating"],"identifier":"s-g-c1-c2-c3-s_s-g-c1-c2-c3-s-1"},"networkPartitionId":"RegionOne"}},"dependencyOrder":{"startupOrders":[]},"isGroupScalingEnabled":false,"isGroupInstanceMonitoringEnabled":false,"instanceIdSequence":{"value":1}}},"initialized":false}
TID: [0] [STRATOS] [2015-06-08 22:18:41,539]  INFO {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher} -  Publishing group instance terminated event: [application] s-g-c1-c2-c3-s [group] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545]  INFO {org.apache.stratos.autoscaler.monitor.component.GroupMonitor} -  [Group] s-g-c1-c2-c3-s-x0x is notifying the [parent] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:18:41,545] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  GroupProcessor chain calculating the status for the group [ s-g-c1-c2-c3-s ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusActiveProcessor} -  StatusChecker calculating the active status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:18:41,546] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:18:41,546]  INFO {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  Sending application instance terminated for [application] s-g-c1-c2-c3-s [instance] s-g-c1-c2-c3-s-1

Unsuccessful:

TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatedProcessor} -  StatusChecker calculating the terminated status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,404] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,405] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.cloud.controller.messaging.topology.TopologyManager} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusProcessor} -  Checking the status of cluster s-g-c1-c2-c3-s.c1-0x0.c1.domain instance status is: Terminating
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusTerminatingProcessor} -  StatusChecker calculating the terminating status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,406] DEBUG {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  StatusChecker calculating the inactive status for the group [ s-g-c1-c2-c3-s-x0x ]  for the instance  [ s-g-c1-c2-c3-s-1 ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock acquired
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Write lock released
TID: [0] [STRATOS] [2015-06-08 22:33:25,407]  WARN {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} -  No possible state change found for [component] s-g-c1-c2-c3-s-x0x [instance] s-g-c1-c2-c3-s-1
TID: [0] [STRATOS] [2015-06-08 22:33:25,407] DEBUG {org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor} -  ClusterMonitor Drools session has been disposed. ClusterMonitor [clusterId=s-g-c1-c2-c3-s.c2-0x0.c2.domain, hasPrimary=false ]
TID: [0] [STRATOS] [2015-06-08 22:33:25,481] ERROR {org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor} -  System error, lock has not released for 30 seconds: [lock-name] application [lock-type] Write [thread-id] 99 [thread-name] pool-26-thread-2 [stack-trace]
java.lang.Thread.getStackTrace(Thread.java:1589)




[1.] Application Structure
[cid:image003.png@01D0A1F8.ECAC2390]






From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 4:38 PM
To: dev@stratos.apache.org
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

This is another application, see [1.] which fails to get completely removed:

Scenario / Observation:

·        After all instances / application go active, one instance is being terminated (to verify termination behavior). Once the terminated instance is restored the application is undeployed.

·        After the Application undeployment process is started, all instances are being terminated

·        Application still shows up in stratos admin, subsequent deployments fail

stratos> list-applications
Applications found:
+---------------------+---------------------+----------+
| Application ID      | Alias               | Status   |
+---------------------+---------------------+----------+
| s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
+---------------------+---------------------+----------+


[1.] Application:

[cid:image004.png@01D0A1F8.ECAC2390]




From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 3:26 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

After re-running it this my observations:


·        After the “Application undeployment process started” is started, there is a likelihood that (a few) VMs are still launched – I suspect this is due to some race condition between “Application undeployment process started” and the “autoscaler”.

·        All Vms which were launched before the “Application undeployment process started” get terminated as part of the undeployment process.

·        Vms which were launched after “Application undeployment process started” eventually get moved to obsolete / pending state and cleaned up, this can take up to 15- 20 minutes.

·        The application never gets completely removed,

·        The following exception is consistently observed:

ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message

org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)

                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)

·        Initiating the “Application undeployment process” again will cause the following INFO statement (without any further actions, see in log)
TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application monitor is already in terminating, graceful un-deployment is has already been attempted thus not invoking again

·        Other exceptions observed after the “Application undeployment process started”
TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException: CloudControllerServiceInvalidMemberExceptionException
        at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
        at org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
        at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
        at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)


·        Created a jira to track this issue: https://issues.apache.org/jira/browse/STRATOS-1430







Regards



Martin



Attached the log file of the last test







From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 12:59 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

For this latest test I got the latest source from stratos repo so I have this commit (see below), but the un-deployment still fails (to some extent).
As mentioned below, it seems that all the members get terminated eventually, including the ones which got started after the “application un-deployment” process started.
What is still left in stratos (even after all members got terminated) is the application (see the stratos> list-applications command result below in email thread). This would still be an issue when re-deploying the application !
I will do a few reruns to verify the removal of the VMs (members) is consistent.
Thanks

Martin

git show 2fe84b91843b20e91e8cafd06011f42d218f231c
commit 2fe84b91843b20e91e8cafd06011f42d218f231c
Author: anuruddhal <an...@gmail.com>>
Date:   Wed Jun 3 14:41:12 2015 +0530

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I also encountered a similar issue with the application un-deployment with PCA but I guess you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with the below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the termination request was made for an already terminated member. There is a possibility that Autoscaler make a second terminate request if the first request take some time to execute and at the time the second request hit Cloud Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its just this exceptions that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2
                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new members are being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<ma...@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image005.png@01D0A1F8.ECAC2390]







--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
This is another application, see [1.] which fails to get completely removed:

Scenario / Observation:

·        After all instances / application go active, one instance is being terminated (to verify termination behavior). Once the terminated instance is restored the application is undeployed.

·        After the Application undeployment process is started, all instances are being terminated

·        Application still shows up in stratos admin, subsequent deployments fail

stratos> list-applications
Applications found:
+---------------------+---------------------+----------+
| Application ID      | Alias               | Status   |
+---------------------+---------------------+----------+
| s-n-gr-s-G123-t-a-4 | s-n-gr-s-G123-t-a-4 | Deployed |
+---------------------+---------------------+----------+


[1.] Application:

[cid:image002.png@01D09FAD.F375DA10]




From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 3:26 PM
To: dev@stratos.apache.org
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

After re-running it this my observations:


·        After the “Application undeployment process started” is started, there is a likelihood that (a few) VMs are still launched – I suspect this is due to some race condition between “Application undeployment process started” and the “autoscaler”.

·        All Vms which were launched before the “Application undeployment process started” get terminated as part of the undeployment process.

·        Vms which were launched after “Application undeployment process started” eventually get moved to obsolete / pending state and cleaned up, this can take up to 15- 20 minutes.

·        The application never gets completely removed,

·        The following exception is consistently observed:

ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message

org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)

                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)

·        Initiating the “Application undeployment process” again will cause the following INFO statement (without any further actions, see in log)
TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application monitor is already in terminating, graceful un-deployment is has already been attempted thus not invoking again

·        Other exceptions observed after the “Application undeployment process started”
TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException: CloudControllerServiceInvalidMemberExceptionException
        at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
        at org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
        at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
        at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)


·        Created a jira to track this issue: https://issues.apache.org/jira/browse/STRATOS-1430







Regards



Martin



Attached the log file of the last test







From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 12:59 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

For this latest test I got the latest source from stratos repo so I have this commit (see below), but the un-deployment still fails (to some extent).
As mentioned below, it seems that all the members get terminated eventually, including the ones which got started after the “application un-deployment” process started.
What is still left in stratos (even after all members got terminated) is the application (see the stratos> list-applications command result below in email thread). This would still be an issue when re-deploying the application !
I will do a few reruns to verify the removal of the VMs (members) is consistent.
Thanks

Martin

git show 2fe84b91843b20e91e8cafd06011f42d218f231c
commit 2fe84b91843b20e91e8cafd06011f42d218f231c
Author: anuruddhal <an...@gmail.com>>
Date:   Wed Jun 3 14:41:12 2015 +0530

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I also encountered a similar issue with the application un-deployment with PCA but I guess you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with the below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the termination request was made for an already terminated member. There is a possibility that Autoscaler make a second terminate request if the first request take some time to execute and at the time the second request hit Cloud Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its just this exceptions that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2
                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new members are being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<ma...@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image001.png@01D09FAD.546777D0]







--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
After re-running it this my observations:


·        After the “Application undeployment process started” is started, there is a likelihood that (a few) VMs are still launched – I suspect this is due to some race condition between “Application undeployment process started” and the “autoscaler”.

·        All Vms which were launched before the “Application undeployment process started” get terminated as part of the undeployment process.

·        Vms which were launched after “Application undeployment process started” eventually get moved to obsolete / pending state and cleaned up, this can take up to 15- 20 minutes.

·        The application never gets completely removed,

·        The following exception is consistently observed:

ID: [0] [STRATOS] [2015-06-05 20:47:07,237]  WARN {org.apache.stratos.common.concurrent.locks.ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

TID: [0] [STRATOS] [2015-06-05 20:47:07,237] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message

org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2

                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)

                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)

·        Initiating the “Application undeployment process” again will cause the following INFO statement (without any further actions, see in log)
TID: [0] [STRATOS] [2015-06-05 21:34:34,509]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application monitor is already in terminating, graceful un-deployment is has already been attempted thus not invoking again

·        Other exceptions observed after the “Application undeployment process started”
TID: [0] [STRATOS] [2015-06-05 21:36:29,458] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
org.apache.stratos.cloud.controller.stub.CloudControllerServiceInvalidMemberExceptionException: CloudControllerServiceInvalidMemberExceptionException
        at sun.reflect.GeneratedConstructorAccessor219.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at java.lang.Class.newInstance(Class.java:374)
        at org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.terminateInstance(CloudControllerServiceStub.java:8633)
        at org.apache.stratos.common.client.CloudControllerServiceClient.terminateInstance(CloudControllerServiceClient.java:120)
        at org.apache.stratos.autoscaler.rule.RuleTasksDelegator.terminateObsoleteInstance(RuleTasksDelegator.java:298)
        at sun.reflect.GeneratedMethodAccessor413.invoke(Unknown Source)


·        Created a jira to track this issue: https://issues.apache.org/jira/browse/STRATOS-1430







Regards



Martin



Attached the log file of the last test







From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 12:59 PM
To: dev@stratos.apache.org
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

For this latest test I got the latest source from stratos repo so I have this commit (see below), but the un-deployment still fails (to some extent).
As mentioned below, it seems that all the members get terminated eventually, including the ones which got started after the “application un-deployment” process started.
What is still left in stratos (even after all members got terminated) is the application (see the stratos> list-applications command result below in email thread). This would still be an issue when re-deploying the application !
I will do a few reruns to verify the removal of the VMs (members) is consistent.
Thanks

Martin

git show 2fe84b91843b20e91e8cafd06011f42d218f231c
commit 2fe84b91843b20e91e8cafd06011f42d218f231c
Author: anuruddhal <an...@gmail.com>>
Date:   Wed Jun 3 14:41:12 2015 +0530

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I also encountered a similar issue with the application un-deployment with PCA but I guess you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with the below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the termination request was made for an already terminated member. There is a possibility that Autoscaler make a second terminate request if the first request take some time to execute and at the time the second request hit Cloud Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its just this exceptions that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2
                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new members are being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<ma...@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image001.png@01D09F98.9EB0B2D0]







--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
For this latest test I got the latest source from stratos repo so I have this commit (see below), but the un-deployment still fails (to some extent).
As mentioned below, it seems that all the members get terminated eventually, including the ones which got started after the “application un-deployment” process started.
What is still left in stratos (even after all members got terminated) is the application (see the stratos> list-applications command result below in email thread). This would still be an issue when re-deploying the application !
I will do a few reruns to verify the removal of the VMs (members) is consistent.
Thanks

Martin

git show 2fe84b91843b20e91e8cafd06011f42d218f231c
commit 2fe84b91843b20e91e8cafd06011f42d218f231c
Author: anuruddhal <an...@gmail.com>
Date:   Wed Jun 3 14:41:12 2015 +0530

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Friday, June 05, 2015 12:46 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

I also encountered a similar issue with the application un-deployment with PCA but I guess you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with the below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the termination request was made for an already terminated member. There is a possibility that Autoscaler make a second terminate request if the first request take some time to execute and at the time the second request hit Cloud Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its just this exceptions that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2
                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new members are being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<ma...@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image001.png@01D09F8F.6C0C9CD0]







--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Martin,

I also encountered a similar issue with the application un-deployment with
PCA but I guess you are using JCA.

I can see that Anuruddha has done a fix for the issue I'm referring with
the below commit:
https://github.com/apache/stratos/commit/2fe84b91843b20e91e8cafd06011f42d218f231c

Regarding the member context not found error, this could occur if the
termination request was made for an already terminated member. There is a
possibility that Autoscaler make a second terminate request if the first
request take some time to execute and at the time the second request hit
Cloud Controller the member is already terminated with the first request.

Can you please confirm whether the members were properly terminated and its
just this exceptions that you are seeing?

Thanks


On Sat, Jun 6, 2015 at 12:36 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Udara,
>
>
>
> Picked up your commit and rerun the test case:
>
>
>
> Attached is the log file (artifacts are the same as before).
>
>
>
> *Didn’t see the issue with* “*Member is in the wrong list” …*
>
>
>
> but see the following exception after the undeploy application message:
>
> *TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR
> {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator}
> -  Failed to retrieve topology event message*
>
> *org.apache.stratos.common.exception.InvalidLockRequestedException: System
> error, cannot acquire a write lock while having a read lock on the same
> thread: [lock-name] application-holder [thread-id] 114 [thread-name]
> pool-24-thread-2*
>
> *                    at
> org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)*
>
> *                    at
> org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)*
>
>
>
>
>
> *Also, after the “Application undeployment process started” is started,
> new members are being instantiated:*
>
>
>
> *TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing member created event*:
>
>
>
>
>
> *Eventually, these VMs get terminated :*
>
>
>
> *TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR
> {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl}
> -  Could not terminate instance: [member-id]
> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>
> *org.apache.stratos.cloud.controller.exception.InvalidMemberException:
> Could not terminate instance, member context not found: [member-id]
> g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f*
>
> *                    at
> org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)*
>
> *                    at
> sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)*
>
> *                    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)*
>
> *                    at java.lang.reflect.Method.invoke(Method.java:606)*
>
>
>
>
>
> *but the application remains:*
>
>
>
> stratos> list-applications
>
> Applications found:
>
> +----------------+------------+----------+
>
> | Application ID | Alias      | Status   |
>
> +----------------+------------+----------+
>
> | g-sc-G12-1     | g-sc-G12-1 | Deployed |
>
> +----------------+------------+----------+
>
>
>
> ['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances
> 3, members 0 ()\n']
>
>
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Friday, June 05, 2015 10:04 AM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Ok:
>
>
>
> log4j.logger.org.apache.stratos.manager=DEBUG
>
> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>
> log4j.logger.org.apache.stratos.messaging=INFO
>
> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>
> log4j.logger.org.wso2.andes.client=ERROR
>
> # Autoscaler rule logs
>
> log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG
>
>
>
> *From:* Udara Liyanage [mailto:udara@wso2.com <ud...@wso2.com>]
> *Sent:* Friday, June 05, 2015 10:00 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi Martin,
>
>
>
> Better if you can enable debugs logs for all AS, CC and cartridge agent
>
>
>
> On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com> wrote:
>
> Hi,
>
>
>
> Please enable AS debug logs.
>
>
>
> On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Udara,
>
>
>
> Yes, this issue seems to be fairly well reproducible, which debug log do
> you want me to enable, cartridge agent logs ?
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Udara Liyanage [mailto:udara@wso2.com]
> *Sent:* Thursday, June 04, 2015 11:11 PM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi,
>
>
>
> This might be possible if AS did not receive member activated event
> published by CC. Is it possible to enable debug logs if this is
> reproducible.
>
> Or else I can add an INFO logs and commit.
>
>
>
>
>
> On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com> wrote:
>
> Hi,
>
>
>
>
>
> For the first issue you have mentioned, the particular member is
> activated, but it is still identified as an obsolete member and is being
> marked to be terminated since pending time expired. Does that mean member
> is still in Obsolete list even though it is being activated?
>
>
>
> //member started
>
> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
> stat context has been added: [application] g-sc-G12-1 [cluster]
> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
> [partitionContext] whole-region [member-id]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>
>
>
> //member activated
>
> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing member activated event: [service-name] c1 [cluster-id]
> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
> [network-partition-id] RegionOne [partition-id] whole-region
>
> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>
>
>
> //after 15 minutes ---member is still in pending state, pending timeout
> expired
>
> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
> -  Pending state of member expired, member will be moved to obsolete list.
> [pending member]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>
>
>
> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi,
>
>
>
> I am running into a scenario where application un-deployment fails (using
> stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>
>
>
> For application structure see [1.], (debug enabled) wso2carbon.log,
> application.json, cartridge-group.json, deployment-policy, auto-scaling
> policies see attached zip file.
>
>
>
> *It is noteworthy, that while the application is running the following log
> statements /exceptions are observed:*
>
>
>
> *…*
>
> *Member is in the wrong list and it is removed from active members list:
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>
> *…*
>
> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
> instance*
>
> *…*
>
> *// **after receiving the application undeploy event:*
>
> *[2015-06-04 20:12:39,465]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Application undeployment process started: [application-id] g-sc-G12-1*
>
> *// **a new instance is being started up*
>
> *…*
>
> *[2015-06-04 20:13:13,445]  INFO
> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
> Instance started successfully: [cartridge-type] c2 [cluster-id]
> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>
>
>
> *// Also noteworthy seems the following warning which is seen repeatedly
> in the logs:*
>
> *ReadWriteLock} -  System warning! Trying to release a lock which has not
> been taken by the same thread: [lock-name]*
>
>
>
>
>
> [1.] Application structure
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
>
> Udara Liyanage
>
> Software Engineer
>
> WSO2, Inc.: http://wso2.com
>
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
>
> phone: +94 71 443 6897
>
>
>
>
>
> --
>
>
> Udara Liyanage
>
> Software Engineer
>
> WSO2, Inc.: http://wso2.com
>
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
>
> phone: +94 71 443 6897
>
>
>
>
>
> --
>
>
> Udara Liyanage
>
> Software Engineer
>
> WSO2, Inc.: http://wso2.com
>
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
>
> phone: +94 71 443 6897
>
>
>
>
>
> --
>
>
> Udara Liyanage
>
> Software Engineer
>
> WSO2, Inc.: http://wso2.com
>
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
>
> phone: +94 71 443 6897
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Udara,

Picked up your commit and rerun the test case:

Attached is the log file (artifacts are the same as before).

Didn’t see the issue with “Member is in the wrong list” …

but see the following exception after the undeploy application message:
TID: [0] [STRATOS] [2015-06-05 18:09:46,836] ERROR {org.apache.stratos.messaging.message.receiver.topology.TopologyEventMessageDelegator} -  Failed to retrieve topology event message
org.apache.stratos.common.exception.InvalidLockRequestedException: System error, cannot acquire a write lock while having a read lock on the same thread: [lock-name] application-holder [thread-id] 114 [thread-name] pool-24-thread-2
                    at org.apache.stratos.common.concurrent.locks.ReadWriteLock.acquireWriteLock(ReadWriteLock.java:114)
                    at org.apache.stratos.autoscaler.applications.ApplicationHolder.acquireWriteLock(ApplicationHolder.java:60)


Also, after the “Application undeployment process started” is started, new members are being instantiated:

TID: [0] [STRATOS] [2015-06-05 18:07:46,545]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member created event:


Eventually, these VMs get terminated :

TID: [0] [STRATOS] [2015-06-05 18:42:42,413] ERROR {org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl} -  Could not terminate instance: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
org.apache.stratos.cloud.controller.exception.InvalidMemberException: Could not terminate instance, member context not found: [member-id] g-sc-G12-1.c1-0x0.c1.domaindd9c1d40-70cc-4950-9757-418afe19ba7f
                    at org.apache.stratos.cloud.controller.services.impl.CloudControllerServiceImpl.terminateInstance(CloudControllerServiceImpl.java:595)
                    at sun.reflect.GeneratedMethodAccessor408.invoke(Unknown Source)
                    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                    at java.lang.reflect.Method.invoke(Method.java:606)


but the application remains:

stratos> list-applications
Applications found:
+----------------+------------+----------+
| Application ID | Alias      | Status   |
+----------------+------------+----------+
| g-sc-G12-1     | g-sc-G12-1 | Deployed |
+----------------+------------+----------+

['g-sc-G12-1: applicationInstances 1, groupInstances 2, clusterInstances 3, members 0 ()\n']



From: Martin Eppel (meppel)
Sent: Friday, June 05, 2015 10:04 AM
To: dev@stratos.apache.org
Subject: RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<ma...@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image001.png@01D09F85.0060B930]







--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Ok:

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR
# Autoscaler rule logs
log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Friday, June 05, 2015 10:00 AM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com<ma...@wso2.com>]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image001.png@01D09F76.E90CEC80]







--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897<tel:%2B94%2071%20443%206897>



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Udara Liyanage <ud...@wso2.com>.
Hi Martin,

Better if you can enable debugs logs for all AS, CC and cartridge agent

On Fri, Jun 5, 2015 at 10:23 PM, Udara Liyanage <ud...@wso2.com> wrote:

> Hi,
>
> Please enable AS debug logs.
>
> On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>>  Hi Udara,
>>
>>
>>
>> Yes, this issue seems to be fairly well reproducible, which debug log do
>> you want me to enable, cartridge agent logs ?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Udara Liyanage [mailto:udara@wso2.com]
>> *Sent:* Thursday, June 04, 2015 11:11 PM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1: Application undeployment:
>> application fails to undeploy (nested grouping, group scaling)
>>
>>
>>
>> Hi,
>>
>>
>>
>> This might be possible if AS did not receive member activated event
>> published by CC. Is it possible to enable debug logs if this is
>> reproducible.
>>
>> Or else I can add an INFO logs and commit.
>>
>>
>>
>>
>>
>> On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com> wrote:
>>
>> Hi,
>>
>>
>>
>>
>>
>> For the first issue you have mentioned, the particular member is
>> activated, but it is still identified as an obsolete member and is being
>> marked to be terminated since pending time expired. Does that mean member
>> is still in Obsolete list even though it is being activated?
>>
>>
>>
>> //member started
>>
>> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
>> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
>> stat context has been added: [application] g-sc-G12-1 [cluster]
>> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
>> [partitionContext] whole-region [member-id]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>
>>
>>
>> //member activated
>>
>> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
>> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
>> -  Publishing member activated event: [service-name] c1 [cluster-id]
>> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>> [network-partition-id] RegionOne [partition-id] whole-region
>>
>> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
>> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
>> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
>> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>>
>>
>>
>> //after 15 minutes ---member is still in pending state, pending timeout
>> expired
>>
>> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
>> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
>> -  Pending state of member expired, member will be moved to obsolete list.
>> [pending member]
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
>> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>>
>>
>>
>> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> I am running into a scenario where application un-deployment fails (using
>> stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>>
>>
>>
>> For application structure see [1.], (debug enabled) wso2carbon.log,
>> application.json, cartridge-group.json, deployment-policy, auto-scaling
>> policies see attached zip file.
>>
>>
>>
>> *It is noteworthy, that while the application is running the following
>> log statements /exceptions are observed:*
>>
>>
>>
>> *…*
>>
>> *Member is in the wrong list and it is removed from active members list:
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>>
>> *…*
>>
>> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>> instance*
>>
>> *…*
>>
>> *// **after receiving the application undeploy event:*
>>
>> *[2015-06-04 20:12:39,465]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Application undeployment process started: [application-id] g-sc-G12-1*
>>
>> *// **a new instance is being started up*
>>
>> *…*
>>
>> *[2015-06-04 20:13:13,445]  INFO
>> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
>> Instance started successfully: [cartridge-type] c2 [cluster-id]
>> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
>> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>>
>>
>>
>> *// Also noteworthy seems the following warning which is seen repeatedly
>> in the logs:*
>>
>> *ReadWriteLock} -  System warning! Trying to release a lock which has not
>> been taken by the same thread: [lock-name]*
>>
>>
>>
>>
>>
>> [1.] Application structure
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>>
>> Udara Liyanage
>>
>> Software Engineer
>>
>> WSO2, Inc.: http://wso2.com
>>
>> lean. enterprise. middleware
>>
>> web: http://udaraliyanage.wordpress.com
>>
>> phone: +94 71 443 6897
>>
>>
>>
>>
>>
>> --
>>
>>
>> Udara Liyanage
>>
>> Software Engineer
>>
>> WSO2, Inc.: http://wso2.com
>>
>> lean. enterprise. middleware
>>
>> web: http://udaraliyanage.wordpress.com
>>
>> phone: +94 71 443 6897
>>
>
>
>
> --
>
> Udara Liyanage
> Software Engineer
> WSO2, Inc.: http://wso2.com
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
> phone: +94 71 443 6897
>



-- 

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com
lean. enterprise. middleware

web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Udara Liyanage <ud...@wso2.com>.
Hi,

Please enable AS debug logs.

On Fri, Jun 5, 2015 at 9:38 PM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Udara,
>
>
>
> Yes, this issue seems to be fairly well reproducible, which debug log do
> you want me to enable, cartridge agent logs ?
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Udara Liyanage [mailto:udara@wso2.com]
> *Sent:* Thursday, June 04, 2015 11:11 PM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1: Application undeployment: application
> fails to undeploy (nested grouping, group scaling)
>
>
>
> Hi,
>
>
>
> This might be possible if AS did not receive member activated event
> published by CC. Is it possible to enable debug logs if this is
> reproducible.
>
> Or else I can add an INFO logs and commit.
>
>
>
>
>
> On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com> wrote:
>
> Hi,
>
>
>
>
>
> For the first issue you have mentioned, the particular member is
> activated, but it is still identified as an obsolete member and is being
> marked to be terminated since pending time expired. Does that mean member
> is still in Obsolete list even though it is being activated?
>
>
>
> //member started
>
> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
> stat context has been added: [application] g-sc-G12-1 [cluster]
> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
> [partitionContext] whole-region [member-id]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>
>
>
> //member activated
>
> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing member activated event: [service-name] c1 [cluster-id]
> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
> [network-partition-id] RegionOne [partition-id] whole-region
>
> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>
>
>
> //after 15 minutes ---member is still in pending state, pending timeout
> expired
>
> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
> -  Pending state of member expired, member will be moved to obsolete list.
> [pending member]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>
>
>
> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi,
>
>
>
> I am running into a scenario where application un-deployment fails (using
> stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>
>
>
> For application structure see [1.], (debug enabled) wso2carbon.log,
> application.json, cartridge-group.json, deployment-policy, auto-scaling
> policies see attached zip file.
>
>
>
> *It is noteworthy, that while the application is running the following log
> statements /exceptions are observed:*
>
>
>
> *…*
>
> *Member is in the wrong list and it is removed from active members list:
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>
> *…*
>
> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
> instance*
>
> *…*
>
> *// **after receiving the application undeploy event:*
>
> *[2015-06-04 20:12:39,465]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Application undeployment process started: [application-id] g-sc-G12-1*
>
> *// **a new instance is being started up*
>
> *…*
>
> *[2015-06-04 20:13:13,445]  INFO
> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
> Instance started successfully: [cartridge-type] c2 [cluster-id]
> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>
>
>
> *// Also noteworthy seems the following warning which is seen repeatedly
> in the logs:*
>
> *ReadWriteLock} -  System warning! Trying to release a lock which has not
> been taken by the same thread: [lock-name]*
>
>
>
>
>
> [1.] Application structure
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
>
> Udara Liyanage
>
> Software Engineer
>
> WSO2, Inc.: http://wso2.com
>
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
>
> phone: +94 71 443 6897
>
>
>
>
>
> --
>
>
> Udara Liyanage
>
> Software Engineer
>
> WSO2, Inc.: http://wso2.com
>
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
>
> phone: +94 71 443 6897
>



-- 

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com
lean. enterprise. middleware

web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897

RE: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Udara,

Yes, this issue seems to be fairly well reproducible, which debug log do you want me to enable, cartridge agent logs ?

Thanks

Martin

From: Udara Liyanage [mailto:udara@wso2.com]
Sent: Thursday, June 04, 2015 11:11 PM
To: dev
Subject: Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Hi,

This might be possible if AS did not receive member activated event published by CC. Is it possible to enable debug logs if this is reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com>> wrote:
Hi,


For the first issue you have mentioned, the particular member is activated, but it is still identified as an obsolete member and is being marked to be terminated since pending time expired. Does that mean member is still in Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member stat context has been added: [application] g-sc-G12-1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1 [partitionContext] whole-region [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member activated event: [service-name] c1 [cluster-id] g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor} -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher} -  Pending state of member expired, member will be moved to obsolete list. [pending member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi,

I am running into a scenario where application un-deployment fails (using stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).

For application structure see [1.], (debug enabled) wso2carbon.log, application.json, cartridge-group.json, deployment-policy, auto-scaling policies see attached zip file.

It is noteworthy, that while the application is running the following log statements /exceptions are observed:

…
Member is in the wrong list and it is removed from active members list: g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
…
TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate instance
…
// after receiving the application undeploy event:
[2015-06-04 20:12:39,465]  INFO {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -  Application undeployment process started: [application-id] g-sc-G12-1
// a new instance is being started up
…
[2015-06-04 20:13:13,445]  INFO {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -  Instance started successfully: [cartridge-type] c2 [cluster-id] g-sc-G12-1.c2-1x0.c2.domain [instance-id] RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407

// Also noteworthy seems the following warning which is seen repeatedly in the logs:
ReadWriteLock} -  System warning! Trying to release a lock which has not been taken by the same thread: [lock-name]


[1.] Application structure

[cid:image001.png@01D09F6F.23D78EE0]







--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897



--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com<http://wso2.com/>
lean. enterprise. middleware
web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Udara Liyanage <ud...@wso2.com>.
Hi,

This might be possible if AS did not receive member activated event
published by CC. Is it possible to enable debug logs if this is
reproducible.
Or else I can add an INFO logs and commit.


On Fri, Jun 5, 2015 at 9:11 AM, Udara Liyanage <ud...@wso2.com> wrote:

> Hi,
>
>
> For the first issue you have mentioned, the particular member is
> activated, but it is still identified as an obsolete member and is being
> marked to be terminated since pending time expired. Does that mean member
> is still in Obsolete list even though it is being activated?
>
> //member started
> TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
> {org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
> stat context has been added: [application] g-sc-G12-1 [cluster]
> g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
> [partitionContext] whole-region [member-id]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>
> //member activated
> TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing member activated event: [service-name] c1 [cluster-id]
> g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
> [network-partition-id] RegionOne [partition-id] whole-region
> TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
> {org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
> -  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
> [member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
>
> //after 15 minutes ---member is still in pending state, pending timeout
> expired
> TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
> {org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
> -  Pending state of member expired, member will be moved to obsolete list.
> [pending member]
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
> time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null
>
> On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>>  Hi,
>>
>>
>>
>> I am running into a scenario where application un-deployment fails (using
>> stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>>
>>
>>
>> For application structure see [1.], (debug enabled) wso2carbon.log,
>> application.json, cartridge-group.json, deployment-policy, auto-scaling
>> policies see attached zip file.
>>
>>
>>
>> *It is noteworthy, that while the application is running the following
>> log statements /exceptions are observed:*
>>
>>
>>
>> *…*
>>
>> *Member is in the wrong list and it is removed from active members list:
>> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>>
>> *…*
>>
>> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
>> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
>> instance*
>>
>> *…*
>>
>> *// **after receiving the application undeploy event:*
>>
>> *[2015-06-04 20:12:39,465]  INFO
>> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
>> Application undeployment process started: [application-id] g-sc-G12-1*
>>
>> *// **a new instance is being started up*
>>
>> *…*
>>
>> *[2015-06-04 20:13:13,445]  INFO
>> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
>> Instance started successfully: [cartridge-type] c2 [cluster-id]
>> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
>> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>>
>>
>>
>> *// Also noteworthy seems the following warning which is seen repeatedly
>> in the logs:*
>>
>> *ReadWriteLock} -  System warning! Trying to release a lock which has not
>> been taken by the same thread: [lock-name]*
>>
>>
>>
>>
>>
>> [1.] Application structure
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
> --
>
> Udara Liyanage
> Software Engineer
> WSO2, Inc.: http://wso2.com
> lean. enterprise. middleware
>
> web: http://udaraliyanage.wordpress.com
> phone: +94 71 443 6897
>



-- 

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com
lean. enterprise. middleware

web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897

Re: Testing Stratos 4.1: Application undeployment: application fails to undeploy (nested grouping, group scaling)

Posted by Udara Liyanage <ud...@wso2.com>.
Hi,


For the first issue you have mentioned, the particular member is activated,
but it is still identified as an obsolete member and is being marked to be
terminated since pending time expired. Does that mean member is still in
Obsolete list even though it is being activated?

//member started
TID: [0] [STRATOS] [2015-06-04 19:53:04,706]  INFO
{org.apache.stratos.autoscaler.context.cluster.ClusterContext} -  Member
stat context has been added: [application] g-sc-G12-1 [cluster]
g-sc-G12-1.c1-0x0.c1.domain [clusterInstanceContext] g-sc-G12-1-1
[partitionContext] whole-region [member-id]
g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//member activated
TID: [0] [STRATOS] [2015-06-04 19:56:00,907]  INFO
{org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
-  Publishing member activated event: [service-name] c1 [cluster-id]
g-sc-G12-1.c1-0x0.c1.domain [cluster-instance-id] g-sc-G12-1-1 [member-id]
g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1
[network-partition-id] RegionOne [partition-id] whole-region
TID: [0] [STRATOS] [2015-06-04 19:56:00,916]  INFO
{org.apache.stratos.messaging.message.processor.topology.MemberActivatedMessageProcessor}
-  Member activated: [service] c1 [cluster] g-sc-G12-1.c1-0x0.c1.domain
[member] g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1

//after 15 minutes ---member is still in pending state, pending timeout
expired
TID: [0] [STRATOS] [2015-06-04 20:08:04,713]  INFO
{org.apache.stratos.autoscaler.context.partition.ClusterLevelPartitionContext$PendingMemberWatcher}
-  Pending state of member expired, member will be moved to obsolete list.
[pending member]
g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1 [expiry
time] 900000 [cluster] g-sc-G12-1.c1-0x0.c1.domain [cluster instance] null

On Fri, Jun 5, 2015 at 5:14 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi,
>
>
>
> I am running into a scenario where application un-deployment fails (using
> stratos with latest commit  b1b6bca3f99b6127da24c9af0a6b20faff2907be).
>
>
>
> For application structure see [1.], (debug enabled) wso2carbon.log,
> application.json, cartridge-group.json, deployment-policy, auto-scaling
> policies see attached zip file.
>
>
>
> *It is noteworthy, that while the application is running the following log
> statements /exceptions are observed:*
>
>
>
> *…*
>
> *Member is in the wrong list and it is removed from active members list:
> g-sc-G12-1.c1-0x0.c1.domainb0aa0188-49f1-47f6-a040-c2eab4acb5b1*
>
> *…*
>
> *TID: [0] [STRATOS] [2015-06-04 20:11:03,425] ERROR
> {org.apache.stratos.autoscaler.rule.RuleTasksDelegator} -  Cannot terminate
> instance*
>
> *…*
>
> *// **after receiving the application undeploy event:*
>
> *[2015-06-04 20:12:39,465]  INFO
> {org.apache.stratos.autoscaler.services.impl.AutoscalerServiceImpl} -
> Application undeployment process started: [application-id] g-sc-G12-1*
>
> *// **a new instance is being started up*
>
> *…*
>
> *[2015-06-04 20:13:13,445]  INFO
> {org.apache.stratos.cloud.controller.services.impl.InstanceCreator} -
> Instance started successfully: [cartridge-type] c2 [cluster-id]
> g-sc-G12-1.c2-1x0.c2.domain [instance-id]
> RegionOne/5d4699f7-b00b-42eb-b565-b48fc8f20407*
>
>
>
> *// Also noteworthy seems the following warning which is seen repeatedly
> in the logs:*
>
> *ReadWriteLock} -  System warning! Trying to release a lock which has not
> been taken by the same thread: [lock-name]*
>
>
>
>
>
> [1.] Application structure
>
>
>
>
>
>
>
>
>
>
>



-- 

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com
lean. enterprise. middleware

web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897