You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stratos.apache.org by Reka Thirunavukkarasu <re...@wso2.com> on 2015/05/01 04:10:36 UTC

Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,

Thanks Martin for the detailed information in order to analyze the issue.
It helped to isolate the issue.

As i went through the logs, it seems that some thread issue. I could see
below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
start a relevant clusterMonitor. After that only c3 got successfully
started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
start a thread for the MonitorAdder to create the ClusterMonitor.

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
{org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
{org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
Monitor scheduled: [type] cluster [component]
sub-G1-G2-G3-1-G4.c4-1x1.c4.domain

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
{org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
Starting monitor: [type] cluster [component]
sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
{org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
{org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
Monitor scheduled: [type] cluster [component]
sub-G1-G2-G3-1-G4.c3-1x1.c3.domain

TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
{org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
Starting monitor: [type] cluster [component]
sub-G1-G2-G3-1-G4.c3-1x1.c3.domain

Found below log for c3 which indicates that c3 monitor got started
successfully. But there is no such log for c4.

TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
{org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
Monitor started successfully: [type] cluster [component]
sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
seconds

@Gayan/Imesh, Do you have any input here? Will increasing the threadpool
solve this issue? Or is it related to something else?

Thanks,
Reka


On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Reka,
>
>
>
> Re-run the scenario, making sure the application alias and group alias are
> as suggested and debug logs are turned on (see config below)
>
>
>
> log4j.logger.org.apache.stratos.manager=DEBUG
>
> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>
> log4j.logger.org.apache.stratos.messaging=INFO
>
> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>
> log4j.logger.org.wso2.andes.client=ERROR
>
>
>
> This is the scenario:
>
>
>
> 1.      deployed application – see screenshot A. , debug logs
> wso2carbon-debug.log
> only 3 instances spin up
>
> 2.      removed application
>
> 3.      re-deployed application – see screenshot B. , debug logs
> wso2carbon-debug-2.log
> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
> released”
> 2nd time the application gets deployed all instances spin up and go active
>
>
>
>
>
> Please see attached artifacts and logs.
>
>
>
> A.     Application Status after deploying the application first time
> after stratos start up:
>
>
>
>
>
>
>
>
>
> B.     Application Status after re-deploying the application
>
> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
> 17:05:23,837] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
> released”:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Thursday, April 30, 2015 1:40 AM
>
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> If you get this issue continuously, can you please share the logs against
> master as we have improved some logs in the master yesterday?
>
> Thanks,
>
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> I have deployed the attached samples as earlier in openstack with latest
> master. All the clusters got created with the members. Please see the
> attached diagram. I'm unable to proceed further as my puppet configuration
> has to be corrected to make the member active. Thought of sharing this as
> all the clusters have members.
>
> Thanks,
>
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> HI Martin,
>
> Can you please confirm whether you are using unique applicationId and
> group alias? I can see from the UI, the applicationID and next group alias
> are same value as sub-G1-G2-G3-1..
>
> Thanks,
>
> Reka
>
>
>
>
>
> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> I have upgraded from beta to the latest stratos code on master and
> retested the scenario from jira STRATOS-1345 but still see the same issue
> (on open stack)
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Wednesday, April 29, 2015 2:54 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Reka,
>
>
>
> I will upgrade my system to the latest master and re-test,
>
>
>
> Regards
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
> *Sent:* Wednesday, April 29, 2015 11:55 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> While i was working on Application update, i fixed few issues with the
> termination behavior. Anyway there seems to be small issues in the logic
> which has to be fixed. I have started to verify this in my local setup. Can
> you create a jira? So that we can track it. I will update the progress in
> the jira..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> Thanks for following up - let me know if I should open a JIRA,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Tuesday, April 28, 2015 5:37 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> Thanks for bringing this up. I have fixed some issue in the flow while
> testing application update support with instances count. I will go through
> your scenarios to reproduce it and update the thread with the progress..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> I am testing a (nested grouping) scenario where a group defines a
> termination behavior “terminate-all”. When terminating the instance (of
> cartridge type c3), no new instance is restarted.
>
> My understanding is that a new instance should be started up.
>
>
>
> The scenario looks like this:
>
>
>
> Group ~G1 has a cartridge member c1 and group member ~G2
>
> Group ~G2 has a cartridge member c2 and group member ~G3
>
> Group ~G3 has a cartridge member c3
>
>
>
> Startup dependencies are: c1 depends on G2, c2 depends on G3
>
>
>
> ~G1 defines termination: none
>
> ~G2 defines termination: dependents
>
> ~G3 defines termination: all
>
>
>
> After startup, when all instances are active, instance c3 is terminated
> which correctly also terminates also instance c2 (since it depends on G3 /
> c3) .
>
> *Issue 1:*
>
> However, no new instances for c3 is started up (consequently no new
> instance for c2 should be started up as well) (see log see log
> wso2carbon.log)
>
>
>
> Only instance which remains running is c1.
>
> *Issue 2:*
>
> When subsequently c1 is manually being terminated, a new instance of c1 is
> started up (as opposed to Issue1) which I think is incorrect since it
> defines a startup dependency (c1 depends on G2) which is not fulfilled at
> the time (G2 should not be active since c2 is still terminated, see log
> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>
>
>
> WDYT ?
>
>
>
> Please find attached artifacts and logs
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
+1

Let me know and I’ll pick up the commit and re-test tomorrow

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Thursday, April 30, 2015 9:33 PM
To: Gayan Gunarathne
Cc: Martin Eppel (meppel); dev@stratos.apache.org; Imesh Gunaratne
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Thanks Gayan for pointing it out. Yah..Something might have gone wrong in the monitor creation.
+1 for adding more logs in order to figure out what is going wrong.
Thanks,
Reka

On Fri, May 1, 2015 at 9:51 AM, Gayan Gunarathne <ga...@wso2.com>> wrote:
Hi Reka,

As per the logs I can see c4 monitor scheduler starting.It means the run method is called in with the scheduler executor service.

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain

Seems something going wrong in the execution of the run method as we can't see the successfully started log.I think we can check by adding some logs in the scheduler run method it self.

Thanks
Gayan

On Fri, May 1, 2015 at 7:58 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Good to hear we found some lead … ☺

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Thursday, April 30, 2015 7:11 PM
To: dev; Imesh Gunaratne; Gayan Gunarathne

Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,

Thanks Martin for the detailed information in order to analyze the issue. It helped to isolate the issue.
As i went through the logs, it seems that some thread issue. I could see below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be start a relevant clusterMonitor. After that only c3 got successfully started with ClusterMonitor not c4. So the scheduler of c4 didn't actually start a thread for the MonitorAdder to create the ClusterMonitor.

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain

Found below log for c3 which indicates that c3 monitor got started successfully. But there is no such log for c4.
TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor started successfully: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3 seconds
@Gayan/Imesh, Do you have any input here? Will increasing the threadpool solve this issue? Or is it related to something else?
Thanks,
Reka


On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Re-run the scenario, making sure the application alias and group alias are as suggested and debug logs are turned on (see config below)

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR

This is the scenario:


1.      deployed application – see screenshot A. , debug logs wso2carbon-debug.log
only 3 instances spin up

2.      removed application

3.      re-deployed application – see screenshot B. , debug logs wso2carbon-debug-2.log
(after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock released”
2nd time the application gets deployed all instances spin up and go active


Please see attached artifacts and logs.


A.     Application Status after deploying the application first time after stratos start up:

[cid:image001.png@01D08390.49672490]




B.     Application Status after re-deploying the application
(see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock released”:

[cid:image002.png@01D08390.49672490]








From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Thursday, April 30, 2015 1:40 AM

To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

If you get this issue continuously, can you please share the logs against master as we have improved some logs in the master yesterday?
Thanks,
Reka

On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
I have deployed the attached samples as earlier in openstack with latest master. All the clusters got created with the members. Please see the attached diagram. I'm unable to proceed further as my puppet configuration has to be corrected to make the member active. Thought of sharing this as all the clusters have members.
Thanks,
Reka

On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
HI Martin,
Can you please confirm whether you are using unique applicationId and group alias? I can see from the UI, the applicationID and next group alias are same value as sub-G1-G2-G3-1..
Thanks,
Reka


On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I have upgraded from beta to the latest stratos code on master and retested the scenario from jira STRATOS-1345 but still see the same issue (on open stack)

Thanks

Martin

[cid:image003.png@01D08390.49672490]

From: Martin Eppel (meppel)
Sent: Wednesday, April 29, 2015 2:54 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Reka,

I will upgrade my system to the latest master and re-test,

Regards

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Wednesday, April 29, 2015 11:55 AM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
While i was working on Application update, i fixed few issues with the termination behavior. Anyway there seems to be small issues in the logic which has to be fixed. I have started to verify this in my local setup. Can you create a jira? So that we can track it. I will update the progress in the jira..
Thanks,
Reka

On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Thanks for following up - let me know if I should open a JIRA,

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Tuesday, April 28, 2015 5:37 AM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
Thanks for bringing this up. I have fixed some issue in the flow while testing application update support with instances count. I will go through your scenarios to reproduce it and update the thread with the progress..
Thanks,
Reka

On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
I am testing a (nested grouping) scenario where a group defines a termination behavior “terminate-all”. When terminating the instance (of cartridge type c3), no new instance is restarted.
My understanding is that a new instance should be started up.

The scenario looks like this:

Group ~G1 has a cartridge member c1 and group member ~G2
Group ~G2 has a cartridge member c2 and group member ~G3
Group ~G3 has a cartridge member c3

Startup dependencies are: c1 depends on G2, c2 depends on G3

~G1 defines termination: none
~G2 defines termination: dependents
~G3 defines termination: all

After startup, when all instances are active, instance c3 is terminated which correctly also terminates also instance c2 (since it depends on G3 / c3) .
Issue 1:
However, no new instances for c3 is started up (consequently no new instance for c2 should be started up as well) (see log see log wso2carbon.log)

Only instance which remains running is c1.
Issue 2:
When subsequently c1 is manually being terminated, a new instance of c1 is started up (as opposed to Issue1) which I think is incorrect since it defines a startup dependency (c1 depends on G2) which is not fulfilled at the time (G2 should not be active since c2 is still terminated, see log wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)

WDYT ?

Please find attached artifacts and logs

Thanks

Martin



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--

Gayan Gunarathne
Technical Lead
WSO2 Inc. (http://wso2.com<http://wso2.com/>)
email  : gayang@wso2.com<ma...@wso2.com>  | mobile : +94 766819985<tel:%2B94%20766819985>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007


Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Thanks Gayan for pointing it out. Yah..Something might have gone wrong in
the monitor creation.

+1 for adding more logs in order to figure out what is going wrong.

Thanks,
Reka

On Fri, May 1, 2015 at 9:51 AM, Gayan Gunarathne <ga...@wso2.com> wrote:

> Hi Reka,
>
> As per the logs I can see c4 monitor scheduler starting.It means the run
> method is called in with the scheduler executor service.
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.
> autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor:
> [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>
> Seems something going wrong in the execution of the run method as we can't
> see the successfully started log.I think we can check by adding some logs
> in the scheduler run method it self.
>
> Thanks
> Gayan
>
> On Fri, May 1, 2015 at 7:58 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>>  Good to hear we found some lead … J
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Thursday, April 30, 2015 7:11 PM
>> *To:* dev; Imesh Gunaratne; Gayan Gunarathne
>>
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Martin,
>>
>>
>>
>> Thanks Martin for the detailed information in order to analyze the issue.
>> It helped to isolate the issue.
>>
>> As i went through the logs, it seems that some thread issue. I could see
>> below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
>> start a relevant clusterMonitor. After that only c3 got successfully
>> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
>> start a thread for the MonitorAdder to create the ClusterMonitor.
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting monitor: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting monitor: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>
>> Found below log for c3 which indicates that c3 monitor got started
>> successfully. But there is no such log for c4.
>>
>> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Monitor started successfully: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
>> seconds
>>
>> @Gayan/Imesh, Do you have any input here? Will increasing the threadpool
>> solve this issue? Or is it related to something else?
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>>
>>
>> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> Re-run the scenario, making sure the application alias and group alias
>> are as suggested and debug logs are turned on (see config below)
>>
>>
>>
>> log4j.logger.org.apache.stratos.manager=DEBUG
>>
>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>
>> log4j.logger.org.apache.stratos.messaging=INFO
>>
>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>
>> log4j.logger.org.wso2.andes.client=ERROR
>>
>>
>>
>> This is the scenario:
>>
>>
>>
>> 1.      deployed application – see screenshot A. , debug logs
>> wso2carbon-debug.log
>> only 3 instances spin up
>>
>> 2.      removed application
>>
>> 3.      re-deployed application – see screenshot B. , debug logs
>> wso2carbon-debug-2.log
>> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>> released”
>> 2nd time the application gets deployed all instances spin up and go
>> active
>>
>>
>>
>>
>>
>> Please see attached artifacts and logs.
>>
>>
>>
>> A.     Application Status after deploying the application first time
>> after stratos start up:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> B.     Application Status after re-deploying the application
>>
>> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
>> 17:05:23,837] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>> released”:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Thursday, April 30, 2015 1:40 AM
>>
>>
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> If you get this issue continuously, can you please share the logs against
>> master as we have improved some logs in the master yesterday?
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Hi Martin,
>>
>> I have deployed the attached samples as earlier in openstack with latest
>> master. All the clusters got created with the members. Please see the
>> attached diagram. I'm unable to proceed further as my puppet configuration
>> has to be corrected to make the member active. Thought of sharing this as
>> all the clusters have members.
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> HI Martin,
>>
>> Can you please confirm whether you are using unique applicationId and
>> group alias? I can see from the UI, the applicationID and next group alias
>> are same value as sub-G1-G2-G3-1..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>>
>>
>> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> I have upgraded from beta to the latest stratos code on master and
>> retested the scenario from jira STRATOS-1345 but still see the same issue
>> (on open stack)
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Wednesday, April 29, 2015 2:54 PM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Reka,
>>
>>
>>
>> I will upgrade my system to the latest master and re-test,
>>
>>
>>
>> Regards
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
>> *Sent:* Wednesday, April 29, 2015 11:55 AM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Martin,
>>
>> While i was working on Application update, i fixed few issues with the
>> termination behavior. Anyway there seems to be small issues in the logic
>> which has to be fixed. I have started to verify this in my local setup. Can
>> you create a jira? So that we can track it. I will update the progress in
>> the jira..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> Thanks for following up - let me know if I should open a JIRA,
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Tuesday, April 28, 2015 5:37 AM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Martin,
>>
>> Thanks for bringing this up. I have fixed some issue in the flow while
>> testing application update support with instances count. I will go through
>> your scenarios to reproduce it and update the thread with the progress..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> I am testing a (nested grouping) scenario where a group defines a
>> termination behavior “terminate-all”. When terminating the instance (of
>> cartridge type c3), no new instance is restarted.
>>
>> My understanding is that a new instance should be started up.
>>
>>
>>
>> The scenario looks like this:
>>
>>
>>
>> Group ~G1 has a cartridge member c1 and group member ~G2
>>
>> Group ~G2 has a cartridge member c2 and group member ~G3
>>
>> Group ~G3 has a cartridge member c3
>>
>>
>>
>> Startup dependencies are: c1 depends on G2, c2 depends on G3
>>
>>
>>
>> ~G1 defines termination: none
>>
>> ~G2 defines termination: dependents
>>
>> ~G3 defines termination: all
>>
>>
>>
>> After startup, when all instances are active, instance c3 is terminated
>> which correctly also terminates also instance c2 (since it depends on G3 /
>> c3) .
>>
>> *Issue 1:*
>>
>> However, no new instances for c3 is started up (consequently no new
>> instance for c2 should be started up as well) (see log see log
>> wso2carbon.log)
>>
>>
>>
>> Only instance which remains running is c1.
>>
>> *Issue 2:*
>>
>> When subsequently c1 is manually being terminated, a new instance of c1
>> is started up (as opposed to Issue1) which I think is incorrect since it
>> defines a startup dependency (c1 depends on G2) which is not fulfilled at
>> the time (G2 should not be active since c2 is still terminated, see log
>> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>>
>>
>>
>> WDYT ?
>>
>>
>>
>> Please find attached artifacts and logs
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>
>
>
> --
>
> Gayan Gunarathne
> Technical Lead
> WSO2 Inc. (http://wso2.com)
> email  : gayang@wso2.com  | mobile : +94 766819985
>
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Gayan Gunarathne <ga...@wso2.com>.
Hi Reka,

As per the logs I can see c4 monitor scheduler starting.It means the run
method is called in with the scheduler executor service.

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.
autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor:
[type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain

Seems something going wrong in the execution of the run method as we can't
see the successfully started log.I think we can check by adding some logs
in the scheduler run method it self.

Thanks
Gayan

On Fri, May 1, 2015 at 7:58 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Good to hear we found some lead … J
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Thursday, April 30, 2015 7:11 PM
> *To:* dev; Imesh Gunaratne; Gayan Gunarathne
>
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
>
>
> Thanks Martin for the detailed information in order to analyze the issue.
> It helped to isolate the issue.
>
> As i went through the logs, it seems that some thread issue. I could see
> below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
> start a relevant clusterMonitor. After that only c3 got successfully
> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
> start a thread for the MonitorAdder to create the ClusterMonitor.
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>
> Found below log for c3 which indicates that c3 monitor got started
> successfully. But there is no such log for c4.
>
> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor started successfully: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
> seconds
>
> @Gayan/Imesh, Do you have any input here? Will increasing the threadpool
> solve this issue? Or is it related to something else?
>
> Thanks,
>
> Reka
>
>
>
>
>
> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> Re-run the scenario, making sure the application alias and group alias are
> as suggested and debug logs are turned on (see config below)
>
>
>
> log4j.logger.org.apache.stratos.manager=DEBUG
>
> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>
> log4j.logger.org.apache.stratos.messaging=INFO
>
> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>
> log4j.logger.org.wso2.andes.client=ERROR
>
>
>
> This is the scenario:
>
>
>
> 1.      deployed application – see screenshot A. , debug logs
> wso2carbon-debug.log
> only 3 instances spin up
>
> 2.      removed application
>
> 3.      re-deployed application – see screenshot B. , debug logs
> wso2carbon-debug-2.log
> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
> released”
> 2nd time the application gets deployed all instances spin up and go active
>
>
>
>
>
> Please see attached artifacts and logs.
>
>
>
> A.     Application Status after deploying the application first time
> after stratos start up:
>
>
>
>
>
>
>
>
>
> B.     Application Status after re-deploying the application
>
> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
> 17:05:23,837] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
> released”:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Thursday, April 30, 2015 1:40 AM
>
>
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> If you get this issue continuously, can you please share the logs against
> master as we have improved some logs in the master yesterday?
>
> Thanks,
>
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> I have deployed the attached samples as earlier in openstack with latest
> master. All the clusters got created with the members. Please see the
> attached diagram. I'm unable to proceed further as my puppet configuration
> has to be corrected to make the member active. Thought of sharing this as
> all the clusters have members.
>
> Thanks,
>
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> HI Martin,
>
> Can you please confirm whether you are using unique applicationId and
> group alias? I can see from the UI, the applicationID and next group alias
> are same value as sub-G1-G2-G3-1..
>
> Thanks,
>
> Reka
>
>
>
>
>
> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> I have upgraded from beta to the latest stratos code on master and
> retested the scenario from jira STRATOS-1345 but still see the same issue
> (on open stack)
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Wednesday, April 29, 2015 2:54 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Reka,
>
>
>
> I will upgrade my system to the latest master and re-test,
>
>
>
> Regards
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
> *Sent:* Wednesday, April 29, 2015 11:55 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> While i was working on Application update, i fixed few issues with the
> termination behavior. Anyway there seems to be small issues in the logic
> which has to be fixed. I have started to verify this in my local setup. Can
> you create a jira? So that we can track it. I will update the progress in
> the jira..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> Thanks for following up - let me know if I should open a JIRA,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Tuesday, April 28, 2015 5:37 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> Thanks for bringing this up. I have fixed some issue in the flow while
> testing application update support with instances count. I will go through
> your scenarios to reproduce it and update the thread with the progress..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> I am testing a (nested grouping) scenario where a group defines a
> termination behavior “terminate-all”. When terminating the instance (of
> cartridge type c3), no new instance is restarted.
>
> My understanding is that a new instance should be started up.
>
>
>
> The scenario looks like this:
>
>
>
> Group ~G1 has a cartridge member c1 and group member ~G2
>
> Group ~G2 has a cartridge member c2 and group member ~G3
>
> Group ~G3 has a cartridge member c3
>
>
>
> Startup dependencies are: c1 depends on G2, c2 depends on G3
>
>
>
> ~G1 defines termination: none
>
> ~G2 defines termination: dependents
>
> ~G3 defines termination: all
>
>
>
> After startup, when all instances are active, instance c3 is terminated
> which correctly also terminates also instance c2 (since it depends on G3 /
> c3) .
>
> *Issue 1:*
>
> However, no new instances for c3 is started up (consequently no new
> instance for c2 should be started up as well) (see log see log
> wso2carbon.log)
>
>
>
> Only instance which remains running is c1.
>
> *Issue 2:*
>
> When subsequently c1 is manually being terminated, a new instance of c1 is
> started up (as opposed to Issue1) which I think is incorrect since it
> defines a startup dependency (c1 depends on G2) which is not fulfilled at
> the time (G2 should not be active since c2 is still terminated, see log
> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>
>
>
> WDYT ?
>
>
>
> Please find attached artifacts and logs
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>



-- 

Gayan Gunarathne
Technical Lead
WSO2 Inc. (http://wso2.com)
email  : gayang@wso2.com  | mobile : +94 766819985

RE: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Good to hear we found some lead … ☺

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Thursday, April 30, 2015 7:11 PM
To: dev; Imesh Gunaratne; Gayan Gunarathne
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,

Thanks Martin for the detailed information in order to analyze the issue. It helped to isolate the issue.
As i went through the logs, it seems that some thread issue. I could see below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be start a relevant clusterMonitor. After that only c3 got successfully started with ClusterMonitor not c4. So the scheduler of c4 didn't actually start a thread for the MonitorAdder to create the ClusterMonitor.

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain

Found below log for c3 which indicates that c3 monitor got started successfully. But there is no such log for c4.
TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor started successfully: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3 seconds
@Gayan/Imesh, Do you have any input here? Will increasing the threadpool solve this issue? Or is it related to something else?
Thanks,
Reka


On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Re-run the scenario, making sure the application alias and group alias are as suggested and debug logs are turned on (see config below)

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR

This is the scenario:


1.      deployed application – see screenshot A. , debug logs wso2carbon-debug.log
only 3 instances spin up

2.      removed application

3.      re-deployed application – see screenshot B. , debug logs wso2carbon-debug-2.log
(after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock released”
2nd time the application gets deployed all instances spin up and go active


Please see attached artifacts and logs.


A.     Application Status after deploying the application first time after stratos start up:

[cid:image002.png@01D0837B.D7DF8330]




B.     Application Status after re-deploying the application
(see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock released”:

[cid:image003.png@01D0837B.D7DF8330]








From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Thursday, April 30, 2015 1:40 AM

To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

If you get this issue continuously, can you please share the logs against master as we have improved some logs in the master yesterday?
Thanks,
Reka

On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
I have deployed the attached samples as earlier in openstack with latest master. All the clusters got created with the members. Please see the attached diagram. I'm unable to proceed further as my puppet configuration has to be corrected to make the member active. Thought of sharing this as all the clusters have members.
Thanks,
Reka

On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
HI Martin,
Can you please confirm whether you are using unique applicationId and group alias? I can see from the UI, the applicationID and next group alias are same value as sub-G1-G2-G3-1..
Thanks,
Reka


On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I have upgraded from beta to the latest stratos code on master and retested the scenario from jira STRATOS-1345 but still see the same issue (on open stack)

Thanks

Martin

[cid:image004.png@01D0837B.D7DF8330]

From: Martin Eppel (meppel)
Sent: Wednesday, April 29, 2015 2:54 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Reka,

I will upgrade my system to the latest master and re-test,

Regards

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Wednesday, April 29, 2015 11:55 AM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
While i was working on Application update, i fixed few issues with the termination behavior. Anyway there seems to be small issues in the logic which has to be fixed. I have started to verify this in my local setup. Can you create a jira? So that we can track it. I will update the progress in the jira..
Thanks,
Reka

On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Thanks for following up - let me know if I should open a JIRA,

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Tuesday, April 28, 2015 5:37 AM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
Thanks for bringing this up. I have fixed some issue in the flow while testing application update support with instances count. I will go through your scenarios to reproduce it and update the thread with the progress..
Thanks,
Reka

On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
I am testing a (nested grouping) scenario where a group defines a termination behavior “terminate-all”. When terminating the instance (of cartridge type c3), no new instance is restarted.
My understanding is that a new instance should be started up.

The scenario looks like this:

Group ~G1 has a cartridge member c1 and group member ~G2
Group ~G2 has a cartridge member c2 and group member ~G3
Group ~G3 has a cartridge member c3

Startup dependencies are: c1 depends on G2, c2 depends on G3

~G1 defines termination: none
~G2 defines termination: dependents
~G3 defines termination: all

After startup, when all instances are active, instance c3 is terminated which correctly also terminates also instance c2 (since it depends on G3 / c3) .
Issue 1:
However, no new instances for c3 is started up (consequently no new instance for c2 should be started up as well) (see log see log wso2carbon.log)

Only instance which remains running is c1.
Issue 2:
When subsequently c1 is manually being terminated, a new instance of c1 is started up (as opposed to Issue1) which I think is incorrect since it defines a startup dependency (c1 depends on G2) which is not fulfilled at the time (G2 should not be active since c2 is still terminated, see log wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)

WDYT ?

Please find attached artifacts and logs

Thanks

Martin



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>


Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Thanks Imesh..!!

On Thu, May 7, 2015 at 8:08 PM, Imesh Gunaratne <im...@apache.org> wrote:

> Hi Reka,
>
> Excellent!! Great work!!
>
> I think your fix is correct. As I see in Drools 5, the knowledge base is
> thread safe [1], anyway it would be better not to share data structures
> among multiple drools executions.
>
> [1] http://www.intertech.com/Blog/introducing-drools-5-post-2-of-3/
>
> Thanks
>
> On Thu, May 7, 2015 at 7:37 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
>> Hi Martin,
>>
>> Yah..I have committed the patch..Please let me, if there is any issue...
>>
>> Thanks,
>> Reka
>>
>> On Thu, May 7, 2015 at 2:07 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>>>  Reka,
>>>
>>>
>>>
>>> Did you commit the patch ?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>>> *Sent:* Tuesday, May 05, 2015 9:36 PM
>>> *To:* Martin Eppel (meppel)
>>> *Cc:* dev; Imesh Gunaratne; Lahiru Sandaruwan
>>>
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> That's a great news. Let me know, how your other testing goes with the
>>> patch. I will continue with this patch when verifying other issues..
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Wed, May 6, 2015 at 12:13 AM, Martin Eppel (meppel) <me...@cisco.com>
>>> wrote:
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> Good news – the patch seems to work fine, please find attached the usual
>>> artifacts / log (debug / thread lock detection enabled) + [1.].
>>>
>>> I’ll run some more tests, let’s keep the fingers crossed.
>>>
>>>
>>>
>>>
>>>
>>> [1.] Screenshot:
>>>
>>>
>>>
>>>
>>>
>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>>> *Sent:* Tuesday, May 05, 2015 6:17 AM
>>> *To:* dev; Imesh Gunaratne; Lahiru Sandaruwan; Martin Eppel (meppel)
>>>
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> Hi Martin,
>>>
>>> I have added a possible fix by considering the threading model in
>>> ClusterMonitor. I have verified it locally and it is working fine. Can you
>>> apply this patch as attached herewith and continue testing the same
>>> scenario? If this fixes the issue and no regression, then i can push it to
>>> master.
>>>
>>> What i have fixed it that, made AutoscalerRuleEvaluator singleton and
>>> initializes the map only when creating the instance. Then i have removed
>>> the drool related variables from ClusterMonitor and added it to
>>> ClusterInstanceContext as there are threads getting spawned per
>>> ClusterInstanceContext in order to increase the performance. In that case,
>>> the drool related variables shouldn't be shared across
>>> ClusterInstanceContext. Then it can cause any conflict situation as i
>>> checked the code. So, i have added the drools related variables per
>>> ClusterInstanceContext. So that each thread can take their own local copy
>>> of variables from the stack and execute.
>>>
>>> @Imesh/Lahiru,
>>>
>>> Please correct me, if I'm wrong.
>>>
>>>
>>>
>>> FYI: The patch has created on top of
>>> 5c87d5de2ad15788f47907d89641c52dd3d21d53 this commit.
>>>
>>> Please let me know, if you face any issues with it.
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Tue, May 5, 2015 at 5:51 PM, Reka Thirunavukkarasu <re...@wso2.com>
>>> wrote:
>>>
>>> Hi Martin,
>>>
>>> I will have to implement this solution in a thread safe manner as
>>> multiple cluster monitors are sharing the same resource. It will get
>>> impacted the Cluster monitor monitoring part as well. I'm still trying to
>>> figure out a solution for this issue. Will keep you updated with the
>>> progress..
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Tue, May 5, 2015 at 5:19 PM, Reka Thirunavukkarasu <re...@wso2.com>
>>> wrote:
>>>
>>> Hi
>>>
>>> I suspect the issue is that we use static knowledgeBases map in the
>>> AutoscalerRuleEvaluator. But this is getting initialized by every cluster
>>> monitor. We need to fix this cluster monitor creation flow to use static
>>> knowledgeBases map and initialize only once  or properly sharing this map
>>> across multiple threads, since each cluster monitors are threads.
>>>
>>> Since drools file parsing can be done only once and used by all other
>>> monitors, i will work on a fix to make drool file parsing only once. Hope
>>> that fix would solve this issue.
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>>
>>>
>>> On Tue, May 5, 2015 at 4:44 PM, Reka Thirunavukkarasu <re...@wso2.com>
>>> wrote:
>>>
>>> Hi Martin/Imesh,
>>>
>>> Thanks Imesh for adding the exception handling in the monitor creation.
>>> That helped to narrow down the issue. It was a drool file parsed issue. I
>>> found below exception in both samples when creating those relevant
>>> monitors. We will have to identify why the drool parsing gave NPE. @Lahiru,
>>> Do you have any idea on this? In both samples, the cluster Monitors failed
>>> when parsing "obsoletecheck.drl".
>>>
>>> Since i couldn't figure out the root cause, i have added descriptive
>>> debug logs (in 5c87d5de2ad15788f47907d89641c52dd3d21d53) in order to
>>> isolate the issue. @Martin, Would you get a chance to test it and provide
>>> us the logs again with the same scenario, since I'm unable to reproduce
>>> this from my side?
>>>
>>>
>>> scenario_c1_c2_c3_c4_cartridges:
>>>
>>> TID: [0] [STRATOS] [2015-05-01 18:24:22,591] DEBUG
>>> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
>>> is parsed successfully: obsoletecheck.drl
>>> TID: [0] [STRATOS] [2015-05-01 18:24:22,594] ERROR
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> An error occurred while starting monitor: [type] cluster [component]
>>> sub-G1-G2-G3-1-Id.c3-1x1.c3.domain
>>> java.lang.NullPointerException
>>>     at
>>> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:137)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>>>     at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>     at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>     at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>     at java.lang.Thread.run(Thread.java:745)
>>>
>>> scenario_c1_c2_cartridges:
>>>
>>> TID: [0] [STRATOS] [2015-05-01 17:58:50,824] DEBUG
>>> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
>>> is parsed successfully: dependent-scaling.drl
>>> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] ERROR
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> An error occurred while starting monitor: [type] cluster [component]
>>> subscription-G1-G2-G3-Id.c1-1x1.c1.domain
>>> java.lang.NullPointerException
>>>     at
>>> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:135)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>>>     at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>     at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>     at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>     at java.lang.Thread.run(Thread.java:745)
>>> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] DEBUG
>>> {org.apache.stratos.autoscaler.monitor.
>>>
>>> @Martin, However, there seems to be separate locking issue. That is not
>>> related to this. For now, that locking issue seems to be harmless. Can we
>>> track it in a jira?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, May 2, 2015 at 12:28 AM, Martin Eppel (meppel) <me...@cisco.com>
>>> wrote:
>>>
>>> Hi Imesh, Reka
>>>
>>>
>>>
>>> As request, please see attached artifacts and logs (with debug enabled)
>>> to test for the deadlock – stratos is running the latest from master,
>>> latest commit :
>>>
>>> commit ae89ba09491891512a9bc89e080577c565ebe8b7
>>>
>>> Author: reka <rt...@gmail.com>
>>>
>>> Date:   Fri May 1 12:30:55 2015 +0530
>>>
>>>
>>>
>>> I run 2 similar but slightly scenarios, see [1.], [2.]
>>>
>>>
>>>
>>> Java startup with lock monitor enabled:
>>>
>>>
>>>
>>> /opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m
>>> -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
>>>
>>> rror
>>> -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof
>>> -Dcom.sun.management.jmxremote -classpath
>>> /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
>>>
>>> -stratos/lib/commons-lang-2.6.0.wso2v1.jar
>>> -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed
>>> -Djava.io
>>>
>>> .tmpdir=/opt/wso2/apache-stratos/tmp
>>> -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat
>>> -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
>>>
>>> pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos
>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>> -Dcarbon.config.dir.path=/opt/wso2/apac
>>>
>>> he-stratos/repository/conf
>>> -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties
>>> -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins
>>> -Dconf.location=/opt/wso2/apache-stratos/repository/conf
>>> -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties
>>> -Dcom.atomikos.icatch.hide_init_file_path=true
>>> -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true
>>> -Dcom.sun.jndi.ldap.connect.pool.a
>>>
>>> uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000
>>> -Dorg.terracotta.quartz.skipUpdateCheck=true
>>> -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8
>>> -Ddisable.cassandra.server.startup=true
>>> -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf
>>> -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml
>>> -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml
>>> *-Dread.write.lock.monitor.enabled*=true
>>> org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default
>>>
>>>
>>>
>>>
>>>
>>> [1.] scenario_c1_c2_cartridges
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> [1b.] exception
>>>
>>>
>>>
>>> *org.apache.stratos.common.exception.LockNotReleasedException*
>>>
>>> *        at
>>> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)*
>>>
>>> *        at
>>> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)*
>>>
>>> *        at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)*
>>>
>>> *        at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)*
>>>
>>> *        at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)*
>>>
>>> *        at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)*
>>>
>>> *        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
>>>
>>> *        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
>>>
>>> *        at java.lang.Thread.run(Thread.java:745)*
>>>
>>>
>>>
>>>
>>>
>>> [2.] scenario_c1_c2_c3_c4_cartridges
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>>> *Sent:* Thursday, April 30, 2015 10:10 PM
>>>
>>>
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> In addition we have not added a try catch block in MonitorAdder.run()
>>> method to cover its full scope. Therefore if an exception is raised in the
>>> middle the above problem also can cause.
>>>
>>>
>>>
>>> I have now fixed this in commit revision:
>>>
>>> 9ec061f44a3189ccd8b509ef4da980687dfbcf62
>>>
>>>
>>>
>>> Martin: Appreciate if you could take this fix and retest.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org>
>>> wrote:
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> It looks like the MonitorAdder.run() has executed properly, that's why
>>> we see the following log:
>>>
>>>
>>>
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Starting monitor: [type] cluster [component]
>>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>>
>>>
>>>
>>> However the thread has not come to its last line:
>>>
>>> *log*.info(String.*format*(*"Monitor started successfully: [type] %s [component] %s [dependents] %s " *+
>>>                 *"[startup-time] %d seconds"*, *monitorTypeStr*, *context*.getId(),
>>>
>>>
>>>
>>> As we discussed offline this may have caused by a deadlock while trying
>>> to get the following topology lock:
>>>
>>>
>>>
>>> *public static *ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
>>>                                                ClusterChildContext context,
>>>                                                List<String> parentInstanceIds)
>>>     ...
>>>
>>> *//acquire read lock for the service and cluster    *TopologyManager.*acquireReadLockForCluster*(serviceName, clusterId);
>>>
>>>
>>>
>>> Martin: Will you be able to do another test run by enabling deadlock
>>> detection logic. You could set the following system property to true in the
>>> stratos.sh file to do this:
>>>
>>> *read.write.lock.monitor.enabled=true*
>>>
>>>  Thanks
>>>
>>>
>>>
>>>
>>>
>>> On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>
>>> wrote:
>>>
>>> Hi Martin,
>>>
>>>
>>>
>>> Thanks Martin for the detailed information in order to analyze the
>>> issue. It helped to isolate the issue.
>>>
>>> As i went through the logs, it seems that some thread issue. I could see
>>> below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
>>> start a relevant clusterMonitor. After that only c3 got successfully
>>> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
>>> start a thread for the MonitorAdder to create the ClusterMonitor.
>>>
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>>
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Starting monitor: [type] cluster [component]
>>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>>
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Starting monitor: [type] cluster [component]
>>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>>
>>> Found below log for c3 which indicates that c3 monitor got started
>>> successfully. But there is no such log for c4.
>>>
>>> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Monitor started successfully: [type] cluster [component]
>>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
>>> seconds
>>>
>>> @Gayan/Imesh, Do you have any input here? Will increasing the threadpool
>>> solve this issue? Or is it related to something else?
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <
>>> meppel@cisco.com> wrote:
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> Re-run the scenario, making sure the application alias and group alias
>>> are as suggested and debug logs are turned on (see config below)
>>>
>>>
>>>
>>> log4j.logger.org.apache.stratos.manager=DEBUG
>>>
>>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>>
>>> log4j.logger.org.apache.stratos.messaging=INFO
>>>
>>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>>
>>> log4j.logger.org.wso2.andes.client=ERROR
>>>
>>>
>>>
>>> This is the scenario:
>>>
>>>
>>>
>>> 1.      deployed application – see screenshot A. , debug logs
>>> wso2carbon-debug.log
>>> only 3 instances spin up
>>>
>>> 2.      removed application
>>>
>>> 3.      re-deployed application – see screenshot B. , debug logs
>>> wso2carbon-debug-2.log
>>> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>>> released”
>>> 2nd time the application gets deployed all instances spin up and go
>>> active
>>>
>>>
>>>
>>>
>>>
>>> Please see attached artifacts and logs.
>>>
>>>
>>>
>>> A.     Application Status after deploying the application first time
>>> after stratos start up:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> B.     Application Status after re-deploying the application
>>>
>>> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
>>> 17:05:23,837] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>>> released”:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>>> *Sent:* Thursday, April 30, 2015 1:40 AM
>>>
>>>
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> If you get this issue continuously, can you please share the logs
>>> against master as we have improved some logs in the master yesterday?
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
>>> wrote:
>>>
>>> Hi Martin,
>>>
>>> I have deployed the attached samples as earlier in openstack with latest
>>> master. All the clusters got created with the members. Please see the
>>> attached diagram. I'm unable to proceed further as my puppet configuration
>>> has to be corrected to make the member active. Thought of sharing this as
>>> all the clusters have members.
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
>>> wrote:
>>>
>>> HI Martin,
>>>
>>> Can you please confirm whether you are using unique applicationId and
>>> group alias? I can see from the UI, the applicationID and next group alias
>>> are same value as sub-G1-G2-G3-1..
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <
>>> meppel@cisco.com> wrote:
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> I have upgraded from beta to the latest stratos code on master and
>>> retested the scenario from jira STRATOS-1345 but still see the same issue
>>> (on open stack)
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>>
>>>
>>> *From:* Martin Eppel (meppel)
>>> *Sent:* Wednesday, April 29, 2015 2:54 PM
>>> *To:* dev@stratos.apache.org
>>> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> I will upgrade my system to the latest master and re-test,
>>>
>>>
>>>
>>> Regards
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
>>> *Sent:* Wednesday, April 29, 2015 11:55 AM
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> Hi Martin,
>>>
>>> While i was working on Application update, i fixed few issues with the
>>> termination behavior. Anyway there seems to be small issues in the logic
>>> which has to be fixed. I have started to verify this in my local setup. Can
>>> you create a jira? So that we can track it. I will update the progress in
>>> the jira..
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <
>>> meppel@cisco.com> wrote:
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> Thanks for following up - let me know if I should open a JIRA,
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>>> *Sent:* Tuesday, April 28, 2015 5:37 AM
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> Hi Martin,
>>>
>>> Thanks for bringing this up. I have fixed some issue in the flow while
>>> testing application update support with instances count. I will go through
>>> your scenarios to reproduce it and update the thread with the progress..
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
>>> wrote:
>>>
>>> I am testing a (nested grouping) scenario where a group defines a
>>> termination behavior “terminate-all”. When terminating the instance (of
>>> cartridge type c3), no new instance is restarted.
>>>
>>> My understanding is that a new instance should be started up.
>>>
>>>
>>>
>>> The scenario looks like this:
>>>
>>>
>>>
>>> Group ~G1 has a cartridge member c1 and group member ~G2
>>>
>>> Group ~G2 has a cartridge member c2 and group member ~G3
>>>
>>> Group ~G3 has a cartridge member c3
>>>
>>>
>>>
>>> Startup dependencies are: c1 depends on G2, c2 depends on G3
>>>
>>>
>>>
>>> ~G1 defines termination: none
>>>
>>> ~G2 defines termination: dependents
>>>
>>> ~G3 defines termination: all
>>>
>>>
>>>
>>> After startup, when all instances are active, instance c3 is terminated
>>> which correctly also terminates also instance c2 (since it depends on G3 /
>>> c3) .
>>>
>>> *Issue 1:*
>>>
>>> However, no new instances for c3 is started up (consequently no new
>>> instance for c2 should be started up as well) (see log see log
>>> wso2carbon.log)
>>>
>>>
>>>
>>> Only instance which remains running is c1.
>>>
>>> *Issue 2:*
>>>
>>> When subsequently c1 is manually being terminated, a new instance of c1
>>> is started up (as opposed to Issue1) which I think is incorrect since it
>>> defines a startup dependency (c1 depends on G2) which is not fulfilled at
>>> the time (G2 should not be active since c2 is still terminated, see log
>>> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>>>
>>>
>>>
>>> WDYT ?
>>>
>>>
>>>
>>> Please find attached artifacts and logs
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>
>>
>>
>> --
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>> Mobile: +94776442007
>>
>>
>>
>
>
> --
> Imesh Gunaratne
>
> Senior Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Reka,

Excellent!! Great work!!

I think your fix is correct. As I see in Drools 5, the knowledge base is
thread safe [1], anyway it would be better not to share data structures
among multiple drools executions.

[1] http://www.intertech.com/Blog/introducing-drools-5-post-2-of-3/

Thanks

On Thu, May 7, 2015 at 7:37 AM, Reka Thirunavukkarasu <re...@wso2.com> wrote:

> Hi Martin,
>
> Yah..I have committed the patch..Please let me, if there is any issue...
>
> Thanks,
> Reka
>
> On Thu, May 7, 2015 at 2:07 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>>  Reka,
>>
>>
>>
>> Did you commit the patch ?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Tuesday, May 05, 2015 9:36 PM
>> *To:* Martin Eppel (meppel)
>> *Cc:* dev; Imesh Gunaratne; Lahiru Sandaruwan
>>
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> That's a great news. Let me know, how your other testing goes with the
>> patch. I will continue with this patch when verifying other issues..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Wed, May 6, 2015 at 12:13 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> Good news – the patch seems to work fine, please find attached the usual
>> artifacts / log (debug / thread lock detection enabled) + [1.].
>>
>> I’ll run some more tests, let’s keep the fingers crossed.
>>
>>
>>
>>
>>
>> [1.] Screenshot:
>>
>>
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Tuesday, May 05, 2015 6:17 AM
>> *To:* dev; Imesh Gunaratne; Lahiru Sandaruwan; Martin Eppel (meppel)
>>
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Martin,
>>
>> I have added a possible fix by considering the threading model in
>> ClusterMonitor. I have verified it locally and it is working fine. Can you
>> apply this patch as attached herewith and continue testing the same
>> scenario? If this fixes the issue and no regression, then i can push it to
>> master.
>>
>> What i have fixed it that, made AutoscalerRuleEvaluator singleton and
>> initializes the map only when creating the instance. Then i have removed
>> the drool related variables from ClusterMonitor and added it to
>> ClusterInstanceContext as there are threads getting spawned per
>> ClusterInstanceContext in order to increase the performance. In that case,
>> the drool related variables shouldn't be shared across
>> ClusterInstanceContext. Then it can cause any conflict situation as i
>> checked the code. So, i have added the drools related variables per
>> ClusterInstanceContext. So that each thread can take their own local copy
>> of variables from the stack and execute.
>>
>> @Imesh/Lahiru,
>>
>> Please correct me, if I'm wrong.
>>
>>
>>
>> FYI: The patch has created on top of
>> 5c87d5de2ad15788f47907d89641c52dd3d21d53 this commit.
>>
>> Please let me know, if you face any issues with it.
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Tue, May 5, 2015 at 5:51 PM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Hi Martin,
>>
>> I will have to implement this solution in a thread safe manner as
>> multiple cluster monitors are sharing the same resource. It will get
>> impacted the Cluster monitor monitoring part as well. I'm still trying to
>> figure out a solution for this issue. Will keep you updated with the
>> progress..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Tue, May 5, 2015 at 5:19 PM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Hi
>>
>> I suspect the issue is that we use static knowledgeBases map in the
>> AutoscalerRuleEvaluator. But this is getting initialized by every cluster
>> monitor. We need to fix this cluster monitor creation flow to use static
>> knowledgeBases map and initialize only once  or properly sharing this map
>> across multiple threads, since each cluster monitors are threads.
>>
>> Since drools file parsing can be done only once and used by all other
>> monitors, i will work on a fix to make drool file parsing only once. Hope
>> that fix would solve this issue.
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>>
>>
>> On Tue, May 5, 2015 at 4:44 PM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Hi Martin/Imesh,
>>
>> Thanks Imesh for adding the exception handling in the monitor creation.
>> That helped to narrow down the issue. It was a drool file parsed issue. I
>> found below exception in both samples when creating those relevant
>> monitors. We will have to identify why the drool parsing gave NPE. @Lahiru,
>> Do you have any idea on this? In both samples, the cluster Monitors failed
>> when parsing "obsoletecheck.drl".
>>
>> Since i couldn't figure out the root cause, i have added descriptive
>> debug logs (in 5c87d5de2ad15788f47907d89641c52dd3d21d53) in order to
>> isolate the issue. @Martin, Would you get a chance to test it and provide
>> us the logs again with the same scenario, since I'm unable to reproduce
>> this from my side?
>>
>>
>> scenario_c1_c2_c3_c4_cartridges:
>>
>> TID: [0] [STRATOS] [2015-05-01 18:24:22,591] DEBUG
>> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
>> is parsed successfully: obsoletecheck.drl
>> TID: [0] [STRATOS] [2015-05-01 18:24:22,594] ERROR
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> An error occurred while starting monitor: [type] cluster [component]
>> sub-G1-G2-G3-1-Id.c3-1x1.c3.domain
>> java.lang.NullPointerException
>>     at
>> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>>     at
>> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:137)
>>     at
>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>>     at
>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>>     at
>> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>>     at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>     at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>     at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>     at java.lang.Thread.run(Thread.java:745)
>>
>> scenario_c1_c2_cartridges:
>>
>> TID: [0] [STRATOS] [2015-05-01 17:58:50,824] DEBUG
>> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
>> is parsed successfully: dependent-scaling.drl
>> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] ERROR
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> An error occurred while starting monitor: [type] cluster [component]
>> subscription-G1-G2-G3-Id.c1-1x1.c1.domain
>> java.lang.NullPointerException
>>     at
>> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>>     at
>> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:135)
>>     at
>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>>     at
>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>>     at
>> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>>     at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>     at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>     at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>     at java.lang.Thread.run(Thread.java:745)
>> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] DEBUG
>> {org.apache.stratos.autoscaler.monitor.
>>
>> @Martin, However, there seems to be separate locking issue. That is not
>> related to this. For now, that locking issue seems to be harmless. Can we
>> track it in a jira?
>>
>>
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>>
>>
>>
>>
>> On Sat, May 2, 2015 at 12:28 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Imesh, Reka
>>
>>
>>
>> As request, please see attached artifacts and logs (with debug enabled)
>> to test for the deadlock – stratos is running the latest from master,
>> latest commit :
>>
>> commit ae89ba09491891512a9bc89e080577c565ebe8b7
>>
>> Author: reka <rt...@gmail.com>
>>
>> Date:   Fri May 1 12:30:55 2015 +0530
>>
>>
>>
>> I run 2 similar but slightly scenarios, see [1.], [2.]
>>
>>
>>
>> Java startup with lock monitor enabled:
>>
>>
>>
>> /opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m
>> -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
>>
>> rror
>> -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof
>> -Dcom.sun.management.jmxremote -classpath
>> /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
>>
>> -stratos/lib/commons-lang-2.6.0.wso2v1.jar
>> -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed
>> -Djava.io
>>
>> .tmpdir=/opt/wso2/apache-stratos/tmp
>> -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat
>> -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
>>
>> pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos
>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>> -Dcarbon.config.dir.path=/opt/wso2/apac
>>
>> he-stratos/repository/conf
>> -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties
>> -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins
>> -Dconf.location=/opt/wso2/apache-stratos/repository/conf
>> -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties
>> -Dcom.atomikos.icatch.hide_init_file_path=true
>> -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true
>> -Dcom.sun.jndi.ldap.connect.pool.a
>>
>> uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000
>> -Dorg.terracotta.quartz.skipUpdateCheck=true
>> -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8
>> -Ddisable.cassandra.server.startup=true
>> -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf
>> -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml
>> -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml
>> *-Dread.write.lock.monitor.enabled*=true
>> org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default
>>
>>
>>
>>
>>
>> [1.] scenario_c1_c2_cartridges
>>
>>
>>
>>
>>
>>
>>
>> [1b.] exception
>>
>>
>>
>> *org.apache.stratos.common.exception.LockNotReleasedException*
>>
>> *        at
>> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)*
>>
>> *        at
>> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)*
>>
>> *        at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)*
>>
>> *        at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)*
>>
>> *        at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)*
>>
>> *        at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)*
>>
>> *        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
>>
>> *        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
>>
>> *        at java.lang.Thread.run(Thread.java:745)*
>>
>>
>>
>>
>>
>> [2.] scenario_c1_c2_c3_c4_cartridges
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* Thursday, April 30, 2015 10:10 PM
>>
>>
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> In addition we have not added a try catch block in MonitorAdder.run()
>> method to cover its full scope. Therefore if an exception is raised in the
>> middle the above problem also can cause.
>>
>>
>>
>> I have now fixed this in commit revision:
>>
>> 9ec061f44a3189ccd8b509ef4da980687dfbcf62
>>
>>
>>
>> Martin: Appreciate if you could take this fix and retest.
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> It looks like the MonitorAdder.run() has executed properly, that's why we
>> see the following log:
>>
>>
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting monitor: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>
>>
>>
>> However the thread has not come to its last line:
>>
>> *log*.info(String.*format*(*"Monitor started successfully: [type] %s [component] %s [dependents] %s " *+
>>                 *"[startup-time] %d seconds"*, *monitorTypeStr*, *context*.getId(),
>>
>>
>>
>> As we discussed offline this may have caused by a deadlock while trying
>> to get the following topology lock:
>>
>>
>>
>> *public static *ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
>>                                                ClusterChildContext context,
>>                                                List<String> parentInstanceIds)
>>     ...
>>
>> *//acquire read lock for the service and cluster    *TopologyManager.*acquireReadLockForCluster*(serviceName, clusterId);
>>
>>
>>
>> Martin: Will you be able to do another test run by enabling deadlock
>> detection logic. You could set the following system property to true in the
>> stratos.sh file to do this:
>>
>> *read.write.lock.monitor.enabled=true*
>>
>>  Thanks
>>
>>
>>
>>
>>
>> On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Hi Martin,
>>
>>
>>
>> Thanks Martin for the detailed information in order to analyze the issue.
>> It helped to isolate the issue.
>>
>> As i went through the logs, it seems that some thread issue. I could see
>> below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
>> start a relevant clusterMonitor. After that only c3 got successfully
>> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
>> start a thread for the MonitorAdder to create the ClusterMonitor.
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting monitor: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting monitor: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>
>> Found below log for c3 which indicates that c3 monitor got started
>> successfully. But there is no such log for c4.
>>
>> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Monitor started successfully: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
>> seconds
>>
>> @Gayan/Imesh, Do you have any input here? Will increasing the threadpool
>> solve this issue? Or is it related to something else?
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> Re-run the scenario, making sure the application alias and group alias
>> are as suggested and debug logs are turned on (see config below)
>>
>>
>>
>> log4j.logger.org.apache.stratos.manager=DEBUG
>>
>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>
>> log4j.logger.org.apache.stratos.messaging=INFO
>>
>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>
>> log4j.logger.org.wso2.andes.client=ERROR
>>
>>
>>
>> This is the scenario:
>>
>>
>>
>> 1.      deployed application – see screenshot A. , debug logs
>> wso2carbon-debug.log
>> only 3 instances spin up
>>
>> 2.      removed application
>>
>> 3.      re-deployed application – see screenshot B. , debug logs
>> wso2carbon-debug-2.log
>> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>> released”
>> 2nd time the application gets deployed all instances spin up and go
>> active
>>
>>
>>
>>
>>
>> Please see attached artifacts and logs.
>>
>>
>>
>> A.     Application Status after deploying the application first time
>> after stratos start up:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> B.     Application Status after re-deploying the application
>>
>> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
>> 17:05:23,837] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>> released”:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Thursday, April 30, 2015 1:40 AM
>>
>>
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> If you get this issue continuously, can you please share the logs against
>> master as we have improved some logs in the master yesterday?
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Hi Martin,
>>
>> I have deployed the attached samples as earlier in openstack with latest
>> master. All the clusters got created with the members. Please see the
>> attached diagram. I'm unable to proceed further as my puppet configuration
>> has to be corrected to make the member active. Thought of sharing this as
>> all the clusters have members.
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> HI Martin,
>>
>> Can you please confirm whether you are using unique applicationId and
>> group alias? I can see from the UI, the applicationID and next group alias
>> are same value as sub-G1-G2-G3-1..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>>
>>
>> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> I have upgraded from beta to the latest stratos code on master and
>> retested the scenario from jira STRATOS-1345 but still see the same issue
>> (on open stack)
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Wednesday, April 29, 2015 2:54 PM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Reka,
>>
>>
>>
>> I will upgrade my system to the latest master and re-test,
>>
>>
>>
>> Regards
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
>> *Sent:* Wednesday, April 29, 2015 11:55 AM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Martin,
>>
>> While i was working on Application update, i fixed few issues with the
>> termination behavior. Anyway there seems to be small issues in the logic
>> which has to be fixed. I have started to verify this in my local setup. Can
>> you create a jira? So that we can track it. I will update the progress in
>> the jira..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> Thanks for following up - let me know if I should open a JIRA,
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Tuesday, April 28, 2015 5:37 AM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Martin,
>>
>> Thanks for bringing this up. I have fixed some issue in the flow while
>> testing application update support with instances count. I will go through
>> your scenarios to reproduce it and update the thread with the progress..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> I am testing a (nested grouping) scenario where a group defines a
>> termination behavior “terminate-all”. When terminating the instance (of
>> cartridge type c3), no new instance is restarted.
>>
>> My understanding is that a new instance should be started up.
>>
>>
>>
>> The scenario looks like this:
>>
>>
>>
>> Group ~G1 has a cartridge member c1 and group member ~G2
>>
>> Group ~G2 has a cartridge member c2 and group member ~G3
>>
>> Group ~G3 has a cartridge member c3
>>
>>
>>
>> Startup dependencies are: c1 depends on G2, c2 depends on G3
>>
>>
>>
>> ~G1 defines termination: none
>>
>> ~G2 defines termination: dependents
>>
>> ~G3 defines termination: all
>>
>>
>>
>> After startup, when all instances are active, instance c3 is terminated
>> which correctly also terminates also instance c2 (since it depends on G3 /
>> c3) .
>>
>> *Issue 1:*
>>
>> However, no new instances for c3 is started up (consequently no new
>> instance for c2 should be started up as well) (see log see log
>> wso2carbon.log)
>>
>>
>>
>> Only instance which remains running is c1.
>>
>> *Issue 2:*
>>
>> When subsequently c1 is manually being terminated, a new instance of c1
>> is started up (as opposed to Issue1) which I think is incorrect since it
>> defines a startup dependency (c1 depends on G2) which is not fulfilled at
>> the time (G2 should not be active since c2 is still terminated, see log
>> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>>
>>
>>
>> WDYT ?
>>
>>
>>
>> Please find attached artifacts and logs
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>
>
>
> --
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
> Mobile: +94776442007
>
>
>


-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

Yah..I have committed the patch..Please let me, if there is any issue...

Thanks,
Reka

On Thu, May 7, 2015 at 2:07 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Reka,
>
>
>
> Did you commit the patch ?
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Tuesday, May 05, 2015 9:36 PM
> *To:* Martin Eppel (meppel)
> *Cc:* dev; Imesh Gunaratne; Lahiru Sandaruwan
>
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> That's a great news. Let me know, how your other testing goes with the
> patch. I will continue with this patch when verifying other issues..
>
> Thanks,
>
> Reka
>
>
>
> On Wed, May 6, 2015 at 12:13 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> Good news – the patch seems to work fine, please find attached the usual
> artifacts / log (debug / thread lock detection enabled) + [1.].
>
> I’ll run some more tests, let’s keep the fingers crossed.
>
>
>
>
>
> [1.] Screenshot:
>
>
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Tuesday, May 05, 2015 6:17 AM
> *To:* dev; Imesh Gunaratne; Lahiru Sandaruwan; Martin Eppel (meppel)
>
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> I have added a possible fix by considering the threading model in
> ClusterMonitor. I have verified it locally and it is working fine. Can you
> apply this patch as attached herewith and continue testing the same
> scenario? If this fixes the issue and no regression, then i can push it to
> master.
>
> What i have fixed it that, made AutoscalerRuleEvaluator singleton and
> initializes the map only when creating the instance. Then i have removed
> the drool related variables from ClusterMonitor and added it to
> ClusterInstanceContext as there are threads getting spawned per
> ClusterInstanceContext in order to increase the performance. In that case,
> the drool related variables shouldn't be shared across
> ClusterInstanceContext. Then it can cause any conflict situation as i
> checked the code. So, i have added the drools related variables per
> ClusterInstanceContext. So that each thread can take their own local copy
> of variables from the stack and execute.
>
> @Imesh/Lahiru,
>
> Please correct me, if I'm wrong.
>
>
>
> FYI: The patch has created on top of
> 5c87d5de2ad15788f47907d89641c52dd3d21d53 this commit.
>
> Please let me know, if you face any issues with it.
>
> Thanks,
>
> Reka
>
>
>
> On Tue, May 5, 2015 at 5:51 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> I will have to implement this solution in a thread safe manner as multiple
> cluster monitors are sharing the same resource. It will get impacted the
> Cluster monitor monitoring part as well. I'm still trying to figure out a
> solution for this issue. Will keep you updated with the progress..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, May 5, 2015 at 5:19 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi
>
> I suspect the issue is that we use static knowledgeBases map in the
> AutoscalerRuleEvaluator. But this is getting initialized by every cluster
> monitor. We need to fix this cluster monitor creation flow to use static
> knowledgeBases map and initialize only once  or properly sharing this map
> across multiple threads, since each cluster monitors are threads.
>
> Since drools file parsing can be done only once and used by all other
> monitors, i will work on a fix to make drool file parsing only once. Hope
> that fix would solve this issue.
>
> Thanks,
>
> Reka
>
>
>
>
>
> On Tue, May 5, 2015 at 4:44 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin/Imesh,
>
> Thanks Imesh for adding the exception handling in the monitor creation.
> That helped to narrow down the issue. It was a drool file parsed issue. I
> found below exception in both samples when creating those relevant
> monitors. We will have to identify why the drool parsing gave NPE. @Lahiru,
> Do you have any idea on this? In both samples, the cluster Monitors failed
> when parsing "obsoletecheck.drl".
>
> Since i couldn't figure out the root cause, i have added descriptive debug
> logs (in 5c87d5de2ad15788f47907d89641c52dd3d21d53) in order to isolate the
> issue. @Martin, Would you get a chance to test it and provide us the logs
> again with the same scenario, since I'm unable to reproduce this from my
> side?
>
>
> scenario_c1_c2_c3_c4_cartridges:
>
> TID: [0] [STRATOS] [2015-05-01 18:24:22,591] DEBUG
> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
> is parsed successfully: obsoletecheck.drl
> TID: [0] [STRATOS] [2015-05-01 18:24:22,594] ERROR
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> An error occurred while starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-Id.c3-1x1.c3.domain
> java.lang.NullPointerException
>     at
> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>     at
> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:137)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>     at
> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>     at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
>
> scenario_c1_c2_cartridges:
>
> TID: [0] [STRATOS] [2015-05-01 17:58:50,824] DEBUG
> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
> is parsed successfully: dependent-scaling.drl
> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] ERROR
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> An error occurred while starting monitor: [type] cluster [component]
> subscription-G1-G2-G3-Id.c1-1x1.c1.domain
> java.lang.NullPointerException
>     at
> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>     at
> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:135)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>     at
> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>     at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] DEBUG
> {org.apache.stratos.autoscaler.monitor.
>
> @Martin, However, there seems to be separate locking issue. That is not
> related to this. For now, that locking issue seems to be harmless. Can we
> track it in a jira?
>
>
>
> Thanks,
>
> Reka
>
>
>
>
>
>
>
> On Sat, May 2, 2015 at 12:28 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Imesh, Reka
>
>
>
> As request, please see attached artifacts and logs (with debug enabled) to
> test for the deadlock – stratos is running the latest from master, latest
> commit :
>
> commit ae89ba09491891512a9bc89e080577c565ebe8b7
>
> Author: reka <rt...@gmail.com>
>
> Date:   Fri May 1 12:30:55 2015 +0530
>
>
>
> I run 2 similar but slightly scenarios, see [1.], [2.]
>
>
>
> Java startup with lock monitor enabled:
>
>
>
> /opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m
> -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
>
> rror
> -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof
> -Dcom.sun.management.jmxremote -classpath
> /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
>
> -stratos/lib/commons-lang-2.6.0.wso2v1.jar
> -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed
> -Djava.io
>
> .tmpdir=/opt/wso2/apache-stratos/tmp
> -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat
> -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
>
> pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Dcarbon.config.dir.path=/opt/wso2/apac
>
> he-stratos/repository/conf
> -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties
> -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins
> -Dconf.location=/opt/wso2/apache-stratos/repository/conf
> -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties
> -Dcom.atomikos.icatch.hide_init_file_path=true
> -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true
> -Dcom.sun.jndi.ldap.connect.pool.a
>
> uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000
> -Dorg.terracotta.quartz.skipUpdateCheck=true
> -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8
> -Ddisable.cassandra.server.startup=true
> -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf
> -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml
> -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml
> *-Dread.write.lock.monitor.enabled*=true
> org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default
>
>
>
>
>
> [1.] scenario_c1_c2_cartridges
>
>
>
>
>
>
>
> [1b.] exception
>
>
>
> *org.apache.stratos.common.exception.LockNotReleasedException*
>
> *        at
> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)*
>
> *        at
> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)*
>
> *        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)*
>
> *        at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)*
>
> *        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)*
>
> *        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)*
>
> *        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
>
> *        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
>
> *        at java.lang.Thread.run(Thread.java:745)*
>
>
>
>
>
> [2.] scenario_c1_c2_c3_c4_cartridges
>
>
>
>
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Thursday, April 30, 2015 10:10 PM
>
>
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> In addition we have not added a try catch block in MonitorAdder.run()
> method to cover its full scope. Therefore if an exception is raised in the
> middle the above problem also can cause.
>
>
>
> I have now fixed this in commit revision:
>
> 9ec061f44a3189ccd8b509ef4da980687dfbcf62
>
>
>
> Martin: Appreciate if you could take this fix and retest.
>
>
>
> Thanks
>
>
>
> On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Reka,
>
>
>
> It looks like the MonitorAdder.run() has executed properly, that's why we
> see the following log:
>
>
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>
>
>
> However the thread has not come to its last line:
>
> *log*.info(String.*format*(*"Monitor started successfully: [type] %s [component] %s [dependents] %s " *+
>                 *"[startup-time] %d seconds"*, *monitorTypeStr*, *context*.getId(),
>
>
>
> As we discussed offline this may have caused by a deadlock while trying to
> get the following topology lock:
>
>
>
> *public static *ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
>                                                ClusterChildContext context,
>                                                List<String> parentInstanceIds)
>     ...
>
> *//acquire read lock for the service and cluster    *TopologyManager.*acquireReadLockForCluster*(serviceName, clusterId);
>
>
>
> Martin: Will you be able to do another test run by enabling deadlock
> detection logic. You could set the following system property to true in the
> stratos.sh file to do this:
>
> *read.write.lock.monitor.enabled=true*
>
>  Thanks
>
>
>
>
>
> On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
>
>
> Thanks Martin for the detailed information in order to analyze the issue.
> It helped to isolate the issue.
>
> As i went through the logs, it seems that some thread issue. I could see
> below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
> start a relevant clusterMonitor. After that only c3 got successfully
> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
> start a thread for the MonitorAdder to create the ClusterMonitor.
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>
> Found below log for c3 which indicates that c3 monitor got started
> successfully. But there is no such log for c4.
>
> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor started successfully: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
> seconds
>
> @Gayan/Imesh, Do you have any input here? Will increasing the threadpool
> solve this issue? Or is it related to something else?
>
> Thanks,
>
> Reka
>
>
>
>
>
>
>
> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> Re-run the scenario, making sure the application alias and group alias are
> as suggested and debug logs are turned on (see config below)
>
>
>
> log4j.logger.org.apache.stratos.manager=DEBUG
>
> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>
> log4j.logger.org.apache.stratos.messaging=INFO
>
> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>
> log4j.logger.org.wso2.andes.client=ERROR
>
>
>
> This is the scenario:
>
>
>
> 1.      deployed application – see screenshot A. , debug logs
> wso2carbon-debug.log
> only 3 instances spin up
>
> 2.      removed application
>
> 3.      re-deployed application – see screenshot B. , debug logs
> wso2carbon-debug-2.log
> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
> released”
> 2nd time the application gets deployed all instances spin up and go active
>
>
>
>
>
> Please see attached artifacts and logs.
>
>
>
> A.     Application Status after deploying the application first time
> after stratos start up:
>
>
>
>
>
>
>
>
>
> B.     Application Status after re-deploying the application
>
> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
> 17:05:23,837] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
> released”:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Thursday, April 30, 2015 1:40 AM
>
>
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> If you get this issue continuously, can you please share the logs against
> master as we have improved some logs in the master yesterday?
>
> Thanks,
>
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> I have deployed the attached samples as earlier in openstack with latest
> master. All the clusters got created with the members. Please see the
> attached diagram. I'm unable to proceed further as my puppet configuration
> has to be corrected to make the member active. Thought of sharing this as
> all the clusters have members.
>
> Thanks,
>
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> HI Martin,
>
> Can you please confirm whether you are using unique applicationId and
> group alias? I can see from the UI, the applicationID and next group alias
> are same value as sub-G1-G2-G3-1..
>
> Thanks,
>
> Reka
>
>
>
>
>
> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> I have upgraded from beta to the latest stratos code on master and
> retested the scenario from jira STRATOS-1345 but still see the same issue
> (on open stack)
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Wednesday, April 29, 2015 2:54 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Reka,
>
>
>
> I will upgrade my system to the latest master and re-test,
>
>
>
> Regards
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
> *Sent:* Wednesday, April 29, 2015 11:55 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> While i was working on Application update, i fixed few issues with the
> termination behavior. Anyway there seems to be small issues in the logic
> which has to be fixed. I have started to verify this in my local setup. Can
> you create a jira? So that we can track it. I will update the progress in
> the jira..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> Thanks for following up - let me know if I should open a JIRA,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Tuesday, April 28, 2015 5:37 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> Thanks for bringing this up. I have fixed some issue in the flow while
> testing application update support with instances count. I will go through
> your scenarios to reproduce it and update the thread with the progress..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> I am testing a (nested grouping) scenario where a group defines a
> termination behavior “terminate-all”. When terminating the instance (of
> cartridge type c3), no new instance is restarted.
>
> My understanding is that a new instance should be started up.
>
>
>
> The scenario looks like this:
>
>
>
> Group ~G1 has a cartridge member c1 and group member ~G2
>
> Group ~G2 has a cartridge member c2 and group member ~G3
>
> Group ~G3 has a cartridge member c3
>
>
>
> Startup dependencies are: c1 depends on G2, c2 depends on G3
>
>
>
> ~G1 defines termination: none
>
> ~G2 defines termination: dependents
>
> ~G3 defines termination: all
>
>
>
> After startup, when all instances are active, instance c3 is terminated
> which correctly also terminates also instance c2 (since it depends on G3 /
> c3) .
>
> *Issue 1:*
>
> However, no new instances for c3 is started up (consequently no new
> instance for c2 should be started up as well) (see log see log
> wso2carbon.log)
>
>
>
> Only instance which remains running is c1.
>
> *Issue 2:*
>
> When subsequently c1 is manually being terminated, a new instance of c1 is
> started up (as opposed to Issue1) which I think is incorrect since it
> defines a startup dependency (c1 depends on G2) which is not fulfilled at
> the time (G2 should not be active since c2 is still terminated, see log
> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>
>
>
> WDYT ?
>
>
>
> Please find attached artifacts and logs
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Reka,

Did you commit the patch ?

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Tuesday, May 05, 2015 9:36 PM
To: Martin Eppel (meppel)
Cc: dev; Imesh Gunaratne; Lahiru Sandaruwan
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

That's a great news. Let me know, how your other testing goes with the patch. I will continue with this patch when verifying other issues..
Thanks,
Reka

On Wed, May 6, 2015 at 12:13 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Good news – the patch seems to work fine, please find attached the usual artifacts / log (debug / thread lock detection enabled) + [1.].
I’ll run some more tests, let’s keep the fingers crossed.


[1.] Screenshot:
[cid:image001.png@01D08801.6DA24400]


From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Tuesday, May 05, 2015 6:17 AM
To: dev; Imesh Gunaratne; Lahiru Sandaruwan; Martin Eppel (meppel)
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
I have added a possible fix by considering the threading model in ClusterMonitor. I have verified it locally and it is working fine. Can you apply this patch as attached herewith and continue testing the same scenario? If this fixes the issue and no regression, then i can push it to master.
What i have fixed it that, made AutoscalerRuleEvaluator singleton and initializes the map only when creating the instance. Then i have removed the drool related variables from ClusterMonitor and added it to ClusterInstanceContext as there are threads getting spawned per ClusterInstanceContext in order to increase the performance. In that case, the drool related variables shouldn't be shared across ClusterInstanceContext. Then it can cause any conflict situation as i checked the code. So, i have added the drools related variables per ClusterInstanceContext. So that each thread can take their own local copy of variables from the stack and execute.
@Imesh/Lahiru,
Please correct me, if I'm wrong.

FYI: The patch has created on top of 5c87d5de2ad15788f47907d89641c52dd3d21d53 this commit.
Please let me know, if you face any issues with it.
Thanks,
Reka

On Tue, May 5, 2015 at 5:51 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
I will have to implement this solution in a thread safe manner as multiple cluster monitors are sharing the same resource. It will get impacted the Cluster monitor monitoring part as well. I'm still trying to figure out a solution for this issue. Will keep you updated with the progress..
Thanks,
Reka

On Tue, May 5, 2015 at 5:19 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi
I suspect the issue is that we use static knowledgeBases map in the AutoscalerRuleEvaluator. But this is getting initialized by every cluster monitor. We need to fix this cluster monitor creation flow to use static knowledgeBases map and initialize only once  or properly sharing this map across multiple threads, since each cluster monitors are threads.
Since drools file parsing can be done only once and used by all other monitors, i will work on a fix to make drool file parsing only once. Hope that fix would solve this issue.
Thanks,
Reka


On Tue, May 5, 2015 at 4:44 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Thanks Imesh for adding the exception handling in the monitor creation. That helped to narrow down the issue. It was a drool file parsed issue. I found below exception in both samples when creating those relevant monitors. We will have to identify why the drool parsing gave NPE. @Lahiru, Do you have any idea on this? In both samples, the cluster Monitors failed when parsing "obsoletecheck.drl".
Since i couldn't figure out the root cause, i have added descriptive debug logs (in 5c87d5de2ad15788f47907d89641c52dd3d21d53) in order to isolate the issue. @Martin, Would you get a chance to test it and provide us the logs again with the same scenario, since I'm unable to reproduce this from my side?

scenario_c1_c2_c3_c4_cartridges:

TID: [0] [STRATOS] [2015-05-01 18:24:22,591] DEBUG {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file is parsed successfully: obsoletecheck.drl
TID: [0] [STRATOS] [2015-05-01 18:24:22,594] ERROR {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  An error occurred while starting monitor: [type] cluster [component] sub-G1-G2-G3-1-Id.c3-1x1.c3.domain
java.lang.NullPointerException
    at org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
    at org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:137)
    at org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
    at org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
    at org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

scenario_c1_c2_cartridges:

TID: [0] [STRATOS] [2015-05-01 17:58:50,824] DEBUG {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file is parsed successfully: dependent-scaling.drl
TID: [0] [STRATOS] [2015-05-01 17:58:50,825] ERROR {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  An error occurred while starting monitor: [type] cluster [component] subscription-G1-G2-G3-Id.c1-1x1.c1.domain
java.lang.NullPointerException
    at org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
    at org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:135)
    at org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
    at org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
    at org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
TID: [0] [STRATOS] [2015-05-01 17:58:50,825] DEBUG {org.apache.stratos.autoscaler.monitor.
@Martin, However, there seems to be separate locking issue. That is not related to this. For now, that locking issue seems to be harmless. Can we track it in a jira?

Thanks,
Reka



On Sat, May 2, 2015 at 12:28 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Imesh, Reka

As request, please see attached artifacts and logs (with debug enabled) to test for the deadlock – stratos is running the latest from master, latest commit :
commit ae89ba09491891512a9bc89e080577c565ebe8b7
Author: reka <rt...@gmail.com>>
Date:   Fri May 1 12:30:55 2015 +0530

I run 2 similar but slightly scenarios, see [1.], [2.]

Java startup with lock monitor enabled:

/opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
rror -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof -Dcom.sun.management.jmxremote -classpath /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
-stratos/lib/commons-lang-2.6.0.wso2v1.jar -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed -Djava.io
.tmpdir=/opt/wso2/apache-stratos/tmp -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dcarbon.config.dir.path=/opt/wso2/apac
he-stratos/repository/conf -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins -Dconf.location=/opt/wso2/apache-stratos/repository/conf -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties -Dcom.atomikos.icatch.hide_init_file_path=true -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true -Dcom.sun.jndi.ldap.connect.pool.a
uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000 -Dorg.terracotta.quartz.skipUpdateCheck=true -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8 -Ddisable.cassandra.server.startup=true -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml -Dread.write.lock.monitor.enabled=true org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default


[1.] scenario_c1_c2_cartridges

[cid:image002.png@01D08801.6DA24400]


[1b.] exception

org.apache.stratos.common.exception.LockNotReleasedException
        at org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)
        at org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


[2.] scenario_c1_c2_c3_c4_cartridges

[cid:image003.png@01D08801.6DA24400]




From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Thursday, April 30, 2015 10:10 PM

To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

In addition we have not added a try catch block in MonitorAdder.run() method to cover its full scope. Therefore if an exception is raised in the middle the above problem also can cause.

I have now fixed this in commit revision:
9ec061f44a3189ccd8b509ef4da980687dfbcf62

Martin: Appreciate if you could take this fix and retest.

Thanks

On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Reka,

It looks like the MonitorAdder.run() has executed properly, that's why we see the following log:

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain

However the thread has not come to its last line:

log.info(String.format("Monitor started successfully: [type] %s [component] %s [dependents] %s " +
                "[startup-time] %d seconds", monitorTypeStr, context.getId(),

As we discussed offline this may have caused by a deadlock while trying to get the following topology lock:


public static ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
                                               ClusterChildContext context,
                                               List<String> parentInstanceIds)
    ...
    //acquire read lock for the service and cluster
    TopologyManager.acquireReadLockForCluster(serviceName, clusterId);

Martin: Will you be able to do another test run by enabling deadlock detection logic. You could set the following system property to true in the stratos.sh file to do this:

read.write.lock.monitor.enabled=true
Thanks


On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,

Thanks Martin for the detailed information in order to analyze the issue. It helped to isolate the issue.
As i went through the logs, it seems that some thread issue. I could see below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be start a relevant clusterMonitor. After that only c3 got successfully started with ClusterMonitor not c4. So the scheduler of c4 didn't actually start a thread for the MonitorAdder to create the ClusterMonitor.

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain

Found below log for c3 which indicates that c3 monitor got started successfully. But there is no such log for c4.
TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor started successfully: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3 seconds
@Gayan/Imesh, Do you have any input here? Will increasing the threadpool solve this issue? Or is it related to something else?
Thanks,
Reka



On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Re-run the scenario, making sure the application alias and group alias are as suggested and debug logs are turned on (see config below)

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR

This is the scenario:


1.      deployed application – see screenshot A. , debug logs wso2carbon-debug.log
only 3 instances spin up

2.      removed application

3.      re-deployed application – see screenshot B. , debug logs wso2carbon-debug-2.log
(after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock released”
2nd time the application gets deployed all instances spin up and go active


Please see attached artifacts and logs.


A.     Application Status after deploying the application first time after stratos start up:

[cid:image004.png@01D08801.6DA24400]




B.     Application Status after re-deploying the application
(see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock released”:

[cid:image005.png@01D08801.6DA24400]








From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Thursday, April 30, 2015 1:40 AM

To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

If you get this issue continuously, can you please share the logs against master as we have improved some logs in the master yesterday?
Thanks,
Reka

On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
I have deployed the attached samples as earlier in openstack with latest master. All the clusters got created with the members. Please see the attached diagram. I'm unable to proceed further as my puppet configuration has to be corrected to make the member active. Thought of sharing this as all the clusters have members.
Thanks,
Reka

On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
HI Martin,
Can you please confirm whether you are using unique applicationId and group alias? I can see from the UI, the applicationID and next group alias are same value as sub-G1-G2-G3-1..
Thanks,
Reka


On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I have upgraded from beta to the latest stratos code on master and retested the scenario from jira STRATOS-1345 but still see the same issue (on open stack)

Thanks

Martin

[cid:image006.png@01D08801.6DA24400]

From: Martin Eppel (meppel)
Sent: Wednesday, April 29, 2015 2:54 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Reka,

I will upgrade my system to the latest master and re-test,

Regards

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Wednesday, April 29, 2015 11:55 AM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
While i was working on Application update, i fixed few issues with the termination behavior. Anyway there seems to be small issues in the logic which has to be fixed. I have started to verify this in my local setup. Can you create a jira? So that we can track it. I will update the progress in the jira..
Thanks,
Reka

On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Thanks for following up - let me know if I should open a JIRA,

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Tuesday, April 28, 2015 5:37 AM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
Thanks for bringing this up. I have fixed some issue in the flow while testing application update support with instances count. I will go through your scenarios to reproduce it and update the thread with the progress..
Thanks,
Reka

On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
I am testing a (nested grouping) scenario where a group defines a termination behavior “terminate-all”. When terminating the instance (of cartridge type c3), no new instance is restarted.
My understanding is that a new instance should be started up.

The scenario looks like this:

Group ~G1 has a cartridge member c1 and group member ~G2
Group ~G2 has a cartridge member c2 and group member ~G3
Group ~G3 has a cartridge member c3

Startup dependencies are: c1 depends on G2, c2 depends on G3

~G1 defines termination: none
~G2 defines termination: dependents
~G3 defines termination: all

After startup, when all instances are active, instance c3 is terminated which correctly also terminates also instance c2 (since it depends on G3 / c3) .
Issue 1:
However, no new instances for c3 is started up (consequently no new instance for c2 should be started up as well) (see log see log wso2carbon.log)

Only instance which remains running is c1.
Issue 2:
When subsequently c1 is manually being terminated, a new instance of c1 is started up (as opposed to Issue1) which I think is incorrect since it defines a startup dependency (c1 depends on G2) which is not fulfilled at the time (G2 should not be active since c2 is still terminated, see log wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)

WDYT ?

Please find attached artifacts and logs

Thanks

Martin



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007


Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
That's a great news. Let me know, how your other testing goes with the
patch. I will continue with this patch when verifying other issues..

Thanks,
Reka

On Wed, May 6, 2015 at 12:13 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Reka,
>
>
>
> Good news – the patch seems to work fine, please find attached the usual
> artifacts / log (debug / thread lock detection enabled) + [1.].
>
> I’ll run some more tests, let’s keep the fingers crossed.
>
>
>
>
>
> [1.] Screenshot:
>
>
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Tuesday, May 05, 2015 6:17 AM
> *To:* dev; Imesh Gunaratne; Lahiru Sandaruwan; Martin Eppel (meppel)
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> I have added a possible fix by considering the threading model in
> ClusterMonitor. I have verified it locally and it is working fine. Can you
> apply this patch as attached herewith and continue testing the same
> scenario? If this fixes the issue and no regression, then i can push it to
> master.
>
> What i have fixed it that, made AutoscalerRuleEvaluator singleton and
> initializes the map only when creating the instance. Then i have removed
> the drool related variables from ClusterMonitor and added it to
> ClusterInstanceContext as there are threads getting spawned per
> ClusterInstanceContext in order to increase the performance. In that case,
> the drool related variables shouldn't be shared across
> ClusterInstanceContext. Then it can cause any conflict situation as i
> checked the code. So, i have added the drools related variables per
> ClusterInstanceContext. So that each thread can take their own local copy
> of variables from the stack and execute.
>
> @Imesh/Lahiru,
>
> Please correct me, if I'm wrong.
>
>
>
> FYI: The patch has created on top of
> 5c87d5de2ad15788f47907d89641c52dd3d21d53 this commit.
>
> Please let me know, if you face any issues with it.
>
> Thanks,
>
> Reka
>
>
>
> On Tue, May 5, 2015 at 5:51 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> I will have to implement this solution in a thread safe manner as multiple
> cluster monitors are sharing the same resource. It will get impacted the
> Cluster monitor monitoring part as well. I'm still trying to figure out a
> solution for this issue. Will keep you updated with the progress..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, May 5, 2015 at 5:19 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi
>
> I suspect the issue is that we use static knowledgeBases map in the
> AutoscalerRuleEvaluator. But this is getting initialized by every cluster
> monitor. We need to fix this cluster monitor creation flow to use static
> knowledgeBases map and initialize only once  or properly sharing this map
> across multiple threads, since each cluster monitors are threads.
>
> Since drools file parsing can be done only once and used by all other
> monitors, i will work on a fix to make drool file parsing only once. Hope
> that fix would solve this issue.
>
> Thanks,
>
> Reka
>
>
>
>
>
> On Tue, May 5, 2015 at 4:44 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin/Imesh,
>
> Thanks Imesh for adding the exception handling in the monitor creation.
> That helped to narrow down the issue. It was a drool file parsed issue. I
> found below exception in both samples when creating those relevant
> monitors. We will have to identify why the drool parsing gave NPE. @Lahiru,
> Do you have any idea on this? In both samples, the cluster Monitors failed
> when parsing "obsoletecheck.drl".
>
> Since i couldn't figure out the root cause, i have added descriptive debug
> logs (in 5c87d5de2ad15788f47907d89641c52dd3d21d53) in order to isolate the
> issue. @Martin, Would you get a chance to test it and provide us the logs
> again with the same scenario, since I'm unable to reproduce this from my
> side?
>
>
> scenario_c1_c2_c3_c4_cartridges:
>
> TID: [0] [STRATOS] [2015-05-01 18:24:22,591] DEBUG
> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
> is parsed successfully: obsoletecheck.drl
> TID: [0] [STRATOS] [2015-05-01 18:24:22,594] ERROR
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> An error occurred while starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-Id.c3-1x1.c3.domain
> java.lang.NullPointerException
>     at
> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>     at
> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:137)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>     at
> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>     at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
>
> scenario_c1_c2_cartridges:
>
> TID: [0] [STRATOS] [2015-05-01 17:58:50,824] DEBUG
> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
> is parsed successfully: dependent-scaling.drl
> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] ERROR
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> An error occurred while starting monitor: [type] cluster [component]
> subscription-G1-G2-G3-Id.c1-1x1.c1.domain
> java.lang.NullPointerException
>     at
> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>     at
> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:135)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>     at
> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>     at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] DEBUG
> {org.apache.stratos.autoscaler.monitor.
>
> @Martin, However, there seems to be separate locking issue. That is not
> related to this. For now, that locking issue seems to be harmless. Can we
> track it in a jira?
>
>
>
> Thanks,
>
> Reka
>
>
>
>
>
>
>
> On Sat, May 2, 2015 at 12:28 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Imesh, Reka
>
>
>
> As request, please see attached artifacts and logs (with debug enabled) to
> test for the deadlock – stratos is running the latest from master, latest
> commit :
>
> commit ae89ba09491891512a9bc89e080577c565ebe8b7
>
> Author: reka <rt...@gmail.com>
>
> Date:   Fri May 1 12:30:55 2015 +0530
>
>
>
> I run 2 similar but slightly scenarios, see [1.], [2.]
>
>
>
> Java startup with lock monitor enabled:
>
>
>
> /opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m
> -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
>
> rror
> -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof
> -Dcom.sun.management.jmxremote -classpath
> /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
>
> -stratos/lib/commons-lang-2.6.0.wso2v1.jar
> -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed
> -Djava.io
>
> .tmpdir=/opt/wso2/apache-stratos/tmp
> -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat
> -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
>
> pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Dcarbon.config.dir.path=/opt/wso2/apac
>
> he-stratos/repository/conf
> -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties
> -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins
> -Dconf.location=/opt/wso2/apache-stratos/repository/conf
> -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties
> -Dcom.atomikos.icatch.hide_init_file_path=true
> -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true
> -Dcom.sun.jndi.ldap.connect.pool.a
>
> uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000
> -Dorg.terracotta.quartz.skipUpdateCheck=true
> -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8
> -Ddisable.cassandra.server.startup=true
> -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf
> -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml
> -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml
> *-Dread.write.lock.monitor.enabled*=true
> org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default
>
>
>
>
>
> [1.] scenario_c1_c2_cartridges
>
>
>
>
>
>
>
> [1b.] exception
>
>
>
> *org.apache.stratos.common.exception.LockNotReleasedException*
>
> *        at
> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)*
>
> *        at
> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)*
>
> *        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)*
>
> *        at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)*
>
> *        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)*
>
> *        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)*
>
> *        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
>
> *        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
>
> *        at java.lang.Thread.run(Thread.java:745)*
>
>
>
>
>
> [2.] scenario_c1_c2_c3_c4_cartridges
>
>
>
>
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Thursday, April 30, 2015 10:10 PM
>
>
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> In addition we have not added a try catch block in MonitorAdder.run()
> method to cover its full scope. Therefore if an exception is raised in the
> middle the above problem also can cause.
>
>
>
> I have now fixed this in commit revision:
>
> 9ec061f44a3189ccd8b509ef4da980687dfbcf62
>
>
>
> Martin: Appreciate if you could take this fix and retest.
>
>
>
> Thanks
>
>
>
> On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Reka,
>
>
>
> It looks like the MonitorAdder.run() has executed properly, that's why we
> see the following log:
>
>
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>
>
>
> However the thread has not come to its last line:
>
> *log*.info(String.*format*(*"Monitor started successfully: [type] %s [component] %s [dependents] %s " *+
>                 *"[startup-time] %d seconds"*, *monitorTypeStr*, *context*.getId(),
>
>
>
> As we discussed offline this may have caused by a deadlock while trying to
> get the following topology lock:
>
>
>
> *public static *ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
>                                                ClusterChildContext context,
>                                                List<String> parentInstanceIds)
>     ...
>
> *//acquire read lock for the service and cluster    *TopologyManager.*acquireReadLockForCluster*(serviceName, clusterId);
>
>
>
> Martin: Will you be able to do another test run by enabling deadlock
> detection logic. You could set the following system property to true in the
> stratos.sh file to do this:
>
> *read.write.lock.monitor.enabled=true*
>
>  Thanks
>
>
>
>
>
> On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
>
>
> Thanks Martin for the detailed information in order to analyze the issue.
> It helped to isolate the issue.
>
> As i went through the logs, it seems that some thread issue. I could see
> below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
> start a relevant clusterMonitor. After that only c3 got successfully
> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
> start a thread for the MonitorAdder to create the ClusterMonitor.
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>
> Found below log for c3 which indicates that c3 monitor got started
> successfully. But there is no such log for c4.
>
> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor started successfully: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
> seconds
>
> @Gayan/Imesh, Do you have any input here? Will increasing the threadpool
> solve this issue? Or is it related to something else?
>
> Thanks,
>
> Reka
>
>
>
>
>
>
>
> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> Re-run the scenario, making sure the application alias and group alias are
> as suggested and debug logs are turned on (see config below)
>
>
>
> log4j.logger.org.apache.stratos.manager=DEBUG
>
> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>
> log4j.logger.org.apache.stratos.messaging=INFO
>
> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>
> log4j.logger.org.wso2.andes.client=ERROR
>
>
>
> This is the scenario:
>
>
>
> 1.      deployed application – see screenshot A. , debug logs
> wso2carbon-debug.log
> only 3 instances spin up
>
> 2.      removed application
>
> 3.      re-deployed application – see screenshot B. , debug logs
> wso2carbon-debug-2.log
> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
> released”
> 2nd time the application gets deployed all instances spin up and go active
>
>
>
>
>
> Please see attached artifacts and logs.
>
>
>
> A.     Application Status after deploying the application first time
> after stratos start up:
>
>
>
>
>
>
>
>
>
> B.     Application Status after re-deploying the application
>
> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
> 17:05:23,837] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
> released”:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Thursday, April 30, 2015 1:40 AM
>
>
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> If you get this issue continuously, can you please share the logs against
> master as we have improved some logs in the master yesterday?
>
> Thanks,
>
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> I have deployed the attached samples as earlier in openstack with latest
> master. All the clusters got created with the members. Please see the
> attached diagram. I'm unable to proceed further as my puppet configuration
> has to be corrected to make the member active. Thought of sharing this as
> all the clusters have members.
>
> Thanks,
>
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> HI Martin,
>
> Can you please confirm whether you are using unique applicationId and
> group alias? I can see from the UI, the applicationID and next group alias
> are same value as sub-G1-G2-G3-1..
>
> Thanks,
>
> Reka
>
>
>
>
>
> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> I have upgraded from beta to the latest stratos code on master and
> retested the scenario from jira STRATOS-1345 but still see the same issue
> (on open stack)
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Wednesday, April 29, 2015 2:54 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Reka,
>
>
>
> I will upgrade my system to the latest master and re-test,
>
>
>
> Regards
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
> *Sent:* Wednesday, April 29, 2015 11:55 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> While i was working on Application update, i fixed few issues with the
> termination behavior. Anyway there seems to be small issues in the logic
> which has to be fixed. I have started to verify this in my local setup. Can
> you create a jira? So that we can track it. I will update the progress in
> the jira..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> Thanks for following up - let me know if I should open a JIRA,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Tuesday, April 28, 2015 5:37 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> Thanks for bringing this up. I have fixed some issue in the flow while
> testing application update support with instances count. I will go through
> your scenarios to reproduce it and update the thread with the progress..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> I am testing a (nested grouping) scenario where a group defines a
> termination behavior “terminate-all”. When terminating the instance (of
> cartridge type c3), no new instance is restarted.
>
> My understanding is that a new instance should be started up.
>
>
>
> The scenario looks like this:
>
>
>
> Group ~G1 has a cartridge member c1 and group member ~G2
>
> Group ~G2 has a cartridge member c2 and group member ~G3
>
> Group ~G3 has a cartridge member c3
>
>
>
> Startup dependencies are: c1 depends on G2, c2 depends on G3
>
>
>
> ~G1 defines termination: none
>
> ~G2 defines termination: dependents
>
> ~G3 defines termination: all
>
>
>
> After startup, when all instances are active, instance c3 is terminated
> which correctly also terminates also instance c2 (since it depends on G3 /
> c3) .
>
> *Issue 1:*
>
> However, no new instances for c3 is started up (consequently no new
> instance for c2 should be started up as well) (see log see log
> wso2carbon.log)
>
>
>
> Only instance which remains running is c1.
>
> *Issue 2:*
>
> When subsequently c1 is manually being terminated, a new instance of c1 is
> started up (as opposed to Issue1) which I think is incorrect since it
> defines a startup dependency (c1 depends on G2) which is not fulfilled at
> the time (G2 should not be active since c2 is still terminated, see log
> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>
>
>
> WDYT ?
>
>
>
> Please find attached artifacts and logs
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Reka,

Good news – the patch seems to work fine, please find attached the usual artifacts / log (debug / thread lock detection enabled) + [1.].
I’ll run some more tests, let’s keep the fingers crossed.


[1.] Screenshot:
[cid:image006.png@01D08727.FB216B10]


From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Tuesday, May 05, 2015 6:17 AM
To: dev; Imesh Gunaratne; Lahiru Sandaruwan; Martin Eppel (meppel)
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
I have added a possible fix by considering the threading model in ClusterMonitor. I have verified it locally and it is working fine. Can you apply this patch as attached herewith and continue testing the same scenario? If this fixes the issue and no regression, then i can push it to master.
What i have fixed it that, made AutoscalerRuleEvaluator singleton and initializes the map only when creating the instance. Then i have removed the drool related variables from ClusterMonitor and added it to ClusterInstanceContext as there are threads getting spawned per ClusterInstanceContext in order to increase the performance. In that case, the drool related variables shouldn't be shared across ClusterInstanceContext. Then it can cause any conflict situation as i checked the code. So, i have added the drools related variables per ClusterInstanceContext. So that each thread can take their own local copy of variables from the stack and execute.
@Imesh/Lahiru,
Please correct me, if I'm wrong.

FYI: The patch has created on top of 5c87d5de2ad15788f47907d89641c52dd3d21d53 this commit.
Please let me know, if you face any issues with it.
Thanks,
Reka

On Tue, May 5, 2015 at 5:51 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
I will have to implement this solution in a thread safe manner as multiple cluster monitors are sharing the same resource. It will get impacted the Cluster monitor monitoring part as well. I'm still trying to figure out a solution for this issue. Will keep you updated with the progress..
Thanks,
Reka

On Tue, May 5, 2015 at 5:19 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi
I suspect the issue is that we use static knowledgeBases map in the AutoscalerRuleEvaluator. But this is getting initialized by every cluster monitor. We need to fix this cluster monitor creation flow to use static knowledgeBases map and initialize only once  or properly sharing this map across multiple threads, since each cluster monitors are threads.
Since drools file parsing can be done only once and used by all other monitors, i will work on a fix to make drool file parsing only once. Hope that fix would solve this issue.
Thanks,
Reka


On Tue, May 5, 2015 at 4:44 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Thanks Imesh for adding the exception handling in the monitor creation. That helped to narrow down the issue. It was a drool file parsed issue. I found below exception in both samples when creating those relevant monitors. We will have to identify why the drool parsing gave NPE. @Lahiru, Do you have any idea on this? In both samples, the cluster Monitors failed when parsing "obsoletecheck.drl".
Since i couldn't figure out the root cause, i have added descriptive debug logs (in 5c87d5de2ad15788f47907d89641c52dd3d21d53) in order to isolate the issue. @Martin, Would you get a chance to test it and provide us the logs again with the same scenario, since I'm unable to reproduce this from my side?

scenario_c1_c2_c3_c4_cartridges:

TID: [0] [STRATOS] [2015-05-01 18:24:22,591] DEBUG {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file is parsed successfully: obsoletecheck.drl
TID: [0] [STRATOS] [2015-05-01 18:24:22,594] ERROR {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  An error occurred while starting monitor: [type] cluster [component] sub-G1-G2-G3-1-Id.c3-1x1.c3.domain
java.lang.NullPointerException
    at org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
    at org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:137)
    at org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
    at org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
    at org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

scenario_c1_c2_cartridges:

TID: [0] [STRATOS] [2015-05-01 17:58:50,824] DEBUG {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file is parsed successfully: dependent-scaling.drl
TID: [0] [STRATOS] [2015-05-01 17:58:50,825] ERROR {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  An error occurred while starting monitor: [type] cluster [component] subscription-G1-G2-G3-Id.c1-1x1.c1.domain
java.lang.NullPointerException
    at org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
    at org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:135)
    at org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
    at org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
    at org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
TID: [0] [STRATOS] [2015-05-01 17:58:50,825] DEBUG {org.apache.stratos.autoscaler.monitor.
@Martin, However, there seems to be separate locking issue. That is not related to this. For now, that locking issue seems to be harmless. Can we track it in a jira?

Thanks,
Reka



On Sat, May 2, 2015 at 12:28 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Imesh, Reka

As request, please see attached artifacts and logs (with debug enabled) to test for the deadlock – stratos is running the latest from master, latest commit :
commit ae89ba09491891512a9bc89e080577c565ebe8b7
Author: reka <rt...@gmail.com>>
Date:   Fri May 1 12:30:55 2015 +0530

I run 2 similar but slightly scenarios, see [1.], [2.]

Java startup with lock monitor enabled:

/opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
rror -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof -Dcom.sun.management.jmxremote -classpath /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
-stratos/lib/commons-lang-2.6.0.wso2v1.jar -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed -Djava.io
.tmpdir=/opt/wso2/apache-stratos/tmp -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dcarbon.config.dir.path=/opt/wso2/apac
he-stratos/repository/conf -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins -Dconf.location=/opt/wso2/apache-stratos/repository/conf -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties -Dcom.atomikos.icatch.hide_init_file_path=true -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true -Dcom.sun.jndi.ldap.connect.pool.a
uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000 -Dorg.terracotta.quartz.skipUpdateCheck=true -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8 -Ddisable.cassandra.server.startup=true -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml -Dread.write.lock.monitor.enabled=true org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default


[1.] scenario_c1_c2_cartridges

[cid:image007.png@01D08727.FB216B10]


[1b.] exception

org.apache.stratos.common.exception.LockNotReleasedException
        at org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)
        at org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


[2.] scenario_c1_c2_c3_c4_cartridges

[cid:image008.png@01D08727.FB216B10]




From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Thursday, April 30, 2015 10:10 PM

To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

In addition we have not added a try catch block in MonitorAdder.run() method to cover its full scope. Therefore if an exception is raised in the middle the above problem also can cause.

I have now fixed this in commit revision:
9ec061f44a3189ccd8b509ef4da980687dfbcf62

Martin: Appreciate if you could take this fix and retest.

Thanks

On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Reka,

It looks like the MonitorAdder.run() has executed properly, that's why we see the following log:

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain

However the thread has not come to its last line:

log.info(String.format("Monitor started successfully: [type] %s [component] %s [dependents] %s " +
                "[startup-time] %d seconds", monitorTypeStr, context.getId(),

As we discussed offline this may have caused by a deadlock while trying to get the following topology lock:


public static ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
                                               ClusterChildContext context,
                                               List<String> parentInstanceIds)
    ...
    //acquire read lock for the service and cluster
    TopologyManager.acquireReadLockForCluster(serviceName, clusterId);

Martin: Will you be able to do another test run by enabling deadlock detection logic. You could set the following system property to true in the stratos.sh file to do this:

read.write.lock.monitor.enabled=true
Thanks


On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,

Thanks Martin for the detailed information in order to analyze the issue. It helped to isolate the issue.
As i went through the logs, it seems that some thread issue. I could see below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be start a relevant clusterMonitor. After that only c3 got successfully started with ClusterMonitor not c4. So the scheduler of c4 didn't actually start a thread for the MonitorAdder to create the ClusterMonitor.

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain

Found below log for c3 which indicates that c3 monitor got started successfully. But there is no such log for c4.
TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor started successfully: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3 seconds
@Gayan/Imesh, Do you have any input here? Will increasing the threadpool solve this issue? Or is it related to something else?
Thanks,
Reka



On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Re-run the scenario, making sure the application alias and group alias are as suggested and debug logs are turned on (see config below)

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR

This is the scenario:


1.      deployed application – see screenshot A. , debug logs wso2carbon-debug.log
only 3 instances spin up

2.      removed application

3.      re-deployed application – see screenshot B. , debug logs wso2carbon-debug-2.log
(after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock released”
2nd time the application gets deployed all instances spin up and go active


Please see attached artifacts and logs.


A.     Application Status after deploying the application first time after stratos start up:

[cid:image009.png@01D08727.FB216B10]




B.     Application Status after re-deploying the application
(see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock released”:

[cid:image010.png@01D08727.FB216B10]








From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Thursday, April 30, 2015 1:40 AM

To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

If you get this issue continuously, can you please share the logs against master as we have improved some logs in the master yesterday?
Thanks,
Reka

On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
I have deployed the attached samples as earlier in openstack with latest master. All the clusters got created with the members. Please see the attached diagram. I'm unable to proceed further as my puppet configuration has to be corrected to make the member active. Thought of sharing this as all the clusters have members.
Thanks,
Reka

On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
HI Martin,
Can you please confirm whether you are using unique applicationId and group alias? I can see from the UI, the applicationID and next group alias are same value as sub-G1-G2-G3-1..
Thanks,
Reka


On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I have upgraded from beta to the latest stratos code on master and retested the scenario from jira STRATOS-1345 but still see the same issue (on open stack)

Thanks

Martin

[cid:image011.png@01D08727.FB216B10]

From: Martin Eppel (meppel)
Sent: Wednesday, April 29, 2015 2:54 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Reka,

I will upgrade my system to the latest master and re-test,

Regards

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Wednesday, April 29, 2015 11:55 AM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
While i was working on Application update, i fixed few issues with the termination behavior. Anyway there seems to be small issues in the logic which has to be fixed. I have started to verify this in my local setup. Can you create a jira? So that we can track it. I will update the progress in the jira..
Thanks,
Reka

On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Thanks for following up - let me know if I should open a JIRA,

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Tuesday, April 28, 2015 5:37 AM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
Thanks for bringing this up. I have fixed some issue in the flow while testing application update support with instances count. I will go through your scenarios to reproduce it and update the thread with the progress..
Thanks,
Reka

On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
I am testing a (nested grouping) scenario where a group defines a termination behavior “terminate-all”. When terminating the instance (of cartridge type c3), no new instance is restarted.
My understanding is that a new instance should be started up.

The scenario looks like this:

Group ~G1 has a cartridge member c1 and group member ~G2
Group ~G2 has a cartridge member c2 and group member ~G3
Group ~G3 has a cartridge member c3

Startup dependencies are: c1 depends on G2, c2 depends on G3

~G1 defines termination: none
~G2 defines termination: dependents
~G3 defines termination: all

After startup, when all instances are active, instance c3 is terminated which correctly also terminates also instance c2 (since it depends on G3 / c3) .
Issue 1:
However, no new instances for c3 is started up (consequently no new instance for c2 should be started up as well) (see log see log wso2carbon.log)

Only instance which remains running is c1.
Issue 2:
When subsequently c1 is manually being terminated, a new instance of c1 is started up (as opposed to Issue1) which I think is incorrect since it defines a startup dependency (c1 depends on G2) which is not fulfilled at the time (G2 should not be active since c2 is still terminated, see log wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)

WDYT ?

Please find attached artifacts and logs

Thanks

Martin



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007


Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

Yah..This patch will work with latest from master as of now..

Thanks,
Reka

On Tue, May 5, 2015 at 10:10 PM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Reka,
>
>
>
> I will get the latest from master and add the patch on top, WDYT – will
> this work ?
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Tuesday, May 05, 2015 6:17 AM
> *To:* dev; Imesh Gunaratne; Lahiru Sandaruwan; Martin Eppel (meppel)
>
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> I have added a possible fix by considering the threading model in
> ClusterMonitor. I have verified it locally and it is working fine. Can you
> apply this patch as attached herewith and continue testing the same
> scenario? If this fixes the issue and no regression, then i can push it to
> master.
>
> What i have fixed it that, made AutoscalerRuleEvaluator singleton and
> initializes the map only when creating the instance. Then i have removed
> the drool related variables from ClusterMonitor and added it to
> ClusterInstanceContext as there are threads getting spawned per
> ClusterInstanceContext in order to increase the performance. In that case,
> the drool related variables shouldn't be shared across
> ClusterInstanceContext. Then it can cause any conflict situation as i
> checked the code. So, i have added the drools related variables per
> ClusterInstanceContext. So that each thread can take their own local copy
> of variables from the stack and execute.
>
> @Imesh/Lahiru,
>
> Please correct me, if I'm wrong.
>
>
>
> FYI: The patch has created on top of
> 5c87d5de2ad15788f47907d89641c52dd3d21d53 this commit.
>
> Please let me know, if you face any issues with it.
>
> Thanks,
>
> Reka
>
>
>
> On Tue, May 5, 2015 at 5:51 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> I will have to implement this solution in a thread safe manner as multiple
> cluster monitors are sharing the same resource. It will get impacted the
> Cluster monitor monitoring part as well. I'm still trying to figure out a
> solution for this issue. Will keep you updated with the progress..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, May 5, 2015 at 5:19 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi
>
> I suspect the issue is that we use static knowledgeBases map in the
> AutoscalerRuleEvaluator. But this is getting initialized by every cluster
> monitor. We need to fix this cluster monitor creation flow to use static
> knowledgeBases map and initialize only once  or properly sharing this map
> across multiple threads, since each cluster monitors are threads.
>
> Since drools file parsing can be done only once and used by all other
> monitors, i will work on a fix to make drool file parsing only once. Hope
> that fix would solve this issue.
>
> Thanks,
>
> Reka
>
>
>
>
>
> On Tue, May 5, 2015 at 4:44 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin/Imesh,
>
> Thanks Imesh for adding the exception handling in the monitor creation.
> That helped to narrow down the issue. It was a drool file parsed issue. I
> found below exception in both samples when creating those relevant
> monitors. We will have to identify why the drool parsing gave NPE. @Lahiru,
> Do you have any idea on this? In both samples, the cluster Monitors failed
> when parsing "obsoletecheck.drl".
>
> Since i couldn't figure out the root cause, i have added descriptive debug
> logs (in 5c87d5de2ad15788f47907d89641c52dd3d21d53) in order to isolate the
> issue. @Martin, Would you get a chance to test it and provide us the logs
> again with the same scenario, since I'm unable to reproduce this from my
> side?
>
>
> scenario_c1_c2_c3_c4_cartridges:
>
> TID: [0] [STRATOS] [2015-05-01 18:24:22,591] DEBUG
> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
> is parsed successfully: obsoletecheck.drl
> TID: [0] [STRATOS] [2015-05-01 18:24:22,594] ERROR
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> An error occurred while starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-Id.c3-1x1.c3.domain
> java.lang.NullPointerException
>     at
> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>     at
> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:137)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>     at
> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>     at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
>
> scenario_c1_c2_cartridges:
>
> TID: [0] [STRATOS] [2015-05-01 17:58:50,824] DEBUG
> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
> is parsed successfully: dependent-scaling.drl
> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] ERROR
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> An error occurred while starting monitor: [type] cluster [component]
> subscription-G1-G2-G3-Id.c1-1x1.c1.domain
> java.lang.NullPointerException
>     at
> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>     at
> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:135)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>     at
> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>     at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] DEBUG
> {org.apache.stratos.autoscaler.monitor.
>
> @Martin, However, there seems to be separate locking issue. That is not
> related to this. For now, that locking issue seems to be harmless. Can we
> track it in a jira?
>
>
>
> Thanks,
>
> Reka
>
>
>
>
>
>
>
> On Sat, May 2, 2015 at 12:28 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Imesh, Reka
>
>
>
> As request, please see attached artifacts and logs (with debug enabled) to
> test for the deadlock – stratos is running the latest from master, latest
> commit :
>
> commit ae89ba09491891512a9bc89e080577c565ebe8b7
>
> Author: reka <rt...@gmail.com>
>
> Date:   Fri May 1 12:30:55 2015 +0530
>
>
>
> I run 2 similar but slightly scenarios, see [1.], [2.]
>
>
>
> Java startup with lock monitor enabled:
>
>
>
> /opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m
> -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
>
> rror
> -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof
> -Dcom.sun.management.jmxremote -classpath
> /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
>
> -stratos/lib/commons-lang-2.6.0.wso2v1.jar
> -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed
> -Djava.io
>
> .tmpdir=/opt/wso2/apache-stratos/tmp
> -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat
> -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
>
> pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Dcarbon.config.dir.path=/opt/wso2/apac
>
> he-stratos/repository/conf
> -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties
> -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins
> -Dconf.location=/opt/wso2/apache-stratos/repository/conf
> -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties
> -Dcom.atomikos.icatch.hide_init_file_path=true
> -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true
> -Dcom.sun.jndi.ldap.connect.pool.a
>
> uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000
> -Dorg.terracotta.quartz.skipUpdateCheck=true
> -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8
> -Ddisable.cassandra.server.startup=true
> -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf
> -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml
> -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml
> *-Dread.write.lock.monitor.enabled*=true
> org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default
>
>
>
>
>
> [1.] scenario_c1_c2_cartridges
>
>
>
>
>
>
>
> [1b.] exception
>
>
>
> *org.apache.stratos.common.exception.LockNotReleasedException*
>
> *        at
> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)*
>
> *        at
> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)*
>
> *        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)*
>
> *        at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)*
>
> *        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)*
>
> *        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)*
>
> *        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
>
> *        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
>
> *        at java.lang.Thread.run(Thread.java:745)*
>
>
>
>
>
> [2.] scenario_c1_c2_c3_c4_cartridges
>
>
>
>
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Thursday, April 30, 2015 10:10 PM
>
>
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> In addition we have not added a try catch block in MonitorAdder.run()
> method to cover its full scope. Therefore if an exception is raised in the
> middle the above problem also can cause.
>
>
>
> I have now fixed this in commit revision:
>
> 9ec061f44a3189ccd8b509ef4da980687dfbcf62
>
>
>
> Martin: Appreciate if you could take this fix and retest.
>
>
>
> Thanks
>
>
>
> On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Reka,
>
>
>
> It looks like the MonitorAdder.run() has executed properly, that's why we
> see the following log:
>
>
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>
>
>
> However the thread has not come to its last line:
>
> *log*.info(String.*format*(*"Monitor started successfully: [type] %s [component] %s [dependents] %s " *+
>                 *"[startup-time] %d seconds"*, *monitorTypeStr*, *context*.getId(),
>
>
>
> As we discussed offline this may have caused by a deadlock while trying to
> get the following topology lock:
>
>
>
> *public static *ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
>                                                ClusterChildContext context,
>                                                List<String> parentInstanceIds)
>     ...
>
> *//acquire read lock for the service and cluster    *TopologyManager.*acquireReadLockForCluster*(serviceName, clusterId);
>
>
>
> Martin: Will you be able to do another test run by enabling deadlock
> detection logic. You could set the following system property to true in the
> stratos.sh file to do this:
>
> *read.write.lock.monitor.enabled=true*
>
>  Thanks
>
>
>
>
>
> On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
>
>
> Thanks Martin for the detailed information in order to analyze the issue.
> It helped to isolate the issue.
>
> As i went through the logs, it seems that some thread issue. I could see
> below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
> start a relevant clusterMonitor. After that only c3 got successfully
> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
> start a thread for the MonitorAdder to create the ClusterMonitor.
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>
> Found below log for c3 which indicates that c3 monitor got started
> successfully. But there is no such log for c4.
>
> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor started successfully: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
> seconds
>
> @Gayan/Imesh, Do you have any input here? Will increasing the threadpool
> solve this issue? Or is it related to something else?
>
> Thanks,
>
> Reka
>
>
>
>
>
>
>
> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> Re-run the scenario, making sure the application alias and group alias are
> as suggested and debug logs are turned on (see config below)
>
>
>
> log4j.logger.org.apache.stratos.manager=DEBUG
>
> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>
> log4j.logger.org.apache.stratos.messaging=INFO
>
> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>
> log4j.logger.org.wso2.andes.client=ERROR
>
>
>
> This is the scenario:
>
>
>
> 1.      deployed application – see screenshot A. , debug logs
> wso2carbon-debug.log
> only 3 instances spin up
>
> 2.      removed application
>
> 3.      re-deployed application – see screenshot B. , debug logs
> wso2carbon-debug-2.log
> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
> released”
> 2nd time the application gets deployed all instances spin up and go active
>
>
>
>
>
> Please see attached artifacts and logs.
>
>
>
> A.     Application Status after deploying the application first time
> after stratos start up:
>
>
>
>
>
>
>
>
>
> B.     Application Status after re-deploying the application
>
> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
> 17:05:23,837] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
> released”:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Thursday, April 30, 2015 1:40 AM
>
>
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> If you get this issue continuously, can you please share the logs against
> master as we have improved some logs in the master yesterday?
>
> Thanks,
>
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> I have deployed the attached samples as earlier in openstack with latest
> master. All the clusters got created with the members. Please see the
> attached diagram. I'm unable to proceed further as my puppet configuration
> has to be corrected to make the member active. Thought of sharing this as
> all the clusters have members.
>
> Thanks,
>
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> HI Martin,
>
> Can you please confirm whether you are using unique applicationId and
> group alias? I can see from the UI, the applicationID and next group alias
> are same value as sub-G1-G2-G3-1..
>
> Thanks,
>
> Reka
>
>
>
>
>
> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> I have upgraded from beta to the latest stratos code on master and
> retested the scenario from jira STRATOS-1345 but still see the same issue
> (on open stack)
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Wednesday, April 29, 2015 2:54 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Reka,
>
>
>
> I will upgrade my system to the latest master and re-test,
>
>
>
> Regards
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
> *Sent:* Wednesday, April 29, 2015 11:55 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> While i was working on Application update, i fixed few issues with the
> termination behavior. Anyway there seems to be small issues in the logic
> which has to be fixed. I have started to verify this in my local setup. Can
> you create a jira? So that we can track it. I will update the progress in
> the jira..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> Thanks for following up - let me know if I should open a JIRA,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Tuesday, April 28, 2015 5:37 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> Thanks for bringing this up. I have fixed some issue in the flow while
> testing application update support with instances count. I will go through
> your scenarios to reproduce it and update the thread with the progress..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> I am testing a (nested grouping) scenario where a group defines a
> termination behavior “terminate-all”. When terminating the instance (of
> cartridge type c3), no new instance is restarted.
>
> My understanding is that a new instance should be started up.
>
>
>
> The scenario looks like this:
>
>
>
> Group ~G1 has a cartridge member c1 and group member ~G2
>
> Group ~G2 has a cartridge member c2 and group member ~G3
>
> Group ~G3 has a cartridge member c3
>
>
>
> Startup dependencies are: c1 depends on G2, c2 depends on G3
>
>
>
> ~G1 defines termination: none
>
> ~G2 defines termination: dependents
>
> ~G3 defines termination: all
>
>
>
> After startup, when all instances are active, instance c3 is terminated
> which correctly also terminates also instance c2 (since it depends on G3 /
> c3) .
>
> *Issue 1:*
>
> However, no new instances for c3 is started up (consequently no new
> instance for c2 should be started up as well) (see log see log
> wso2carbon.log)
>
>
>
> Only instance which remains running is c1.
>
> *Issue 2:*
>
> When subsequently c1 is manually being terminated, a new instance of c1 is
> started up (as opposed to Issue1) which I think is incorrect since it
> defines a startup dependency (c1 depends on G2) which is not fulfilled at
> the time (G2 should not be active since c2 is still terminated, see log
> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>
>
>
> WDYT ?
>
>
>
> Please find attached artifacts and logs
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Reka,

I will get the latest from master and add the patch on top, WDYT – will this work ?

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Tuesday, May 05, 2015 6:17 AM
To: dev; Imesh Gunaratne; Lahiru Sandaruwan; Martin Eppel (meppel)
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
I have added a possible fix by considering the threading model in ClusterMonitor. I have verified it locally and it is working fine. Can you apply this patch as attached herewith and continue testing the same scenario? If this fixes the issue and no regression, then i can push it to master.
What i have fixed it that, made AutoscalerRuleEvaluator singleton and initializes the map only when creating the instance. Then i have removed the drool related variables from ClusterMonitor and added it to ClusterInstanceContext as there are threads getting spawned per ClusterInstanceContext in order to increase the performance. In that case, the drool related variables shouldn't be shared across ClusterInstanceContext. Then it can cause any conflict situation as i checked the code. So, i have added the drools related variables per ClusterInstanceContext. So that each thread can take their own local copy of variables from the stack and execute.
@Imesh/Lahiru,
Please correct me, if I'm wrong.

FYI: The patch has created on top of 5c87d5de2ad15788f47907d89641c52dd3d21d53 this commit.
Please let me know, if you face any issues with it.
Thanks,
Reka

On Tue, May 5, 2015 at 5:51 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
I will have to implement this solution in a thread safe manner as multiple cluster monitors are sharing the same resource. It will get impacted the Cluster monitor monitoring part as well. I'm still trying to figure out a solution for this issue. Will keep you updated with the progress..
Thanks,
Reka

On Tue, May 5, 2015 at 5:19 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi
I suspect the issue is that we use static knowledgeBases map in the AutoscalerRuleEvaluator. But this is getting initialized by every cluster monitor. We need to fix this cluster monitor creation flow to use static knowledgeBases map and initialize only once  or properly sharing this map across multiple threads, since each cluster monitors are threads.
Since drools file parsing can be done only once and used by all other monitors, i will work on a fix to make drool file parsing only once. Hope that fix would solve this issue.
Thanks,
Reka


On Tue, May 5, 2015 at 4:44 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin/Imesh,
Thanks Imesh for adding the exception handling in the monitor creation. That helped to narrow down the issue. It was a drool file parsed issue. I found below exception in both samples when creating those relevant monitors. We will have to identify why the drool parsing gave NPE. @Lahiru, Do you have any idea on this? In both samples, the cluster Monitors failed when parsing "obsoletecheck.drl".
Since i couldn't figure out the root cause, i have added descriptive debug logs (in 5c87d5de2ad15788f47907d89641c52dd3d21d53) in order to isolate the issue. @Martin, Would you get a chance to test it and provide us the logs again with the same scenario, since I'm unable to reproduce this from my side?

scenario_c1_c2_c3_c4_cartridges:

TID: [0] [STRATOS] [2015-05-01 18:24:22,591] DEBUG {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file is parsed successfully: obsoletecheck.drl
TID: [0] [STRATOS] [2015-05-01 18:24:22,594] ERROR {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  An error occurred while starting monitor: [type] cluster [component] sub-G1-G2-G3-1-Id.c3-1x1.c3.domain
java.lang.NullPointerException
    at org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
    at org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:137)
    at org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
    at org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
    at org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

scenario_c1_c2_cartridges:

TID: [0] [STRATOS] [2015-05-01 17:58:50,824] DEBUG {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file is parsed successfully: dependent-scaling.drl
TID: [0] [STRATOS] [2015-05-01 17:58:50,825] ERROR {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  An error occurred while starting monitor: [type] cluster [component] subscription-G1-G2-G3-Id.c1-1x1.c1.domain
java.lang.NullPointerException
    at org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
    at org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:135)
    at org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
    at org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
    at org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
TID: [0] [STRATOS] [2015-05-01 17:58:50,825] DEBUG {org.apache.stratos.autoscaler.monitor.
@Martin, However, there seems to be separate locking issue. That is not related to this. For now, that locking issue seems to be harmless. Can we track it in a jira?

Thanks,
Reka



On Sat, May 2, 2015 at 12:28 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Imesh, Reka

As request, please see attached artifacts and logs (with debug enabled) to test for the deadlock – stratos is running the latest from master, latest commit :
commit ae89ba09491891512a9bc89e080577c565ebe8b7
Author: reka <rt...@gmail.com>>
Date:   Fri May 1 12:30:55 2015 +0530

I run 2 similar but slightly scenarios, see [1.], [2.]

Java startup with lock monitor enabled:

/opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
rror -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof -Dcom.sun.management.jmxremote -classpath /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
-stratos/lib/commons-lang-2.6.0.wso2v1.jar -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed -Djava.io
.tmpdir=/opt/wso2/apache-stratos/tmp -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dcarbon.config.dir.path=/opt/wso2/apac
he-stratos/repository/conf -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins -Dconf.location=/opt/wso2/apache-stratos/repository/conf -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties -Dcom.atomikos.icatch.hide_init_file_path=true -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true -Dcom.sun.jndi.ldap.connect.pool.a
uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000 -Dorg.terracotta.quartz.skipUpdateCheck=true -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8 -Ddisable.cassandra.server.startup=true -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml -Dread.write.lock.monitor.enabled=true org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default


[1.] scenario_c1_c2_cartridges

[cid:image001.png@01D08717.82DD8EA0]


[1b.] exception

org.apache.stratos.common.exception.LockNotReleasedException
        at org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)
        at org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


[2.] scenario_c1_c2_c3_c4_cartridges

[cid:image005.png@01D08717.82DD8EA0]




From: Imesh Gunaratne [mailto:imesh@apache.org<ma...@apache.org>]
Sent: Thursday, April 30, 2015 10:10 PM

To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

In addition we have not added a try catch block in MonitorAdder.run() method to cover its full scope. Therefore if an exception is raised in the middle the above problem also can cause.

I have now fixed this in commit revision:
9ec061f44a3189ccd8b509ef4da980687dfbcf62

Martin: Appreciate if you could take this fix and retest.

Thanks

On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Reka,

It looks like the MonitorAdder.run() has executed properly, that's why we see the following log:

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain

However the thread has not come to its last line:

log.info(String.format("Monitor started successfully: [type] %s [component] %s [dependents] %s " +
                "[startup-time] %d seconds", monitorTypeStr, context.getId(),

As we discussed offline this may have caused by a deadlock while trying to get the following topology lock:


public static ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
                                               ClusterChildContext context,
                                               List<String> parentInstanceIds)
    ...
    //acquire read lock for the service and cluster
    TopologyManager.acquireReadLockForCluster(serviceName, clusterId);

Martin: Will you be able to do another test run by enabling deadlock detection logic. You could set the following system property to true in the stratos.sh file to do this:

read.write.lock.monitor.enabled=true
Thanks


On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,

Thanks Martin for the detailed information in order to analyze the issue. It helped to isolate the issue.
As i went through the logs, it seems that some thread issue. I could see below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be start a relevant clusterMonitor. After that only c3 got successfully started with ClusterMonitor not c4. So the scheduler of c4 didn't actually start a thread for the MonitorAdder to create the ClusterMonitor.

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain

Found below log for c3 which indicates that c3 monitor got started successfully. But there is no such log for c4.
TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor started successfully: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3 seconds
@Gayan/Imesh, Do you have any input here? Will increasing the threadpool solve this issue? Or is it related to something else?
Thanks,
Reka



On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Re-run the scenario, making sure the application alias and group alias are as suggested and debug logs are turned on (see config below)

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR

This is the scenario:


1.      deployed application – see screenshot A. , debug logs wso2carbon-debug.log
only 3 instances spin up

2.      removed application

3.      re-deployed application – see screenshot B. , debug logs wso2carbon-debug-2.log
(after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock released”
2nd time the application gets deployed all instances spin up and go active


Please see attached artifacts and logs.


A.     Application Status after deploying the application first time after stratos start up:

[cid:image006.png@01D08717.82DD8EA0]




B.     Application Status after re-deploying the application
(see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock released”:

[cid:image007.png@01D08717.82DD8EA0]








From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Thursday, April 30, 2015 1:40 AM

To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

If you get this issue continuously, can you please share the logs against master as we have improved some logs in the master yesterday?
Thanks,
Reka

On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
I have deployed the attached samples as earlier in openstack with latest master. All the clusters got created with the members. Please see the attached diagram. I'm unable to proceed further as my puppet configuration has to be corrected to make the member active. Thought of sharing this as all the clusters have members.
Thanks,
Reka

On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
HI Martin,
Can you please confirm whether you are using unique applicationId and group alias? I can see from the UI, the applicationID and next group alias are same value as sub-G1-G2-G3-1..
Thanks,
Reka


On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I have upgraded from beta to the latest stratos code on master and retested the scenario from jira STRATOS-1345 but still see the same issue (on open stack)

Thanks

Martin

[cid:image008.png@01D08717.82DD8EA0]

From: Martin Eppel (meppel)
Sent: Wednesday, April 29, 2015 2:54 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Reka,

I will upgrade my system to the latest master and re-test,

Regards

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Wednesday, April 29, 2015 11:55 AM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
While i was working on Application update, i fixed few issues with the termination behavior. Anyway there seems to be small issues in the logic which has to be fixed. I have started to verify this in my local setup. Can you create a jira? So that we can track it. I will update the progress in the jira..
Thanks,
Reka

On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Thanks for following up - let me know if I should open a JIRA,

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Tuesday, April 28, 2015 5:37 AM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
Thanks for bringing this up. I have fixed some issue in the flow while testing application update support with instances count. I will go through your scenarios to reproduce it and update the thread with the progress..
Thanks,
Reka

On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
I am testing a (nested grouping) scenario where a group defines a termination behavior “terminate-all”. When terminating the instance (of cartridge type c3), no new instance is restarted.
My understanding is that a new instance should be started up.

The scenario looks like this:

Group ~G1 has a cartridge member c1 and group member ~G2
Group ~G2 has a cartridge member c2 and group member ~G3
Group ~G3 has a cartridge member c3

Startup dependencies are: c1 depends on G2, c2 depends on G3

~G1 defines termination: none
~G2 defines termination: dependents
~G3 defines termination: all

After startup, when all instances are active, instance c3 is terminated which correctly also terminates also instance c2 (since it depends on G3 / c3) .
Issue 1:
However, no new instances for c3 is started up (consequently no new instance for c2 should be started up as well) (see log see log wso2carbon.log)

Only instance which remains running is c1.
Issue 2:
When subsequently c1 is manually being terminated, a new instance of c1 is started up (as opposed to Issue1) which I think is incorrect since it defines a startup dependency (c1 depends on G2) which is not fulfilled at the time (G2 should not be active since c2 is still terminated, see log wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)

WDYT ?

Please find attached artifacts and logs

Thanks

Martin



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007


Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

I have added a possible fix by considering the threading model in
ClusterMonitor. I have verified it locally and it is working fine. Can you
apply this patch as attached herewith and continue testing the same
scenario? If this fixes the issue and no regression, then i can push it to
master.

What i have fixed it that, made AutoscalerRuleEvaluator singleton and
initializes the map only when creating the instance. Then i have removed
the drool related variables from ClusterMonitor and added it to
ClusterInstanceContext as there are threads getting spawned per
ClusterInstanceContext in order to increase the performance. In that case,
the drool related variables shouldn't be shared across
ClusterInstanceContext. Then it can cause any conflict situation as i
checked the code. So, i have added the drools related variables per
ClusterInstanceContext. So that each thread can take their own local copy
of variables from the stack and execute.

@Imesh/Lahiru,

Please correct me, if I'm wrong.

FYI: The patch has created on top of
5c87d5de2ad15788f47907d89641c52dd3d21d53 this commit.

Please let me know, if you face any issues with it.

Thanks,
Reka

On Tue, May 5, 2015 at 5:51 PM, Reka Thirunavukkarasu <re...@wso2.com> wrote:

> Hi Martin,
>
> I will have to implement this solution in a thread safe manner as multiple
> cluster monitors are sharing the same resource. It will get impacted the
> Cluster monitor monitoring part as well. I'm still trying to figure out a
> solution for this issue. Will keep you updated with the progress..
>
> Thanks,
> Reka
>
> On Tue, May 5, 2015 at 5:19 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
>> Hi
>>
>> I suspect the issue is that we use static knowledgeBases map in the
>> AutoscalerRuleEvaluator. But this is getting initialized by every cluster
>> monitor. We need to fix this cluster monitor creation flow to use static
>> knowledgeBases map and initialize only once  or properly sharing this map
>> across multiple threads, since each cluster monitors are threads.
>>
>> Since drools file parsing can be done only once and used by all other
>> monitors, i will work on a fix to make drool file parsing only once. Hope
>> that fix would solve this issue.
>>
>> Thanks,
>> Reka
>>
>>
>> On Tue, May 5, 2015 at 4:44 PM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>>> Hi Martin/Imesh,
>>>
>>> Thanks Imesh for adding the exception handling in the monitor creation.
>>> That helped to narrow down the issue. It was a drool file parsed issue. I
>>> found below exception in both samples when creating those relevant
>>> monitors. We will have to identify why the drool parsing gave NPE. @Lahiru,
>>> Do you have any idea on this? In both samples, the cluster Monitors failed
>>> when parsing "obsoletecheck.drl".
>>>
>>> Since i couldn't figure out the root cause, i have added descriptive
>>> debug logs (in 5c87d5de2ad15788f47907d89641c52dd3d21d53) in order to
>>> isolate the issue. @Martin, Would you get a chance to test it and provide
>>> us the logs again with the same scenario, since I'm unable to reproduce
>>> this from my side?
>>>
>>> scenario_c1_c2_c3_c4_cartridges:
>>>
>>> TID: [0] [STRATOS] [2015-05-01 18:24:22,591] DEBUG
>>> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
>>> is parsed successfully: obsoletecheck.drl
>>> TID: [0] [STRATOS] [2015-05-01 18:24:22,594] ERROR
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> An error occurred while starting monitor: [type] cluster [component]
>>> sub-G1-G2-G3-1-Id.c3-1x1.c3.domain
>>> java.lang.NullPointerException
>>>     at
>>> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:137)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>>>     at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>     at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>     at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>     at java.lang.Thread.run(Thread.java:745)
>>>
>>> scenario_c1_c2_cartridges:
>>>
>>> TID: [0] [STRATOS] [2015-05-01 17:58:50,824] DEBUG
>>> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
>>> is parsed successfully: dependent-scaling.drl
>>> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] ERROR
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> An error occurred while starting monitor: [type] cluster [component]
>>> subscription-G1-G2-G3-Id.c1-1x1.c1.domain
>>> java.lang.NullPointerException
>>>     at
>>> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:135)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>>>     at
>>> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>>>     at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>     at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>     at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>     at java.lang.Thread.run(Thread.java:745)
>>> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] DEBUG
>>> {org.apache.stratos.autoscaler.monitor.
>>>
>>> @Martin, However, there seems to be separate locking issue. That is not
>>> related to this. For now, that locking issue seems to be harmless. Can we
>>> track it in a jira?
>>>
>>>
>>> Thanks,
>>> Reka
>>>
>>>
>>>
>>> On Sat, May 2, 2015 at 12:28 AM, Martin Eppel (meppel) <meppel@cisco.com
>>> > wrote:
>>>
>>>>  Hi Imesh, Reka
>>>>
>>>>
>>>>
>>>> As request, please see attached artifacts and logs (with debug enabled)
>>>> to test for the deadlock – stratos is running the latest from master,
>>>> latest commit :
>>>>
>>>> commit ae89ba09491891512a9bc89e080577c565ebe8b7
>>>>
>>>> Author: reka <rt...@gmail.com>
>>>>
>>>> Date:   Fri May 1 12:30:55 2015 +0530
>>>>
>>>>
>>>>
>>>> I run 2 similar but slightly scenarios, see [1.], [2.]
>>>>
>>>>
>>>>
>>>> Java startup with lock monitor enabled:
>>>>
>>>>
>>>>
>>>> /opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m
>>>> -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
>>>>
>>>> rror
>>>> -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof
>>>> -Dcom.sun.management.jmxremote -classpath
>>>> /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
>>>>
>>>> -stratos/lib/commons-lang-2.6.0.wso2v1.jar
>>>> -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed
>>>> -Djava.io
>>>>
>>>> .tmpdir=/opt/wso2/apache-stratos/tmp
>>>> -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat
>>>> -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
>>>>
>>>> pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos
>>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>>> -Dcarbon.config.dir.path=/opt/wso2/apac
>>>>
>>>> he-stratos/repository/conf
>>>> -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties
>>>> -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins
>>>> -Dconf.location=/opt/wso2/apache-stratos/repository/conf
>>>> -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties
>>>> -Dcom.atomikos.icatch.hide_init_file_path=true
>>>> -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true
>>>> -Dcom.sun.jndi.ldap.connect.pool.a
>>>>
>>>> uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000
>>>> -Dorg.terracotta.quartz.skipUpdateCheck=true
>>>> -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8
>>>> -Ddisable.cassandra.server.startup=true
>>>> -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf
>>>> -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml
>>>> -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml
>>>> *-Dread.write.lock.monitor.enabled*=true
>>>> org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [1.] scenario_c1_c2_cartridges
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [1b.] exception
>>>>
>>>>
>>>>
>>>> *org.apache.stratos.common.exception.LockNotReleasedException*
>>>>
>>>> *        at
>>>> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)*
>>>>
>>>> *        at
>>>> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)*
>>>>
>>>> *        at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)*
>>>>
>>>> *        at
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)*
>>>>
>>>> *        at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)*
>>>>
>>>> *        at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)*
>>>>
>>>> *        at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
>>>>
>>>> *        at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
>>>>
>>>> *        at java.lang.Thread.run(Thread.java:745)*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [2.] scenario_c1_c2_c3_c4_cartridges
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>>>> *Sent:* Thursday, April 30, 2015 10:10 PM
>>>>
>>>> *To:* dev
>>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>>> startup and termination issues (?)
>>>>
>>>>
>>>>
>>>> In addition we have not added a try catch block in MonitorAdder.run()
>>>> method to cover its full scope. Therefore if an exception is raised in the
>>>> middle the above problem also can cause.
>>>>
>>>>
>>>>
>>>> I have now fixed this in commit revision:
>>>>
>>>> 9ec061f44a3189ccd8b509ef4da980687dfbcf62
>>>>
>>>>
>>>>
>>>> Martin: Appreciate if you could take this fix and retest.
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org>
>>>> wrote:
>>>>
>>>> Hi Reka,
>>>>
>>>>
>>>>
>>>> It looks like the MonitorAdder.run() has executed properly, that's why
>>>> we see the following log:
>>>>
>>>>
>>>>
>>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>>> Starting monitor: [type] cluster [component]
>>>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>>>
>>>>
>>>>
>>>> However the thread has not come to its last line:
>>>>
>>>> *log*.info(String.*format*(*"Monitor started successfully: [type] %s [component] %s [dependents] %s " *+
>>>>                 *"[startup-time] %d seconds"*, *monitorTypeStr*, *context*.getId(),
>>>>
>>>>
>>>>
>>>> As we discussed offline this may have caused by a deadlock while trying
>>>> to get the following topology lock:
>>>>
>>>>
>>>>
>>>> *public static *ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
>>>>                                                ClusterChildContext context,
>>>>                                                List<String> parentInstanceIds)
>>>>     ...
>>>>
>>>> *//acquire read lock for the service and cluster    *TopologyManager.*acquireReadLockForCluster*(serviceName, clusterId);
>>>>
>>>>
>>>>
>>>> Martin: Will you be able to do another test run by enabling deadlock
>>>> detection logic. You could set the following system property to true in the
>>>> stratos.sh file to do this:
>>>>
>>>> *read.write.lock.monitor.enabled=true*
>>>>
>>>>  Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>
>>>> wrote:
>>>>
>>>> Hi Martin,
>>>>
>>>>
>>>>
>>>> Thanks Martin for the detailed information in order to analyze the
>>>> issue. It helped to isolate the issue.
>>>>
>>>> As i went through the logs, it seems that some thread issue. I could
>>>> see below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled
>>>> to be start a relevant clusterMonitor. After that only c3 got successfully
>>>> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
>>>> start a thread for the MonitorAdder to create the ClusterMonitor.
>>>>
>>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>>>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>>>
>>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>>> Starting monitor: [type] cluster [component]
>>>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>>>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>>>
>>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>>> Starting monitor: [type] cluster [component]
>>>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>>>
>>>> Found below log for c3 which indicates that c3 monitor got started
>>>> successfully. But there is no such log for c4.
>>>>
>>>> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
>>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>>> Monitor started successfully: [type] cluster [component]
>>>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
>>>> seconds
>>>>
>>>> @Gayan/Imesh, Do you have any input here? Will increasing the
>>>> threadpool solve this issue? Or is it related to something else?
>>>>
>>>> Thanks,
>>>>
>>>> Reka
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <
>>>> meppel@cisco.com> wrote:
>>>>
>>>> Hi Reka,
>>>>
>>>>
>>>>
>>>> Re-run the scenario, making sure the application alias and group alias
>>>> are as suggested and debug logs are turned on (see config below)
>>>>
>>>>
>>>>
>>>> log4j.logger.org.apache.stratos.manager=DEBUG
>>>>
>>>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>>>
>>>> log4j.logger.org.apache.stratos.messaging=INFO
>>>>
>>>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>>>
>>>> log4j.logger.org.wso2.andes.client=ERROR
>>>>
>>>>
>>>>
>>>> This is the scenario:
>>>>
>>>>
>>>>
>>>> 1.      deployed application – see screenshot A. , debug logs
>>>> wso2carbon-debug.log
>>>> only 3 instances spin up
>>>>
>>>> 2.      removed application
>>>>
>>>> 3.      re-deployed application – see screenshot B. , debug logs
>>>> wso2carbon-debug-2.log
>>>> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>>>> released”
>>>> 2nd time the application gets deployed all instances spin up and go
>>>> active
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Please see attached artifacts and logs.
>>>>
>>>>
>>>>
>>>> A.     Application Status after deploying the application first time
>>>> after stratos start up:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> B.     Application Status after re-deploying the application
>>>>
>>>> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
>>>> 17:05:23,837] DEBUG
>>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>>>> released”:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>>>> *Sent:* Thursday, April 30, 2015 1:40 AM
>>>>
>>>>
>>>> *To:* dev
>>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>>> startup and termination issues (?)
>>>>
>>>>
>>>>
>>>> If you get this issue continuously, can you please share the logs
>>>> against master as we have improved some logs in the master yesterday?
>>>>
>>>> Thanks,
>>>>
>>>> Reka
>>>>
>>>>
>>>>
>>>> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
>>>> wrote:
>>>>
>>>> Hi Martin,
>>>>
>>>> I have deployed the attached samples as earlier in openstack with
>>>> latest master. All the clusters got created with the members. Please see
>>>> the attached diagram. I'm unable to proceed further as my puppet
>>>> configuration has to be corrected to make the member active. Thought of
>>>> sharing this as all the clusters have members.
>>>>
>>>> Thanks,
>>>>
>>>> Reka
>>>>
>>>>
>>>>
>>>> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
>>>> wrote:
>>>>
>>>> HI Martin,
>>>>
>>>> Can you please confirm whether you are using unique applicationId and
>>>> group alias? I can see from the UI, the applicationID and next group alias
>>>> are same value as sub-G1-G2-G3-1..
>>>>
>>>> Thanks,
>>>>
>>>> Reka
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <
>>>> meppel@cisco.com> wrote:
>>>>
>>>> Hi Reka,
>>>>
>>>>
>>>>
>>>> I have upgraded from beta to the latest stratos code on master and
>>>> retested the scenario from jira STRATOS-1345 but still see the same issue
>>>> (on open stack)
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> Martin
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Martin Eppel (meppel)
>>>> *Sent:* Wednesday, April 29, 2015 2:54 PM
>>>> *To:* dev@stratos.apache.org
>>>> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
>>>> startup and termination issues (?)
>>>>
>>>>
>>>>
>>>> Hi Reka,
>>>>
>>>>
>>>>
>>>> I will upgrade my system to the latest master and re-test,
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>>
>>>>
>>>> Martin
>>>>
>>>>
>>>>
>>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
>>>> *Sent:* Wednesday, April 29, 2015 11:55 AM
>>>> *To:* dev
>>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>>> startup and termination issues (?)
>>>>
>>>>
>>>>
>>>> Hi Martin,
>>>>
>>>> While i was working on Application update, i fixed few issues with the
>>>> termination behavior. Anyway there seems to be small issues in the logic
>>>> which has to be fixed. I have started to verify this in my local setup. Can
>>>> you create a jira? So that we can track it. I will update the progress in
>>>> the jira..
>>>>
>>>> Thanks,
>>>>
>>>> Reka
>>>>
>>>>
>>>>
>>>> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <
>>>> meppel@cisco.com> wrote:
>>>>
>>>> Hi Reka,
>>>>
>>>>
>>>>
>>>> Thanks for following up - let me know if I should open a JIRA,
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> Martin
>>>>
>>>>
>>>>
>>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>>>> *Sent:* Tuesday, April 28, 2015 5:37 AM
>>>> *To:* dev
>>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>>> startup and termination issues (?)
>>>>
>>>>
>>>>
>>>> Hi Martin,
>>>>
>>>> Thanks for bringing this up. I have fixed some issue in the flow while
>>>> testing application update support with instances count. I will go through
>>>> your scenarios to reproduce it and update the thread with the progress..
>>>>
>>>> Thanks,
>>>>
>>>> Reka
>>>>
>>>>
>>>>
>>>> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <
>>>> meppel@cisco.com> wrote:
>>>>
>>>> I am testing a (nested grouping) scenario where a group defines a
>>>> termination behavior “terminate-all”. When terminating the instance (of
>>>> cartridge type c3), no new instance is restarted.
>>>>
>>>> My understanding is that a new instance should be started up.
>>>>
>>>>
>>>>
>>>> The scenario looks like this:
>>>>
>>>>
>>>>
>>>> Group ~G1 has a cartridge member c1 and group member ~G2
>>>>
>>>> Group ~G2 has a cartridge member c2 and group member ~G3
>>>>
>>>> Group ~G3 has a cartridge member c3
>>>>
>>>>
>>>>
>>>> Startup dependencies are: c1 depends on G2, c2 depends on G3
>>>>
>>>>
>>>>
>>>> ~G1 defines termination: none
>>>>
>>>> ~G2 defines termination: dependents
>>>>
>>>> ~G3 defines termination: all
>>>>
>>>>
>>>>
>>>> After startup, when all instances are active, instance c3 is terminated
>>>> which correctly also terminates also instance c2 (since it depends on G3 /
>>>> c3) .
>>>>
>>>> *Issue 1:*
>>>>
>>>> However, no new instances for c3 is started up (consequently no new
>>>> instance for c2 should be started up as well) (see log see log
>>>> wso2carbon.log)
>>>>
>>>>
>>>>
>>>> Only instance which remains running is c1.
>>>>
>>>> *Issue 2:*
>>>>
>>>> When subsequently c1 is manually being terminated, a new instance of c1
>>>> is started up (as opposed to Issue1) which I think is incorrect since it
>>>> defines a startup dependency (c1 depends on G2) which is not fulfilled at
>>>> the time (G2 should not be active since c2 is still terminated, see log
>>>> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>>>>
>>>>
>>>>
>>>> WDYT ?
>>>>
>>>>
>>>>
>>>> Please find attached artifacts and logs
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> Martin
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Reka Thirunavukkarasu
>>>> Senior Software Engineer,
>>>> WSO2, Inc.:http://wso2.com,
>>>>
>>>> Mobile: +94776442007
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Reka Thirunavukkarasu
>>>> Senior Software Engineer,
>>>> WSO2, Inc.:http://wso2.com,
>>>>
>>>> Mobile: +94776442007
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Reka Thirunavukkarasu
>>>> Senior Software Engineer,
>>>> WSO2, Inc.:http://wso2.com,
>>>>
>>>> Mobile: +94776442007
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Reka Thirunavukkarasu
>>>> Senior Software Engineer,
>>>> WSO2, Inc.:http://wso2.com,
>>>>
>>>> Mobile: +94776442007
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Reka Thirunavukkarasu
>>>> Senior Software Engineer,
>>>> WSO2, Inc.:http://wso2.com,
>>>>
>>>> Mobile: +94776442007
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Reka Thirunavukkarasu
>>>> Senior Software Engineer,
>>>> WSO2, Inc.:http://wso2.com,
>>>>
>>>> Mobile: +94776442007
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Imesh Gunaratne
>>>>
>>>>
>>>>
>>>> Senior Technical Lead, WSO2
>>>>
>>>> Committer & PMC Member, Apache Stratos
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Imesh Gunaratne
>>>>
>>>>
>>>>
>>>> Senior Technical Lead, WSO2
>>>>
>>>> Committer & PMC Member, Apache Stratos
>>>>
>>>
>>>
>>>
>>> --
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>> Mobile: +94776442007
>>>
>>>
>>>
>>
>>
>> --
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>> Mobile: +94776442007
>>
>>
>>
>
>
> --
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
> Mobile: +94776442007
>
>
>


-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

I will have to implement this solution in a thread safe manner as multiple
cluster monitors are sharing the same resource. It will get impacted the
Cluster monitor monitoring part as well. I'm still trying to figure out a
solution for this issue. Will keep you updated with the progress..

Thanks,
Reka

On Tue, May 5, 2015 at 5:19 PM, Reka Thirunavukkarasu <re...@wso2.com> wrote:

> Hi
>
> I suspect the issue is that we use static knowledgeBases map in the
> AutoscalerRuleEvaluator. But this is getting initialized by every cluster
> monitor. We need to fix this cluster monitor creation flow to use static
> knowledgeBases map and initialize only once  or properly sharing this map
> across multiple threads, since each cluster monitors are threads.
>
> Since drools file parsing can be done only once and used by all other
> monitors, i will work on a fix to make drool file parsing only once. Hope
> that fix would solve this issue.
>
> Thanks,
> Reka
>
>
> On Tue, May 5, 2015 at 4:44 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
>> Hi Martin/Imesh,
>>
>> Thanks Imesh for adding the exception handling in the monitor creation.
>> That helped to narrow down the issue. It was a drool file parsed issue. I
>> found below exception in both samples when creating those relevant
>> monitors. We will have to identify why the drool parsing gave NPE. @Lahiru,
>> Do you have any idea on this? In both samples, the cluster Monitors failed
>> when parsing "obsoletecheck.drl".
>>
>> Since i couldn't figure out the root cause, i have added descriptive
>> debug logs (in 5c87d5de2ad15788f47907d89641c52dd3d21d53) in order to
>> isolate the issue. @Martin, Would you get a chance to test it and provide
>> us the logs again with the same scenario, since I'm unable to reproduce
>> this from my side?
>>
>> scenario_c1_c2_c3_c4_cartridges:
>>
>> TID: [0] [STRATOS] [2015-05-01 18:24:22,591] DEBUG
>> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
>> is parsed successfully: obsoletecheck.drl
>> TID: [0] [STRATOS] [2015-05-01 18:24:22,594] ERROR
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> An error occurred while starting monitor: [type] cluster [component]
>> sub-G1-G2-G3-1-Id.c3-1x1.c3.domain
>> java.lang.NullPointerException
>>     at
>> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>>     at
>> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:137)
>>     at
>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>>     at
>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>>     at
>> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>>     at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>     at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>     at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>     at java.lang.Thread.run(Thread.java:745)
>>
>> scenario_c1_c2_cartridges:
>>
>> TID: [0] [STRATOS] [2015-05-01 17:58:50,824] DEBUG
>> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
>> is parsed successfully: dependent-scaling.drl
>> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] ERROR
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> An error occurred while starting monitor: [type] cluster [component]
>> subscription-G1-G2-G3-Id.c1-1x1.c1.domain
>> java.lang.NullPointerException
>>     at
>> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>>     at
>> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:135)
>>     at
>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>>     at
>> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>>     at
>> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>>     at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>     at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>     at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>     at java.lang.Thread.run(Thread.java:745)
>> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] DEBUG
>> {org.apache.stratos.autoscaler.monitor.
>>
>> @Martin, However, there seems to be separate locking issue. That is not
>> related to this. For now, that locking issue seems to be harmless. Can we
>> track it in a jira?
>>
>>
>> Thanks,
>> Reka
>>
>>
>>
>> On Sat, May 2, 2015 at 12:28 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>>>  Hi Imesh, Reka
>>>
>>>
>>>
>>> As request, please see attached artifacts and logs (with debug enabled)
>>> to test for the deadlock – stratos is running the latest from master,
>>> latest commit :
>>>
>>> commit ae89ba09491891512a9bc89e080577c565ebe8b7
>>>
>>> Author: reka <rt...@gmail.com>
>>>
>>> Date:   Fri May 1 12:30:55 2015 +0530
>>>
>>>
>>>
>>> I run 2 similar but slightly scenarios, see [1.], [2.]
>>>
>>>
>>>
>>> Java startup with lock monitor enabled:
>>>
>>>
>>>
>>> /opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m
>>> -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
>>>
>>> rror
>>> -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof
>>> -Dcom.sun.management.jmxremote -classpath
>>> /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
>>>
>>> -stratos/lib/commons-lang-2.6.0.wso2v1.jar
>>> -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed
>>> -Djava.io
>>>
>>> .tmpdir=/opt/wso2/apache-stratos/tmp
>>> -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat
>>> -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
>>>
>>> pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos
>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>> -Dcarbon.config.dir.path=/opt/wso2/apac
>>>
>>> he-stratos/repository/conf
>>> -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties
>>> -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins
>>> -Dconf.location=/opt/wso2/apache-stratos/repository/conf
>>> -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties
>>> -Dcom.atomikos.icatch.hide_init_file_path=true
>>> -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true
>>> -Dcom.sun.jndi.ldap.connect.pool.a
>>>
>>> uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000
>>> -Dorg.terracotta.quartz.skipUpdateCheck=true
>>> -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8
>>> -Ddisable.cassandra.server.startup=true
>>> -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf
>>> -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml
>>> -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml
>>> *-Dread.write.lock.monitor.enabled*=true
>>> org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default
>>>
>>>
>>>
>>>
>>>
>>> [1.] scenario_c1_c2_cartridges
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> [1b.] exception
>>>
>>>
>>>
>>> *org.apache.stratos.common.exception.LockNotReleasedException*
>>>
>>> *        at
>>> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)*
>>>
>>> *        at
>>> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)*
>>>
>>> *        at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)*
>>>
>>> *        at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)*
>>>
>>> *        at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)*
>>>
>>> *        at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)*
>>>
>>> *        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
>>>
>>> *        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
>>>
>>> *        at java.lang.Thread.run(Thread.java:745)*
>>>
>>>
>>>
>>>
>>>
>>> [2.] scenario_c1_c2_c3_c4_cartridges
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>>> *Sent:* Thursday, April 30, 2015 10:10 PM
>>>
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> In addition we have not added a try catch block in MonitorAdder.run()
>>> method to cover its full scope. Therefore if an exception is raised in the
>>> middle the above problem also can cause.
>>>
>>>
>>>
>>> I have now fixed this in commit revision:
>>>
>>> 9ec061f44a3189ccd8b509ef4da980687dfbcf62
>>>
>>>
>>>
>>> Martin: Appreciate if you could take this fix and retest.
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org>
>>> wrote:
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> It looks like the MonitorAdder.run() has executed properly, that's why
>>> we see the following log:
>>>
>>>
>>>
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Starting monitor: [type] cluster [component]
>>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>>
>>>
>>>
>>> However the thread has not come to its last line:
>>>
>>> *log*.info(String.*format*(*"Monitor started successfully: [type] %s [component] %s [dependents] %s " *+
>>>                 *"[startup-time] %d seconds"*, *monitorTypeStr*, *context*.getId(),
>>>
>>>
>>>
>>> As we discussed offline this may have caused by a deadlock while trying
>>> to get the following topology lock:
>>>
>>>
>>>
>>> *public static *ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
>>>                                                ClusterChildContext context,
>>>                                                List<String> parentInstanceIds)
>>>     ...
>>>
>>> *//acquire read lock for the service and cluster    *TopologyManager.*acquireReadLockForCluster*(serviceName, clusterId);
>>>
>>>
>>>
>>> Martin: Will you be able to do another test run by enabling deadlock
>>> detection logic. You could set the following system property to true in the
>>> stratos.sh file to do this:
>>>
>>> *read.write.lock.monitor.enabled=true*
>>>
>>>  Thanks
>>>
>>>
>>>
>>>
>>>
>>> On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>
>>> wrote:
>>>
>>> Hi Martin,
>>>
>>>
>>>
>>> Thanks Martin for the detailed information in order to analyze the
>>> issue. It helped to isolate the issue.
>>>
>>> As i went through the logs, it seems that some thread issue. I could see
>>> below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
>>> start a relevant clusterMonitor. After that only c3 got successfully
>>> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
>>> start a thread for the MonitorAdder to create the ClusterMonitor.
>>>
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>>
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Starting monitor: [type] cluster [component]
>>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>>
>>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Starting monitor: [type] cluster [component]
>>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>>
>>> Found below log for c3 which indicates that c3 monitor got started
>>> successfully. But there is no such log for c4.
>>>
>>> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
>>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>>> Monitor started successfully: [type] cluster [component]
>>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
>>> seconds
>>>
>>> @Gayan/Imesh, Do you have any input here? Will increasing the threadpool
>>> solve this issue? Or is it related to something else?
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <
>>> meppel@cisco.com> wrote:
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> Re-run the scenario, making sure the application alias and group alias
>>> are as suggested and debug logs are turned on (see config below)
>>>
>>>
>>>
>>> log4j.logger.org.apache.stratos.manager=DEBUG
>>>
>>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>>
>>> log4j.logger.org.apache.stratos.messaging=INFO
>>>
>>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>>
>>> log4j.logger.org.wso2.andes.client=ERROR
>>>
>>>
>>>
>>> This is the scenario:
>>>
>>>
>>>
>>> 1.      deployed application – see screenshot A. , debug logs
>>> wso2carbon-debug.log
>>> only 3 instances spin up
>>>
>>> 2.      removed application
>>>
>>> 3.      re-deployed application – see screenshot B. , debug logs
>>> wso2carbon-debug-2.log
>>> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>>> released”
>>> 2nd time the application gets deployed all instances spin up and go
>>> active
>>>
>>>
>>>
>>>
>>>
>>> Please see attached artifacts and logs.
>>>
>>>
>>>
>>> A.     Application Status after deploying the application first time
>>> after stratos start up:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> B.     Application Status after re-deploying the application
>>>
>>> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
>>> 17:05:23,837] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>>> released”:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>>> *Sent:* Thursday, April 30, 2015 1:40 AM
>>>
>>>
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> If you get this issue continuously, can you please share the logs
>>> against master as we have improved some logs in the master yesterday?
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
>>> wrote:
>>>
>>> Hi Martin,
>>>
>>> I have deployed the attached samples as earlier in openstack with latest
>>> master. All the clusters got created with the members. Please see the
>>> attached diagram. I'm unable to proceed further as my puppet configuration
>>> has to be corrected to make the member active. Thought of sharing this as
>>> all the clusters have members.
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
>>> wrote:
>>>
>>> HI Martin,
>>>
>>> Can you please confirm whether you are using unique applicationId and
>>> group alias? I can see from the UI, the applicationID and next group alias
>>> are same value as sub-G1-G2-G3-1..
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <
>>> meppel@cisco.com> wrote:
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> I have upgraded from beta to the latest stratos code on master and
>>> retested the scenario from jira STRATOS-1345 but still see the same issue
>>> (on open stack)
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>>
>>>
>>> *From:* Martin Eppel (meppel)
>>> *Sent:* Wednesday, April 29, 2015 2:54 PM
>>> *To:* dev@stratos.apache.org
>>> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> I will upgrade my system to the latest master and re-test,
>>>
>>>
>>>
>>> Regards
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
>>> *Sent:* Wednesday, April 29, 2015 11:55 AM
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> Hi Martin,
>>>
>>> While i was working on Application update, i fixed few issues with the
>>> termination behavior. Anyway there seems to be small issues in the logic
>>> which has to be fixed. I have started to verify this in my local setup. Can
>>> you create a jira? So that we can track it. I will update the progress in
>>> the jira..
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <
>>> meppel@cisco.com> wrote:
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> Thanks for following up - let me know if I should open a JIRA,
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>>> *Sent:* Tuesday, April 28, 2015 5:37 AM
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> Hi Martin,
>>>
>>> Thanks for bringing this up. I have fixed some issue in the flow while
>>> testing application update support with instances count. I will go through
>>> your scenarios to reproduce it and update the thread with the progress..
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
>>> wrote:
>>>
>>> I am testing a (nested grouping) scenario where a group defines a
>>> termination behavior “terminate-all”. When terminating the instance (of
>>> cartridge type c3), no new instance is restarted.
>>>
>>> My understanding is that a new instance should be started up.
>>>
>>>
>>>
>>> The scenario looks like this:
>>>
>>>
>>>
>>> Group ~G1 has a cartridge member c1 and group member ~G2
>>>
>>> Group ~G2 has a cartridge member c2 and group member ~G3
>>>
>>> Group ~G3 has a cartridge member c3
>>>
>>>
>>>
>>> Startup dependencies are: c1 depends on G2, c2 depends on G3
>>>
>>>
>>>
>>> ~G1 defines termination: none
>>>
>>> ~G2 defines termination: dependents
>>>
>>> ~G3 defines termination: all
>>>
>>>
>>>
>>> After startup, when all instances are active, instance c3 is terminated
>>> which correctly also terminates also instance c2 (since it depends on G3 /
>>> c3) .
>>>
>>> *Issue 1:*
>>>
>>> However, no new instances for c3 is started up (consequently no new
>>> instance for c2 should be started up as well) (see log see log
>>> wso2carbon.log)
>>>
>>>
>>>
>>> Only instance which remains running is c1.
>>>
>>> *Issue 2:*
>>>
>>> When subsequently c1 is manually being terminated, a new instance of c1
>>> is started up (as opposed to Issue1) which I think is incorrect since it
>>> defines a startup dependency (c1 depends on G2) which is not fulfilled at
>>> the time (G2 should not be active since c2 is still terminated, see log
>>> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>>>
>>>
>>>
>>> WDYT ?
>>>
>>>
>>>
>>> Please find attached artifacts and logs
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Imesh Gunaratne
>>>
>>>
>>>
>>> Senior Technical Lead, WSO2
>>>
>>> Committer & PMC Member, Apache Stratos
>>>
>>
>>
>>
>> --
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>> Mobile: +94776442007
>>
>>
>>
>
>
> --
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
> Mobile: +94776442007
>
>
>


-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi

I suspect the issue is that we use static knowledgeBases map in the
AutoscalerRuleEvaluator. But this is getting initialized by every cluster
monitor. We need to fix this cluster monitor creation flow to use static
knowledgeBases map and initialize only once  or properly sharing this map
across multiple threads, since each cluster monitors are threads.

Since drools file parsing can be done only once and used by all other
monitors, i will work on a fix to make drool file parsing only once. Hope
that fix would solve this issue.

Thanks,
Reka

On Tue, May 5, 2015 at 4:44 PM, Reka Thirunavukkarasu <re...@wso2.com> wrote:

> Hi Martin/Imesh,
>
> Thanks Imesh for adding the exception handling in the monitor creation.
> That helped to narrow down the issue. It was a drool file parsed issue. I
> found below exception in both samples when creating those relevant
> monitors. We will have to identify why the drool parsing gave NPE. @Lahiru,
> Do you have any idea on this? In both samples, the cluster Monitors failed
> when parsing "obsoletecheck.drl".
>
> Since i couldn't figure out the root cause, i have added descriptive debug
> logs (in 5c87d5de2ad15788f47907d89641c52dd3d21d53) in order to isolate the
> issue. @Martin, Would you get a chance to test it and provide us the logs
> again with the same scenario, since I'm unable to reproduce this from my
> side?
>
> scenario_c1_c2_c3_c4_cartridges:
>
> TID: [0] [STRATOS] [2015-05-01 18:24:22,591] DEBUG
> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
> is parsed successfully: obsoletecheck.drl
> TID: [0] [STRATOS] [2015-05-01 18:24:22,594] ERROR
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> An error occurred while starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-Id.c3-1x1.c3.domain
> java.lang.NullPointerException
>     at
> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>     at
> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:137)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>     at
> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>     at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
>
> scenario_c1_c2_cartridges:
>
> TID: [0] [STRATOS] [2015-05-01 17:58:50,824] DEBUG
> {org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
> is parsed successfully: dependent-scaling.drl
> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] ERROR
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> An error occurred while starting monitor: [type] cluster [component]
> subscription-G1-G2-G3-Id.c1-1x1.c1.domain
> java.lang.NullPointerException
>     at
> org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
>     at
> org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:135)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
>     at
> org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
>     at
> org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
>     at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> TID: [0] [STRATOS] [2015-05-01 17:58:50,825] DEBUG
> {org.apache.stratos.autoscaler.monitor.
>
> @Martin, However, there seems to be separate locking issue. That is not
> related to this. For now, that locking issue seems to be harmless. Can we
> track it in a jira?
>
>
> Thanks,
> Reka
>
>
>
> On Sat, May 2, 2015 at 12:28 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>>  Hi Imesh, Reka
>>
>>
>>
>> As request, please see attached artifacts and logs (with debug enabled)
>> to test for the deadlock – stratos is running the latest from master,
>> latest commit :
>>
>> commit ae89ba09491891512a9bc89e080577c565ebe8b7
>>
>> Author: reka <rt...@gmail.com>
>>
>> Date:   Fri May 1 12:30:55 2015 +0530
>>
>>
>>
>> I run 2 similar but slightly scenarios, see [1.], [2.]
>>
>>
>>
>> Java startup with lock monitor enabled:
>>
>>
>>
>> /opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m
>> -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
>>
>> rror
>> -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof
>> -Dcom.sun.management.jmxremote -classpath
>> /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
>>
>> -stratos/lib/commons-lang-2.6.0.wso2v1.jar
>> -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed
>> -Djava.io
>>
>> .tmpdir=/opt/wso2/apache-stratos/tmp
>> -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat
>> -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
>>
>> pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos
>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>> -Dcarbon.config.dir.path=/opt/wso2/apac
>>
>> he-stratos/repository/conf
>> -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties
>> -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins
>> -Dconf.location=/opt/wso2/apache-stratos/repository/conf
>> -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties
>> -Dcom.atomikos.icatch.hide_init_file_path=true
>> -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true
>> -Dcom.sun.jndi.ldap.connect.pool.a
>>
>> uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000
>> -Dorg.terracotta.quartz.skipUpdateCheck=true
>> -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8
>> -Ddisable.cassandra.server.startup=true
>> -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf
>> -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml
>> -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml
>> *-Dread.write.lock.monitor.enabled*=true
>> org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default
>>
>>
>>
>>
>>
>> [1.] scenario_c1_c2_cartridges
>>
>>
>>
>>
>>
>>
>>
>> [1b.] exception
>>
>>
>>
>> *org.apache.stratos.common.exception.LockNotReleasedException*
>>
>> *        at
>> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)*
>>
>> *        at
>> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)*
>>
>> *        at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)*
>>
>> *        at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)*
>>
>> *        at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)*
>>
>> *        at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)*
>>
>> *        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
>>
>> *        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
>>
>> *        at java.lang.Thread.run(Thread.java:745)*
>>
>>
>>
>>
>>
>> [2.] scenario_c1_c2_c3_c4_cartridges
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
>> *Sent:* Thursday, April 30, 2015 10:10 PM
>>
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> In addition we have not added a try catch block in MonitorAdder.run()
>> method to cover its full scope. Therefore if an exception is raised in the
>> middle the above problem also can cause.
>>
>>
>>
>> I have now fixed this in commit revision:
>>
>> 9ec061f44a3189ccd8b509ef4da980687dfbcf62
>>
>>
>>
>> Martin: Appreciate if you could take this fix and retest.
>>
>>
>>
>> Thanks
>>
>>
>>
>> On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> It looks like the MonitorAdder.run() has executed properly, that's why we
>> see the following log:
>>
>>
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting monitor: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>
>>
>>
>> However the thread has not come to its last line:
>>
>> *log*.info(String.*format*(*"Monitor started successfully: [type] %s [component] %s [dependents] %s " *+
>>                 *"[startup-time] %d seconds"*, *monitorTypeStr*, *context*.getId(),
>>
>>
>>
>> As we discussed offline this may have caused by a deadlock while trying
>> to get the following topology lock:
>>
>>
>>
>> *public static *ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
>>                                                ClusterChildContext context,
>>                                                List<String> parentInstanceIds)
>>     ...
>>
>> *//acquire read lock for the service and cluster    *TopologyManager.*acquireReadLockForCluster*(serviceName, clusterId);
>>
>>
>>
>> Martin: Will you be able to do another test run by enabling deadlock
>> detection logic. You could set the following system property to true in the
>> stratos.sh file to do this:
>>
>> *read.write.lock.monitor.enabled=true*
>>
>>  Thanks
>>
>>
>>
>>
>>
>> On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Hi Martin,
>>
>>
>>
>> Thanks Martin for the detailed information in order to analyze the issue.
>> It helped to isolate the issue.
>>
>> As i went through the logs, it seems that some thread issue. I could see
>> below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
>> start a relevant clusterMonitor. After that only c3 got successfully
>> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
>> start a thread for the MonitorAdder to create the ClusterMonitor.
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting monitor: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting monitor: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>
>> Found below log for c3 which indicates that c3 monitor got started
>> successfully. But there is no such log for c4.
>>
>> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Monitor started successfully: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
>> seconds
>>
>> @Gayan/Imesh, Do you have any input here? Will increasing the threadpool
>> solve this issue? Or is it related to something else?
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> Re-run the scenario, making sure the application alias and group alias
>> are as suggested and debug logs are turned on (see config below)
>>
>>
>>
>> log4j.logger.org.apache.stratos.manager=DEBUG
>>
>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>
>> log4j.logger.org.apache.stratos.messaging=INFO
>>
>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>
>> log4j.logger.org.wso2.andes.client=ERROR
>>
>>
>>
>> This is the scenario:
>>
>>
>>
>> 1.      deployed application – see screenshot A. , debug logs
>> wso2carbon-debug.log
>> only 3 instances spin up
>>
>> 2.      removed application
>>
>> 3.      re-deployed application – see screenshot B. , debug logs
>> wso2carbon-debug-2.log
>> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>> released”
>> 2nd time the application gets deployed all instances spin up and go
>> active
>>
>>
>>
>>
>>
>> Please see attached artifacts and logs.
>>
>>
>>
>> A.     Application Status after deploying the application first time
>> after stratos start up:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> B.     Application Status after re-deploying the application
>>
>> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
>> 17:05:23,837] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>> released”:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Thursday, April 30, 2015 1:40 AM
>>
>>
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> If you get this issue continuously, can you please share the logs against
>> master as we have improved some logs in the master yesterday?
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Hi Martin,
>>
>> I have deployed the attached samples as earlier in openstack with latest
>> master. All the clusters got created with the members. Please see the
>> attached diagram. I'm unable to proceed further as my puppet configuration
>> has to be corrected to make the member active. Thought of sharing this as
>> all the clusters have members.
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> HI Martin,
>>
>> Can you please confirm whether you are using unique applicationId and
>> group alias? I can see from the UI, the applicationID and next group alias
>> are same value as sub-G1-G2-G3-1..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>>
>>
>> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> I have upgraded from beta to the latest stratos code on master and
>> retested the scenario from jira STRATOS-1345 but still see the same issue
>> (on open stack)
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Wednesday, April 29, 2015 2:54 PM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Reka,
>>
>>
>>
>> I will upgrade my system to the latest master and re-test,
>>
>>
>>
>> Regards
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
>> *Sent:* Wednesday, April 29, 2015 11:55 AM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Martin,
>>
>> While i was working on Application update, i fixed few issues with the
>> termination behavior. Anyway there seems to be small issues in the logic
>> which has to be fixed. I have started to verify this in my local setup. Can
>> you create a jira? So that we can track it. I will update the progress in
>> the jira..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> Thanks for following up - let me know if I should open a JIRA,
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Tuesday, April 28, 2015 5:37 AM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Martin,
>>
>> Thanks for bringing this up. I have fixed some issue in the flow while
>> testing application update support with instances count. I will go through
>> your scenarios to reproduce it and update the thread with the progress..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> I am testing a (nested grouping) scenario where a group defines a
>> termination behavior “terminate-all”. When terminating the instance (of
>> cartridge type c3), no new instance is restarted.
>>
>> My understanding is that a new instance should be started up.
>>
>>
>>
>> The scenario looks like this:
>>
>>
>>
>> Group ~G1 has a cartridge member c1 and group member ~G2
>>
>> Group ~G2 has a cartridge member c2 and group member ~G3
>>
>> Group ~G3 has a cartridge member c3
>>
>>
>>
>> Startup dependencies are: c1 depends on G2, c2 depends on G3
>>
>>
>>
>> ~G1 defines termination: none
>>
>> ~G2 defines termination: dependents
>>
>> ~G3 defines termination: all
>>
>>
>>
>> After startup, when all instances are active, instance c3 is terminated
>> which correctly also terminates also instance c2 (since it depends on G3 /
>> c3) .
>>
>> *Issue 1:*
>>
>> However, no new instances for c3 is started up (consequently no new
>> instance for c2 should be started up as well) (see log see log
>> wso2carbon.log)
>>
>>
>>
>> Only instance which remains running is c1.
>>
>> *Issue 2:*
>>
>> When subsequently c1 is manually being terminated, a new instance of c1
>> is started up (as opposed to Issue1) which I think is incorrect since it
>> defines a startup dependency (c1 depends on G2) which is not fulfilled at
>> the time (G2 should not be active since c2 is still terminated, see log
>> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>>
>>
>>
>> WDYT ?
>>
>>
>>
>> Please find attached artifacts and logs
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>>
>>
>> --
>>
>> Imesh Gunaratne
>>
>>
>>
>> Senior Technical Lead, WSO2
>>
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
> Mobile: +94776442007
>
>
>


-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin/Imesh,

Thanks Imesh for adding the exception handling in the monitor creation.
That helped to narrow down the issue. It was a drool file parsed issue. I
found below exception in both samples when creating those relevant
monitors. We will have to identify why the drool parsing gave NPE. @Lahiru,
Do you have any idea on this? In both samples, the cluster Monitors failed
when parsing "obsoletecheck.drl".

Since i couldn't figure out the root cause, i have added descriptive debug
logs (in 5c87d5de2ad15788f47907d89641c52dd3d21d53) in order to isolate the
issue. @Martin, Would you get a chance to test it and provide us the logs
again with the same scenario, since I'm unable to reproduce this from my
side?

scenario_c1_c2_c3_c4_cartridges:

TID: [0] [STRATOS] [2015-05-01 18:24:22,591] DEBUG
{org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
is parsed successfully: obsoletecheck.drl
TID: [0] [STRATOS] [2015-05-01 18:24:22,594] ERROR
{org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
An error occurred while starting monitor: [type] cluster [component]
sub-G1-G2-G3-1-Id.c3-1x1.c3.domain
java.lang.NullPointerException
    at
org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
    at
org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:137)
    at
org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
    at
org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
    at
org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

scenario_c1_c2_cartridges:

TID: [0] [STRATOS] [2015-05-01 17:58:50,824] DEBUG
{org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator} -  Drools file
is parsed successfully: dependent-scaling.drl
TID: [0] [STRATOS] [2015-05-01 17:58:50,825] ERROR
{org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
An error occurred while starting monitor: [type] cluster [component]
subscription-G1-G2-G3-Id.c1-1x1.c1.domain
java.lang.NullPointerException
    at
org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.getStatefulSession(AutoscalerRuleEvaluator.java:76)
    at
org.apache.stratos.autoscaler.monitor.cluster.ClusterMonitor.<init>(ClusterMonitor.java:135)
    at
org.apache.stratos.autoscaler.monitor.MonitorFactory.getClusterMonitor(MonitorFactory.java:302)
    at
org.apache.stratos.autoscaler.monitor.MonitorFactory.getMonitor(MonitorFactory.java:83)
    at
org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor$MonitorAdder.run(ParentComponentMonitor.java:844)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
TID: [0] [STRATOS] [2015-05-01 17:58:50,825] DEBUG
{org.apache.stratos.autoscaler.monitor.

@Martin, However, there seems to be separate locking issue. That is not
related to this. For now, that locking issue seems to be harmless. Can we
track it in a jira?


Thanks,
Reka


On Sat, May 2, 2015 at 12:28 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  Hi Imesh, Reka
>
>
>
> As request, please see attached artifacts and logs (with debug enabled) to
> test for the deadlock – stratos is running the latest from master, latest
> commit :
>
> commit ae89ba09491891512a9bc89e080577c565ebe8b7
>
> Author: reka <rt...@gmail.com>
>
> Date:   Fri May 1 12:30:55 2015 +0530
>
>
>
> I run 2 similar but slightly scenarios, see [1.], [2.]
>
>
>
> Java startup with lock monitor enabled:
>
>
>
> /opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m
> -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
>
> rror
> -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof
> -Dcom.sun.management.jmxremote -classpath
> /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
>
> -stratos/lib/commons-lang-2.6.0.wso2v1.jar
> -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed
> -Djava.io
>
> .tmpdir=/opt/wso2/apache-stratos/tmp
> -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat
> -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
>
> pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Dcarbon.config.dir.path=/opt/wso2/apac
>
> he-stratos/repository/conf
> -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties
> -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins
> -Dconf.location=/opt/wso2/apache-stratos/repository/conf
> -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties
> -Dcom.atomikos.icatch.hide_init_file_path=true
> -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true
> -Dcom.sun.jndi.ldap.connect.pool.a
>
> uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000
> -Dorg.terracotta.quartz.skipUpdateCheck=true
> -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8
> -Ddisable.cassandra.server.startup=true
> -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf
> -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml
> -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml
> *-Dread.write.lock.monitor.enabled*=true
> org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default
>
>
>
>
>
> [1.] scenario_c1_c2_cartridges
>
>
>
>
>
>
>
> [1b.] exception
>
>
>
> *org.apache.stratos.common.exception.LockNotReleasedException*
>
> *        at
> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)*
>
> *        at
> org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)*
>
> *        at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)*
>
> *        at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)*
>
> *        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)*
>
> *        at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)*
>
> *        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*
>
> *        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*
>
> *        at java.lang.Thread.run(Thread.java:745)*
>
>
>
>
>
> [2.] scenario_c1_c2_c3_c4_cartridges
>
>
>
>
>
>
>
>
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Thursday, April 30, 2015 10:10 PM
>
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> In addition we have not added a try catch block in MonitorAdder.run()
> method to cover its full scope. Therefore if an exception is raised in the
> middle the above problem also can cause.
>
>
>
> I have now fixed this in commit revision:
>
> 9ec061f44a3189ccd8b509ef4da980687dfbcf62
>
>
>
> Martin: Appreciate if you could take this fix and retest.
>
>
>
> Thanks
>
>
>
> On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Reka,
>
>
>
> It looks like the MonitorAdder.run() has executed properly, that's why we
> see the following log:
>
>
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>
>
>
> However the thread has not come to its last line:
>
> *log*.info(String.*format*(*"Monitor started successfully: [type] %s [component] %s [dependents] %s " *+
>                 *"[startup-time] %d seconds"*, *monitorTypeStr*, *context*.getId(),
>
>
>
> As we discussed offline this may have caused by a deadlock while trying to
> get the following topology lock:
>
>
>
> *public static *ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
>                                                ClusterChildContext context,
>                                                List<String> parentInstanceIds)
>     ...
>
> *//acquire read lock for the service and cluster    *TopologyManager.*acquireReadLockForCluster*(serviceName, clusterId);
>
>
>
> Martin: Will you be able to do another test run by enabling deadlock
> detection logic. You could set the following system property to true in the
> stratos.sh file to do this:
>
> *read.write.lock.monitor.enabled=true*
>
>  Thanks
>
>
>
>
>
> On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
>
>
> Thanks Martin for the detailed information in order to analyze the issue.
> It helped to isolate the issue.
>
> As i went through the logs, it seems that some thread issue. I could see
> below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
> start a relevant clusterMonitor. After that only c3 got successfully
> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
> start a thread for the MonitorAdder to create the ClusterMonitor.
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>
> Found below log for c3 which indicates that c3 monitor got started
> successfully. But there is no such log for c4.
>
> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor started successfully: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
> seconds
>
> @Gayan/Imesh, Do you have any input here? Will increasing the threadpool
> solve this issue? Or is it related to something else?
>
> Thanks,
>
> Reka
>
>
>
>
>
>
>
> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> Re-run the scenario, making sure the application alias and group alias are
> as suggested and debug logs are turned on (see config below)
>
>
>
> log4j.logger.org.apache.stratos.manager=DEBUG
>
> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>
> log4j.logger.org.apache.stratos.messaging=INFO
>
> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>
> log4j.logger.org.wso2.andes.client=ERROR
>
>
>
> This is the scenario:
>
>
>
> 1.      deployed application – see screenshot A. , debug logs
> wso2carbon-debug.log
> only 3 instances spin up
>
> 2.      removed application
>
> 3.      re-deployed application – see screenshot B. , debug logs
> wso2carbon-debug-2.log
> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
> released”
> 2nd time the application gets deployed all instances spin up and go active
>
>
>
>
>
> Please see attached artifacts and logs.
>
>
>
> A.     Application Status after deploying the application first time
> after stratos start up:
>
>
>
>
>
>
>
>
>
> B.     Application Status after re-deploying the application
>
> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
> 17:05:23,837] DEBUG
> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
> released”:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Thursday, April 30, 2015 1:40 AM
>
>
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> If you get this issue continuously, can you please share the logs against
> master as we have improved some logs in the master yesterday?
>
> Thanks,
>
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> Hi Martin,
>
> I have deployed the attached samples as earlier in openstack with latest
> master. All the clusters got created with the members. Please see the
> attached diagram. I'm unable to proceed further as my puppet configuration
> has to be corrected to make the member active. Thought of sharing this as
> all the clusters have members.
>
> Thanks,
>
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
> HI Martin,
>
> Can you please confirm whether you are using unique applicationId and
> group alias? I can see from the UI, the applicationID and next group alias
> are same value as sub-G1-G2-G3-1..
>
> Thanks,
>
> Reka
>
>
>
>
>
> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> I have upgraded from beta to the latest stratos code on master and
> retested the scenario from jira STRATOS-1345 but still see the same issue
> (on open stack)
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Wednesday, April 29, 2015 2:54 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Reka,
>
>
>
> I will upgrade my system to the latest master and re-test,
>
>
>
> Regards
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
> *Sent:* Wednesday, April 29, 2015 11:55 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> While i was working on Application update, i fixed few issues with the
> termination behavior. Anyway there seems to be small issues in the logic
> which has to be fixed. I have started to verify this in my local setup. Can
> you create a jira? So that we can track it. I will update the progress in
> the jira..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Hi Reka,
>
>
>
> Thanks for following up - let me know if I should open a JIRA,
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
> *Sent:* Tuesday, April 28, 2015 5:37 AM
> *To:* dev
> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
> startup and termination issues (?)
>
>
>
> Hi Martin,
>
> Thanks for bringing this up. I have fixed some issue in the flow while
> testing application update support with instances count. I will go through
> your scenarios to reproduce it and update the thread with the progress..
>
> Thanks,
>
> Reka
>
>
>
> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> I am testing a (nested grouping) scenario where a group defines a
> termination behavior “terminate-all”. When terminating the instance (of
> cartridge type c3), no new instance is restarted.
>
> My understanding is that a new instance should be started up.
>
>
>
> The scenario looks like this:
>
>
>
> Group ~G1 has a cartridge member c1 and group member ~G2
>
> Group ~G2 has a cartridge member c2 and group member ~G3
>
> Group ~G3 has a cartridge member c3
>
>
>
> Startup dependencies are: c1 depends on G2, c2 depends on G3
>
>
>
> ~G1 defines termination: none
>
> ~G2 defines termination: dependents
>
> ~G3 defines termination: all
>
>
>
> After startup, when all instances are active, instance c3 is terminated
> which correctly also terminates also instance c2 (since it depends on G3 /
> c3) .
>
> *Issue 1:*
>
> However, no new instances for c3 is started up (consequently no new
> instance for c2 should be started up as well) (see log see log
> wso2carbon.log)
>
>
>
> Only instance which remains running is c1.
>
> *Issue 2:*
>
> When subsequently c1 is manually being terminated, a new instance of c1 is
> started up (as opposed to Issue1) which I think is incorrect since it
> defines a startup dependency (c1 depends on G2) which is not fulfilled at
> the time (G2 should not be active since c2 is still terminated, see log
> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>
>
>
> WDYT ?
>
>
>
> Please find attached artifacts and logs
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
> --
>
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
>
> Mobile: +94776442007
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Senior Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>



-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

RE: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Imesh, Reka

As request, please see attached artifacts and logs (with debug enabled) to test for the deadlock – stratos is running the latest from master, latest commit :
commit ae89ba09491891512a9bc89e080577c565ebe8b7
Author: reka <rt...@gmail.com>
Date:   Fri May 1 12:30:55 2015 +0530

I run 2 similar but slightly scenarios, see [1.], [2.]

Java startup with lock monitor enabled:

/opt/java/bin/java -Xbootclasspath/a: -Xms256m -Xmx2048m -XX:MaxPermSize=256m -server -XX:+HeapDumpOnOutOfMemoryE
rror -XX:HeapDumpPath=/opt/wso2/apache-stratos/repository/logs/heap-dump.hprof -Dcom.sun.management.jmxremote -classpath /opt/java/lib/tools.jar:/opt/wso2/apache-stratos/bin/org.wso2.carbon.bootstrap-4.2.0.jar:/opt/wso2/apache-stratos/bin/tcpmon-1.0.jar:/opt/wso2/apache-stratos/bin/tomcat-juli-7.0.34.jar:/opt/wso2/apache
-stratos/lib/commons-lang-2.6.0.wso2v1.jar -Djava.endorsed.dirs=/opt/wso2/apache-stratos/lib/endorsed:/opt/java/jre/lib/endorsed:/opt/java/lib/endorsed -Djava.io
.tmpdir=/opt/wso2/apache-stratos/tmp -Dcatalina.base=/opt/wso2/apache-stratos/lib/tomcat -Dwso2.server.standalone=true -Dcarbon.registry.root=/ -Djava.command=/o
pt/java/bin/java -Dcarbon.home=/opt/wso2/apache-stratos -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dcarbon.config.dir.path=/opt/wso2/apac
he-stratos/repository/conf -Djava.util.logging.config.file=/opt/wso2/apache-stratos/repository/conf/etc/logging-bridge.properties -Dcomponents.repo=/opt/wso2/apache-stratos/repository/components/plugins -Dconf.location=/opt/wso2/apache-stratos/repository/conf -Dcom.atomikos.icatch.file=/opt/wso2/apache-stratos/lib/transactions.properties -Dcom.atomikos.icatch.hide_init_file_path=true -Dorg.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true -Dcom.sun.jndi.ldap.connect.pool.a
uthentication=simple -Dcom.sun.jndi.ldap.connect.pool.timeout=3000 -Dorg.terracotta.quartz.skipUpdateCheck=true -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF8 -Ddisable.cassandra.server.startup=true -Djndi.properties.dir=/opt/wso2/apache-stratos/repository/conf -Dthrift.client.config.file.path=/opt/wso2/apache-stratos/repository/conf/thrift-client-config.xml -DMETADATA_CLIENT_CONFIG_FILE=/opt/wso2/apache-stratos/repository/conf/metadataservice.xml -Dread.write.lock.monitor.enabled=true org.wso2.carbon.bootstrap.Bootstrap -Dprofile=default


[1.] scenario_c1_c2_cartridges

[cid:image001.png@01D083FF.6D262560]


[1b.] exception

org.apache.stratos.common.exception.LockNotReleasedException
        at org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.checkTimeout(ReadWriteLockMonitor.java:72)
        at org.apache.stratos.common.concurrent.locks.ReadWriteLockMonitor.run(ReadWriteLockMonitor.java:55)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)


[2.] scenario_c1_c2_c3_c4_cartridges

[cid:image005.png@01D08402.44C13C60]




From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Thursday, April 30, 2015 10:10 PM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

In addition we have not added a try catch block in MonitorAdder.run() method to cover its full scope. Therefore if an exception is raised in the middle the above problem also can cause.

I have now fixed this in commit revision:
9ec061f44a3189ccd8b509ef4da980687dfbcf62

Martin: Appreciate if you could take this fix and retest.

Thanks

On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Reka,

It looks like the MonitorAdder.run() has executed properly, that's why we see the following log:

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain

However the thread has not come to its last line:

log.info(String.format("Monitor started successfully: [type] %s [component] %s [dependents] %s " +
                "[startup-time] %d seconds", monitorTypeStr, context.getId(),

As we discussed offline this may have caused by a deadlock while trying to get the following topology lock:


public static ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
                                               ClusterChildContext context,
                                               List<String> parentInstanceIds)
    ...
    //acquire read lock for the service and cluster
    TopologyManager.acquireReadLockForCluster(serviceName, clusterId);

Martin: Will you be able to do another test run by enabling deadlock detection logic. You could set the following system property to true in the stratos.sh file to do this:

read.write.lock.monitor.enabled=true
Thanks


On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,

Thanks Martin for the detailed information in order to analyze the issue. It helped to isolate the issue.
As i went through the logs, it seems that some thread issue. I could see below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be start a relevant clusterMonitor. After that only c3 got successfully started with ClusterMonitor not c4. So the scheduler of c4 didn't actually start a thread for the MonitorAdder to create the ClusterMonitor.

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain

Found below log for c3 which indicates that c3 monitor got started successfully. But there is no such log for c4.
TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -  Monitor started successfully: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3 seconds
@Gayan/Imesh, Do you have any input here? Will increasing the threadpool solve this issue? Or is it related to something else?
Thanks,
Reka



On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Re-run the scenario, making sure the application alias and group alias are as suggested and debug logs are turned on (see config below)

log4j.logger.org.apache.stratos.manager=DEBUG
log4j.logger.org.apache.stratos.autoscaler=DEBUG
log4j.logger.org.apache.stratos.messaging=INFO
log4j.logger.org.apache.stratos.cloud.controller=DEBUG
log4j.logger.org.wso2.andes.client=ERROR

This is the scenario:


1.      deployed application – see screenshot A. , debug logs wso2carbon-debug.log
only 3 instances spin up

2.      removed application

3.      re-deployed application – see screenshot B. , debug logs wso2carbon-debug-2.log
(after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock released”
2nd time the application gets deployed all instances spin up and go active


Please see attached artifacts and logs.


A.     Application Status after deploying the application first time after stratos start up:

[cid:image002.png@01D083FD.A23B1550]




B.     Application Status after re-deploying the application
(see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock released”:

[cid:image003.png@01D083FD.A23B1550]








From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Thursday, April 30, 2015 1:40 AM

To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

If you get this issue continuously, can you please share the logs against master as we have improved some logs in the master yesterday?
Thanks,
Reka

On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,
I have deployed the attached samples as earlier in openstack with latest master. All the clusters got created with the members. Please see the attached diagram. I'm unable to proceed further as my puppet configuration has to be corrected to make the member active. Thought of sharing this as all the clusters have members.
Thanks,
Reka

On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
HI Martin,
Can you please confirm whether you are using unique applicationId and group alias? I can see from the UI, the applicationID and next group alias are same value as sub-G1-G2-G3-1..
Thanks,
Reka


On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

I have upgraded from beta to the latest stratos code on master and retested the scenario from jira STRATOS-1345 but still see the same issue (on open stack)

Thanks

Martin

[cid:image004.png@01D083FD.A23B1550]

From: Martin Eppel (meppel)
Sent: Wednesday, April 29, 2015 2:54 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Reka,

I will upgrade my system to the latest master and re-test,

Regards

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com]
Sent: Wednesday, April 29, 2015 11:55 AM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
While i was working on Application update, i fixed few issues with the termination behavior. Anyway there seems to be small issues in the logic which has to be fixed. I have started to verify this in my local setup. Can you create a jira? So that we can track it. I will update the progress in the jira..
Thanks,
Reka

On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Hi Reka,

Thanks for following up - let me know if I should open a JIRA,

Thanks

Martin

From: Reka Thirunavukkarasu [mailto:reka@wso2.com<ma...@wso2.com>]
Sent: Tuesday, April 28, 2015 5:37 AM
To: dev
Subject: Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Hi Martin,
Thanks for bringing this up. I have fixed some issue in the flow while testing application update support with instances count. I will go through your scenarios to reproduce it and update the thread with the progress..
Thanks,
Reka

On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
I am testing a (nested grouping) scenario where a group defines a termination behavior “terminate-all”. When terminating the instance (of cartridge type c3), no new instance is restarted.
My understanding is that a new instance should be started up.

The scenario looks like this:

Group ~G1 has a cartridge member c1 and group member ~G2
Group ~G2 has a cartridge member c2 and group member ~G3
Group ~G3 has a cartridge member c3

Startup dependencies are: c1 depends on G2, c2 depends on G3

~G1 defines termination: none
~G2 defines termination: dependents
~G3 defines termination: all

After startup, when all instances are active, instance c3 is terminated which correctly also terminates also instance c2 (since it depends on G3 / c3) .
Issue 1:
However, no new instances for c3 is started up (consequently no new instance for c2 should be started up as well) (see log see log wso2carbon.log)

Only instance which remains running is c1.
Issue 2:
When subsequently c1 is manually being terminated, a new instance of c1 is started up (as opposed to Issue1) which I think is incorrect since it defines a startup dependency (c1 depends on G2) which is not fulfilled at the time (G2 should not be active since c2 is still terminated, see log wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)

WDYT ?

Please find attached artifacts and logs

Thanks

Martin



--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007<tel:%2B94776442007>




--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Imesh Gunaratne <im...@apache.org>.
In addition we have not added a try catch block in MonitorAdder.run()
method to cover its full scope. Therefore if an exception is raised in the
middle the above problem also can cause.

I have now fixed this in commit revision:
9ec061f44a3189ccd8b509ef4da980687dfbcf62

Martin: Appreciate if you could take this fix and retest.

Thanks

On Fri, May 1, 2015 at 10:32 AM, Imesh Gunaratne <im...@apache.org> wrote:

> Hi Reka,
>
> It looks like the MonitorAdder.run() has executed properly, that's why we
> see the following log:
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.
> autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor:
> [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>
> However the thread has not come to its last line:
>
> log.info(String.format("Monitor started successfully: [type] %s [component] %s [dependents] %s " +
>                 "[startup-time] %d seconds", monitorTypeStr, context.getId(),
>
>
> As we discussed offline this may have caused by a deadlock while trying to
> get the following topology lock:
>
> public static ClusterMonitor getClusterMonitor(ParentComponentMonitor parentMonitor,
>                                                ClusterChildContext context,
>                                                List<String> parentInstanceIds)
>     ...
>     //acquire read lock for the service and cluster
>     TopologyManager.acquireReadLockForCluster(serviceName, clusterId);
>
>
> Martin: Will you be able to do another test run by enabling deadlock
> detection logic. You could set the following system property to true in the
> stratos.sh file to do this:
>
> read.write.lock.monitor.enabled=true
>
> Thanks
>
>
> On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com>
> wrote:
>
>> Hi Martin,
>>
>> Thanks Martin for the detailed information in order to analyze the issue.
>> It helped to isolate the issue.
>>
>> As i went through the logs, it seems that some thread issue. I could see
>> below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
>> start a relevant clusterMonitor. After that only c3 got successfully
>> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
>> start a thread for the MonitorAdder to create the ClusterMonitor.
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting monitor: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>
>> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Starting monitor: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>>
>> Found below log for c3 which indicates that c3 monitor got started
>> successfully. But there is no such log for c4.
>>
>> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
>> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
>> Monitor started successfully: [type] cluster [component]
>> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
>> seconds
>>
>> @Gayan/Imesh, Do you have any input here? Will increasing the threadpool
>> solve this issue? Or is it related to something else?
>>
>> Thanks,
>> Reka
>>
>>
>>
>> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <meppel@cisco.com
>> > wrote:
>>
>>>  Hi Reka,
>>>
>>>
>>>
>>> Re-run the scenario, making sure the application alias and group alias
>>> are as suggested and debug logs are turned on (see config below)
>>>
>>>
>>>
>>> log4j.logger.org.apache.stratos.manager=DEBUG
>>>
>>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>>
>>> log4j.logger.org.apache.stratos.messaging=INFO
>>>
>>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>>
>>> log4j.logger.org.wso2.andes.client=ERROR
>>>
>>>
>>>
>>> This is the scenario:
>>>
>>>
>>>
>>> 1.      deployed application – see screenshot A. , debug logs
>>> wso2carbon-debug.log
>>> only 3 instances spin up
>>>
>>> 2.      removed application
>>>
>>> 3.      re-deployed application – see screenshot B. , debug logs
>>> wso2carbon-debug-2.log
>>> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>>> released”
>>> 2nd time the application gets deployed all instances spin up and go
>>> active
>>>
>>>
>>>
>>>
>>>
>>> Please see attached artifacts and logs.
>>>
>>>
>>>
>>> A.     Application Status after deploying the application first time
>>> after stratos start up:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> B.     Application Status after re-deploying the application
>>>
>>> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
>>> 17:05:23,837] DEBUG
>>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>>> released”:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>>> *Sent:* Thursday, April 30, 2015 1:40 AM
>>>
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> If you get this issue continuously, can you please share the logs
>>> against master as we have improved some logs in the master yesterday?
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
>>> wrote:
>>>
>>> Hi Martin,
>>>
>>> I have deployed the attached samples as earlier in openstack with latest
>>> master. All the clusters got created with the members. Please see the
>>> attached diagram. I'm unable to proceed further as my puppet configuration
>>> has to be corrected to make the member active. Thought of sharing this as
>>> all the clusters have members.
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
>>> wrote:
>>>
>>> HI Martin,
>>>
>>> Can you please confirm whether you are using unique applicationId and
>>> group alias? I can see from the UI, the applicationID and next group alias
>>> are same value as sub-G1-G2-G3-1..
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <
>>> meppel@cisco.com> wrote:
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> I have upgraded from beta to the latest stratos code on master and
>>> retested the scenario from jira STRATOS-1345 but still see the same issue
>>> (on open stack)
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>>
>>>
>>> *From:* Martin Eppel (meppel)
>>> *Sent:* Wednesday, April 29, 2015 2:54 PM
>>> *To:* dev@stratos.apache.org
>>> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> I will upgrade my system to the latest master and re-test,
>>>
>>>
>>>
>>> Regards
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
>>> *Sent:* Wednesday, April 29, 2015 11:55 AM
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> Hi Martin,
>>>
>>> While i was working on Application update, i fixed few issues with the
>>> termination behavior. Anyway there seems to be small issues in the logic
>>> which has to be fixed. I have started to verify this in my local setup. Can
>>> you create a jira? So that we can track it. I will update the progress in
>>> the jira..
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <
>>> meppel@cisco.com> wrote:
>>>
>>> Hi Reka,
>>>
>>>
>>>
>>> Thanks for following up - let me know if I should open a JIRA,
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>>> *Sent:* Tuesday, April 28, 2015 5:37 AM
>>> *To:* dev
>>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>>> startup and termination issues (?)
>>>
>>>
>>>
>>> Hi Martin,
>>>
>>> Thanks for bringing this up. I have fixed some issue in the flow while
>>> testing application update support with instances count. I will go through
>>> your scenarios to reproduce it and update the thread with the progress..
>>>
>>> Thanks,
>>>
>>> Reka
>>>
>>>
>>>
>>> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
>>> wrote:
>>>
>>> I am testing a (nested grouping) scenario where a group defines a
>>> termination behavior “terminate-all”. When terminating the instance (of
>>> cartridge type c3), no new instance is restarted.
>>>
>>> My understanding is that a new instance should be started up.
>>>
>>>
>>>
>>> The scenario looks like this:
>>>
>>>
>>>
>>> Group ~G1 has a cartridge member c1 and group member ~G2
>>>
>>> Group ~G2 has a cartridge member c2 and group member ~G3
>>>
>>> Group ~G3 has a cartridge member c3
>>>
>>>
>>>
>>> Startup dependencies are: c1 depends on G2, c2 depends on G3
>>>
>>>
>>>
>>> ~G1 defines termination: none
>>>
>>> ~G2 defines termination: dependents
>>>
>>> ~G3 defines termination: all
>>>
>>>
>>>
>>> After startup, when all instances are active, instance c3 is terminated
>>> which correctly also terminates also instance c2 (since it depends on G3 /
>>> c3) .
>>>
>>> *Issue 1:*
>>>
>>> However, no new instances for c3 is started up (consequently no new
>>> instance for c2 should be started up as well) (see log see log
>>> wso2carbon.log)
>>>
>>>
>>>
>>> Only instance which remains running is c1.
>>>
>>> *Issue 2:*
>>>
>>> When subsequently c1 is manually being terminated, a new instance of c1
>>> is started up (as opposed to Issue1) which I think is incorrect since it
>>> defines a startup dependency (c1 depends on G2) which is not fulfilled at
>>> the time (G2 should not be active since c2 is still terminated, see log
>>> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>>>
>>>
>>>
>>> WDYT ?
>>>
>>>
>>>
>>> Please find attached artifacts and logs
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Reka Thirunavukkarasu
>>> Senior Software Engineer,
>>> WSO2, Inc.:http://wso2.com,
>>>
>>> Mobile: +94776442007
>>>
>>>
>>>
>>
>>
>>
>> --
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>> Mobile: +94776442007
>>
>>
>>
>
>
> --
> Imesh Gunaratne
>
> Senior Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Testing Stratos 4.1 : nested grouping scenario with startup and termination issues (?)

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Reka,

It looks like the MonitorAdder.run() has executed properly, that's why we
see the following log:

TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO {org.apache.stratos.
autoscaler.monitor.component.ParentComponentMonitor} -  Starting monitor:
[type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain

However the thread has not come to its last line:

log.info(String.format("Monitor started successfully: [type] %s
[component] %s [dependents] %s " +
                "[startup-time] %d seconds", monitorTypeStr, context.getId(),


As we discussed offline this may have caused by a deadlock while trying to
get the following topology lock:

public static ClusterMonitor getClusterMonitor(ParentComponentMonitor
parentMonitor,
                                               ClusterChildContext context,
                                               List<String> parentInstanceIds)
    ...
    //acquire read lock for the service and cluster
    TopologyManager.acquireReadLockForCluster(serviceName, clusterId);


Martin: Will you be able to do another test run by enabling deadlock
detection logic. You could set the following system property to true in the
stratos.sh file to do this:

read.write.lock.monitor.enabled=true

Thanks


On Fri, May 1, 2015 at 7:40 AM, Reka Thirunavukkarasu <re...@wso2.com> wrote:

> Hi Martin,
>
> Thanks Martin for the detailed information in order to analyze the issue.
> It helped to isolate the issue.
>
> As i went through the logs, it seems that some thread issue. I could see
> below log for c4-1x1 and c3-1x1. In that case c3 and c4 got scheduled to be
> start a relevant clusterMonitor. After that only c3 got successfully
> started with ClusterMonitor not c4. So the scheduler of c4 didn't actually
> start a thread for the MonitorAdder to create the ClusterMonitor.
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,712]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c4-1x1.c4.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting dependent monitor: [application] sub-G1-G2-G3-1-G4 [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor scheduled: [type] cluster [component] sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>
> TID: [0] [STRATOS] [2015-04-30 16:48:57,713]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Starting monitor: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain
>
> Found below log for c3 which indicates that c3 monitor got started
> successfully. But there is no such log for c4.
>
> TID: [0] [STRATOS] [2015-04-30 16:49:00,760]  INFO
> {org.apache.stratos.autoscaler.monitor.component.ParentComponentMonitor} -
> Monitor started successfully: [type] cluster [component]
> sub-G1-G2-G3-1-G4.c3-1x1.c3.domain [dependents] none [startup-time] 3
> seconds
>
> @Gayan/Imesh, Do you have any input here? Will increasing the threadpool
> solve this issue? Or is it related to something else?
>
> Thanks,
> Reka
>
>
>
> On Thu, Apr 30, 2015 at 10:54 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>>  Hi Reka,
>>
>>
>>
>> Re-run the scenario, making sure the application alias and group alias
>> are as suggested and debug logs are turned on (see config below)
>>
>>
>>
>> log4j.logger.org.apache.stratos.manager=DEBUG
>>
>> log4j.logger.org.apache.stratos.autoscaler=DEBUG
>>
>> log4j.logger.org.apache.stratos.messaging=INFO
>>
>> log4j.logger.org.apache.stratos.cloud.controller=DEBUG
>>
>> log4j.logger.org.wso2.andes.client=ERROR
>>
>>
>>
>> This is the scenario:
>>
>>
>>
>> 1.      deployed application – see screenshot A. , debug logs
>> wso2carbon-debug.log
>> only 3 instances spin up
>>
>> 2.      removed application
>>
>> 3.      re-deployed application – see screenshot B. , debug logs
>> wso2carbon-debug-2.log
>> (after line “TID: [0] [STRATOS] [2015-04-30 17:05:23,837] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>> released”
>> 2nd time the application gets deployed all instances spin up and go
>> active
>>
>>
>>
>>
>>
>> Please see attached artifacts and logs.
>>
>>
>>
>> A.     Application Status after deploying the application first time
>> after stratos start up:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> B.     Application Status after re-deploying the application
>>
>> (see log wso2carbon-debug-2.log after “TID: [0] [STRATOS] [2015-04-30
>> 17:05:23,837] DEBUG
>> {org.apache.stratos.autoscaler.applications.ApplicationHolder} -  Read lock
>> released”:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Thursday, April 30, 2015 1:40 AM
>>
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> If you get this issue continuously, can you please share the logs against
>> master as we have improved some logs in the master yesterday?
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Thu, Apr 30, 2015 at 2:08 PM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> Hi Martin,
>>
>> I have deployed the attached samples as earlier in openstack with latest
>> master. All the clusters got created with the members. Please see the
>> attached diagram. I'm unable to proceed further as my puppet configuration
>> has to be corrected to make the member active. Thought of sharing this as
>> all the clusters have members.
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Thu, Apr 30, 2015 at 10:25 AM, Reka Thirunavukkarasu <re...@wso2.com>
>> wrote:
>>
>> HI Martin,
>>
>> Can you please confirm whether you are using unique applicationId and
>> group alias? I can see from the UI, the applicationID and next group alias
>> are same value as sub-G1-G2-G3-1..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>>
>>
>> On Thu, Apr 30, 2015 at 10:16 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> I have upgraded from beta to the latest stratos code on master and
>> retested the scenario from jira STRATOS-1345 but still see the same issue
>> (on open stack)
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>>
>>
>> *From:* Martin Eppel (meppel)
>> *Sent:* Wednesday, April 29, 2015 2:54 PM
>> *To:* dev@stratos.apache.org
>> *Subject:* RE: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Reka,
>>
>>
>>
>> I will upgrade my system to the latest master and re-test,
>>
>>
>>
>> Regards
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com <re...@wso2.com>]
>> *Sent:* Wednesday, April 29, 2015 11:55 AM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Martin,
>>
>> While i was working on Application update, i fixed few issues with the
>> termination behavior. Anyway there seems to be small issues in the logic
>> which has to be fixed. I have started to verify this in my local setup. Can
>> you create a jira? So that we can track it. I will update the progress in
>> the jira..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Tue, Apr 28, 2015 at 10:11 PM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> Hi Reka,
>>
>>
>>
>> Thanks for following up - let me know if I should open a JIRA,
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>> *From:* Reka Thirunavukkarasu [mailto:reka@wso2.com]
>> *Sent:* Tuesday, April 28, 2015 5:37 AM
>> *To:* dev
>> *Subject:* Re: Testing Stratos 4.1 : nested grouping scenario with
>> startup and termination issues (?)
>>
>>
>>
>> Hi Martin,
>>
>> Thanks for bringing this up. I have fixed some issue in the flow while
>> testing application update support with instances count. I will go through
>> your scenarios to reproduce it and update the thread with the progress..
>>
>> Thanks,
>>
>> Reka
>>
>>
>>
>> On Tue, Apr 28, 2015 at 7:08 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>> I am testing a (nested grouping) scenario where a group defines a
>> termination behavior “terminate-all”. When terminating the instance (of
>> cartridge type c3), no new instance is restarted.
>>
>> My understanding is that a new instance should be started up.
>>
>>
>>
>> The scenario looks like this:
>>
>>
>>
>> Group ~G1 has a cartridge member c1 and group member ~G2
>>
>> Group ~G2 has a cartridge member c2 and group member ~G3
>>
>> Group ~G3 has a cartridge member c3
>>
>>
>>
>> Startup dependencies are: c1 depends on G2, c2 depends on G3
>>
>>
>>
>> ~G1 defines termination: none
>>
>> ~G2 defines termination: dependents
>>
>> ~G3 defines termination: all
>>
>>
>>
>> After startup, when all instances are active, instance c3 is terminated
>> which correctly also terminates also instance c2 (since it depends on G3 /
>> c3) .
>>
>> *Issue 1:*
>>
>> However, no new instances for c3 is started up (consequently no new
>> instance for c2 should be started up as well) (see log see log
>> wso2carbon.log)
>>
>>
>>
>> Only instance which remains running is c1.
>>
>> *Issue 2:*
>>
>> When subsequently c1 is manually being terminated, a new instance of c1
>> is started up (as opposed to Issue1) which I think is incorrect since it
>> defines a startup dependency (c1 depends on G2) which is not fulfilled at
>> the time (G2 should not be active since c2 is still terminated, see log
>> wso2carbon-issue2.log, same log as wso2carbon.log but at a later time)
>>
>>
>>
>> WDYT ?
>>
>>
>>
>> Please find attached artifacts and logs
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>>
>>
>>
>> --
>>
>> Reka Thirunavukkarasu
>> Senior Software Engineer,
>> WSO2, Inc.:http://wso2.com,
>>
>> Mobile: +94776442007
>>
>>
>>
>
>
>
> --
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
> Mobile: +94776442007
>
>
>


-- 
Imesh Gunaratne

Senior Technical Lead, WSO2
Committer & PMC Member, Apache Stratos