You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stratos.apache.org by "Martin Eppel (meppel)" <me...@cisco.com> on 2015/02/14 00:48:20 UTC

RE: Issue with java cartridge agent ... was RE: [Discuss] Issue in Cartridge Agent Config Initialization Logic

I had a question on this, I added handling of the “Member Initialized event” but noticed that, at least in all the tests I run so far, the “Member Initialized event” is being sent long before the cartridge agent is up (and the event listeners are initialized). Is this tolerable or is the expectation that the event should be handled by the cartridge agent event handler every time a new instance is started ?

See the log below, the stratos log shows that the event is published @ 2015-02-13 21:24:11,947 is published while the cartridge agent (java process) is not started until 21:25:43,786842294 .

Btw, the clocks between the VM running the stratos process and the VM running the agent are aligned.

Thanks

Martin


Stratos log:

TID: [0] [STRATOS] [2015-02-13 21:24:11,936]  INFO {org.apache.stratos.cloud.controller.messaging.topology.TopologyBuilder} -  Member status updated to initialized
TID: [0] [STRATOS] [2015-02-13 21:24:11,947]  INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} -  Publishing member initialized event: [service-name] cisco-sample-vm [cluster-id] csco_sample_cartridge.cisco-sample-vm.domain [cluster-instance-id] cisco_sample-1 [member-id] csco_sample_cartridge.cisco-sample-vm.domain35ded366-6d7e-474c-998c-1df8feb780b5 [instance-id] RegionOne/a4a0a5c2-638b-4c01-9797-e5fa674ac446 [network-partition-id] N1 [partition-id] RegionOne-Core [lb-cluster-id] null


Cartridge agent log:
root@cisco-sample-vm-172-16-2-17:/var/log/apache-stratos# vi cartridge-agent.log
[2015-02-13 21:25:43,786842294] %%stratos-wrapper.sh-Info: System variables: knock-address localhost, instance-uuid a4a0a5c2-638b-4c01-9797-e5fa674ac446, launch-params /var/lib/qtcm/cartridge-agent/launch-params, app_path /var/lib/qtcm/cartridge-agent/cartridge-app-data

From: Imesh Gunaratne [mailto:imesh@apache.org]
Sent: Thursday, February 12, 2015 10:25 AM
To: dev
Subject: Re: Issue with java cartrdieg agent ... was RE: [Discuss] Issue in Cartridge Agent Config Initialization Logic

Hi Martin,

It looks like the Member Initialized event listener is not handled in Java Cartridge Agent (JCA). Cartridge agent needs to wait until Member Initialized event is received before starting any action.

Thanks

On Thu, Feb 12, 2015 at 11:22 PM, Martin Eppel (meppel) <me...@cisco.com>> wrote:
Haven’t heard back, anyone has an idea   ?

Thanks

Martin

From: Martin Eppel (meppel)
Sent: Wednesday, February 11, 2015 8:49 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: RE: [Discuss] Issue in Cartridge Agent Config Initialization Logic

I am seeing some inconsistent behavior with the java cartridge agent going (mostly) active but sometimes not. I was wondering if I hit the discussed race condition or some other issue ?
I attached the log for an error scenario (cartridge_agent.log.error) and in comparison the log for a successful run (cartridge-agent.log),
What would be the suggested fix ?

Thanks

Martin

From: Gayan Gunarathne [mailto:gayang@wso2.com]
Sent: Tuesday, January 27, 2015 10:45 PM
To: dev@stratos.apache.org<ma...@stratos.apache.org>
Subject: Re: [Discuss] Issue in Cartridge Agent Config Initialization Logic

I think we need to check the topology consistency at once in the start up. Once initialized set to true ,IMO there is no point to check the topology consistence again unless the cartridge agent restart.

Thanks,
Gayan


On Wed, Jan 28, 2015 at 9:47 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Chamila,

Thanks for the feedback. Yes there is a possibility of cartridge agent could miss the Member Initialized event if agent starts after publishing the event. If so we may need to listen to Complete Topology event and check whether the member is in Initialized state and set this flag to True.

According to I saw in the current codebase we have set this flag to True each time Complete Topology event is received. Therefore Cloud Controller receives Instance Started event before publishing Member Initialized event.

Thanks

On Wed, Jan 28, 2015 at 1:29 AM, Chamila De Alwis <ch...@wso2.com>> wrote:
Hi Imesh,

This was done to reflect the changes committed by Raj [1]. The CompleteTopologyEvent is checked for consistency (member id being present in the topology) and the initialized flag is set to true. Almost all the time, the initialization goes through this path, because the InstanceSpawnedEvent is missed by the agent. Is there a particular way this breaks the member life cycle?

[1] - https://github.com/apache/stratos/commit/5e41897eb730b941f2d2521f15dd6378eaddddda


Regards,
Chamila de Alwis
Software Engineer | WSO2 | +94772207163<tel:%2B94772207163>
Blog: code.chamiladealwis.com<http://code.chamiladealwis.com>



On Wed, Jan 28, 2015 at 12:00 AM, Imesh Gunaratne <im...@apache.org>> wrote:
Hi Devs,

I think there is an issue cartridge agent config initialization logic. Lakmal reported this while testing Kubernetes workflow. Cartridge agent wait until the Member Initialization event to send the Instance Started event, this is the intended behaviour:

[cid:image001.png@01D04794.0EFB5A70]
​
However Complete Topology event makes the above property True:

[cid:image002.png@01D04794.0EFB5A70]
​As a result Member lifecycle breaks and member does not become active. I have now fixed this by removing the above highlighted line (in grey). Will build the php docker image and test this again.

Thanks


--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos




--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos



--

Gayan Gunarathne
Technical Lead
WSO2 Inc. (http://wso2.com<http://wso2.com/>)
email  : gayang@wso2.com<ma...@wso2.com>  | mobile : +94 766819985<tel:%2B94%20766819985>




--
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Issue with java cartridge agent ... was RE: [Discuss] Issue in Cartridge Agent Config Initialization Logic

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Martin,

Yes what you have observed is correct. Member initialized event can be
published even before cartridge agent starts.

To handle this we need to pause cartridge agent at the startup until it
receives the Complete Topology event. If member is not in Initialized state
in the Complete Topology event then it needs to wait until the Member
Initialized event comes.

This is how we have implemented PCA.

Thanks

On Sat, Feb 14, 2015 at 5:18 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>  I had a question on this, I added handling of the “Member Initialized
> event” but noticed that, at least in all the tests I run so far, the
> “Member Initialized event” is being sent long before the cartridge agent is
> up (and the event listeners are initialized). Is this tolerable or is the
> expectation that the event should be handled by the cartridge agent event
> handler every time a new instance is started ?
>
>
>
> See the log below, the stratos log shows that the event is published @ 2015-02-13
> 21:24:11,947 is published while the cartridge agent (java process) is not
> started until 21:25:43,786842294 .
>
>
>
> Btw, the clocks between the VM running the stratos process and the VM
> running the agent are aligned.
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> Stratos log:
>
>
>
> TID: [0] [STRATOS] [2015-02-13 21:24:11,936]  INFO
> {org.apache.stratos.cloud.controller.messaging.topology.TopologyBuilder} -
> Member status updated to initialized
>
> TID: [0] [STRATOS] [2015-02-13 21:24:11,947]  INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> -  Publishing member initialized event: [service-name] cisco-sample-vm
> [cluster-id] csco_sample_cartridge.cisco-sample-vm.domain
> [cluster-instance-id] cisco_sample-1 [member-id]
> csco_sample_cartridge.cisco-sample-vm.domain35ded366-6d7e-474c-998c-1df8feb780b5
> [instance-id] RegionOne/a4a0a5c2-638b-4c01-9797-e5fa674ac446
> [network-partition-id] N1 [partition-id] RegionOne-Core [lb-cluster-id] null
>
>
>
>
>
> Cartridge agent log:
>
> root@cisco-sample-vm-172-16-2-17:/var/log/apache-stratos# vi
> cartridge-agent.log
>
> [2015-02-13 21:25:43,786842294] %%stratos-wrapper.sh-Info: System
> variables: knock-address localhost, instance-uuid
> a4a0a5c2-638b-4c01-9797-e5fa674ac446, launch-params
> /var/lib/qtcm/cartridge-agent/launch-params, app_path
> /var/lib/qtcm/cartridge-agent/cartridge-app-data
>
>
>
> *From:* Imesh Gunaratne [mailto:imesh@apache.org]
> *Sent:* Thursday, February 12, 2015 10:25 AM
> *To:* dev
> *Subject:* Re: Issue with java cartrdieg agent ... was RE: [Discuss]
> Issue in Cartridge Agent Config Initialization Logic
>
>
>
> Hi Martin,
>
>
>
> It looks like the Member Initialized event listener is not handled in Java
> Cartridge Agent (JCA). Cartridge agent needs to wait until Member
> Initialized event is received before starting any action.
>
>
>
> Thanks
>
>
>
> On Thu, Feb 12, 2015 at 11:22 PM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
> Haven’t heard back, anyone has an idea   ?
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Martin Eppel (meppel)
> *Sent:* Wednesday, February 11, 2015 8:49 PM
> *To:* dev@stratos.apache.org
> *Subject:* RE: [Discuss] Issue in Cartridge Agent Config Initialization
> Logic
>
>
>
> I am seeing some inconsistent behavior with the java cartridge agent going
> (mostly) active but sometimes not. I was wondering if I hit the discussed
> race condition or some other issue ?
>
> I attached the log for an error scenario (cartridge_agent.log.error) and
> in comparison the log for a successful run (cartridge-agent.log),
>
> What would be the suggested fix ?
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
> *From:* Gayan Gunarathne [mailto:gayang@wso2.com <ga...@wso2.com>]
> *Sent:* Tuesday, January 27, 2015 10:45 PM
> *To:* dev@stratos.apache.org
> *Subject:* Re: [Discuss] Issue in Cartridge Agent Config Initialization
> Logic
>
>
>
> I think we need to check the topology consistency at once in the start up.
> Once initialized set to true ,IMO there is no point to check the topology
> consistence again unless the cartridge agent restart.
>
>
>
> Thanks,
>
> Gayan
>
>
>
>
>
> On Wed, Jan 28, 2015 at 9:47 AM, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Chamila,
>
>
>
> Thanks for the feedback. Yes there is a possibility of cartridge agent
> could miss the Member Initialized event if agent starts after publishing
> the event. If so we may need to listen to Complete Topology event and check
> whether the member is in Initialized state and set this flag to True.
>
>
>
> According to I saw in the current codebase we have set this flag to True
> each time Complete Topology event is received. Therefore Cloud Controller
> receives Instance Started event before publishing Member Initialized event.
>
>
>
> Thanks
>
>
>
> On Wed, Jan 28, 2015 at 1:29 AM, Chamila De Alwis <ch...@wso2.com>
> wrote:
>
> Hi Imesh,
>
>
>
> This was done to reflect the changes committed by Raj [1]. The
> CompleteTopologyEvent is checked for consistency (member id being present
> in the topology) and the initialized flag is set to true. Almost all the
> time, the initialization goes through this path, because the
> InstanceSpawnedEvent is missed by the agent. Is there a particular way this
> breaks the member life cycle?
>
>
>
> [1] -
> https://github.com/apache/stratos/commit/5e41897eb730b941f2d2521f15dd6378eaddddda
>
>
>
>
> Regards,
>
> Chamila de Alwis
>
> Software Engineer | WSO2 | +94772207163
>
> Blog: code.chamiladealwis.com
>
>
>
>
>
>
>
> On Wed, Jan 28, 2015 at 12:00 AM, Imesh Gunaratne <im...@apache.org>
> wrote:
>
> Hi Devs,
>
>
>
> I think there is an issue cartridge agent config initialization logic.
> Lakmal reported this while testing Kubernetes workflow. Cartridge agent
> wait until the Member Initialization event to send the Instance Started
> event, this is the intended behaviour:
>
>
>
>    ​
>
> However Complete Topology event makes the above property True:
>
>
>
>    ​As a result Member lifecycle breaks and member does not become
> active. I have now fixed this by removing the above highlighted line (in
> grey). Will build the php docker image and test this again.
>
>
>
> Thanks
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>
>
>
>
>
> --
>
>
>
> Gayan Gunarathne
>
> Technical Lead
>
> WSO2 Inc. (http://wso2.com)
>
> email  : gayang@wso2.com  | mobile : +94 766819985
>
>
>
>
>
>
>
> --
>
> Imesh Gunaratne
>
>
>
> Technical Lead, WSO2
>
> Committer & PMC Member, Apache Stratos
>



-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos