You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stratos.apache.org by "Michiel Blokzijl (mblokzij)" <mb...@cisco.com> on 2015/04/20 16:14:27 UTC

Topology inconsistent

Hi,
I’m looking at an issue with Stratos 4.0.0 code, and I’m having an issue with the cartridge agent. It complains about the topology being inconsistent, triggered by this code [1].

This causes the extension handler not to fire for cartridges going down.

[2015-04-19 07:19:22,486]  INFO - [MemberTerminatedMessageProcessor] Member terminated: [service] XXX [cluster] XXX [member] XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
[2015-04-19 07:19:22,486]  INFO - [DefaultExtensionHandler] Member terminated event received: [service] XXX [cluster] XX [member] XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
[2015-04-19 07:19:22,486] ERROR - [ExtensionUtils] Member id not found in topology [member] XXXX.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
[2015-04-19 07:19:22,486] ERROR - [DefaultExtensionHandler] Topology is inconsistent...failed to execute member terminated event

Any idea what’s going wrong here?

I assume the topology isn’t being maintained correctly for some reason, but I haven’t quite figured out how/if the topology is being maintained at all. Looking at the complete topology event handler [2] for example, it doesn’t actually update the internally stored topology.. There’s nothing in the cartridge agent that calls the topology manager’s acquireWriteLock function..

Best regards,

Michiel

[1] https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374

[2] https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328

[diff attached] Passing TopologyEvent data to cartridge agent extensions (was: Re: Topology inconsistent)

Posted by "Michiel Blokzijl (mblokzij)" <mb...@cisco.com>.
Hi,

It looks like this isn’t an issue in the latest 4.1 code, so that’s good.

However, in the 4.1 code we’ve lost the ability to pass data about topology events to the extensions run by the cartridge agent. I’ve attached a diff which shows how I would add this functionality back. Basically I’d extend the (Topology)Event interface to include a toEnv() method, which can be overridden by the subclasses to populate a HashMap with event data. This HashMap will then be passed into the subprocess as extra environment info.

How do people feel about the approach? I think it’s cleaner than the approach that existed in the 4.0 code, where the calling code put the event info into environment variables.

If people think this is a good idea I’m happy to expand it to cover the other TopologyEvents, and go through the process of getting it into the codebase. Feedback is welcome!

Best regards,

Michiel




> On 22 Apr 2015, at 19:05, Imesh Gunaratne <im...@apache.org> wrote:
> 
> Hi Michiel,
> 
> In JCA topology is handled by the messaging module, you could see how topology is updated on Complete Topology event here:
> https://github.com/apache/stratos/blob/master/components/org.apache.stratos.messaging/src/main/java/org/apache/stratos/messaging/message/processor/topology/CompleteTopologyMessageProcessor.java <https://github.com/apache/stratos/blob/master/components/org.apache.stratos.messaging/src/main/java/org/apache/stratos/messaging/message/processor/topology/CompleteTopologyMessageProcessor.java>
> 
> In PCA, still topology is not properly updated after initializing it with the Complete Topology event.
> 
> 
> On Wed, Apr 22, 2015 at 7:28 PM, Michiel Blokzijl (mblokzij) <mblokzij@cisco.com <ma...@cisco.com>> wrote:
> Hi Imesh,
> 
>> Ideally cartridge agent should only wait for Complete Topology event once in its lifecycle. If it is waiting more than once then there is an issue.
> 
> 
> That’s not the problem, it only waits once for the complete topology.
> 
> To me it looks like the topology is never updated, or if it is, then it’s not clear to me how that’s happening? It looks like the Python cartridge agent for example does call an ‘update’ method:
> https://github.com/apache/stratos/blob/22fdf78be8a62312a65b23e017f0de20cfad82b2/components/org.apache.stratos.python.cartridge.agent/src/main/python/cartridge.agent/cartridge.agent/agent.py#L250 <https://github.com/apache/stratos/blob/22fdf78be8a62312a65b23e017f0de20cfad82b2/components/org.apache.stratos.python.cartridge.agent/src/main/python/cartridge.agent/cartridge.agent/agent.py#L250>
> 
> - I don’t see anything similar in the Java cartridge agent.
> 
> Please could someone confirm whether this is the case, and perhaps explain how updating the topology is supposed to work in the Java cartridge agent?
> 
> Best regards,
> 
> Michiel
> 
> On 20 Apr 2015, at 19:05, Imesh Gunaratne <imesh@apache.org <ma...@apache.org>> wrote:
> 
>> Hi Michiel,
>> 
>> It's a pleasure! My guess is that either cartridge agent has been restarted or there is a bug in its logic.
>> 
>> Ideally cartridge agent should only wait for Complete Topology event once in its lifecycle. If it is waiting more than once then there is an issue.
>> 
>> Thanks
>> 
>> On Mon, Apr 20, 2015 at 11:06 PM, Michiel Blokzijl (mblokzij) <mblokzij@cisco.com <ma...@cisco.com>> wrote:
>> HI Imesh,
>> 
>> Thanks for replying,
>> 
>>> This issue might occur if the cartridge agent start processing member events before consuming Complete Topology event.
>> 
>> 
>> The issue happened way after that, I had Stratos running for a day or so, and in the logs I saw some “waiting for complete topology event ..” but they went away pretty quickly (way before this happened).
>> 
>> Is this the code that’s supposed to do the updates? https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328 <https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328>
>> 
>> Because I don’t see anything that actually updates anything (beyond function-local variables like ‘env')..
>> 
>> Michiel
>> 
>> On 20 Apr 2015, at 18:13, Imesh Gunaratne <imesh@apache.org <ma...@apache.org>> wrote:
>> 
>>> Hi Michiel,
>>> 
>>> This issue might occur if the cartridge agent start processing member events before consuming Complete Topology event.
>>> 
>>> This is how the topology get initialized in any component that listen to topology topic in message broker; First of all when the component starts up it waits for the Complete Topology event to receive. This event is periodically published by Cloud Controller with the entire topology of a given moment of time.
>>> 
>>> Once it is received the component would initialize the local topology and start listening to other events. Since Complete Topology event has given the latest state of the topology now the component can consume any other event published afterwards.
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> On Mon, Apr 20, 2015 at 7:44 PM, Michiel Blokzijl (mblokzij) <mblokzij@cisco.com <ma...@cisco.com>> wrote:
>>> Hi,
>>> I’m looking at an issue with Stratos 4.0.0 code, and I’m having an issue with the cartridge agent. It complains about the topology being inconsistent, triggered by this code [1].
>>> 
>>> This causes the extension handler not to fire for cartridges going down.
>>> 
>>> [2015-04-19 07:19:22,486]  INFO - [MemberTerminatedMessageProcessor] Member terminated: [service] XXX [cluster] XXX [member] XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>>> [2015-04-19 07:19:22,486]  INFO - [DefaultExtensionHandler] Member terminated event received: [service] XXX [cluster] XX [member] XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>>> [2015-04-19 07:19:22,486] ERROR - [ExtensionUtils] Member id not found in topology [member] XXXX.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>>> [2015-04-19 07:19:22,486] ERROR - [DefaultExtensionHandler] Topology is inconsistent...failed to execute member terminated event
>>> 
>>> Any idea what’s going wrong here?
>>> 
>>> I assume the topology isn’t being maintained correctly for some reason, but I haven’t quite figured out how/if the topology is being maintained at all. Looking at the complete topology event handler [2] for example, it doesn’t actually update the internally stored topology.. There’s nothing in the cartridge agent that calls the topology manager’s acquireWriteLock function..
>>> 
>>> Best regards,
>>> 
>>> Michiel
>>> 
>>> [1] https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374 <https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374>
>>> 
>>> [2] https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328 <https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328>
>>> 
>>> 
>>> --
>>> Imesh Gunaratne
>>> 
>>> Technical Lead, WSO2
>>> Committer & PMC Member, Apache Stratos
>> 
>> 
>> 
>> 
>> --
>> Imesh Gunaratne
>> 
>> Technical Lead, WSO2
>> Committer & PMC Member, Apache Stratos
> 
> 
> 
> 
> --
> Imesh Gunaratne
> 
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos


Re: Topology inconsistent

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Michiel,

In JCA topology is handled by the messaging module, you could see how
topology is updated on Complete Topology event here:
https://github.com/apache/stratos/blob/master/components/org.apache.stratos.messaging/src/main/java/org/apache/stratos/messaging/message/processor/topology/CompleteTopologyMessageProcessor.java

In PCA, still topology is not properly updated after initializing it with
the Complete Topology event.


On Wed, Apr 22, 2015 at 7:28 PM, Michiel Blokzijl (mblokzij) <
mblokzij@cisco.com> wrote:

> Hi Imesh,
>
> Ideally cartridge agent should only wait for Complete Topology event once
> in its lifecycle. If it is waiting more than once then there is an issue.
>
>
> That’s not the problem, it only waits once for the complete topology.
>
> To me it looks like the topology is never updated, or if it is, then it’s
> not clear to me how that’s happening? It looks like the Python cartridge
> agent for example does call an ‘update’ method:
>
> https://github.com/apache/stratos/blob/22fdf78be8a62312a65b23e017f0de20cfad82b2/components/org.apache.stratos.python.cartridge.agent/src/main/python/cartridge.agent/cartridge.agent/agent.py#L250
>
> - I don’t see anything similar in the Java cartridge agent.
>
> Please could someone confirm whether this is the case, and perhaps explain
> how updating the topology is supposed to work in the Java cartridge agent?
>
> Best regards,
>
> Michiel
>
> On 20 Apr 2015, at 19:05, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Michiel,
>
> It's a pleasure! My guess is that either cartridge agent has been
> restarted or there is a bug in its logic.
>
> Ideally cartridge agent should only wait for Complete Topology event once
> in its lifecycle. If it is waiting more than once then there is an issue.
>
> Thanks
>
> On Mon, Apr 20, 2015 at 11:06 PM, Michiel Blokzijl (mblokzij) <
> mblokzij@cisco.com> wrote:
>
>> HI Imesh,
>>
>> Thanks for replying,
>>
>> This issue might occur if the cartridge agent start processing member
>> events before consuming Complete Topology event.
>>
>>
>> The issue happened way after that, I had Stratos running for a day or so,
>> and in the logs I saw some “waiting for complete topology event ..” but
>> they went away pretty quickly (way before this happened).
>>
>> Is this the code that’s supposed to do the updates?
>> https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328
>>
>> Because I don’t see anything that actually updates anything (beyond
>> function-local variables like ‘env')..
>>
>> Michiel
>>
>> On 20 Apr 2015, at 18:13, Imesh Gunaratne <im...@apache.org> wrote:
>>
>> Hi Michiel,
>>
>> This issue might occur if the cartridge agent start processing member
>> events before consuming Complete Topology event.
>>
>> This is how the topology get initialized in any component that listen to
>> topology topic in message broker; First of all when the component starts up
>> it waits for the Complete Topology event to receive. This event is
>> periodically published by Cloud Controller with the entire topology of a
>> given moment of time.
>>
>> Once it is received the component would initialize the local topology and
>> start listening to other events. Since Complete Topology event has given
>> the latest state of the topology now the component can consume any other
>> event published afterwards.
>>
>> Thanks
>>
>>
>>
>> On Mon, Apr 20, 2015 at 7:44 PM, Michiel Blokzijl (mblokzij) <
>> mblokzij@cisco.com> wrote:
>>
>>> Hi,
>>> I’m looking at an issue with Stratos 4.0.0 code, and I’m having an issue
>>> with the cartridge agent. It complains about the topology being
>>> inconsistent, triggered by this code [1].
>>>
>>> This causes the extension handler not to fire for cartridges going down.
>>>
>>> [2015-04-19 07:19:22,486]  INFO - [MemberTerminatedMessageProcessor]
>>> Member terminated: [service] XXX [cluster] XXX [member]
>>> XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>>> [2015-04-19 07:19:22,486]  INFO - [DefaultExtensionHandler] Member
>>> terminated event received: [service] XXX [cluster] XX [member]
>>> XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>>> [2015-04-19 07:19:22,486] ERROR - [ExtensionUtils] Member id not found
>>> in topology [member] XXXX.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>>> [2015-04-19 07:19:22,486] ERROR - [DefaultExtensionHandler] Topology is
>>> inconsistent...failed to execute member terminated event
>>>
>>> Any idea what’s going wrong here?
>>>
>>> I assume the topology isn’t being maintained correctly for some reason,
>>> but I haven’t quite figured out how/if the topology is being maintained at
>>> all. Looking at the complete topology event handler [2] for example, it
>>> doesn’t actually update the internally stored topology.. There’s nothing in
>>> the cartridge agent that calls the topology manager’s acquireWriteLock
>>> function..
>>>
>>> Best regards,
>>>
>>> Michiel
>>>
>>> [1]
>>> https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374
>>>
>>> [2]
>>> https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328
>>>
>>
>>
>>
>> --
>> Imesh Gunaratne
>>
>> Technical Lead, WSO2
>> Committer & PMC Member, Apache Stratos
>>
>>
>>
>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>
>
>


-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Topology inconsistent

Posted by "Michiel Blokzijl (mblokzij)" <mb...@cisco.com>.
Hi Imesh,

> Ideally cartridge agent should only wait for Complete Topology event once in its lifecycle. If it is waiting more than once then there is an issue.


That’s not the problem, it only waits once for the complete topology.

To me it looks like the topology is never updated, or if it is, then it’s not clear to me how that’s happening? It looks like the Python cartridge agent for example does call an ‘update’ method:
https://github.com/apache/stratos/blob/22fdf78be8a62312a65b23e017f0de20cfad82b2/components/org.apache.stratos.python.cartridge.agent/src/main/python/cartridge.agent/cartridge.agent/agent.py#L250 <https://github.com/apache/stratos/blob/22fdf78be8a62312a65b23e017f0de20cfad82b2/components/org.apache.stratos.python.cartridge.agent/src/main/python/cartridge.agent/cartridge.agent/agent.py#L250>

- I don’t see anything similar in the Java cartridge agent.

Please could someone confirm whether this is the case, and perhaps explain how updating the topology is supposed to work in the Java cartridge agent?

Best regards,

Michiel

On 20 Apr 2015, at 19:05, Imesh Gunaratne <im...@apache.org> wrote:

> Hi Michiel,
> 
> It's a pleasure! My guess is that either cartridge agent has been restarted or there is a bug in its logic.
> 
> Ideally cartridge agent should only wait for Complete Topology event once in its lifecycle. If it is waiting more than once then there is an issue.
> 
> Thanks
> 
> On Mon, Apr 20, 2015 at 11:06 PM, Michiel Blokzijl (mblokzij) <mblokzij@cisco.com <ma...@cisco.com>> wrote:
> HI Imesh,
> 
> Thanks for replying,
> 
>> This issue might occur if the cartridge agent start processing member events before consuming Complete Topology event.
> 
> 
> The issue happened way after that, I had Stratos running for a day or so, and in the logs I saw some “waiting for complete topology event ..” but they went away pretty quickly (way before this happened).
> 
> Is this the code that’s supposed to do the updates? https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328 <https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328>
> 
> Because I don’t see anything that actually updates anything (beyond function-local variables like ‘env')..
> 
> Michiel
> 
> On 20 Apr 2015, at 18:13, Imesh Gunaratne <imesh@apache.org <ma...@apache.org>> wrote:
> 
>> Hi Michiel,
>> 
>> This issue might occur if the cartridge agent start processing member events before consuming Complete Topology event.
>> 
>> This is how the topology get initialized in any component that listen to topology topic in message broker; First of all when the component starts up it waits for the Complete Topology event to receive. This event is periodically published by Cloud Controller with the entire topology of a given moment of time.
>> 
>> Once it is received the component would initialize the local topology and start listening to other events. Since Complete Topology event has given the latest state of the topology now the component can consume any other event published afterwards.
>> 
>> Thanks
>> 
>> 
>> 
>> On Mon, Apr 20, 2015 at 7:44 PM, Michiel Blokzijl (mblokzij) <mblokzij@cisco.com <ma...@cisco.com>> wrote:
>> Hi,
>> I’m looking at an issue with Stratos 4.0.0 code, and I’m having an issue with the cartridge agent. It complains about the topology being inconsistent, triggered by this code [1].
>> 
>> This causes the extension handler not to fire for cartridges going down.
>> 
>> [2015-04-19 07:19:22,486]  INFO - [MemberTerminatedMessageProcessor] Member terminated: [service] XXX [cluster] XXX [member] XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>> [2015-04-19 07:19:22,486]  INFO - [DefaultExtensionHandler] Member terminated event received: [service] XXX [cluster] XX [member] XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>> [2015-04-19 07:19:22,486] ERROR - [ExtensionUtils] Member id not found in topology [member] XXXX.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>> [2015-04-19 07:19:22,486] ERROR - [DefaultExtensionHandler] Topology is inconsistent...failed to execute member terminated event
>> 
>> Any idea what’s going wrong here?
>> 
>> I assume the topology isn’t being maintained correctly for some reason, but I haven’t quite figured out how/if the topology is being maintained at all. Looking at the complete topology event handler [2] for example, it doesn’t actually update the internally stored topology.. There’s nothing in the cartridge agent that calls the topology manager’s acquireWriteLock function..
>> 
>> Best regards,
>> 
>> Michiel
>> 
>> [1] https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374 <https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374>
>> 
>> [2] https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328 <https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328>
>> 
>> 
>> --
>> Imesh Gunaratne
>> 
>> Technical Lead, WSO2
>> Committer & PMC Member, Apache Stratos
> 
> 
> 
> 
> --
> Imesh Gunaratne
> 
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos


Re: Topology inconsistent

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Michiel,

It's a pleasure! My guess is that either cartridge agent has been restarted
or there is a bug in its logic.

Ideally cartridge agent should only wait for Complete Topology event once
in its lifecycle. If it is waiting more than once then there is an issue.

Thanks

On Mon, Apr 20, 2015 at 11:06 PM, Michiel Blokzijl (mblokzij) <
mblokzij@cisco.com> wrote:

> HI Imesh,
>
> Thanks for replying,
>
> This issue might occur if the cartridge agent start processing member
> events before consuming Complete Topology event.
>
>
> The issue happened way after that, I had Stratos running for a day or so,
> and in the logs I saw some “waiting for complete topology event ..” but
> they went away pretty quickly (way before this happened).
>
> Is this the code that’s supposed to do the updates?
> https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328
>
> Because I don’t see anything that actually updates anything (beyond
> function-local variables like ‘env')..
>
> Michiel
>
> On 20 Apr 2015, at 18:13, Imesh Gunaratne <im...@apache.org> wrote:
>
> Hi Michiel,
>
> This issue might occur if the cartridge agent start processing member
> events before consuming Complete Topology event.
>
> This is how the topology get initialized in any component that listen to
> topology topic in message broker; First of all when the component starts up
> it waits for the Complete Topology event to receive. This event is
> periodically published by Cloud Controller with the entire topology of a
> given moment of time.
>
> Once it is received the component would initialize the local topology and
> start listening to other events. Since Complete Topology event has given
> the latest state of the topology now the component can consume any other
> event published afterwards.
>
> Thanks
>
>
>
> On Mon, Apr 20, 2015 at 7:44 PM, Michiel Blokzijl (mblokzij) <
> mblokzij@cisco.com> wrote:
>
>> Hi,
>> I’m looking at an issue with Stratos 4.0.0 code, and I’m having an issue
>> with the cartridge agent. It complains about the topology being
>> inconsistent, triggered by this code [1].
>>
>> This causes the extension handler not to fire for cartridges going down.
>>
>> [2015-04-19 07:19:22,486]  INFO - [MemberTerminatedMessageProcessor]
>> Member terminated: [service] XXX [cluster] XXX [member]
>> XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>> [2015-04-19 07:19:22,486]  INFO - [DefaultExtensionHandler] Member
>> terminated event received: [service] XXX [cluster] XX [member]
>> XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>> [2015-04-19 07:19:22,486] ERROR - [ExtensionUtils] Member id not found in
>> topology [member] XXXX.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
>> [2015-04-19 07:19:22,486] ERROR - [DefaultExtensionHandler] Topology is
>> inconsistent...failed to execute member terminated event
>>
>> Any idea what’s going wrong here?
>>
>> I assume the topology isn’t being maintained correctly for some reason,
>> but I haven’t quite figured out how/if the topology is being maintained at
>> all. Looking at the complete topology event handler [2] for example, it
>> doesn’t actually update the internally stored topology.. There’s nothing in
>> the cartridge agent that calls the topology manager’s acquireWriteLock
>> function..
>>
>> Best regards,
>>
>> Michiel
>>
>> [1]
>> https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374
>>
>> [2]
>> https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328
>>
>
>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>
>
>


-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Topology inconsistent

Posted by "Michiel Blokzijl (mblokzij)" <mb...@cisco.com>.
HI Imesh,

Thanks for replying,

> This issue might occur if the cartridge agent start processing member events before consuming Complete Topology event.


The issue happened way after that, I had Stratos running for a day or so, and in the logs I saw some “waiting for complete topology event ..” but they went away pretty quickly (way before this happened).

Is this the code that’s supposed to do the updates? https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328

Because I don’t see anything that actually updates anything (beyond function-local variables like ‘env')..

Michiel

On 20 Apr 2015, at 18:13, Imesh Gunaratne <im...@apache.org> wrote:

> Hi Michiel,
> 
> This issue might occur if the cartridge agent start processing member events before consuming Complete Topology event.
> 
> This is how the topology get initialized in any component that listen to topology topic in message broker; First of all when the component starts up it waits for the Complete Topology event to receive. This event is periodically published by Cloud Controller with the entire topology of a given moment of time. 
> 
> Once it is received the component would initialize the local topology and start listening to other events. Since Complete Topology event has given the latest state of the topology now the component can consume any other event published afterwards.
> 
> Thanks
> 
> 
> 
> On Mon, Apr 20, 2015 at 7:44 PM, Michiel Blokzijl (mblokzij) <mb...@cisco.com> wrote:
> Hi,
> I’m looking at an issue with Stratos 4.0.0 code, and I’m having an issue with the cartridge agent. It complains about the topology being inconsistent, triggered by this code [1].
> 
> This causes the extension handler not to fire for cartridges going down.
> 
> [2015-04-19 07:19:22,486]  INFO - [MemberTerminatedMessageProcessor] Member terminated: [service] XXX [cluster] XXX [member] XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
> [2015-04-19 07:19:22,486]  INFO - [DefaultExtensionHandler] Member terminated event received: [service] XXX [cluster] XX [member] XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
> [2015-04-19 07:19:22,486] ERROR - [ExtensionUtils] Member id not found in topology [member] XXXX.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
> [2015-04-19 07:19:22,486] ERROR - [DefaultExtensionHandler] Topology is inconsistent...failed to execute member terminated event
> 
> Any idea what’s going wrong here?
> 
> I assume the topology isn’t being maintained correctly for some reason, but I haven’t quite figured out how/if the topology is being maintained at all. Looking at the complete topology event handler [2] for example, it doesn’t actually update the internally stored topology.. There’s nothing in the cartridge agent that calls the topology manager’s acquireWriteLock function..
> 
> Best regards,
> 
> Michiel
> 
> [1] https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374
> 
> [2] https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328
> 
> 
> 
> -- 
> Imesh Gunaratne
> 
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos


Re: Topology inconsistent

Posted by Imesh Gunaratne <im...@apache.org>.
Hi Michiel,

This issue might occur if the cartridge agent start processing member
events before consuming Complete Topology event.

This is how the topology get initialized in any component that listen to
topology topic in message broker; First of all when the component starts up
it waits for the Complete Topology event to receive. This event is
periodically published by Cloud Controller with the entire topology of a
given moment of time.

Once it is received the component would initialize the local topology and
start listening to other events. Since Complete Topology event has given
the latest state of the topology now the component can consume any other
event published afterwards.

Thanks



On Mon, Apr 20, 2015 at 7:44 PM, Michiel Blokzijl (mblokzij) <
mblokzij@cisco.com> wrote:

> Hi,
> I’m looking at an issue with Stratos 4.0.0 code, and I’m having an issue
> with the cartridge agent. It complains about the topology being
> inconsistent, triggered by this code [1].
>
> This causes the extension handler not to fire for cartridges going down.
>
> [2015-04-19 07:19:22,486]  INFO - [MemberTerminatedMessageProcessor]
> Member terminated: [service] XXX [cluster] XXX [member]
> XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
> [2015-04-19 07:19:22,486]  INFO - [DefaultExtensionHandler] Member
> terminated event received: [service] XXX [cluster] XX [member]
> XXX-0.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
> [2015-04-19 07:19:22,486] ERROR - [ExtensionUtils] Member id not found in
> topology [member] XXXX.dom2a4618d5-edd9-4a99-9d9c-918715c761bd
> [2015-04-19 07:19:22,486] ERROR - [DefaultExtensionHandler] Topology is
> inconsistent...failed to execute member terminated event
>
> Any idea what’s going wrong here?
>
> I assume the topology isn’t being maintained correctly for some reason,
> but I haven’t quite figured out how/if the topology is being maintained at
> all. Looking at the complete topology event handler [2] for example, it
> doesn’t actually update the internally stored topology.. There’s nothing in
> the cartridge agent that calls the topology manager’s acquireWriteLock
> function..
>
> Best regards,
>
> Michiel
>
> [1]
> https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L374
>
> [2]
> https://github.com/apache/stratos/blob/4.0.0/components/org.apache.stratos.cartridge.agent/src/main/java/org/apache/stratos/cartridge/agent/extensions/DefaultExtensionHandler.java#L328
>



-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos