You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stratos.apache.org by Imesh Gunaratne <im...@apache.org> on 2014/11/26 08:46:14 UTC

[Discuss] Cloud Controller Clustering Model

Hi Devs,

This is to discuss the clustering model of the cloud controller:




As shown in the above diagram the idea is to have a coordinator node to
handle data persistence logic and message publishing (topology, instance
status, etc). The coordinator will be selected randomly and at a given time
there will be only one coordinator. If the existing coordinator node goes
down, another member will become the coordinator automatically (similar to
carbon clustering agent).

According to this design Autoscaler (AS)/Stratos Manager (SM) will talk to
Cloud Controller (CC) via the Cloud Controller Service endpoint exposed via
the load balancer.

*Data Replication*
When a request comes into one of the CC instances it will execute the
necessary actions and update the data holder and/or topology which is in
memory. At this point the data holder changes will be replicated to other
instances using a distributed map. Once the coordinator receives the above
updates it will persist the changes to the registry database.

In this design we might not need to replicate the topology since it is
already there in the message broker. The idea is to let coordinator publish
the topology changes and the other members to listen to it.

Please add your thoughts.

Thanks


-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: [Discuss] Cloud Controller Clustering Model

Posted by Imesh Gunaratne <im...@apache.org>.

Hi Devs,

I have now implemented the cloud controller coordinator management logic
and improved distributed locking functionality, the changes are now in
master branch.

*Cloud Controller (CC) Coordinator:*
- There will be only one coordinator for the cloud controller cluster at a
given time.
- Coordinator will be the only CC instance that listens to the cluster
status, application status and instance status topics.
- Coordinator will be the only CC instance that publishes topology.
- All the instances will respond to service calls.
- Any change to the CC state will be replicated.
- If the coordinator node goes down another member of the cluster will
become the coordinator and start listening to above topics and publishing
topology.

*Distributed Locking in CC:*
- All the CC service methods are now managed by distributed locks.

Thanks


On Mon, Dec 1, 2014 at 8:09 AM, Imesh Gunaratne <im...@apache.org> wrote:

> Hi Devs,
>
> I have now completed the initial implementation of $subject and pushed
> those changes to master branch.
>
> Thanks
>
> On Fri, Nov 28, 2014 at 1:49 PM, Gayan Gunarathne <ga...@wso2.com> wrote:
>
>> Hi,
>>
>> On Fri, Nov 28, 2014 at 1:00 PM, Akila Ravihansa Perera <
>> ravihansa@wso2.com> wrote:
>>
>>> Hi,
>>>
>>> According to this design Autoscaler (AS)/Stratos Manager (SM) will talk
>>>>> to Cloud Controller (CC) via the Cloud Controller Service endpoint exposed
>>>>> via the load balancer.
>>>>>
>>>>> *Data Replication*
>>>>> When a request comes into one of the CC instances it will execute the
>>>>> necessary actions and update the data holder and/or topology which is in
>>>>> memory. At this point the data holder changes will be replicated to other
>>>>> instances using a distributed map. Once the coordinator receives the above
>>>>> updates it will persist the changes to the registry database.
>>>>>
>>>>
>>>> Are we sending a notification (cluster message) when the distributed
>>>> map updated?
>>>>
>>>
>>> This is handled by Hazelcast OOTB right?
>>>
>>>
>>>>> In this design we might not need to replicate the topology since it is
>>>>> already there in the message broker. The idea is to let coordinator publish
>>>>> the topology changes and the other members to listen to it.
>>>>>
>>>>
>>> So that means worker nodes listen to the topology as well as cluster
>>> messages? I think we need to clarify this model a bit more.
>>>
>>>
>>>>
>>>> This would add a latency for the events. What are the issues we would
>>>> face, when each node sends out the event? Of course, the complete topology
>>>> should only be sent out by the Coordinator.
>>>>
>>>
>>> Sending out multiple topology events (for eg - MemberActivated,
>>> MemberTerminated) will trigger many listeners multiple times, and that's
>>> probably not a good idea. Or did you mean something else here, sorry I'm
>>> bit confused.
>>>
>>
>> IMO coordinator is the one who needs to make persistence and message
>> publishing.Other instance responsibility to handle the request and update
>> the in-memory data grid.
>>
>>>
>>>
>>>> Also, we need to make CC data publishers activated only when a node is
>>>> the Coordinator.
>>>>
>>>> Further, only the Coordinator should react to the Instance status
>>>> events etc. IMO.
>>>>
>>>
>>> I think this might result in an inconsistent state if the coordinator
>>> fails while processing an instance status event (or any other event for
>>> that matter). Perhaps we can implement a notifier cluster message to
>>> indicate whether incoming events are processed successfully. If the
>>> coordinator fails, the next elected coordinator should be able to pick up
>>> from the last successful event handled.
>>>
>>
>>  +1 we may need to synchronous the new coordinator with the last
>> coordinator status. I guess we may need maintain the coordinator status.
>>
>>>
>>>
>>>> There's a cache to hold the validated partitions of a Cartridge, we
>>>> need to use a distributed hash map for that too.
>>>>
>>>
>>> +1
>>>
>>>
>>>>> Please add your thoughts.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> --
>>>>> Imesh Gunaratne
>>>>>
>>>>> Technical Lead, WSO2
>>>>> Committer & PMC Member, Apache Stratos
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Nirmal
>>>>
>>>> Nirmal Fernando.
>>>> PPMC Member & Committer of Apache Stratos,
>>>> Senior Software Engineer, WSO2 Inc.
>>>>
>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>
>>>
>>>
>>>
>>> --
>>> Akila Ravihansa Perera
>>> Software Engineer, WSO2
>>>
>>> Blog: http://ravihansa3000.blogspot.com
>>>
>>
>>
>>
>> --
>>
>> Gayan Gunarathne
>> Technical Lead
>> WSO2 Inc. (http://wso2.com)
>> email  : gayang@wso2.com  | mobile : +94 766819985
>>
>>
>
>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: [Discuss] Cloud Controller Clustering Model

Posted by Imesh Gunaratne <im...@apache.org>.

Hi Devs,

I have now completed the initial implementation of $subject and pushed
those changes to master branch.

Thanks

On Fri, Nov 28, 2014 at 1:49 PM, Gayan Gunarathne <ga...@wso2.com> wrote:

> Hi,
>
> On Fri, Nov 28, 2014 at 1:00 PM, Akila Ravihansa Perera <
> ravihansa@wso2.com> wrote:
>
>> Hi,
>>
>> According to this design Autoscaler (AS)/Stratos Manager (SM) will talk
>>>> to Cloud Controller (CC) via the Cloud Controller Service endpoint exposed
>>>> via the load balancer.
>>>>
>>>> *Data Replication*
>>>> When a request comes into one of the CC instances it will execute the
>>>> necessary actions and update the data holder and/or topology which is in
>>>> memory. At this point the data holder changes will be replicated to other
>>>> instances using a distributed map. Once the coordinator receives the above
>>>> updates it will persist the changes to the registry database.
>>>>
>>>
>>> Are we sending a notification (cluster message) when the distributed map
>>> updated?
>>>
>>
>> This is handled by Hazelcast OOTB right?
>>
>>
>>>> In this design we might not need to replicate the topology since it is
>>>> already there in the message broker. The idea is to let coordinator publish
>>>> the topology changes and the other members to listen to it.
>>>>
>>>
>> So that means worker nodes listen to the topology as well as cluster
>> messages? I think we need to clarify this model a bit more.
>>
>>
>>>
>>> This would add a latency for the events. What are the issues we would
>>> face, when each node sends out the event? Of course, the complete topology
>>> should only be sent out by the Coordinator.
>>>
>>
>> Sending out multiple topology events (for eg - MemberActivated,
>> MemberTerminated) will trigger many listeners multiple times, and that's
>> probably not a good idea. Or did you mean something else here, sorry I'm
>> bit confused.
>>
>
> IMO coordinator is the one who needs to make persistence and message
> publishing.Other instance responsibility to handle the request and update
> the in-memory data grid.
>
>>
>>
>>> Also, we need to make CC data publishers activated only when a node is
>>> the Coordinator.
>>>
>>> Further, only the Coordinator should react to the Instance status events
>>> etc. IMO.
>>>
>>
>> I think this might result in an inconsistent state if the coordinator
>> fails while processing an instance status event (or any other event for
>> that matter). Perhaps we can implement a notifier cluster message to
>> indicate whether incoming events are processed successfully. If the
>> coordinator fails, the next elected coordinator should be able to pick up
>> from the last successful event handled.
>>
>
>  +1 we may need to synchronous the new coordinator with the last
> coordinator status. I guess we may need maintain the coordinator status.
>
>>
>>
>>> There's a cache to hold the validated partitions of a Cartridge, we need
>>> to use a distributed hash map for that too.
>>>
>>
>> +1
>>
>>
>>>> Please add your thoughts.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> --
>>>> Imesh Gunaratne
>>>>
>>>> Technical Lead, WSO2
>>>> Committer & PMC Member, Apache Stratos
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Nirmal
>>>
>>> Nirmal Fernando.
>>> PPMC Member & Committer of Apache Stratos,
>>> Senior Software Engineer, WSO2 Inc.
>>>
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>
>>
>>
>> --
>> Akila Ravihansa Perera
>> Software Engineer, WSO2
>>
>> Blog: http://ravihansa3000.blogspot.com
>>
>
>
>
> --
>
> Gayan Gunarathne
> Technical Lead
> WSO2 Inc. (http://wso2.com)
> email  : gayang@wso2.com  | mobile : +94 766819985
>
>



-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: [Discuss] Cloud Controller Clustering Model

Posted by Gayan Gunarathne <ga...@wso2.com>.

Hi,

On Fri, Nov 28, 2014 at 1:00 PM, Akila Ravihansa Perera <ra...@wso2.com>
wrote:

> Hi,
>
> According to this design Autoscaler (AS)/Stratos Manager (SM) will talk to
>>> Cloud Controller (CC) via the Cloud Controller Service endpoint exposed via
>>> the load balancer.
>>>
>>> *Data Replication*
>>> When a request comes into one of the CC instances it will execute the
>>> necessary actions and update the data holder and/or topology which is in
>>> memory. At this point the data holder changes will be replicated to other
>>> instances using a distributed map. Once the coordinator receives the above
>>> updates it will persist the changes to the registry database.
>>>
>>
>> Are we sending a notification (cluster message) when the distributed map
>> updated?
>>
>
> This is handled by Hazelcast OOTB right?
>
>
>>> In this design we might not need to replicate the topology since it is
>>> already there in the message broker. The idea is to let coordinator publish
>>> the topology changes and the other members to listen to it.
>>>
>>
> So that means worker nodes listen to the topology as well as cluster
> messages? I think we need to clarify this model a bit more.
>
>
>>
>> This would add a latency for the events. What are the issues we would
>> face, when each node sends out the event? Of course, the complete topology
>> should only be sent out by the Coordinator.
>>
>
> Sending out multiple topology events (for eg - MemberActivated,
> MemberTerminated) will trigger many listeners multiple times, and that's
> probably not a good idea. Or did you mean something else here, sorry I'm
> bit confused.
>

IMO coordinator is the one who needs to make persistence and message
publishing.Other instance responsibility to handle the request and update
the in-memory data grid.

>
>
>> Also, we need to make CC data publishers activated only when a node is
>> the Coordinator.
>>
>> Further, only the Coordinator should react to the Instance status events
>> etc. IMO.
>>
>
> I think this might result in an inconsistent state if the coordinator
> fails while processing an instance status event (or any other event for
> that matter). Perhaps we can implement a notifier cluster message to
> indicate whether incoming events are processed successfully. If the
> coordinator fails, the next elected coordinator should be able to pick up
> from the last successful event handled.
>

 +1 we may need to synchronous the new coordinator with the last
coordinator status. I guess we may need maintain the coordinator status.

>
>
>> There's a cache to hold the validated partitions of a Cartridge, we need
>> to use a distributed hash map for that too.
>>
>
> +1
>
>
>>> Please add your thoughts.
>>>
>>> Thanks
>>>
>>>
>>> --
>>> Imesh Gunaratne
>>>
>>> Technical Lead, WSO2
>>> Committer & PMC Member, Apache Stratos
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Nirmal
>>
>> Nirmal Fernando.
>> PPMC Member & Committer of Apache Stratos,
>> Senior Software Engineer, WSO2 Inc.
>>
>> Blog: http://nirmalfdo.blogspot.com/
>>
>
>
>
> --
> Akila Ravihansa Perera
> Software Engineer, WSO2
>
> Blog: http://ravihansa3000.blogspot.com
>



-- 

Gayan Gunarathne
Technical Lead
WSO2 Inc. (http://wso2.com)
email  : gayang@wso2.com  | mobile : +94 766819985

Re: [Discuss] Cloud Controller Clustering Model

Posted by Akila Ravihansa Perera <ra...@wso2.com>.

Hi,

According to this design Autoscaler (AS)/Stratos Manager (SM) will talk to
>> Cloud Controller (CC) via the Cloud Controller Service endpoint exposed via
>> the load balancer.
>>
>> *Data Replication*
>> When a request comes into one of the CC instances it will execute the
>> necessary actions and update the data holder and/or topology which is in
>> memory. At this point the data holder changes will be replicated to other
>> instances using a distributed map. Once the coordinator receives the above
>> updates it will persist the changes to the registry database.
>>
>
> Are we sending a notification (cluster message) when the distributed map
> updated?
>

This is handled by Hazelcast OOTB right?


>> In this design we might not need to replicate the topology since it is
>> already there in the message broker. The idea is to let coordinator publish
>> the topology changes and the other members to listen to it.
>>
>
So that means worker nodes listen to the topology as well as cluster
messages? I think we need to clarify this model a bit more.


>
> This would add a latency for the events. What are the issues we would
> face, when each node sends out the event? Of course, the complete topology
> should only be sent out by the Coordinator.
>

Sending out multiple topology events (for eg - MemberActivated,
MemberTerminated) will trigger many listeners multiple times, and that's
probably not a good idea. Or did you mean something else here, sorry I'm
bit confused.


> Also, we need to make CC data publishers activated only when a node is the
> Coordinator.
>
> Further, only the Coordinator should react to the Instance status events
> etc. IMO.
>

I think this might result in an inconsistent state if the coordinator fails
while processing an instance status event (or any other event for that
matter). Perhaps we can implement a notifier cluster message to indicate
whether incoming events are processed successfully. If the coordinator
fails, the next elected coordinator should be able to pick up from the last
successful event handled.


> There's a cache to hold the validated partitions of a Cartridge, we need
> to use a distributed hash map for that too.
>

+1


>> Please add your thoughts.
>>
>> Thanks
>>
>>
>> --
>> Imesh Gunaratne
>>
>> Technical Lead, WSO2
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> Best Regards,
> Nirmal
>
> Nirmal Fernando.
> PPMC Member & Committer of Apache Stratos,
> Senior Software Engineer, WSO2 Inc.
>
> Blog: http://nirmalfdo.blogspot.com/
>



-- 
Akila Ravihansa Perera
Software Engineer, WSO2

Blog: http://ravihansa3000.blogspot.com

Re: [Discuss] Cloud Controller Clustering Model

Posted by Nirmal Fernando <ni...@gmail.com>.

Hi,

On Wed, Nov 26, 2014 at 1:16 PM, Imesh Gunaratne <im...@apache.org> wrote:

> Hi Devs,
>
> This is to discuss the clustering model of the cloud controller:
>
>
> 
>
> As shown in the above diagram the idea is to have a coordinator node to
> handle data persistence logic and message publishing (topology, instance
> status, etc). The coordinator will be selected randomly and at a given time
> there will be only one coordinator. If the existing coordinator node goes
> down, another member will become the coordinator automatically (similar to
> carbon clustering agent).
>

We gonna enable Hazelcast clustering right? Then Hazelcast Carbon
clustering agent provides an API to check whether a node is the
coordinator.

> According to this design Autoscaler (AS)/Stratos Manager (SM) will talk to
> Cloud Controller (CC) via the Cloud Controller Service endpoint exposed via
> the load balancer.
>
> *Data Replication*
> When a request comes into one of the CC instances it will execute the
> necessary actions and update the data holder and/or topology which is in
> memory. At this point the data holder changes will be replicated to other
> instances using a distributed map. Once the coordinator receives the above
> updates it will persist the changes to the registry database.
>

Are we sending a notification (cluster message) when the distributed map
updated?

>
> In this design we might not need to replicate the topology since it is
> already there in the message broker. The idea is to let coordinator publish
> the topology changes and the other members to listen to it.
>

This would add a latency for the events. What are the issues we would face,
when each node sends out the event? Of course, the complete topology should
only be sent out by the Coordinator.

Also, we need to make CC data publishers activated only when a node is the
Coordinator.

Further, only the Coordinator should react to the Instance status events
etc. IMO.

There's a cache to hold the validated partitions of a Cartridge, we need to
use a distributed hash map for that too.

>
> Please add your thoughts.
>
> Thanks
>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>

-- 
Best Regards,
Nirmal

Nirmal Fernando.
PPMC Member & Committer of Apache Stratos,
Senior Software Engineer, WSO2 Inc.

Blog: http://nirmalfdo.blogspot.com/

Re: [Discuss] Cloud Controller Clustering Model

Posted by Udara Liyanage <ud...@wso2.com>.

Hi Imesh,

Reading from registry frequently might be more expensive than distributed
locking since a number of database queries are executed.
+1 for distributed collections.

Are we going to use Hazlecast or any other framework ?

On Wed, Nov 26, 2014 at 4:53 PM, Lakmal Warusawithana <la...@wso2.com>
wrote:

>
>
> On Wed, Nov 26, 2014 at 4:49 PM, Imesh Gunaratne <im...@apache.org> wrote:
>
>> Hi Akila,
>>
>> The goal we are trying to achieve here is to replicate the state of the
>> cloud controller. IMO registry based approach might not be appropriate for
>> state replication due to the following reason:
>>
>> - The time it takes to propagate a modificaiton from one instance to
>> another would be higher compared to using a distributed map (instance 1
>>  persist changes, send a cluster message to invalidate the cache, other
>> instances read changes from registry database, refresh in memory data
>> structure, etc)
>> - If above happens, when serving multiple requests (with a high
>> frequency) on different instances of CC, the system might come to an
>> inconsistent state because the in memory data structures have not updated
>> properly.
>>
>>
> This is what in my understanding also.
>
>
>> Thanks
>>
>> On Wed, Nov 26, 2014 at 3:10 PM, Akila Ravihansa Perera <
>> ravihansa@wso2.com> wrote:
>>
>>> Hi Imesh,
>>>
>>> According to the model you've described, we will have to use a
>>> distributed lock to synchronize read/write operations into the topology
>>> data structure. This could be expensive.
>>>
>>> I'd like to propose that we actually store and read the topology from
>>> the registry. And to optimize the performance, use a distributed cache. We
>>> have to store topology as a collection of objects and if a write operation
>>> occurs against an object, that cache object has to be invalidated. This way
>>> we can get rid of having to replicate topology using cluster messages,
>>> which could cause many inconsistent states, IMHO.
>>>
>>> Thanks.
>>>
>>> On Wed, Nov 26, 2014 at 2:50 PM, Imesh Gunaratne <im...@apache.org>
>>> wrote:
>>>
>>>> +1 Yes a good point Lakmal, to reduce the latency for a modification to
>>>> propagate to all the instances we might need to replicate topology as well.
>>>>
>>>> Thanks
>>>>
>>>> On Wed, Nov 26, 2014 at 2:44 PM, Lakmal Warusawithana <la...@wso2.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 26, 2014 at 1:16 PM, Imesh Gunaratne <im...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Devs,
>>>>>>
>>>>>> This is to discuss the clustering model of the cloud controller:
>>>>>>
>>>>>>
>>>>>> 
>>>>>>
>>>>>> As shown in the above diagram the idea is to have a coordinator node
>>>>>> to handle data persistence logic and message publishing (topology, instance
>>>>>> status, etc). The coordinator will be selected randomly and at a given time
>>>>>> there will be only one coordinator. If the existing coordinator node goes
>>>>>> down, another member will become the coordinator automatically (similar to
>>>>>> carbon clustering agent).
>>>>>>
>>>>>> According to this design Autoscaler (AS)/Stratos Manager (SM) will
>>>>>> talk to Cloud Controller (CC) via the Cloud Controller Service endpoint
>>>>>> exposed via the load balancer.
>>>>>>
>>>>>> *Data Replication*
>>>>>> When a request comes into one of the CC instances it will execute the
>>>>>> necessary actions and update the data holder and/or topology which is in
>>>>>> memory. At this point the data holder changes will be replicated to other
>>>>>> instances using a distributed map. Once the coordinator receives the above
>>>>>> updates it will persist the changes to the registry database.
>>>>>>
>>>>>> In this design we might not need to replicate the topology since it
>>>>>> is already there in the message broker. The idea is to let coordinator
>>>>>> publish the topology changes and the other members to listen to it.
>>>>>>
>>>>>>
>>>>> IMO, we may need to replicate topology also, otherwise it may occur
>>>>> some inconsistency.
>>>>>
>>>>>
>>>>>> Please add your thoughts.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Imesh Gunaratne
>>>>>>
>>>>>> Technical Lead, WSO2
>>>>>> Committer & PMC Member, Apache Stratos
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lakmal Warusawithana
>>>>> Vice President, Apache Stratos
>>>>> Director - Cloud Architecture; WSO2 Inc.
>>>>> Mobile : +94714289692
>>>>> Blog : http://lakmalsview.blogspot.com/
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Imesh Gunaratne
>>>>
>>>> Technical Lead, WSO2
>>>> Committer & PMC Member, Apache Stratos
>>>>
>>>
>>>
>>>
>>> --
>>> Akila Ravihansa Perera
>>> Software Engineer, WSO2
>>>
>>> Blog: http://ravihansa3000.blogspot.com
>>>
>>
>>
>>
>> --
>> Imesh Gunaratne
>>
>> Technical Lead, WSO2
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> Lakmal Warusawithana
> Vice President, Apache Stratos
> Director - Cloud Architecture; WSO2 Inc.
> Mobile : +94714289692
> Blog : http://lakmalsview.blogspot.com/
>
>


-- 

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com
lean. enterprise. middleware

web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897

Re: [Discuss] Cloud Controller Clustering Model

Posted by Lakmal Warusawithana <la...@wso2.com>.

On Wed, Nov 26, 2014 at 4:49 PM, Imesh Gunaratne <im...@apache.org> wrote:

> Hi Akila,
>
> The goal we are trying to achieve here is to replicate the state of the
> cloud controller. IMO registry based approach might not be appropriate for
> state replication due to the following reason:
>
> - The time it takes to propagate a modificaiton from one instance to
> another would be higher compared to using a distributed map (instance 1
>  persist changes, send a cluster message to invalidate the cache, other
> instances read changes from registry database, refresh in memory data
> structure, etc)
> - If above happens, when serving multiple requests (with a high frequency)
> on different instances of CC, the system might come to an inconsistent
> state because the in memory data structures have not updated properly.
>
>
This is what in my understanding also.


> Thanks
>
> On Wed, Nov 26, 2014 at 3:10 PM, Akila Ravihansa Perera <
> ravihansa@wso2.com> wrote:
>
>> Hi Imesh,
>>
>> According to the model you've described, we will have to use a
>> distributed lock to synchronize read/write operations into the topology
>> data structure. This could be expensive.
>>
>> I'd like to propose that we actually store and read the topology from the
>> registry. And to optimize the performance, use a distributed cache. We have
>> to store topology as a collection of objects and if a write operation
>> occurs against an object, that cache object has to be invalidated. This way
>> we can get rid of having to replicate topology using cluster messages,
>> which could cause many inconsistent states, IMHO.
>>
>> Thanks.
>>
>> On Wed, Nov 26, 2014 at 2:50 PM, Imesh Gunaratne <im...@apache.org>
>> wrote:
>>
>>> +1 Yes a good point Lakmal, to reduce the latency for a modification to
>>> propagate to all the instances we might need to replicate topology as well.
>>>
>>> Thanks
>>>
>>> On Wed, Nov 26, 2014 at 2:44 PM, Lakmal Warusawithana <la...@wso2.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Nov 26, 2014 at 1:16 PM, Imesh Gunaratne <im...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Devs,
>>>>>
>>>>> This is to discuss the clustering model of the cloud controller:
>>>>>
>>>>>
>>>>> 
>>>>>
>>>>> As shown in the above diagram the idea is to have a coordinator node
>>>>> to handle data persistence logic and message publishing (topology, instance
>>>>> status, etc). The coordinator will be selected randomly and at a given time
>>>>> there will be only one coordinator. If the existing coordinator node goes
>>>>> down, another member will become the coordinator automatically (similar to
>>>>> carbon clustering agent).
>>>>>
>>>>> According to this design Autoscaler (AS)/Stratos Manager (SM) will
>>>>> talk to Cloud Controller (CC) via the Cloud Controller Service endpoint
>>>>> exposed via the load balancer.
>>>>>
>>>>> *Data Replication*
>>>>> When a request comes into one of the CC instances it will execute the
>>>>> necessary actions and update the data holder and/or topology which is in
>>>>> memory. At this point the data holder changes will be replicated to other
>>>>> instances using a distributed map. Once the coordinator receives the above
>>>>> updates it will persist the changes to the registry database.
>>>>>
>>>>> In this design we might not need to replicate the topology since it is
>>>>> already there in the message broker. The idea is to let coordinator publish
>>>>> the topology changes and the other members to listen to it.
>>>>>
>>>>>
>>>> IMO, we may need to replicate topology also, otherwise it may occur
>>>> some inconsistency.
>>>>
>>>>
>>>>> Please add your thoughts.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> --
>>>>> Imesh Gunaratne
>>>>>
>>>>> Technical Lead, WSO2
>>>>> Committer & PMC Member, Apache Stratos
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Lakmal Warusawithana
>>>> Vice President, Apache Stratos
>>>> Director - Cloud Architecture; WSO2 Inc.
>>>> Mobile : +94714289692
>>>> Blog : http://lakmalsview.blogspot.com/
>>>>
>>>>
>>>
>>>
>>> --
>>> Imesh Gunaratne
>>>
>>> Technical Lead, WSO2
>>> Committer & PMC Member, Apache Stratos
>>>
>>
>>
>>
>> --
>> Akila Ravihansa Perera
>> Software Engineer, WSO2
>>
>> Blog: http://ravihansa3000.blogspot.com
>>
>
>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
Lakmal Warusawithana
Vice President, Apache Stratos
Director - Cloud Architecture; WSO2 Inc.
Mobile : +94714289692
Blog : http://lakmalsview.blogspot.com/

Re: [Discuss] Cloud Controller Clustering Model

Posted by Akila Ravihansa Perera <ra...@wso2.com>.

Hi Imesh,

Thank you for the explanations.

I'm +1 for the overall model.

Thanks.
On 27 Nov 2014 07:19, "Imesh Gunaratne" <im...@apache.org> wrote:

> Hi,
>
> On Wed, Nov 26, 2014 at 5:27 PM, Udara Liyanage <ud...@wso2.com> wrote:
>
>> Reading from registry frequently might be more expensive than distributed
>> locking since a number of database queries are executed.
>> +1 for distributed collections.
>>
>> Are we going to use Hazlecast or any other framework ?
>>
>
> We are planning to use Carbon clustering (abstraction) which internally
> use Hazelcast.
>
> On Wed, Nov 26, 2014 at 5:30 PM, Akila Ravihansa Perera <
> ravihansa@wso2.com> wrote:
>
>>
>>> We don't have to refresh the whole data structure, but only the
>> invalidated object(s). We have to have a good registry structure for the
>> topology in order for this to be effective.
>>
>
> A distributed map does this OOB.
>
>>
>>
>>> - If above happens, when serving multiple requests (with a high
>>> frequency) on different instances of CC, the system might come to an
>>> inconsistent state because the in memory data structures have not updated
>>> properly.
>>>
>>
>> Good point, thanks for pointing out. But the same problem will occur when
>> using cluster messages to replicate the state, right? I guess we will have
>> to resort to a distributed lock in either case to avoid consistencies. I
>> remember Isuru was working on a hierarchical locking approach for the
>> topology. Perhaps we can incorporate that for this scenario as well.
>>
>
> We actually don't use cluster messages with a distributed map. However yes
> in either approach we will need a distributed lock. Indeed, for topology we
> should be able to change the existing locks to distributed ones if
> clustering is enabled.
>
>>
>> If you're going to use a distributed map, how do you plan to utilize that
>> to distribute the topology data structure? I think replicating the complete
>> topology object on change would be very expensive, wdyt?
>>
>> No, by design it will only send the modifications.
>
> Thanks
>
>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>

Re: [Discuss] Cloud Controller Clustering Model

Posted by Imesh Gunaratne <im...@apache.org>.

Hi,

On Wed, Nov 26, 2014 at 5:27 PM, Udara Liyanage <ud...@wso2.com> wrote:

> Reading from registry frequently might be more expensive than distributed
> locking since a number of database queries are executed.
> +1 for distributed collections.
>
> Are we going to use Hazlecast or any other framework ?
>

We are planning to use Carbon clustering (abstraction) which internally use
Hazelcast.

On Wed, Nov 26, 2014 at 5:30 PM, Akila Ravihansa Perera <ra...@wso2.com>
wrote:

>
>> We don't have to refresh the whole data structure, but only the
> invalidated object(s). We have to have a good registry structure for the
> topology in order for this to be effective.
>

A distributed map does this OOB.

>
>
>> - If above happens, when serving multiple requests (with a high
>> frequency) on different instances of CC, the system might come to an
>> inconsistent state because the in memory data structures have not updated
>> properly.
>>
>
> Good point, thanks for pointing out. But the same problem will occur when
> using cluster messages to replicate the state, right? I guess we will have
> to resort to a distributed lock in either case to avoid consistencies. I
> remember Isuru was working on a hierarchical locking approach for the
> topology. Perhaps we can incorporate that for this scenario as well.
>

We actually don't use cluster messages with a distributed map. However yes
in either approach we will need a distributed lock. Indeed, for topology we
should be able to change the existing locks to distributed ones if
clustering is enabled.

>
> If you're going to use a distributed map, how do you plan to utilize that
> to distribute the topology data structure? I think replicating the complete
> topology object on change would be very expensive, wdyt?
>
> No, by design it will only send the modifications.

Thanks



-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: [Discuss] Cloud Controller Clustering Model

Posted by Akila Ravihansa Perera <ra...@wso2.com>.

Hi  Imesh,


> - The time it takes to propagate a modificaiton from one instance to
> another would be higher compared to using a distributed map (instance 1
>  persist changes, send a cluster message to invalidate the cache, other
> instances read changes from registry database, refresh in memory data
> structure, etc)
>

We don't have to refresh the whole data structure, but only the invalidated
object(s). We have to have a good registry structure for the topology in
order for this to be effective.


> - If above happens, when serving multiple requests (with a high frequency)
> on different instances of CC, the system might come to an inconsistent
> state because the in memory data structures have not updated properly.
>

Good point, thanks for pointing out. But the same problem will occur when
using cluster messages to replicate the state, right? I guess we will have
to resort to a distributed lock in either case to avoid consistencies. I
remember Isuru was working on a hierarchical locking approach for the
topology. Perhaps we can incorporate that for this scenario as well.

If you're going to use a distributed map, how do you plan to utilize that
to distribute the topology data structure? I think replicating the complete
topology object on change would be very expensive, wdyt?

Thanks.


> On Wed, Nov 26, 2014 at 3:10 PM, Akila Ravihansa Perera <
> ravihansa@wso2.com> wrote:
>
>> Hi Imesh,
>>
>> According to the model you've described, we will have to use a
>> distributed lock to synchronize read/write operations into the topology
>> data structure. This could be expensive.
>>
>> I'd like to propose that we actually store and read the topology from the
>> registry. And to optimize the performance, use a distributed cache. We have
>> to store topology as a collection of objects and if a write operation
>> occurs against an object, that cache object has to be invalidated. This way
>> we can get rid of having to replicate topology using cluster messages,
>> which could cause many inconsistent states, IMHO.
>>
>> Thanks.
>>
>> On Wed, Nov 26, 2014 at 2:50 PM, Imesh Gunaratne <im...@apache.org>
>> wrote:
>>
>>> +1 Yes a good point Lakmal, to reduce the latency for a modification to
>>> propagate to all the instances we might need to replicate topology as well.
>>>
>>> Thanks
>>>
>>> On Wed, Nov 26, 2014 at 2:44 PM, Lakmal Warusawithana <la...@wso2.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Nov 26, 2014 at 1:16 PM, Imesh Gunaratne <im...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Devs,
>>>>>
>>>>> This is to discuss the clustering model of the cloud controller:
>>>>>
>>>>>
>>>>> 
>>>>>
>>>>> As shown in the above diagram the idea is to have a coordinator node
>>>>> to handle data persistence logic and message publishing (topology, instance
>>>>> status, etc). The coordinator will be selected randomly and at a given time
>>>>> there will be only one coordinator. If the existing coordinator node goes
>>>>> down, another member will become the coordinator automatically (similar to
>>>>> carbon clustering agent).
>>>>>
>>>>> According to this design Autoscaler (AS)/Stratos Manager (SM) will
>>>>> talk to Cloud Controller (CC) via the Cloud Controller Service endpoint
>>>>> exposed via the load balancer.
>>>>>
>>>>> *Data Replication*
>>>>> When a request comes into one of the CC instances it will execute the
>>>>> necessary actions and update the data holder and/or topology which is in
>>>>> memory. At this point the data holder changes will be replicated to other
>>>>> instances using a distributed map. Once the coordinator receives the above
>>>>> updates it will persist the changes to the registry database.
>>>>>
>>>>> In this design we might not need to replicate the topology since it is
>>>>> already there in the message broker. The idea is to let coordinator publish
>>>>> the topology changes and the other members to listen to it.
>>>>>
>>>>>
>>>> IMO, we may need to replicate topology also, otherwise it may occur
>>>> some inconsistency.
>>>>
>>>>
>>>>> Please add your thoughts.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> --
>>>>> Imesh Gunaratne
>>>>>
>>>>> Technical Lead, WSO2
>>>>> Committer & PMC Member, Apache Stratos
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Lakmal Warusawithana
>>>> Vice President, Apache Stratos
>>>> Director - Cloud Architecture; WSO2 Inc.
>>>> Mobile : +94714289692
>>>> Blog : http://lakmalsview.blogspot.com/
>>>>
>>>>
>>>
>>>
>>> --
>>> Imesh Gunaratne
>>>
>>> Technical Lead, WSO2
>>> Committer & PMC Member, Apache Stratos
>>>
>>
>>
>>
>> --
>> Akila Ravihansa Perera
>> Software Engineer, WSO2
>>
>> Blog: http://ravihansa3000.blogspot.com
>>
>
>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
Akila Ravihansa Perera
Software Engineer, WSO2

Blog: http://ravihansa3000.blogspot.com

Re: [Discuss] Cloud Controller Clustering Model

Posted by Imesh Gunaratne <im...@apache.org>.

Hi Akila,

The goal we are trying to achieve here is to replicate the state of the
cloud controller. IMO registry based approach might not be appropriate for
state replication due to the following reason:

- The time it takes to propagate a modificaiton from one instance to
another would be higher compared to using a distributed map (instance 1
 persist changes, send a cluster message to invalidate the cache, other
instances read changes from registry database, refresh in memory data
structure, etc)
- If above happens, when serving multiple requests (with a high frequency)
on different instances of CC, the system might come to an inconsistent
state because the in memory data structures have not updated properly.

Thanks

On Wed, Nov 26, 2014 at 3:10 PM, Akila Ravihansa Perera <ra...@wso2.com>
wrote:

> Hi Imesh,
>
> According to the model you've described, we will have to use a distributed
> lock to synchronize read/write operations into the topology data structure.
> This could be expensive.
>
> I'd like to propose that we actually store and read the topology from the
> registry. And to optimize the performance, use a distributed cache. We have
> to store topology as a collection of objects and if a write operation
> occurs against an object, that cache object has to be invalidated. This way
> we can get rid of having to replicate topology using cluster messages,
> which could cause many inconsistent states, IMHO.
>
> Thanks.
>
> On Wed, Nov 26, 2014 at 2:50 PM, Imesh Gunaratne <im...@apache.org> wrote:
>
>> +1 Yes a good point Lakmal, to reduce the latency for a modification to
>> propagate to all the instances we might need to replicate topology as well.
>>
>> Thanks
>>
>> On Wed, Nov 26, 2014 at 2:44 PM, Lakmal Warusawithana <la...@wso2.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Nov 26, 2014 at 1:16 PM, Imesh Gunaratne <im...@apache.org>
>>> wrote:
>>>
>>>> Hi Devs,
>>>>
>>>> This is to discuss the clustering model of the cloud controller:
>>>>
>>>>
>>>> 
>>>>
>>>> As shown in the above diagram the idea is to have a coordinator node to
>>>> handle data persistence logic and message publishing (topology, instance
>>>> status, etc). The coordinator will be selected randomly and at a given time
>>>> there will be only one coordinator. If the existing coordinator node goes
>>>> down, another member will become the coordinator automatically (similar to
>>>> carbon clustering agent).
>>>>
>>>> According to this design Autoscaler (AS)/Stratos Manager (SM) will talk
>>>> to Cloud Controller (CC) via the Cloud Controller Service endpoint exposed
>>>> via the load balancer.
>>>>
>>>> *Data Replication*
>>>> When a request comes into one of the CC instances it will execute the
>>>> necessary actions and update the data holder and/or topology which is in
>>>> memory. At this point the data holder changes will be replicated to other
>>>> instances using a distributed map. Once the coordinator receives the above
>>>> updates it will persist the changes to the registry database.
>>>>
>>>> In this design we might not need to replicate the topology since it is
>>>> already there in the message broker. The idea is to let coordinator publish
>>>> the topology changes and the other members to listen to it.
>>>>
>>>>
>>> IMO, we may need to replicate topology also, otherwise it may occur some
>>> inconsistency.
>>>
>>>
>>>> Please add your thoughts.
>>>>
>>>> Thanks
>>>>
>>>>
>>>> --
>>>> Imesh Gunaratne
>>>>
>>>> Technical Lead, WSO2
>>>> Committer & PMC Member, Apache Stratos
>>>>
>>>
>>>
>>>
>>> --
>>> Lakmal Warusawithana
>>> Vice President, Apache Stratos
>>> Director - Cloud Architecture; WSO2 Inc.
>>> Mobile : +94714289692
>>> Blog : http://lakmalsview.blogspot.com/
>>>
>>>
>>
>>
>> --
>> Imesh Gunaratne
>>
>> Technical Lead, WSO2
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> Akila Ravihansa Perera
> Software Engineer, WSO2
>
> Blog: http://ravihansa3000.blogspot.com
>



-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: [Discuss] Cloud Controller Clustering Model

Posted by Akila Ravihansa Perera <ra...@wso2.com>.

Hi Imesh,

According to the model you've described, we will have to use a distributed
lock to synchronize read/write operations into the topology data structure.
This could be expensive.

I'd like to propose that we actually store and read the topology from the
registry. And to optimize the performance, use a distributed cache. We have
to store topology as a collection of objects and if a write operation
occurs against an object, that cache object has to be invalidated. This way
we can get rid of having to replicate topology using cluster messages,
which could cause many inconsistent states, IMHO.

Thanks.

On Wed, Nov 26, 2014 at 2:50 PM, Imesh Gunaratne <im...@apache.org> wrote:

> +1 Yes a good point Lakmal, to reduce the latency for a modification to
> propagate to all the instances we might need to replicate topology as well.
>
> Thanks
>
> On Wed, Nov 26, 2014 at 2:44 PM, Lakmal Warusawithana <la...@wso2.com>
> wrote:
>
>>
>>
>> On Wed, Nov 26, 2014 at 1:16 PM, Imesh Gunaratne <im...@apache.org>
>> wrote:
>>
>>> Hi Devs,
>>>
>>> This is to discuss the clustering model of the cloud controller:
>>>
>>>
>>> 
>>>
>>> As shown in the above diagram the idea is to have a coordinator node to
>>> handle data persistence logic and message publishing (topology, instance
>>> status, etc). The coordinator will be selected randomly and at a given time
>>> there will be only one coordinator. If the existing coordinator node goes
>>> down, another member will become the coordinator automatically (similar to
>>> carbon clustering agent).
>>>
>>> According to this design Autoscaler (AS)/Stratos Manager (SM) will talk
>>> to Cloud Controller (CC) via the Cloud Controller Service endpoint exposed
>>> via the load balancer.
>>>
>>> *Data Replication*
>>> When a request comes into one of the CC instances it will execute the
>>> necessary actions and update the data holder and/or topology which is in
>>> memory. At this point the data holder changes will be replicated to other
>>> instances using a distributed map. Once the coordinator receives the above
>>> updates it will persist the changes to the registry database.
>>>
>>> In this design we might not need to replicate the topology since it is
>>> already there in the message broker. The idea is to let coordinator publish
>>> the topology changes and the other members to listen to it.
>>>
>>>
>> IMO, we may need to replicate topology also, otherwise it may occur some
>> inconsistency.
>>
>>
>>> Please add your thoughts.
>>>
>>> Thanks
>>>
>>>
>>> --
>>> Imesh Gunaratne
>>>
>>> Technical Lead, WSO2
>>> Committer & PMC Member, Apache Stratos
>>>
>>
>>
>>
>> --
>> Lakmal Warusawithana
>> Vice President, Apache Stratos
>> Director - Cloud Architecture; WSO2 Inc.
>> Mobile : +94714289692
>> Blog : http://lakmalsview.blogspot.com/
>>
>>
>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
Akila Ravihansa Perera
Software Engineer, WSO2

Blog: http://ravihansa3000.blogspot.com

Re: [Discuss] Cloud Controller Clustering Model

Posted by Imesh Gunaratne <im...@apache.org>.

+1 Yes a good point Lakmal, to reduce the latency for a modification to
propagate to all the instances we might need to replicate topology as well.

Thanks

On Wed, Nov 26, 2014 at 2:44 PM, Lakmal Warusawithana <la...@wso2.com>
wrote:

>
>
> On Wed, Nov 26, 2014 at 1:16 PM, Imesh Gunaratne <im...@apache.org> wrote:
>
>> Hi Devs,
>>
>> This is to discuss the clustering model of the cloud controller:
>>
>>
>> 
>>
>> As shown in the above diagram the idea is to have a coordinator node to
>> handle data persistence logic and message publishing (topology, instance
>> status, etc). The coordinator will be selected randomly and at a given time
>> there will be only one coordinator. If the existing coordinator node goes
>> down, another member will become the coordinator automatically (similar to
>> carbon clustering agent).
>>
>> According to this design Autoscaler (AS)/Stratos Manager (SM) will talk
>> to Cloud Controller (CC) via the Cloud Controller Service endpoint exposed
>> via the load balancer.
>>
>> *Data Replication*
>> When a request comes into one of the CC instances it will execute the
>> necessary actions and update the data holder and/or topology which is in
>> memory. At this point the data holder changes will be replicated to other
>> instances using a distributed map. Once the coordinator receives the above
>> updates it will persist the changes to the registry database.
>>
>> In this design we might not need to replicate the topology since it is
>> already there in the message broker. The idea is to let coordinator publish
>> the topology changes and the other members to listen to it.
>>
>>
> IMO, we may need to replicate topology also, otherwise it may occur some
> inconsistency.
>
>
>> Please add your thoughts.
>>
>> Thanks
>>
>>
>> --
>> Imesh Gunaratne
>>
>> Technical Lead, WSO2
>> Committer & PMC Member, Apache Stratos
>>
>
>
>
> --
> Lakmal Warusawithana
> Vice President, Apache Stratos
> Director - Cloud Architecture; WSO2 Inc.
> Mobile : +94714289692
> Blog : http://lakmalsview.blogspot.com/
>
>


-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: [Discuss] Cloud Controller Clustering Model

Posted by Lakmal Warusawithana <la...@wso2.com>.

On Wed, Nov 26, 2014 at 1:16 PM, Imesh Gunaratne <im...@apache.org> wrote:

> Hi Devs,
>
> This is to discuss the clustering model of the cloud controller:
>
>
> 
>
> As shown in the above diagram the idea is to have a coordinator node to
> handle data persistence logic and message publishing (topology, instance
> status, etc). The coordinator will be selected randomly and at a given time
> there will be only one coordinator. If the existing coordinator node goes
> down, another member will become the coordinator automatically (similar to
> carbon clustering agent).
>
> According to this design Autoscaler (AS)/Stratos Manager (SM) will talk to
> Cloud Controller (CC) via the Cloud Controller Service endpoint exposed via
> the load balancer.
>
> *Data Replication*
> When a request comes into one of the CC instances it will execute the
> necessary actions and update the data holder and/or topology which is in
> memory. At this point the data holder changes will be replicated to other
> instances using a distributed map. Once the coordinator receives the above
> updates it will persist the changes to the registry database.
>
> In this design we might not need to replicate the topology since it is
> already there in the message broker. The idea is to let coordinator publish
> the topology changes and the other members to listen to it.
>
>
IMO, we may need to replicate topology also, otherwise it may occur some
inconsistency.


> Please add your thoughts.
>
> Thanks
>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>



-- 
Lakmal Warusawithana
Vice President, Apache Stratos
Director - Cloud Architecture; WSO2 Inc.
Mobile : +94714289692
Blog : http://lakmalsview.blogspot.com/