You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stratos.apache.org by "Martin Eppel (meppel)" <me...@cisco.com> on 2014/09/29 03:21:01 UTC

subscribing a large number of cartridge takes a very long time

We are observing an interesting phenomena (based on stratos 4.0.0), when we subscribe a large number of cartridges (40+) it takes up to 40+ minutes for the VMs to spin up. From the logs we can observe the publishing subscriptions goes fairly quickly but VM spin up takes up quite a bit of time. For example, if we publish a few new subscriptions on top of already running VMs, publishing the subscriptions take 10+ seconds, while spinning up the corresponding VMs take minutes.

Based on this observation, it appears that stratos is spinning up VMs not in parallel but rather sequentially even across different subscriptions (one VM at a time).

However, my understanding from analyzing the code is that it should happen asynchronously instead of sequentially. Is there something in stratos wich serializes the spin up of VM (across different sbscriptions) ?

Below is mu analysis on how subscriptions and Vm spin up works in stratos, is this correct, or did I miss something? If yes, what could potentially cause the appearantly  sequential spin up of the VMs ?


Thanks

Martin


Stratos subscription and VM spin up:

In general, cartridge subscription and VM spin up is not serialized (at least for different subscriptions), it's all event driven and for each cartridge a separate monitoring thread is created. However, spinning up VMs is rule driven (per cluster) in the autscaler and is determined by the setting of the min number of VMs in a cluster as well as the health statistics for scaling (which are reported back by the cartridge agent and averaged by the autoscaler). The rule is checked periodically, so it does take time for a VM to spin up.



For the rule to kick in the Cluster has to be in a certain state, and for the cluster state to be set it again depends on an event to be received so I think there is a good chance that when the rule starts up the first time (as [part of the periodic checks in the ClusterMonitor) it has to wait for a subsequent run for the Cluster to be in the right state.



In one aspect it does seem to be serialized, for each Cluster (== subscription) to reach the min number of VMs, VMs are spawned sequentially for each periodic check (it does not seem to spawn n number of VMs at the same time).


RE: subscribing a large number of cartridge takes a very long time

Posted by "Martin Eppel (meppel)" <me...@cisco.com>.
Hi Isuru, Reka,

Thanks for the reply. I think the main issue or question  for us at the moment is why VM spin up of different subscription is handled (seemingly) sequentially (although stratos is handling them asynchronously).

Btw, Supporting multiple VM spin up for a specific subscription is definitely a good enhancement,

Thanks

Martin

From: isuruh@wso2.com [mailto:isuruh@wso2.com] On Behalf Of Isuru Haththotuwa
Sent: Monday, September 29, 2014 2:26 AM
To: dev
Cc: Sajith Kariyawasam; Nirmal Fernando
Subject: Re: subscribing a large number of cartridge takes a very long time

Hi

On Mon, Sep 29, 2014 at 2:48 PM, Reka Thirunavukkarasu <re...@wso2.com>> wrote:
Hi Martin,

Stratos is not spinning up more than of VMs at once per cluster using the ClusterMonitor. This is an improvement which we will have to address in autoscaler and cloud controller to handle more than one VM spinning at once and more than one VM termination at once per cluster. I could see in the cloud controller that we can create more than one node from jclouds. But we are using the jclouds api to always spin one VM. AFAIK, we didn't use jclouds api to create more than one VM at once because it is a blocking call. Also, we are calling validate partition from jclouds when the cluster monitor is running which is also a blocking call as i think. So, not sure whether 40 min taken for your 40+ cartridges, as these jclouds's blocking calls took considerable amount of time even though stratos is executing the clusterMonitor in parallel. We will have to recreate this issue and analyse further to get the root cause of it.

I hope that Nirmal/Sajith can give more input on this. If there is no limitation from jclouds, then we will be able to implement this improvement.
Thanks a lot Reka for the clarification. Yes, in this case there definitely can be room for improvement.

Thanks,
Reka

On Mon, Sep 29, 2014 at 11:17 AM, Isuru Haththotuwa <is...@apache.org>> wrote:
Hi Martin,
Can you please explain more about how you subscribed to a large number of cartridges? AFAIU, you have subscribed to them one after the other. Please confirm.

What you have observed is correct. Stratos would spin up cartridges for each subscriptions sequentially. We did not have a use case for spinning up instances is parallel since in Stratos 4.0.0 there were no concept of a subscribing at once to more than one cartridge; whether they should be spinning up one after the other (dependencies) or if they should be spinning in parallel (independent). AFAIU we address this in Service Grouping, which is currently happening.

On Mon, Sep 29, 2014 at 6:51 AM, Martin Eppel (meppel) <me...@cisco.com>> wrote:

We are observing an interesting phenomena (based on stratos 4.0.0), when we subscribe a large number of cartridges (40+) it takes up to 40+ minutes for the VMs to spin up. From the logs we can observe the publishing subscriptions goes fairly quickly but VM spin up takes up quite a bit of time. For example, if we publish a few new subscriptions on top of already running VMs, publishing the subscriptions take 10+ seconds, while spinning up the corresponding VMs take minutes.

Based on this observation, it appears that stratos is spinning up VMs not in parallel but rather sequentially even across different subscriptions (one VM at a time).

However, my understanding from analyzing the code is that it should happen asynchronously instead of sequentially. Is there something in stratos wich serializes the spin up of VM (across different sbscriptions) ?

Below is mu analysis on how subscriptions and Vm spin up works in stratos, is this correct, or did I miss something? If yes, what could potentially cause the appearantly  sequential spin up of the VMs ?


Thanks

Martin


Stratos subscription and VM spin up:

In general, cartridge subscription and VM spin up is not serialized (at least for different subscriptions), it’s all event driven and for each cartridge a separate monitoring thread is created. However, spinning up VMs is rule driven (per cluster) in the autscaler and is determined by the setting of the min number of VMs in a cluster as well as the health statistics for scaling (which are reported back by the cartridge agent and averaged by the autoscaler). The rule is checked periodically, so it does take time for a VM to spin up.



For the rule to kick in the Cluster has to be in a certain state, and for the cluster state to be set it again depends on an event to be received so I think there is a good chance that when the rule starts up the first time (as [part of the periodic checks in the ClusterMonitor) it has to wait for a subsequent run for the Cluster to be in the right state.


In one aspect it does seem to be serialized, for each Cluster (== subscription) to reach the min number of VMs, VMs are spawned sequentially for each periodic check (it does not seem to spawn n number of VMs at the same time).
--
Thanks and Regards,

Isuru H.
+94 716 358 048<tel:%2B94%20716%20358%20048>






--
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

--
<tel:%2B94776442007>
<tel:%2B94776442007>
Thanks and Regards,

Isuru H.

<tel:%2B94776442007>

+94 716 358 048<tel:%2B94776442007>




Re: subscribing a large number of cartridge takes a very long time

Posted by Isuru Haththotuwa <is...@apache.org>.
Hi

On Mon, Sep 29, 2014 at 2:48 PM, Reka Thirunavukkarasu <re...@wso2.com>
wrote:

> Hi Martin,
>
> Stratos is not spinning up more than of VMs at once per cluster using the
> ClusterMonitor. This is an improvement which we will have to address in
> autoscaler and cloud controller to handle more than one VM spinning at once
> and more than one VM termination at once per cluster. I could see in the
> cloud controller that we can create more than one node from jclouds. But we
> are using the jclouds api to always spin one VM. AFAIK, we didn't use
> jclouds api to create more than one VM at once because it is a blocking
> call. Also, we are calling validate partition from jclouds when the cluster
> monitor is running which is also a blocking call as i think. So, not sure
> whether 40 min taken for your 40+ cartridges, as these jclouds's blocking
> calls took considerable amount of time even though stratos is executing the
> clusterMonitor in parallel. We will have to recreate this issue and analyse
> further to get the root cause of it.
>
> I hope that Nirmal/Sajith can give more input on this. If there is no
> limitation from jclouds, then we will be able to implement this improvement.
>
Thanks a lot Reka for the clarification. Yes, in this case there definitely
can be room for improvement.

>
> Thanks,
> Reka
>
> On Mon, Sep 29, 2014 at 11:17 AM, Isuru Haththotuwa <is...@apache.org>
> wrote:
>
>> Hi Martin,
>>
>> Can you please explain more about how you subscribed to a large number of
>> cartridges? AFAIU, you have subscribed to them one after the other. Please
>> confirm.
>>
>> What you have observed is correct. Stratos would spin up cartridges for
>> each subscriptions sequentially. We did not have a use case for spinning up
>> instances is parallel since in Stratos 4.0.0 there were no concept of a
>> subscribing at once to more than one cartridge; whether they should be
>> spinning up one after the other (dependencies) or if they should be
>> spinning in parallel (independent). AFAIU we address this in Service
>> Grouping, which is currently happening.
>>
>> On Mon, Sep 29, 2014 at 6:51 AM, Martin Eppel (meppel) <me...@cisco.com>
>> wrote:
>>
>>>
>>>
>>> We are observing an interesting phenomena (based on stratos 4.0.0), when
>>> we subscribe a large number of cartridges (40+) it takes up to 40+ minutes
>>> for the VMs to spin up. From the logs we can observe the publishing
>>> subscriptions goes fairly quickly but VM spin up takes up quite a bit of
>>> time. For example, if we publish a few new subscriptions on top of already
>>> running VMs, publishing the subscriptions take 10+ seconds, while spinning
>>> up the corresponding VMs take minutes.
>>>
>>>
>>>
>>> Based on this observation, it appears that stratos is spinning up VMs
>>> not in parallel but rather sequentially even across different subscriptions
>>> (one VM at a time).
>>>
>>>
>>>
>>> However, my understanding from analyzing the code is that it should
>>> happen asynchronously instead of sequentially. Is there something in
>>> stratos wich serializes the spin up of VM (across different sbscriptions) ?
>>>
>>>
>>>
>>> Below is mu analysis on how subscriptions and Vm spin up works in
>>> stratos, is this correct, or did I miss something? If yes, what could
>>> potentially cause the appearantly  sequential spin up of the VMs ?
>>>
>>>
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Martin
>>>
>>>
>>>
>>>
>>>
>>> Stratos subscription and VM spin up:
>>>
>>> In general, cartridge subscription and VM spin up is not serialized (at
>>> least for different subscriptions), it’s all event driven and for each
>>> cartridge a separate monitoring thread is created. However, spinning up VMs
>>> is rule driven (per cluster) in the autscaler and is determined by the
>>> setting of the min number of VMs in a cluster as well as the health
>>> statistics for scaling (which are reported back by the cartridge agent and
>>> averaged by the autoscaler). The rule is checked periodically, so it does
>>> take time for a VM to spin up.
>>>
>>>
>>>
>>> For the rule to kick in the Cluster has to be in a certain state, and
>>> for the cluster state to be set it again depends on an event to be received
>>> so I think there is a good chance that when the rule starts up the first
>>> time (as [part of the periodic checks in the ClusterMonitor) it has to wait
>>> for a subsequent run for the Cluster to be in the right state.
>>>
>>>
>>>
>>> In one aspect it does seem to be serialized, for each Cluster (==
>>> subscription) to reach the min number of VMs, VMs are spawned sequentially
>>> for each periodic check (it does not seem to spawn n number of VMs at the
>>> same time).
>>>
>>> --
>>>
>>> Thanks and Regards,
>>>
>>> Isuru H.
>>> +94 716 358 048* <http://wso2.com/>*
>>>
>>>
>>> * <http://wso2.com/>*
>>>
>>>
>>>
>
>
> --
> Reka Thirunavukkarasu
> Senior Software Engineer,
> WSO2, Inc.:http://wso2.com,
> Mobile: +94776442007
>
> --
> <%2B94776442007>
> Thanks and Regards,
>
> Isuru H.
> <%2B94776442007>
> +94 716 358 048 <%2B94776442007>* <http://wso2.com/>*
>
>
> * <http://wso2.com/>*
>
>
>

Re: subscribing a large number of cartridge takes a very long time

Posted by Reka Thirunavukkarasu <re...@wso2.com>.
Hi Martin,

Stratos is not spinning up more than of VMs at once per cluster using the
ClusterMonitor. This is an improvement which we will have to address in
autoscaler and cloud controller to handle more than one VM spinning at once
and more than one VM termination at once per cluster. I could see in the
cloud controller that we can create more than one node from jclouds. But we
are using the jclouds api to always spin one VM. AFAIK, we didn't use
jclouds api to create more than one VM at once because it is a blocking
call. Also, we are calling validate partition from jclouds when the cluster
monitor is running which is also a blocking call as i think. So, not sure
whether 40 min taken for your 40+ cartridges, as these jclouds's blocking
calls took considerable amount of time even though stratos is executing the
clusterMonitor in parallel. We will have to recreate this issue and analyse
further to get the root cause of it.

I hope that Nirmal/Sajith can give more input on this. If there is no
limitation from jclouds, then we will be able to implement this improvement.

Thanks,
Reka

On Mon, Sep 29, 2014 at 11:17 AM, Isuru Haththotuwa <is...@apache.org>
wrote:

> Hi Martin,
>
> Can you please explain more about how you subscribed to a large number of
> cartridges? AFAIU, you have subscribed to them one after the other. Please
> confirm.
>
> What you have observed is correct. Stratos would spin up cartridges for
> each subscriptions sequentially. We did not have a use case for spinning up
> instances is parallel since in Stratos 4.0.0 there were no concept of a
> subscribing at once to more than one cartridge; whether they should be
> spinning up one after the other (dependencies) or if they should be
> spinning in parallel (independent). AFAIU we address this in Service
> Grouping, which is currently happening.
>
> On Mon, Sep 29, 2014 at 6:51 AM, Martin Eppel (meppel) <me...@cisco.com>
> wrote:
>
>>
>>
>> We are observing an interesting phenomena (based on stratos 4.0.0), when
>> we subscribe a large number of cartridges (40+) it takes up to 40+ minutes
>> for the VMs to spin up. From the logs we can observe the publishing
>> subscriptions goes fairly quickly but VM spin up takes up quite a bit of
>> time. For example, if we publish a few new subscriptions on top of already
>> running VMs, publishing the subscriptions take 10+ seconds, while spinning
>> up the corresponding VMs take minutes.
>>
>>
>>
>> Based on this observation, it appears that stratos is spinning up VMs not
>> in parallel but rather sequentially even across different subscriptions
>> (one VM at a time).
>>
>>
>>
>> However, my understanding from analyzing the code is that it should
>> happen asynchronously instead of sequentially. Is there something in
>> stratos wich serializes the spin up of VM (across different sbscriptions) ?
>>
>>
>>
>> Below is mu analysis on how subscriptions and Vm spin up works in
>> stratos, is this correct, or did I miss something? If yes, what could
>> potentially cause the appearantly  sequential spin up of the VMs ?
>>
>>
>>
>>
>>
>> Thanks
>>
>>
>>
>> Martin
>>
>>
>>
>>
>>
>> Stratos subscription and VM spin up:
>>
>> In general, cartridge subscription and VM spin up is not serialized (at
>> least for different subscriptions), it’s all event driven and for each
>> cartridge a separate monitoring thread is created. However, spinning up VMs
>> is rule driven (per cluster) in the autscaler and is determined by the
>> setting of the min number of VMs in a cluster as well as the health
>> statistics for scaling (which are reported back by the cartridge agent and
>> averaged by the autoscaler). The rule is checked periodically, so it does
>> take time for a VM to spin up.
>>
>>
>>
>> For the rule to kick in the Cluster has to be in a certain state, and for
>> the cluster state to be set it again depends on an event to be received so
>> I think there is a good chance that when the rule starts up the first time
>> (as [part of the periodic checks in the ClusterMonitor) it has to wait for
>> a subsequent run for the Cluster to be in the right state.
>>
>>
>>
>> In one aspect it does seem to be serialized, for each Cluster (==
>> subscription) to reach the min number of VMs, VMs are spawned sequentially
>> for each periodic check (it does not seem to spawn n number of VMs at the
>> same time).
>>
>> --
>>
>> Thanks and Regards,
>>
>> Isuru H.
>> +94 716 358 048* <http://wso2.com/>*
>>
>>
>> * <http://wso2.com/>*
>>
>>
>>


-- 
Reka Thirunavukkarasu
Senior Software Engineer,
WSO2, Inc.:http://wso2.com,
Mobile: +94776442007

Re: subscribing a large number of cartridge takes a very long time

Posted by Isuru Haththotuwa <is...@apache.org>.
Hi Martin,

Can you please explain more about how you subscribed to a large number of
cartridges? AFAIU, you have subscribed to them one after the other. Please
confirm.

What you have observed is correct. Stratos would spin up cartridges for
each subscriptions sequentially. We did not have a use case for spinning up
instances is parallel since in Stratos 4.0.0 there were no concept of a
subscribing at once to more than one cartridge; whether they should be
spinning up one after the other (dependencies) or if they should be
spinning in parallel (independent). AFAIU we address this in Service
Grouping, which is currently happening.

On Mon, Sep 29, 2014 at 6:51 AM, Martin Eppel (meppel) <me...@cisco.com>
wrote:

>
>
> We are observing an interesting phenomena (based on stratos 4.0.0), when
> we subscribe a large number of cartridges (40+) it takes up to 40+ minutes
> for the VMs to spin up. From the logs we can observe the publishing
> subscriptions goes fairly quickly but VM spin up takes up quite a bit of
> time. For example, if we publish a few new subscriptions on top of already
> running VMs, publishing the subscriptions take 10+ seconds, while spinning
> up the corresponding VMs take minutes.
>
>
>
> Based on this observation, it appears that stratos is spinning up VMs not
> in parallel but rather sequentially even across different subscriptions
> (one VM at a time).
>
>
>
> However, my understanding from analyzing the code is that it should happen
> asynchronously instead of sequentially. Is there something in stratos wich
> serializes the spin up of VM (across different sbscriptions) ?
>
>
>
> Below is mu analysis on how subscriptions and Vm spin up works in stratos,
> is this correct, or did I miss something? If yes, what could potentially
> cause the appearantly  sequential spin up of the VMs ?
>
>
>
>
>
> Thanks
>
>
>
> Martin
>
>
>
>
>
> Stratos subscription and VM spin up:
>
> In general, cartridge subscription and VM spin up is not serialized (at
> least for different subscriptions), it’s all event driven and for each
> cartridge a separate monitoring thread is created. However, spinning up VMs
> is rule driven (per cluster) in the autscaler and is determined by the
> setting of the min number of VMs in a cluster as well as the health
> statistics for scaling (which are reported back by the cartridge agent and
> averaged by the autoscaler). The rule is checked periodically, so it does
> take time for a VM to spin up.
>
>
>
> For the rule to kick in the Cluster has to be in a certain state, and for
> the cluster state to be set it again depends on an event to be received so
> I think there is a good chance that when the rule starts up the first time
> (as [part of the periodic checks in the ClusterMonitor) it has to wait for
> a subsequent run for the Cluster to be in the right state.
>
>
>
> In one aspect it does seem to be serialized, for each Cluster (==
> subscription) to reach the min number of VMs, VMs are spawned sequentially
> for each periodic check (it does not seem to spawn n number of VMs at the
> same time).
>
> --
> Thanks and Regards,
>
> Isuru H.
> +94 716 358 048* <http://wso2.com/>*
>
>
> * <http://wso2.com/>*
>
>
>