You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stratos.apache.org by Vanson Lim <vl...@cisco.com> on 2015/04/01 00:04:35 UTC

Stratos not properly terminating VMs to fail to startup

Devs,

I've simulated the case where openstack fails to bring up a VM (we've seen this before in cases where required resources are not available 
or there is some IAAS problem/timeout which caused the VM to failure to launch), in this case we cause this failure by specifying the 
cartridge to have a fixed ip address is not part of the network which the VM attaches to.  The network is defined with a 10.0.0.0/24 
subnet, but I've specified a fixed ip=10.0.8.1 for cause the VM startup to fail.

The VM start fails and the VM remains in an error state the of the "pendingMemberExpiryTimeout" period set in the autoscaler.xml file.

Stratos fails to delete the VM in error state and attempts to start a new VM, which also fails to launch.

This presumably repeat itself creating an additional VM in error state during each iteration until we've exhausted all the resources in the 
system.

wso2carbon.log and cartridge definition attached.

-Vanson

Re: Stratos not properly terminating VMs to fail to startup

Posted by Lakmal Warusawithana <la...@wso2.com>.

IMO we should move this into next release since this is not a blocker. WDYT?

On Friday, April 3, 2015, Imesh Gunaratne <im...@apache.org> wrote:

> By any chance do we know why above instances are going to Error state?
>
> Thanks
>
> On Friday, April 3, 2015, Vanson Lim <vlim@cisco.com
> <javascript:_e(%7B%7D,'cvml','vlim@cisco.com');>> wrote:
>
>>  On 4/2/15, 1:09 PM, Jeffrey Nguyen (jeffrngu) wrote:
>>
>>  Hi Anuruddha,
>>
>>  The instances that are in Error state on Openstack Horizon were never
>> removed even after Stratos successfully spawned an instance.   It sounds
>> like you might need to enhance jClouds API to return an object with nodeID
>> info for this case.  Or perhaps a better solution would be to modify the
>> jClouds API to delete the failed instance if it wasn’t spawned successfully
>> (or make that an option of the API that handles spawning new instance).
>>
>>   Jeffrey,
>>
>> If jcloud is not returning an nodeID, it should handle cleaning up. It's
>> also reasonable for jcloud to return an object to the failed instance but
>> that's not much use to stratos except for leaving the VM around so that we
>> can see that it failed to come up.      I don't know if we want an option
>> to have jcloud try to respawn an instances as this would most likely fail.
>> It's better to have stratos manage the retries.
>>
>> -Vanson.
>>
>>   Regards,
>> -Jeffrey
>>
>>   From: Anuruddha Liyanarachchi <an...@wso2.com>
>> Reply-To: "dev@stratos.apache.org" <de...@stratos.apache.org>
>> Date: Thursday, April 2, 2015 at 4:43 AM
>> To: "dev@stratos.apache.org" <de...@stratos.apache.org>
>> Subject: Re: Stratos not properly terminating VMs to fail to startup
>>
>>   Hi Vanson / Jeffery,
>>
>>  As seen in logs, the instance Id is not returned to Stratos
>> (instanceId=null) for the members which went to error state.Therefore
>> Stratos don't have control over the instances in the error state. Hence
>> spawned instances with errors are not being deleted.
>>
>>
>>
>> On Wed, Apr 1, 2015 at 4:08 AM, Jeffrey Nguyen (jeffrngu) <
>> jeffrngu@cisco.com> wrote:
>>
>>> Hi Vanson,
>>>
>>> I opened a JIRA to track this issue last week:
>>> https://issues.apache.org/jira/browse/STRATOS-1293
>>>
>>> -Jeffrey
>>>
>>> On 3/31/15, 3:04 PM, "Vanson Lim (vlim)" <vl...@cisco.com> wrote:
>>>
>>> >Devs,
>>> >
>>> >I've simulated the case where openstack fails to bring up a VM (we've
>>> >seen this before in cases where required resources are not available
>>> >or there is some IAAS problem/timeout which caused the VM to failure to
>>> >launch), in this case we cause this failure by specifying the
>>> >cartridge to have a fixed ip address is not part of the network which
>>> the
>>> >VM attaches to.  The network is defined with a 10.0.0.0/24
>>> >subnet, but I've specified a fixed ip=10.0.8.1 for cause the VM startup
>>> >to fail.
>>> >
>>> >The VM start fails and the VM remains in an error state the of the
>>> >"pendingMemberExpiryTimeout" period set in the autoscaler.xml file.
>>> >
>>> >Stratos fails to delete the VM in error state and attempts to start a
>>> new
>>> >VM, which also fails to launch.
>>> >
>>> >This presumably repeat itself creating an additional VM in error state
>>> >during each iteration until we've exhausted all the resources in the
>>> >system.
>>> >
>>> >wso2carbon.log and cartridge definition attached.
>>> >
>>> >-Vanson
>>> >
>>>
>>>
>>
>>
>>  --
>>   *Thanks and Regards,*
>> Anuruddha Lanka Liyanarachchi
>> Software Engineer - WSO2
>> Mobile : +94 (0) 712762611
>> Tel      : +94 112 145 345
>>  anuruddhal@wso2.com
>>
>>
>>
>
> --
> Imesh Gunaratne
>
> Technical Lead, WSO2
> Committer & PMC Member, Apache Stratos
>
>

-- 
Sent from Gmail Mobile

Re: Stratos not properly terminating VMs to fail to startup

Posted by Imesh Gunaratne <im...@apache.org>.

By any chance do we know why above instances are going to Error state?

Thanks

On Friday, April 3, 2015, Vanson Lim <vl...@cisco.com> wrote:

>  On 4/2/15, 1:09 PM, Jeffrey Nguyen (jeffrngu) wrote:
>
>  Hi Anuruddha,
>
>  The instances that are in Error state on Openstack Horizon were never
> removed even after Stratos successfully spawned an instance.   It sounds
> like you might need to enhance jClouds API to return an object with nodeID
> info for this case.  Or perhaps a better solution would be to modify the
> jClouds API to delete the failed instance if it wasn’t spawned successfully
> (or make that an option of the API that handles spawning new instance).
>
>   Jeffrey,
>
> If jcloud is not returning an nodeID, it should handle cleaning up. It's
> also reasonable for jcloud to return an object to the failed instance but
> that's not much use to stratos except for leaving the VM around so that we
> can see that it failed to come up.      I don't know if we want an option
> to have jcloud try to respawn an instances as this would most likely fail.
> It's better to have stratos manage the retries.
>
> -Vanson.
>
>   Regards,
> -Jeffrey
>
>   From: Anuruddha Liyanarachchi <anuruddhal@wso2.com
> <javascript:_e(%7B%7D,'cvml','anuruddhal@wso2.com');>>
> Reply-To: "dev@stratos.apache.org
> <javascript:_e(%7B%7D,'cvml','dev@stratos.apache.org');>" <
> dev@stratos.apache.org
> <javascript:_e(%7B%7D,'cvml','dev@stratos.apache.org');>>
> Date: Thursday, April 2, 2015 at 4:43 AM
> To: "dev@stratos.apache.org
> <javascript:_e(%7B%7D,'cvml','dev@stratos.apache.org');>" <
> dev@stratos.apache.org
> <javascript:_e(%7B%7D,'cvml','dev@stratos.apache.org');>>
> Subject: Re: Stratos not properly terminating VMs to fail to startup
>
>   Hi Vanson / Jeffery,
>
>  As seen in logs, the instance Id is not returned to Stratos
> (instanceId=null) for the members which went to error state.Therefore
> Stratos don't have control over the instances in the error state. Hence
> spawned instances with errors are not being deleted.
>
>
>
> On Wed, Apr 1, 2015 at 4:08 AM, Jeffrey Nguyen (jeffrngu) <
> jeffrngu@cisco.com <javascript:_e(%7B%7D,'cvml','jeffrngu@cisco.com');>>
> wrote:
>
>> Hi Vanson,
>>
>> I opened a JIRA to track this issue last week:
>> https://issues.apache.org/jira/browse/STRATOS-1293
>>
>> -Jeffrey
>>
>> On 3/31/15, 3:04 PM, "Vanson Lim (vlim)" <vlim@cisco.com
>> <javascript:_e(%7B%7D,'cvml','vlim@cisco.com');>> wrote:
>>
>> >Devs,
>> >
>> >I've simulated the case where openstack fails to bring up a VM (we've
>> >seen this before in cases where required resources are not available
>> >or there is some IAAS problem/timeout which caused the VM to failure to
>> >launch), in this case we cause this failure by specifying the
>> >cartridge to have a fixed ip address is not part of the network which the
>> >VM attaches to.  The network is defined with a 10.0.0.0/24
>> >subnet, but I've specified a fixed ip=10.0.8.1 for cause the VM startup
>> >to fail.
>> >
>> >The VM start fails and the VM remains in an error state the of the
>> >"pendingMemberExpiryTimeout" period set in the autoscaler.xml file.
>> >
>> >Stratos fails to delete the VM in error state and attempts to start a new
>> >VM, which also fails to launch.
>> >
>> >This presumably repeat itself creating an additional VM in error state
>> >during each iteration until we've exhausted all the resources in the
>> >system.
>> >
>> >wso2carbon.log and cartridge definition attached.
>> >
>> >-Vanson
>> >
>>
>>
>
>
>  --
>   *Thanks and Regards,*
> Anuruddha Lanka Liyanarachchi
> Software Engineer - WSO2
> Mobile : +94 (0) 712762611
> Tel      : +94 112 145 345
>  a <javascript:_e(%7B%7D,'cvml','thilinad@wso2.com');>nuruddhal@wso2.com
> <javascript:_e(%7B%7D,'cvml','nuruddhal@wso2.com');>
>
>
>

-- 
Imesh Gunaratne

Technical Lead, WSO2
Committer & PMC Member, Apache Stratos

Re: Stratos not properly terminating VMs to fail to startup

Posted by Vanson Lim <vl...@cisco.com>.

On 4/2/15, 1:09 PM, Jeffrey Nguyen (jeffrngu) wrote:
> Hi Anuruddha,
>
> The instances that are in Error state on Openstack Horizon were never removed even after Stratos successfully spawned an instance.   It 
> sounds like you might need to enhance jClouds API to return an object with nodeID info for this case.  Or perhaps a better solution would 
> be to modify the jClouds API to delete the failed instance if it wasn’t spawned successfully (or make that an option of the API that 
> handles spawning new instance).
>
Jeffrey,

If jcloud is not returning an nodeID, it should handle cleaning up. It's also reasonable for jcloud to return an object to the failed 
instance but that's not much use to stratos except for leaving the VM around so that we can see that it failed to come up.      I don't 
know if we want an option to have jcloud try to respawn an instances as this would most likely fail.  It's better to have stratos manage 
the retries.

-Vanson.

> Regards,
> -Jeffrey
>
> From: Anuruddha Liyanarachchi <anuruddhal@wso2.com <ma...@wso2.com>>
> Reply-To: "dev@stratos.apache.org <ma...@stratos.apache.org>" <dev@stratos.apache.org <ma...@stratos.apache.org>>
> Date: Thursday, April 2, 2015 at 4:43 AM
> To: "dev@stratos.apache.org <ma...@stratos.apache.org>" <dev@stratos.apache.org <ma...@stratos.apache.org>>
> Subject: Re: Stratos not properly terminating VMs to fail to startup
>
> Hi Vanson / Jeffery,
>
> As seen in logs, the instance Id is not returned to Stratos (instanceId=null) for the members which went to error state.Therefore Stratos 
> don't have control over the instances in the error state. Hence spawned instances with errors are not being deleted.
>
>
>
> On Wed, Apr 1, 2015 at 4:08 AM, Jeffrey Nguyen (jeffrngu) <jeffrngu@cisco.com <ma...@cisco.com>> wrote:
>
>     Hi Vanson,
>
>     I opened a JIRA to track this issue last week:
>     https://issues.apache.org/jira/browse/STRATOS-1293
>
>     -Jeffrey
>
>     On 3/31/15, 3:04 PM, "Vanson Lim (vlim)" <vlim@cisco.com <ma...@cisco.com>> wrote:
>
>     >Devs,
>     >
>     >I've simulated the case where openstack fails to bring up a VM (we've
>     >seen this before in cases where required resources are not available
>     >or there is some IAAS problem/timeout which caused the VM to failure to
>     >launch), in this case we cause this failure by specifying the
>     >cartridge to have a fixed ip address is not part of the network which the
>     >VM attaches to.  The network is defined with a 10.0.0.0/24 <http://10.0.0.0/24>
>     >subnet, but I've specified a fixed ip=10.0.8.1 for cause the VM startup
>     >to fail.
>     >
>     >The VM start fails and the VM remains in an error state the of the
>     >"pendingMemberExpiryTimeout" period set in the autoscaler.xml file.
>     >
>     >Stratos fails to delete the VM in error state and attempts to start a new
>     >VM, which also fails to launch.
>     >
>     >This presumably repeat itself creating an additional VM in error state
>     >during each iteration until we've exhausted all the resources in the
>     >system.
>     >
>     >wso2carbon.log and cartridge definition attached.
>     >
>     >-Vanson
>     >
>
>
>
>
> -- 
> *Thanks and Regards,*
> Anuruddha Lanka Liyanarachchi
> Software Engineer - WSO2
> Mobile : +94 (0) 712762611
> Tel      : +94 112 145 345
> a <ma...@wso2.com>nuruddhal@wso2.com <ma...@wso2.com>

Re: Stratos not properly terminating VMs to fail to startup

Posted by "Jeffrey Nguyen (jeffrngu)" <je...@cisco.com>.

Hi Anuruddha,

The instances that are in Error state on Openstack Horizon were never removed even after Stratos successfully spawned an instance.   It sounds like you might need to enhance jClouds API to return an object with nodeID info for this case.  Or perhaps a better solution would be to modify the jClouds API to delete the failed instance if it wasn’t spawned successfully (or make that an option of the API that handles spawning new instance).

Regards,
-Jeffrey

From: Anuruddha Liyanarachchi <an...@wso2.com>>
Reply-To: "dev@stratos.apache.org<ma...@stratos.apache.org>" <de...@stratos.apache.org>>
Date: Thursday, April 2, 2015 at 4:43 AM
To: "dev@stratos.apache.org<ma...@stratos.apache.org>" <de...@stratos.apache.org>>
Subject: Re: Stratos not properly terminating VMs to fail to startup

Hi Vanson / Jeffery,

As seen in logs, the instance Id is not returned to Stratos (instanceId=null) for the members which went to error state.Therefore Stratos don't have control over the instances in the error state. Hence spawned instances with errors are not being deleted.

On Wed, Apr 1, 2015 at 4:08 AM, Jeffrey Nguyen (jeffrngu) <je...@cisco.com>> wrote:
Hi Vanson,

I opened a JIRA to track this issue last week:
https://issues.apache.org/jira/browse/STRATOS-1293

-Jeffrey

On 3/31/15, 3:04 PM, "Vanson Lim (vlim)" <vl...@cisco.com>> wrote:

>Devs,
>
>I've simulated the case where openstack fails to bring up a VM (we've
>seen this before in cases where required resources are not available
>or there is some IAAS problem/timeout which caused the VM to failure to
>launch), in this case we cause this failure by specifying the
>cartridge to have a fixed ip address is not part of the network which the
>VM attaches to.  The network is defined with a 10.0.0.0/24<http://10.0.0.0/24>
>subnet, but I've specified a fixed ip=10.0.8.1 for cause the VM startup
>to fail.
>
>The VM start fails and the VM remains in an error state the of the
>"pendingMemberExpiryTimeout" period set in the autoscaler.xml file.
>
>Stratos fails to delete the VM in error state and attempts to start a new
>VM, which also fails to launch.
>
>This presumably repeat itself creating an additional VM in error state
>during each iteration until we've exhausted all the resources in the
>system.
>
>wso2carbon.log and cartridge definition attached.
>
>-Vanson
>

--
Thanks and Regards,
Anuruddha Lanka Liyanarachchi
Software Engineer - WSO2
Mobile : +94 (0) 712762611
Tel      : +94 112 145 345
a<ma...@wso2.com>

Re: Stratos not properly terminating VMs to fail to startup

Posted by Anuruddha Liyanarachchi <an...@wso2.com>.

Hi Vanson / Jeffery,

As seen in logs, the instance Id is not returned to Stratos
(instanceId=null) for the members which went to error state.Therefore
Stratos don't have control over the instances in the error state. Hence
spawned instances with errors are not being deleted.



On Wed, Apr 1, 2015 at 4:08 AM, Jeffrey Nguyen (jeffrngu) <
jeffrngu@cisco.com> wrote:

> Hi Vanson,
>
> I opened a JIRA to track this issue last week:
> https://issues.apache.org/jira/browse/STRATOS-1293
>
> -Jeffrey
>
> On 3/31/15, 3:04 PM, "Vanson Lim (vlim)" <vl...@cisco.com> wrote:
>
> >Devs,
> >
> >I've simulated the case where openstack fails to bring up a VM (we've
> >seen this before in cases where required resources are not available
> >or there is some IAAS problem/timeout which caused the VM to failure to
> >launch), in this case we cause this failure by specifying the
> >cartridge to have a fixed ip address is not part of the network which the
> >VM attaches to.  The network is defined with a 10.0.0.0/24
> >subnet, but I've specified a fixed ip=10.0.8.1 for cause the VM startup
> >to fail.
> >
> >The VM start fails and the VM remains in an error state the of the
> >"pendingMemberExpiryTimeout" period set in the autoscaler.xml file.
> >
> >Stratos fails to delete the VM in error state and attempts to start a new
> >VM, which also fails to launch.
> >
> >This presumably repeat itself creating an additional VM in error state
> >during each iteration until we've exhausted all the resources in the
> >system.
> >
> >wso2carbon.log and cartridge definition attached.
> >
> >-Vanson
> >
>
>


-- 
*Thanks and Regards,*
Anuruddha Lanka Liyanarachchi
Software Engineer - WSO2
Mobile : +94 (0) 712762611
Tel      : +94 112 145 345
a <th...@wso2.com>nuruddhal@wso2.com

Re: Stratos not properly terminating VMs to fail to startup

Posted by "Jeffrey Nguyen (jeffrngu)" <je...@cisco.com>.

Hi Vanson,

I opened a JIRA to track this issue last week:
https://issues.apache.org/jira/browse/STRATOS-1293

-Jeffrey

On 3/31/15, 3:04 PM, "Vanson Lim (vlim)" <vl...@cisco.com> wrote:

>Devs,
>
>I've simulated the case where openstack fails to bring up a VM (we've
>seen this before in cases where required resources are not available
>or there is some IAAS problem/timeout which caused the VM to failure to
>launch), in this case we cause this failure by specifying the
>cartridge to have a fixed ip address is not part of the network which the
>VM attaches to.  The network is defined with a 10.0.0.0/24
>subnet, but I've specified a fixed ip=10.0.8.1 for cause the VM startup
>to fail.
>
>The VM start fails and the VM remains in an error state the of the
>"pendingMemberExpiryTimeout" period set in the autoscaler.xml file.
>
>Stratos fails to delete the VM in error state and attempts to start a new
>VM, which also fails to launch.
>
>This presumably repeat itself creating an additional VM in error state
>during each iteration until we've exhausted all the resources in the
>system.
>
>wso2carbon.log and cartridge definition attached.
>
>-Vanson
>