You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cloudstack.apache.org by Mārtiņš Jakubovičs <ma...@hostnet.lv> on 2016/03/30 10:14:11 UTC

CloudStack HA

Hello,

This morning I faced unexpected problem, one of XenServer hosts 
rebooted. I checked logs and it looks like due network issue, but 
question is why host rebooted it self? CloudStack's XS Pool is not HA 
enabled. And as I know, in ACS 4.3.2 CloudStack did not manage Host's HA 
or am I wrong?

Mar 30 07:00:33 cloudstack-1 heartbeat: Potential problem with 
/var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: 
not reachable since 65 seconds
Mar 30 07:00:33 cloudstack-1 heartbeat: Problem with 
/var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: 
not reachable for 65 seconds, rebooting system!

[root@cloudstack-1 ~]# xe pool-list params=all | grep ha-
                       ha-enabled ( RO): false
                 ha-configuration ( RO):
                    ha-statefiles ( RO):
     ha-host-failures-to-tolerate ( RW): 0
               ha-plan-exists-for ( RO): 0
              ha-allow-overcommit ( RW): false
                 ha-overcommitted ( RO): false

So did ACS manage some kind of host's HA?

XenServer 6.2
ACS 4.3.2

Best regards,
Martins


Re: CloudStack HA

Posted by Dag Sonstebo <Da...@shapeblue.com>.
Hi Martins,

yes this is a typical example of host self fencing, where a host will reboot when losing connectivity to a primary storage pool. As you have already found this is controlled by the xenheartbeat.sh script.

Dag Sonstebo
Cloud Architect
ShapeBlue






On 30/03/2016, 10:04, "Mārtiņš Jakubovičs" <ma...@hostnet.lv> wrote:

>Looks like I found issue, it is /opt/cloud/bin/xenheartbeat.sh script 
>which is running in all hosts.
>
>On 2016.03.30. 11:14, Mārtiņš Jakubovičs wrote:
>> Hello,
>>
>> This morning I faced unexpected problem, one of XenServer hosts 
>> rebooted. I checked logs and it looks like due network issue, but 
>> question is why host rebooted it self? CloudStack's XS Pool is not HA 
>> enabled. And as I know, in ACS 4.3.2 CloudStack did not manage Host's 
>> HA or am I wrong?
>>
>> Mar 30 07:00:33 cloudstack-1 heartbeat: Potential problem with 
>> /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: 
>> not reachable since 65 seconds
>> Mar 30 07:00:33 cloudstack-1 heartbeat: Problem with 
>> /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: 
>> not reachable for 65 seconds, rebooting system!
>>
>> [root@cloudstack-1 ~]# xe pool-list params=all | grep ha-
>>                       ha-enabled ( RO): false
>>                 ha-configuration ( RO):
>>                    ha-statefiles ( RO):
>>     ha-host-failures-to-tolerate ( RW): 0
>>               ha-plan-exists-for ( RO): 0
>>              ha-allow-overcommit ( RW): false
>>                 ha-overcommitted ( RO): false
>>
>> So did ACS manage some kind of host's HA?
>>
>> XenServer 6.2
>> ACS 4.3.2
>>
>> Best regards,
>> Martins
>>
>

Regards,

Dag Sonstebo

Dag.Sonstebo@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue

Re: CloudStack HA

Posted by Mārtiņš Jakubovičs <ma...@hostnet.lv>.
Looks like I found issue, it is /opt/cloud/bin/xenheartbeat.sh script 
which is running in all hosts.

On 2016.03.30. 11:14, Mārtiņš Jakubovičs wrote:
> Hello,
>
> This morning I faced unexpected problem, one of XenServer hosts 
> rebooted. I checked logs and it looks like due network issue, but 
> question is why host rebooted it self? CloudStack's XS Pool is not HA 
> enabled. And as I know, in ACS 4.3.2 CloudStack did not manage Host's 
> HA or am I wrong?
>
> Mar 30 07:00:33 cloudstack-1 heartbeat: Potential problem with 
> /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: 
> not reachable since 65 seconds
> Mar 30 07:00:33 cloudstack-1 heartbeat: Problem with 
> /var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: 
> not reachable for 65 seconds, rebooting system!
>
> [root@cloudstack-1 ~]# xe pool-list params=all | grep ha-
>                       ha-enabled ( RO): false
>                 ha-configuration ( RO):
>                    ha-statefiles ( RO):
>     ha-host-failures-to-tolerate ( RW): 0
>               ha-plan-exists-for ( RO): 0
>              ha-allow-overcommit ( RW): false
>                 ha-overcommitted ( RO): false
>
> So did ACS manage some kind of host's HA?
>
> XenServer 6.2
> ACS 4.3.2
>
> Best regards,
> Martins
>


AW: CloudStack HA

Posted by "S. Brüseke - proIO GmbH" <s....@proio.com>.
Hi Martins,

you need to check XenServer logs. CS will not reboot any hypervisor.
XenServer will also reboot in some situations where Dom0 has no resources (CPU, RAM) left. Which version of XS are you using?

Mit freundlichen Grüßen / With kind regards,

Swen


-----Ursprüngliche Nachricht-----
Von: Mārtiņš Jakubovičs [mailto:martins-lists@hostnet.lv] 
Gesendet: Mittwoch, 30. März 2016 10:14
An: users@cloudstack.apache.org
Betreff: CloudStack HA

Hello,

This morning I faced unexpected problem, one of XenServer hosts rebooted. I checked logs and it looks like due network issue, but question is why host rebooted it self? CloudStack's XS Pool is not HA enabled. And as I know, in ACS 4.3.2 CloudStack did not manage Host's HA or am I wrong?

Mar 30 07:00:33 cloudstack-1 heartbeat: Potential problem with
/var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: 
not reachable since 65 seconds
Mar 30 07:00:33 cloudstack-1 heartbeat: Problem with
/var/run/sr-mount/858d490c-38e8-0f44-2840-c6acb98c3ae9/hb-b81b5d17-dea8-4257-a9b5-30b52229cc68: 
not reachable for 65 seconds, rebooting system!

[root@cloudstack-1 ~]# xe pool-list params=all | grep ha-
                       ha-enabled ( RO): false
                 ha-configuration ( RO):
                    ha-statefiles ( RO):
     ha-host-failures-to-tolerate ( RW): 0
               ha-plan-exists-for ( RO): 0
              ha-allow-overcommit ( RW): false
                 ha-overcommitted ( RO): false

So did ACS manage some kind of host's HA?

XenServer 6.2
ACS 4.3.2

Best regards,
Martins



- proIO GmbH -
Geschäftsführer: Swen Brüseke
Sitz der Gesellschaft: Frankfurt am Main

USt-IdNr. DE 267 075 918
Registergericht: Frankfurt am Main - HRB 86239

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. 
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, 
informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. 
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind nicht gestattet. 

This e-mail may contain confidential and/or privileged information. 
If you are not the intended recipient (or have received this e-mail in error) please notify 
the sender immediately and destroy this e-mail.  
Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.