You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@cloudstack.apache.org by Yiping Zhang <yz...@marketo.com> on 2017/01/05 00:36:16 UTC

Xen hypervisor question

Hi,

This is a Xen Server question, but since it is part of my ACS setup, I hope Xen Server experts on this list can provide some help.

Every month or so, one of my Xen Server pool (6.5 SP1 with most of patches are installed) with ten hypervisor nodes will go crazy: in CS, only the pool master stays in UP state, all slaves are in either Alert or Connecting state; and CS can’t perform any VM operations if that VM in running on one of slaves. On hypervisor CLI, xe commands are extremely slow on slaves, often they would just fail, but on pool master, xe commands behaves normally.  It seems that the pool slaves just can’t communicate with the master properly.

I have managed to recover the pool each time by switching pool master to another hypervisor (often this step proceeds with great difficulty due to poor communication between the master and slaves) and followed by running xe-toolstack-restart command on all pool members.

What is the root cause of this condition? How could I avoid getting into such situation in the first place?

Thanks

Yiping

AW: Xen hypervisor question

Posted by Skale Franz <fr...@citycom-austria.com>.

Hi Yiping,
check your DNS setup. (/etc/resolv.conf)
Possibly you supplied a wrong list of servers which cannot be reached.
Check the logfile /var/log/xensource.log for errors.
Check diagnostic-net-stats, dmesg etc.
Check xentop for system load.

Rgds.
Franz

________________________________________
Von: Yiping Zhang <yz...@marketo.com>
Gesendet: Donnerstag, 05. Jänner 2017 01:36
An: users@cloudstack.apache.org
Betreff: Xen hypervisor question

Hi,

This is a Xen Server question, but since it is part of my ACS setup, I hope Xen Server experts on this list can provide some help.

Every month or so, one of my Xen Server pool (6.5 SP1 with most of patches are installed) with ten hypervisor nodes will go crazy: in CS, only the pool master stays in UP state, all slaves are in either Alert or Connecting state; and CS can’t perform any VM operations if that VM in running on one of slaves. On hypervisor CLI, xe commands are extremely slow on slaves, often they would just fail, but on pool master, xe commands behaves normally. It seems that the pool slaves just can’t communicate with the master properly.

I have managed to recover the pool each time by switching pool master to another hypervisor (often this step proceeds with great difficulty due to poor communication between the master and slaves) and followed by running xe-toolstack-restart command on all pool members.

What is the root cause of this condition? How could I avoid getting into such situation in the first place?

Thanks

Yiping