You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Andrija Panic <an...@gmail.com> on 2019/11/21 22:24:37 UTC

[DISCUSS] VMs crashing/stopped during live migration?

Hi guys.

I wanted to see if any of you have seen similar/same in master, as below.

I've been testing some work/PRs (against the current master) and I've seen
that VMs will crash/be stopped occasionally when live migration is
happening. I experienced this on an NEW/EMPTY env, with 2 KVM hosts, and
only SSVM and CPVM - not a capacity issues or similar.

This is happening with CentOS 7 (CentOS 7.3 I believe, but we also updated
packages to the latest stock ones and same issue was happening again).

This is still under investigation, but I was wondering if anyone else has
seen similar thing happening?

Best,

-- 

Andrija Panić

Re: [DISCUSS] VMs crashing/stopped during live migration?

Posted by Sven Vogel <S....@ewerk.com>.
Hi Wei, Hi Andrija,

Sounds really interesting. In can confirm all thing to point three.
There is not retry or stop or check if enough resources. VMs will be shut off and that’s it.

@Wei do you have an Fix for point 3? That would be great.

@All we use CentOS 7.7 with qemu 2.12 and libvirt 5.0.0.

Cheers

Sven

Von meinem iPhone gesendet


__

Sven Vogel
Teamlead Platform

EWERK DIGITAL GmbH
Brühl 24, D-04109 Leipzig
P +49 341 42649 - 99
F +49 341 42649 - 98
S.Vogel@ewerk.com
www.ewerk.com

Geschäftsführer:
Dr. Erik Wende, Hendrik Schubert, Frank Richter
Registergericht: Leipzig HRB 9065

Zertifiziert nach:
ISO/IEC 27001:2013
DIN EN ISO 9001:2015
DIN ISO/IEC 20000-1:2011

EWERK-Blog | LinkedIn | Xing | Twitter | Facebook

Auskünfte und Angebote per Mail sind freibleibend und unverbindlich.

Disclaimer Privacy:
Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter Dateien) ist vertraulich und nur für den Empfänger bestimmt. Sollten Sie nicht der bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung, Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. Bitte informieren Sie in diesem Fall unverzüglich den Absender und löschen Sie die E-Mail (einschließlich etwaiger beigefügter Dateien) von Ihrem System. Vielen Dank.

The contents of this e-mail (including any attachments) are confidential and may be legally privileged. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system. Thank you.
> Am 22.11.2019 um 11:39 schrieb Andrija Panic <an...@gmail.com>:
>
> thx Wei - that should (so I was told) kill the DESTINATION VM on failed
> migrations - i.e. perform cleanup - so that is OK?
>
>> On Fri, 22 Nov 2019 at 10:59, Wei ZHOU <us...@gmail.com> wrote:
>>
>> Hi Andrija,
>>
>> As I remember, it happened on our production few years ago.
>>
>> https://github.com/apache/cloudstack/blob/master/engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java#L2962-L2983
>>
>>
>> -Wei
>>
>> On Fri, 22 Nov 2019 at 09:34, Andrija Panic <an...@gmail.com>
>> wrote:
>>
>>> Thx both, thx Wei - that sounds all interesting.
>>>
>>> as for "vm migration fails and no retry in cloudstack" - this should NOT
>>> trigger stopping the VM - at least what I saw so far - simply host will
>> be
>>> in ErrorMaintenance - can you confirm VMs are not stopped in this case?
>>>
>>>> On Fri, 22 Nov 2019 at 08:54, Wei ZHOU <us...@gmail.com> wrote:
>>>
>>>> Hi Andrija,
>>>>
>>>> We have faces some vm migration issues. There are three categories
>>> actually
>>>> 1. vm migration fails due to different hardware or software on source
>> and
>>>> destination hosts, for example, cpu models. vm will be still running on
>>>> source hosts.
>>>> you may find some errors in agent.log.
>>>> 2. vm migration fails due to some libvirt/qemu bugs. you may find some
>>>> errors in /var/log/libvirt/qemu/ folder (on ubuntu) on the source or
>>>> destination host.
>>>> mostly the vm will be still running on source host. In rare cases the
>> vm
>>> is
>>>> stopped.
>>>> 3. vm is stopped due to some cloudstack bugs. for example, when we put
>> a
>>>> host to maintenance, the vm will be stopped if (1) no other host is Up
>> in
>>>> same cluster, or (2) vm migration fails and no retry in cloudstack, or
>>> (3)
>>>> multiple vms are migrated to same destination at the same time but
>> there
>>> is
>>>> no enough memory on the destination.
>>>>
>>>> We need to fix the issues mentioned in part 3 above in cloudstack.
>>>>
>>>> In Leaseweb, to improve the vm migration
>>>> (1) we use custom cpu model , see
>>>>
>>>>
>>>
>> http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/master/hypervisor/kvm.html#configure-cpu-model-for-kvm-guest-optional
>>>> (2) we have build our own qemu packages with some bug fixes for
>>>> installation
>>>> (3) we have some fixes in our fork from 4.7.1. We have not tested with
>>>> 4.13/4.14.
>>>> We still see failed vm migration sometimes. However the vms will not be
>>>> stopped if migration fails.
>>>>
>>>> -Wei
>>>>
>>>> On Fri, 22 Nov 2019 at 01:54, Andrija Panic <an...@gmail.com>
>>>> wrote:
>>>>
>>>>> ( @Sven, not being able to migrate Vm with ISO attached - don't
>> recall
>>>>> testing/doing that recently - but is technically perfectly possible,
>>>> unless
>>>>> we don't support it via CloudStack - feel free to open GitHub issue
>>> with
>>>>> correct steps to reproduce etc)
>>>>>
>>>>> On Fri, 22 Nov 2019 at 01:47, Andrija Panic <andrija.panic@gmail.com
>>>
>>>>> wrote:
>>>>>
>>>>>> That sucks...thx both.
>>>>>>
>>>>>> @both - which ACS version do you use (and encounter such issues?)
>>>>>>
>>>>>> Ubuntu comes with a whole another set of issues (I was losing my
>>> nerves
>>>>>> around very idiotic things, last time a week ago...) - though most
>>> can
>>>> be
>>>>>> managed with some workarounds.
>>>>>> But yes, Qemu/libvirt should be better with Ubuntu - free of RedHat
>>>>>> s$^%tty business politics - i.e. in CentOS 6.x you were able to
>> live
>>>>>> migrate VM WITH all the volumes to another host/storage. On CentOS
>> 7
>>>> you
>>>>>> can't do that any more, unless you are using qemu-kvm-ev (but not
>> the
>>>>>> regular one from the SIG CentOS repo, you need the one from the
>> oVirt
>>>>>> project)
>>>>>>
>>>>>> I'm just trying to understand if this is happening also on i.e. ACS
>>>> 4.11
>>>>> -
>>>>>> so to stop digging around the problem (and assume it's purely
>> CentOS
>>>>> which
>>>>>> is broken - why all great things need to come to an end...damn it)
>>>>>>
>>>>>> (well I could also test same ACS code on Ubuntu and see if no
>> issues
>>>>> there
>>>>>> with live migrations..)
>>>>>>
>>>>>> Thanks
>>>>>> Andrija
>>>>>>
>>>>>> On Thu, 21 Nov 2019 at 23:39, Jean-Francois Nadeau <
>>>>> the.jfnadeau@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Andrija,
>>>>>>>
>>>>>>> We experienced that problem with stock packages on CentOS 7.4.
>>> Live
>>>>>>> migration would frequently fail and leave the VM dead.    We since
>>>> moved
>>>>>>> to
>>>>>>> RHEV packages for qemu.  Libvirt is still stock per CentoS 7.6
>>> (4.5).
>>>>> I
>>>>>>> want to say the situation improved but I can't tell yet if we
>> have a
>>>>> 100%
>>>>>>> success rate on live migrations (as it should be !)
>>>>>>>
>>>>>>> Redhat also have been messing up severely with stock  libvirt
>>> versions
>>>>>>> between 7.4/7.5/7.6 in such way it broke live migration
>>> compatibility
>>>>> (cpu
>>>>>>> definitions).   Im at the crossroads right now to entirely ditch
>>>>>>> centos/redhat in favor of Ubuntu to have well tested stock
>> packages.
>>>>>>>
>>>>>>> best,
>>>>>>>
>>>>>>> -Jfn
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 21, 2019 at 5:25 PM Andrija Panic <
>>>> andrija.panic@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi guys.
>>>>>>>>
>>>>>>>> I wanted to see if any of you have seen similar/same in master,
>> as
>>>>>>> below.
>>>>>>>>
>>>>>>>> I've been testing some work/PRs (against the current master) and
>>>> I've
>>>>>>> seen
>>>>>>>> that VMs will crash/be stopped occasionally when live migration
>> is
>>>>>>>> happening. I experienced this on an NEW/EMPTY env, with 2 KVM
>>> hosts,
>>>>> and
>>>>>>>> only SSVM and CPVM - not a capacity issues or similar.
>>>>>>>>
>>>>>>>> This is happening with CentOS 7 (CentOS 7.3 I believe, but we
>> also
>>>>>>> updated
>>>>>>>> packages to the latest stock ones and same issue was happening
>>>> again).
>>>>>>>>
>>>>>>>> This is still under investigation, but I was wondering if anyone
>>>> else
>>>>>>> has
>>>>>>>> seen similar thing happening?
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Andrija Panić
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Andrija Panić
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Andrija Panić
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Andrija Panić
>>>
>>
>
>
> --
>
> Andrija Panić

Re: [DISCUSS] VMs crashing/stopped during live migration?

Posted by Andrija Panic <an...@gmail.com>.
thx Wei - that should (so I was told) kill the DESTINATION VM on failed
migrations - i.e. perform cleanup - so that is OK?

On Fri, 22 Nov 2019 at 10:59, Wei ZHOU <us...@gmail.com> wrote:

> Hi Andrija,
>
> As I remember, it happened on our production few years ago.
>
> https://github.com/apache/cloudstack/blob/master/engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java#L2962-L2983
>
>
>  -Wei
>
> On Fri, 22 Nov 2019 at 09:34, Andrija Panic <an...@gmail.com>
> wrote:
>
> > Thx both, thx Wei - that sounds all interesting.
> >
> > as for "vm migration fails and no retry in cloudstack" - this should NOT
> > trigger stopping the VM - at least what I saw so far - simply host will
> be
> > in ErrorMaintenance - can you confirm VMs are not stopped in this case?
> >
> > On Fri, 22 Nov 2019 at 08:54, Wei ZHOU <us...@gmail.com> wrote:
> >
> > > Hi Andrija,
> > >
> > > We have faces some vm migration issues. There are three categories
> > actually
> > > 1. vm migration fails due to different hardware or software on source
> and
> > > destination hosts, for example, cpu models. vm will be still running on
> > > source hosts.
> > > you may find some errors in agent.log.
> > > 2. vm migration fails due to some libvirt/qemu bugs. you may find some
> > > errors in /var/log/libvirt/qemu/ folder (on ubuntu) on the source or
> > > destination host.
> > > mostly the vm will be still running on source host. In rare cases the
> vm
> > is
> > > stopped.
> > > 3. vm is stopped due to some cloudstack bugs. for example, when we put
> a
> > > host to maintenance, the vm will be stopped if (1) no other host is Up
> in
> > > same cluster, or (2) vm migration fails and no retry in cloudstack, or
> > (3)
> > > multiple vms are migrated to same destination at the same time but
> there
> > is
> > > no enough memory on the destination.
> > >
> > > We need to fix the issues mentioned in part 3 above in cloudstack.
> > >
> > > In Leaseweb, to improve the vm migration
> > > (1) we use custom cpu model , see
> > >
> > >
> >
> http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/master/hypervisor/kvm.html#configure-cpu-model-for-kvm-guest-optional
> > > (2) we have build our own qemu packages with some bug fixes for
> > > installation
> > > (3) we have some fixes in our fork from 4.7.1. We have not tested with
> > > 4.13/4.14.
> > > We still see failed vm migration sometimes. However the vms will not be
> > > stopped if migration fails.
> > >
> > > -Wei
> > >
> > > On Fri, 22 Nov 2019 at 01:54, Andrija Panic <an...@gmail.com>
> > > wrote:
> > >
> > > > ( @Sven, not being able to migrate Vm with ISO attached - don't
> recall
> > > > testing/doing that recently - but is technically perfectly possible,
> > > unless
> > > > we don't support it via CloudStack - feel free to open GitHub issue
> > with
> > > > correct steps to reproduce etc)
> > > >
> > > > On Fri, 22 Nov 2019 at 01:47, Andrija Panic <andrija.panic@gmail.com
> >
> > > > wrote:
> > > >
> > > > > That sucks...thx both.
> > > > >
> > > > > @both - which ACS version do you use (and encounter such issues?)
> > > > >
> > > > > Ubuntu comes with a whole another set of issues (I was losing my
> > nerves
> > > > > around very idiotic things, last time a week ago...) - though most
> > can
> > > be
> > > > > managed with some workarounds.
> > > > > But yes, Qemu/libvirt should be better with Ubuntu - free of RedHat
> > > > > s$^%tty business politics - i.e. in CentOS 6.x you were able to
> live
> > > > > migrate VM WITH all the volumes to another host/storage. On CentOS
> 7
> > > you
> > > > > can't do that any more, unless you are using qemu-kvm-ev (but not
> the
> > > > > regular one from the SIG CentOS repo, you need the one from the
> oVirt
> > > > > project)
> > > > >
> > > > > I'm just trying to understand if this is happening also on i.e. ACS
> > > 4.11
> > > > -
> > > > > so to stop digging around the problem (and assume it's purely
> CentOS
> > > > which
> > > > > is broken - why all great things need to come to an end...damn it)
> > > > >
> > > > > (well I could also test same ACS code on Ubuntu and see if no
> issues
> > > > there
> > > > > with live migrations..)
> > > > >
> > > > > Thanks
> > > > > Andrija
> > > > >
> > > > > On Thu, 21 Nov 2019 at 23:39, Jean-Francois Nadeau <
> > > > the.jfnadeau@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hi Andrija,
> > > > >>
> > > > >> We experienced that problem with stock packages on CentOS 7.4.
> > Live
> > > > >> migration would frequently fail and leave the VM dead.    We since
> > > moved
> > > > >> to
> > > > >> RHEV packages for qemu.  Libvirt is still stock per CentoS 7.6
> > (4.5).
> > > >  I
> > > > >> want to say the situation improved but I can't tell yet if we
> have a
> > > > 100%
> > > > >> success rate on live migrations (as it should be !)
> > > > >>
> > > > >> Redhat also have been messing up severely with stock  libvirt
> > versions
> > > > >> between 7.4/7.5/7.6 in such way it broke live migration
> > compatibility
> > > > (cpu
> > > > >> definitions).   Im at the crossroads right now to entirely ditch
> > > > >> centos/redhat in favor of Ubuntu to have well tested stock
> packages.
> > > > >>
> > > > >> best,
> > > > >>
> > > > >> -Jfn
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Thu, Nov 21, 2019 at 5:25 PM Andrija Panic <
> > > andrija.panic@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi guys.
> > > > >> >
> > > > >> > I wanted to see if any of you have seen similar/same in master,
> as
> > > > >> below.
> > > > >> >
> > > > >> > I've been testing some work/PRs (against the current master) and
> > > I've
> > > > >> seen
> > > > >> > that VMs will crash/be stopped occasionally when live migration
> is
> > > > >> > happening. I experienced this on an NEW/EMPTY env, with 2 KVM
> > hosts,
> > > > and
> > > > >> > only SSVM and CPVM - not a capacity issues or similar.
> > > > >> >
> > > > >> > This is happening with CentOS 7 (CentOS 7.3 I believe, but we
> also
> > > > >> updated
> > > > >> > packages to the latest stock ones and same issue was happening
> > > again).
> > > > >> >
> > > > >> > This is still under investigation, but I was wondering if anyone
> > > else
> > > > >> has
> > > > >> > seen similar thing happening?
> > > > >> >
> > > > >> > Best,
> > > > >> >
> > > > >> > --
> > > > >> >
> > > > >> > Andrija Panić
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Andrija Panić
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Andrija Panić
> > > >
> > >
> >
> >
> > --
> >
> > Andrija Panić
> >
>


-- 

Andrija Panić

Re: [DISCUSS] VMs crashing/stopped during live migration?

Posted by Wei ZHOU <us...@gmail.com>.
Hi Andrija,

As I remember, it happened on our production few years ago.
https://github.com/apache/cloudstack/blob/master/engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java#L2962-L2983


 -Wei

On Fri, 22 Nov 2019 at 09:34, Andrija Panic <an...@gmail.com> wrote:

> Thx both, thx Wei - that sounds all interesting.
>
> as for "vm migration fails and no retry in cloudstack" - this should NOT
> trigger stopping the VM - at least what I saw so far - simply host will be
> in ErrorMaintenance - can you confirm VMs are not stopped in this case?
>
> On Fri, 22 Nov 2019 at 08:54, Wei ZHOU <us...@gmail.com> wrote:
>
> > Hi Andrija,
> >
> > We have faces some vm migration issues. There are three categories
> actually
> > 1. vm migration fails due to different hardware or software on source and
> > destination hosts, for example, cpu models. vm will be still running on
> > source hosts.
> > you may find some errors in agent.log.
> > 2. vm migration fails due to some libvirt/qemu bugs. you may find some
> > errors in /var/log/libvirt/qemu/ folder (on ubuntu) on the source or
> > destination host.
> > mostly the vm will be still running on source host. In rare cases the vm
> is
> > stopped.
> > 3. vm is stopped due to some cloudstack bugs. for example, when we put a
> > host to maintenance, the vm will be stopped if (1) no other host is Up in
> > same cluster, or (2) vm migration fails and no retry in cloudstack, or
> (3)
> > multiple vms are migrated to same destination at the same time but there
> is
> > no enough memory on the destination.
> >
> > We need to fix the issues mentioned in part 3 above in cloudstack.
> >
> > In Leaseweb, to improve the vm migration
> > (1) we use custom cpu model , see
> >
> >
> http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/master/hypervisor/kvm.html#configure-cpu-model-for-kvm-guest-optional
> > (2) we have build our own qemu packages with some bug fixes for
> > installation
> > (3) we have some fixes in our fork from 4.7.1. We have not tested with
> > 4.13/4.14.
> > We still see failed vm migration sometimes. However the vms will not be
> > stopped if migration fails.
> >
> > -Wei
> >
> > On Fri, 22 Nov 2019 at 01:54, Andrija Panic <an...@gmail.com>
> > wrote:
> >
> > > ( @Sven, not being able to migrate Vm with ISO attached - don't recall
> > > testing/doing that recently - but is technically perfectly possible,
> > unless
> > > we don't support it via CloudStack - feel free to open GitHub issue
> with
> > > correct steps to reproduce etc)
> > >
> > > On Fri, 22 Nov 2019 at 01:47, Andrija Panic <an...@gmail.com>
> > > wrote:
> > >
> > > > That sucks...thx both.
> > > >
> > > > @both - which ACS version do you use (and encounter such issues?)
> > > >
> > > > Ubuntu comes with a whole another set of issues (I was losing my
> nerves
> > > > around very idiotic things, last time a week ago...) - though most
> can
> > be
> > > > managed with some workarounds.
> > > > But yes, Qemu/libvirt should be better with Ubuntu - free of RedHat
> > > > s$^%tty business politics - i.e. in CentOS 6.x you were able to live
> > > > migrate VM WITH all the volumes to another host/storage. On CentOS 7
> > you
> > > > can't do that any more, unless you are using qemu-kvm-ev (but not the
> > > > regular one from the SIG CentOS repo, you need the one from the oVirt
> > > > project)
> > > >
> > > > I'm just trying to understand if this is happening also on i.e. ACS
> > 4.11
> > > -
> > > > so to stop digging around the problem (and assume it's purely CentOS
> > > which
> > > > is broken - why all great things need to come to an end...damn it)
> > > >
> > > > (well I could also test same ACS code on Ubuntu and see if no issues
> > > there
> > > > with live migrations..)
> > > >
> > > > Thanks
> > > > Andrija
> > > >
> > > > On Thu, 21 Nov 2019 at 23:39, Jean-Francois Nadeau <
> > > the.jfnadeau@gmail.com>
> > > > wrote:
> > > >
> > > >> Hi Andrija,
> > > >>
> > > >> We experienced that problem with stock packages on CentOS 7.4.
> Live
> > > >> migration would frequently fail and leave the VM dead.    We since
> > moved
> > > >> to
> > > >> RHEV packages for qemu.  Libvirt is still stock per CentoS 7.6
> (4.5).
> > >  I
> > > >> want to say the situation improved but I can't tell yet if we have a
> > > 100%
> > > >> success rate on live migrations (as it should be !)
> > > >>
> > > >> Redhat also have been messing up severely with stock  libvirt
> versions
> > > >> between 7.4/7.5/7.6 in such way it broke live migration
> compatibility
> > > (cpu
> > > >> definitions).   Im at the crossroads right now to entirely ditch
> > > >> centos/redhat in favor of Ubuntu to have well tested stock packages.
> > > >>
> > > >> best,
> > > >>
> > > >> -Jfn
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Nov 21, 2019 at 5:25 PM Andrija Panic <
> > andrija.panic@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi guys.
> > > >> >
> > > >> > I wanted to see if any of you have seen similar/same in master, as
> > > >> below.
> > > >> >
> > > >> > I've been testing some work/PRs (against the current master) and
> > I've
> > > >> seen
> > > >> > that VMs will crash/be stopped occasionally when live migration is
> > > >> > happening. I experienced this on an NEW/EMPTY env, with 2 KVM
> hosts,
> > > and
> > > >> > only SSVM and CPVM - not a capacity issues or similar.
> > > >> >
> > > >> > This is happening with CentOS 7 (CentOS 7.3 I believe, but we also
> > > >> updated
> > > >> > packages to the latest stock ones and same issue was happening
> > again).
> > > >> >
> > > >> > This is still under investigation, but I was wondering if anyone
> > else
> > > >> has
> > > >> > seen similar thing happening?
> > > >> >
> > > >> > Best,
> > > >> >
> > > >> > --
> > > >> >
> > > >> > Andrija Panić
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > >
> > > > Andrija Panić
> > > >
> > >
> > >
> > > --
> > >
> > > Andrija Panić
> > >
> >
>
>
> --
>
> Andrija Panić
>

Re: [DISCUSS] VMs crashing/stopped during live migration?

Posted by Andrija Panic <an...@gmail.com>.
Thx both, thx Wei - that sounds all interesting.

as for "vm migration fails and no retry in cloudstack" - this should NOT
trigger stopping the VM - at least what I saw so far - simply host will be
in ErrorMaintenance - can you confirm VMs are not stopped in this case?

On Fri, 22 Nov 2019 at 08:54, Wei ZHOU <us...@gmail.com> wrote:

> Hi Andrija,
>
> We have faces some vm migration issues. There are three categories actually
> 1. vm migration fails due to different hardware or software on source and
> destination hosts, for example, cpu models. vm will be still running on
> source hosts.
> you may find some errors in agent.log.
> 2. vm migration fails due to some libvirt/qemu bugs. you may find some
> errors in /var/log/libvirt/qemu/ folder (on ubuntu) on the source or
> destination host.
> mostly the vm will be still running on source host. In rare cases the vm is
> stopped.
> 3. vm is stopped due to some cloudstack bugs. for example, when we put a
> host to maintenance, the vm will be stopped if (1) no other host is Up in
> same cluster, or (2) vm migration fails and no retry in cloudstack, or (3)
> multiple vms are migrated to same destination at the same time but there is
> no enough memory on the destination.
>
> We need to fix the issues mentioned in part 3 above in cloudstack.
>
> In Leaseweb, to improve the vm migration
> (1) we use custom cpu model , see
>
> http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/master/hypervisor/kvm.html#configure-cpu-model-for-kvm-guest-optional
> (2) we have build our own qemu packages with some bug fixes for
> installation
> (3) we have some fixes in our fork from 4.7.1. We have not tested with
> 4.13/4.14.
> We still see failed vm migration sometimes. However the vms will not be
> stopped if migration fails.
>
> -Wei
>
> On Fri, 22 Nov 2019 at 01:54, Andrija Panic <an...@gmail.com>
> wrote:
>
> > ( @Sven, not being able to migrate Vm with ISO attached - don't recall
> > testing/doing that recently - but is technically perfectly possible,
> unless
> > we don't support it via CloudStack - feel free to open GitHub issue with
> > correct steps to reproduce etc)
> >
> > On Fri, 22 Nov 2019 at 01:47, Andrija Panic <an...@gmail.com>
> > wrote:
> >
> > > That sucks...thx both.
> > >
> > > @both - which ACS version do you use (and encounter such issues?)
> > >
> > > Ubuntu comes with a whole another set of issues (I was losing my nerves
> > > around very idiotic things, last time a week ago...) - though most can
> be
> > > managed with some workarounds.
> > > But yes, Qemu/libvirt should be better with Ubuntu - free of RedHat
> > > s$^%tty business politics - i.e. in CentOS 6.x you were able to live
> > > migrate VM WITH all the volumes to another host/storage. On CentOS 7
> you
> > > can't do that any more, unless you are using qemu-kvm-ev (but not the
> > > regular one from the SIG CentOS repo, you need the one from the oVirt
> > > project)
> > >
> > > I'm just trying to understand if this is happening also on i.e. ACS
> 4.11
> > -
> > > so to stop digging around the problem (and assume it's purely CentOS
> > which
> > > is broken - why all great things need to come to an end...damn it)
> > >
> > > (well I could also test same ACS code on Ubuntu and see if no issues
> > there
> > > with live migrations..)
> > >
> > > Thanks
> > > Andrija
> > >
> > > On Thu, 21 Nov 2019 at 23:39, Jean-Francois Nadeau <
> > the.jfnadeau@gmail.com>
> > > wrote:
> > >
> > >> Hi Andrija,
> > >>
> > >> We experienced that problem with stock packages on CentOS 7.4.    Live
> > >> migration would frequently fail and leave the VM dead.    We since
> moved
> > >> to
> > >> RHEV packages for qemu.  Libvirt is still stock per CentoS 7.6 (4.5).
> >  I
> > >> want to say the situation improved but I can't tell yet if we have a
> > 100%
> > >> success rate on live migrations (as it should be !)
> > >>
> > >> Redhat also have been messing up severely with stock  libvirt versions
> > >> between 7.4/7.5/7.6 in such way it broke live migration compatibility
> > (cpu
> > >> definitions).   Im at the crossroads right now to entirely ditch
> > >> centos/redhat in favor of Ubuntu to have well tested stock packages.
> > >>
> > >> best,
> > >>
> > >> -Jfn
> > >>
> > >>
> > >>
> > >> On Thu, Nov 21, 2019 at 5:25 PM Andrija Panic <
> andrija.panic@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi guys.
> > >> >
> > >> > I wanted to see if any of you have seen similar/same in master, as
> > >> below.
> > >> >
> > >> > I've been testing some work/PRs (against the current master) and
> I've
> > >> seen
> > >> > that VMs will crash/be stopped occasionally when live migration is
> > >> > happening. I experienced this on an NEW/EMPTY env, with 2 KVM hosts,
> > and
> > >> > only SSVM and CPVM - not a capacity issues or similar.
> > >> >
> > >> > This is happening with CentOS 7 (CentOS 7.3 I believe, but we also
> > >> updated
> > >> > packages to the latest stock ones and same issue was happening
> again).
> > >> >
> > >> > This is still under investigation, but I was wondering if anyone
> else
> > >> has
> > >> > seen similar thing happening?
> > >> >
> > >> > Best,
> > >> >
> > >> > --
> > >> >
> > >> > Andrija Panić
> > >> >
> > >>
> > >
> > >
> > > --
> > >
> > > Andrija Panić
> > >
> >
> >
> > --
> >
> > Andrija Panić
> >
>


-- 

Andrija Panić

Re: [DISCUSS] VMs crashing/stopped during live migration?

Posted by Wei ZHOU <us...@gmail.com>.
Hi Andrija,

We have faces some vm migration issues. There are three categories actually
1. vm migration fails due to different hardware or software on source and
destination hosts, for example, cpu models. vm will be still running on
source hosts.
you may find some errors in agent.log.
2. vm migration fails due to some libvirt/qemu bugs. you may find some
errors in /var/log/libvirt/qemu/ folder (on ubuntu) on the source or
destination host.
mostly the vm will be still running on source host. In rare cases the vm is
stopped.
3. vm is stopped due to some cloudstack bugs. for example, when we put a
host to maintenance, the vm will be stopped if (1) no other host is Up in
same cluster, or (2) vm migration fails and no retry in cloudstack, or (3)
multiple vms are migrated to same destination at the same time but there is
no enough memory on the destination.

We need to fix the issues mentioned in part 3 above in cloudstack.

In Leaseweb, to improve the vm migration
(1) we use custom cpu model , see
http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/master/hypervisor/kvm.html#configure-cpu-model-for-kvm-guest-optional
(2) we have build our own qemu packages with some bug fixes for installation
(3) we have some fixes in our fork from 4.7.1. We have not tested with
4.13/4.14.
We still see failed vm migration sometimes. However the vms will not be
stopped if migration fails.

-Wei

On Fri, 22 Nov 2019 at 01:54, Andrija Panic <an...@gmail.com> wrote:

> ( @Sven, not being able to migrate Vm with ISO attached - don't recall
> testing/doing that recently - but is technically perfectly possible, unless
> we don't support it via CloudStack - feel free to open GitHub issue with
> correct steps to reproduce etc)
>
> On Fri, 22 Nov 2019 at 01:47, Andrija Panic <an...@gmail.com>
> wrote:
>
> > That sucks...thx both.
> >
> > @both - which ACS version do you use (and encounter such issues?)
> >
> > Ubuntu comes with a whole another set of issues (I was losing my nerves
> > around very idiotic things, last time a week ago...) - though most can be
> > managed with some workarounds.
> > But yes, Qemu/libvirt should be better with Ubuntu - free of RedHat
> > s$^%tty business politics - i.e. in CentOS 6.x you were able to live
> > migrate VM WITH all the volumes to another host/storage. On CentOS 7 you
> > can't do that any more, unless you are using qemu-kvm-ev (but not the
> > regular one from the SIG CentOS repo, you need the one from the oVirt
> > project)
> >
> > I'm just trying to understand if this is happening also on i.e. ACS 4.11
> -
> > so to stop digging around the problem (and assume it's purely CentOS
> which
> > is broken - why all great things need to come to an end...damn it)
> >
> > (well I could also test same ACS code on Ubuntu and see if no issues
> there
> > with live migrations..)
> >
> > Thanks
> > Andrija
> >
> > On Thu, 21 Nov 2019 at 23:39, Jean-Francois Nadeau <
> the.jfnadeau@gmail.com>
> > wrote:
> >
> >> Hi Andrija,
> >>
> >> We experienced that problem with stock packages on CentOS 7.4.    Live
> >> migration would frequently fail and leave the VM dead.    We since moved
> >> to
> >> RHEV packages for qemu.  Libvirt is still stock per CentoS 7.6 (4.5).
>  I
> >> want to say the situation improved but I can't tell yet if we have a
> 100%
> >> success rate on live migrations (as it should be !)
> >>
> >> Redhat also have been messing up severely with stock  libvirt versions
> >> between 7.4/7.5/7.6 in such way it broke live migration compatibility
> (cpu
> >> definitions).   Im at the crossroads right now to entirely ditch
> >> centos/redhat in favor of Ubuntu to have well tested stock packages.
> >>
> >> best,
> >>
> >> -Jfn
> >>
> >>
> >>
> >> On Thu, Nov 21, 2019 at 5:25 PM Andrija Panic <an...@gmail.com>
> >> wrote:
> >>
> >> > Hi guys.
> >> >
> >> > I wanted to see if any of you have seen similar/same in master, as
> >> below.
> >> >
> >> > I've been testing some work/PRs (against the current master) and I've
> >> seen
> >> > that VMs will crash/be stopped occasionally when live migration is
> >> > happening. I experienced this on an NEW/EMPTY env, with 2 KVM hosts,
> and
> >> > only SSVM and CPVM - not a capacity issues or similar.
> >> >
> >> > This is happening with CentOS 7 (CentOS 7.3 I believe, but we also
> >> updated
> >> > packages to the latest stock ones and same issue was happening again).
> >> >
> >> > This is still under investigation, but I was wondering if anyone else
> >> has
> >> > seen similar thing happening?
> >> >
> >> > Best,
> >> >
> >> > --
> >> >
> >> > Andrija Panić
> >> >
> >>
> >
> >
> > --
> >
> > Andrija Panić
> >
>
>
> --
>
> Andrija Panić
>

Re: [DISCUSS] VMs crashing/stopped during live migration?

Posted by Andrija Panic <an...@gmail.com>.
( @Sven, not being able to migrate Vm with ISO attached - don't recall
testing/doing that recently - but is technically perfectly possible, unless
we don't support it via CloudStack - feel free to open GitHub issue with
correct steps to reproduce etc)

On Fri, 22 Nov 2019 at 01:47, Andrija Panic <an...@gmail.com> wrote:

> That sucks...thx both.
>
> @both - which ACS version do you use (and encounter such issues?)
>
> Ubuntu comes with a whole another set of issues (I was losing my nerves
> around very idiotic things, last time a week ago...) - though most can be
> managed with some workarounds.
> But yes, Qemu/libvirt should be better with Ubuntu - free of RedHat
> s$^%tty business politics - i.e. in CentOS 6.x you were able to live
> migrate VM WITH all the volumes to another host/storage. On CentOS 7 you
> can't do that any more, unless you are using qemu-kvm-ev (but not the
> regular one from the SIG CentOS repo, you need the one from the oVirt
> project)
>
> I'm just trying to understand if this is happening also on i.e. ACS 4.11 -
> so to stop digging around the problem (and assume it's purely CentOS which
> is broken - why all great things need to come to an end...damn it)
>
> (well I could also test same ACS code on Ubuntu and see if no issues there
> with live migrations..)
>
> Thanks
> Andrija
>
> On Thu, 21 Nov 2019 at 23:39, Jean-Francois Nadeau <th...@gmail.com>
> wrote:
>
>> Hi Andrija,
>>
>> We experienced that problem with stock packages on CentOS 7.4.    Live
>> migration would frequently fail and leave the VM dead.    We since moved
>> to
>> RHEV packages for qemu.  Libvirt is still stock per CentoS 7.6 (4.5).   I
>> want to say the situation improved but I can't tell yet if we have a 100%
>> success rate on live migrations (as it should be !)
>>
>> Redhat also have been messing up severely with stock  libvirt versions
>> between 7.4/7.5/7.6 in such way it broke live migration compatibility (cpu
>> definitions).   Im at the crossroads right now to entirely ditch
>> centos/redhat in favor of Ubuntu to have well tested stock packages.
>>
>> best,
>>
>> -Jfn
>>
>>
>>
>> On Thu, Nov 21, 2019 at 5:25 PM Andrija Panic <an...@gmail.com>
>> wrote:
>>
>> > Hi guys.
>> >
>> > I wanted to see if any of you have seen similar/same in master, as
>> below.
>> >
>> > I've been testing some work/PRs (against the current master) and I've
>> seen
>> > that VMs will crash/be stopped occasionally when live migration is
>> > happening. I experienced this on an NEW/EMPTY env, with 2 KVM hosts, and
>> > only SSVM and CPVM - not a capacity issues or similar.
>> >
>> > This is happening with CentOS 7 (CentOS 7.3 I believe, but we also
>> updated
>> > packages to the latest stock ones and same issue was happening again).
>> >
>> > This is still under investigation, but I was wondering if anyone else
>> has
>> > seen similar thing happening?
>> >
>> > Best,
>> >
>> > --
>> >
>> > Andrija Panić
>> >
>>
>
>
> --
>
> Andrija Panić
>


-- 

Andrija Panić

Re: [DISCUSS] VMs crashing/stopped during live migration?

Posted by Jean-Francois Nadeau <th...@gmail.com>.
We saw the issue on both 4.9.3 and 4.11.2.   This seems to be a race in
libvirt itself  and was hit mostly when we put host in maintenance and 5
live migrations are processed in parallel.   I don't recall triggering the
bug migrating a single VM at a time.

On Thu, Nov 21, 2019 at 7:48 PM Andrija Panic <an...@gmail.com>
wrote:

> That sucks...thx both.
>
> @both - which ACS version do you use (and encounter such issues?)
>
> Ubuntu comes with a whole another set of issues (I was losing my nerves
> around very idiotic things, last time a week ago...) - though most can be
> managed with some workarounds.
> But yes, Qemu/libvirt should be better with Ubuntu - free of RedHat s$^%tty
> business politics - i.e. in CentOS 6.x you were able to live migrate VM
> WITH all the volumes to another host/storage. On CentOS 7 you can't do that
> any more, unless you are using qemu-kvm-ev (but not the regular one from
> the SIG CentOS repo, you need the one from the oVirt project)
>
> I'm just trying to understand if this is happening also on i.e. ACS 4.11 -
> so to stop digging around the problem (and assume it's purely CentOS which
> is broken - why all great things need to come to an end...damn it)
>
> (well I could also test same ACS code on Ubuntu and see if no issues there
> with live migrations..)
>
> Thanks
> Andrija
>
> On Thu, 21 Nov 2019 at 23:39, Jean-Francois Nadeau <the.jfnadeau@gmail.com
> >
> wrote:
>
> > Hi Andrija,
> >
> > We experienced that problem with stock packages on CentOS 7.4.    Live
> > migration would frequently fail and leave the VM dead.    We since moved
> to
> > RHEV packages for qemu.  Libvirt is still stock per CentoS 7.6 (4.5).   I
> > want to say the situation improved but I can't tell yet if we have a 100%
> > success rate on live migrations (as it should be !)
> >
> > Redhat also have been messing up severely with stock  libvirt versions
> > between 7.4/7.5/7.6 in such way it broke live migration compatibility
> (cpu
> > definitions).   Im at the crossroads right now to entirely ditch
> > centos/redhat in favor of Ubuntu to have well tested stock packages.
> >
> > best,
> >
> > -Jfn
> >
> >
> >
> > On Thu, Nov 21, 2019 at 5:25 PM Andrija Panic <an...@gmail.com>
> > wrote:
> >
> > > Hi guys.
> > >
> > > I wanted to see if any of you have seen similar/same in master, as
> below.
> > >
> > > I've been testing some work/PRs (against the current master) and I've
> > seen
> > > that VMs will crash/be stopped occasionally when live migration is
> > > happening. I experienced this on an NEW/EMPTY env, with 2 KVM hosts,
> and
> > > only SSVM and CPVM - not a capacity issues or similar.
> > >
> > > This is happening with CentOS 7 (CentOS 7.3 I believe, but we also
> > updated
> > > packages to the latest stock ones and same issue was happening again).
> > >
> > > This is still under investigation, but I was wondering if anyone else
> has
> > > seen similar thing happening?
> > >
> > > Best,
> > >
> > > --
> > >
> > > Andrija Panić
> > >
> >
>
>
> --
>
> Andrija Panić
>

Re: [DISCUSS] VMs crashing/stopped during live migration?

Posted by Andrija Panic <an...@gmail.com>.
That sucks...thx both.

@both - which ACS version do you use (and encounter such issues?)

Ubuntu comes with a whole another set of issues (I was losing my nerves
around very idiotic things, last time a week ago...) - though most can be
managed with some workarounds.
But yes, Qemu/libvirt should be better with Ubuntu - free of RedHat s$^%tty
business politics - i.e. in CentOS 6.x you were able to live migrate VM
WITH all the volumes to another host/storage. On CentOS 7 you can't do that
any more, unless you are using qemu-kvm-ev (but not the regular one from
the SIG CentOS repo, you need the one from the oVirt project)

I'm just trying to understand if this is happening also on i.e. ACS 4.11 -
so to stop digging around the problem (and assume it's purely CentOS which
is broken - why all great things need to come to an end...damn it)

(well I could also test same ACS code on Ubuntu and see if no issues there
with live migrations..)

Thanks
Andrija

On Thu, 21 Nov 2019 at 23:39, Jean-Francois Nadeau <th...@gmail.com>
wrote:

> Hi Andrija,
>
> We experienced that problem with stock packages on CentOS 7.4.    Live
> migration would frequently fail and leave the VM dead.    We since moved to
> RHEV packages for qemu.  Libvirt is still stock per CentoS 7.6 (4.5).   I
> want to say the situation improved but I can't tell yet if we have a 100%
> success rate on live migrations (as it should be !)
>
> Redhat also have been messing up severely with stock  libvirt versions
> between 7.4/7.5/7.6 in such way it broke live migration compatibility (cpu
> definitions).   Im at the crossroads right now to entirely ditch
> centos/redhat in favor of Ubuntu to have well tested stock packages.
>
> best,
>
> -Jfn
>
>
>
> On Thu, Nov 21, 2019 at 5:25 PM Andrija Panic <an...@gmail.com>
> wrote:
>
> > Hi guys.
> >
> > I wanted to see if any of you have seen similar/same in master, as below.
> >
> > I've been testing some work/PRs (against the current master) and I've
> seen
> > that VMs will crash/be stopped occasionally when live migration is
> > happening. I experienced this on an NEW/EMPTY env, with 2 KVM hosts, and
> > only SSVM and CPVM - not a capacity issues or similar.
> >
> > This is happening with CentOS 7 (CentOS 7.3 I believe, but we also
> updated
> > packages to the latest stock ones and same issue was happening again).
> >
> > This is still under investigation, but I was wondering if anyone else has
> > seen similar thing happening?
> >
> > Best,
> >
> > --
> >
> > Andrija Panić
> >
>


-- 

Andrija Panić

Re: [DISCUSS] VMs crashing/stopped during live migration?

Posted by Jean-Francois Nadeau <th...@gmail.com>.
Hi Andrija,

We experienced that problem with stock packages on CentOS 7.4.    Live
migration would frequently fail and leave the VM dead.    We since moved to
RHEV packages for qemu.  Libvirt is still stock per CentoS 7.6 (4.5).   I
want to say the situation improved but I can't tell yet if we have a 100%
success rate on live migrations (as it should be !)

Redhat also have been messing up severely with stock  libvirt versions
between 7.4/7.5/7.6 in such way it broke live migration compatibility (cpu
definitions).   Im at the crossroads right now to entirely ditch
centos/redhat in favor of Ubuntu to have well tested stock packages.

best,

-Jfn



On Thu, Nov 21, 2019 at 5:25 PM Andrija Panic <an...@gmail.com>
wrote:

> Hi guys.
>
> I wanted to see if any of you have seen similar/same in master, as below.
>
> I've been testing some work/PRs (against the current master) and I've seen
> that VMs will crash/be stopped occasionally when live migration is
> happening. I experienced this on an NEW/EMPTY env, with 2 KVM hosts, and
> only SSVM and CPVM - not a capacity issues or similar.
>
> This is happening with CentOS 7 (CentOS 7.3 I believe, but we also updated
> packages to the latest stock ones and same issue was happening again).
>
> This is still under investigation, but I was wondering if anyone else has
> seen similar thing happening?
>
> Best,
>
> --
>
> Andrija Panić
>

Re: [DISCUSS] VMs crashing/stopped during live migration?

Posted by Sven Vogel <S....@ewerk.com>.
Hi Andrija,

We use KVM heavy in production. We don’t encounter such a problem.

What we saw was the following.

1. If you use the maintenance mode and very often it’s not possible to mount a iso on the destination host. We don’t why this problem but normally that should not a problem. The main problem is that in maintenance mode virtual machines will be powered off if they can’t migrate. This is very ugly.

2. Normal migration should not power off virtual machines. I don’t understand why iso can not be mounted on dest host at the moment. This is not an nfs problem but we think a bug in CS. We don’t know where it comes.

We don’t encounter other failures at the moment.

Which work PR you are talking about?

Cheers

Sven


__

Sven Vogel
Teamlead Platform

EWERK DIGITAL GmbH
Brühl 24, D-04109 Leipzig
P +49 341 42649 - 99
F +49 341 42649 - 98
S.Vogel@ewerk.com
www.ewerk.com

Geschäftsführer:
Dr. Erik Wende, Hendrik Schubert, Frank Richter
Registergericht: Leipzig HRB 9065

Zertifiziert nach:
ISO/IEC 27001:2013
DIN EN ISO 9001:2015
DIN ISO/IEC 20000-1:2011

EWERK-Blog | LinkedIn | Xing | Twitter | Facebook

Auskünfte und Angebote per Mail sind freibleibend und unverbindlich.

Disclaimer Privacy:
Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter Dateien) ist vertraulich und nur für den Empfänger bestimmt. Sollten Sie nicht der bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung, Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. Bitte informieren Sie in diesem Fall unverzüglich den Absender und löschen Sie die E-Mail (einschließlich etwaiger beigefügter Dateien) von Ihrem System. Vielen Dank.

The contents of this e-mail (including any attachments) are confidential and may be legally privileged. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system. Thank you.
> Am 21.11.2019 um 23:25 schrieb Andrija Panic <an...@gmail.com>:
>
> Hi guys.
>
> I wanted to see if any of you have seen similar/same in master, as below.
>
> I've been testing some work/PRs (against the current master) and I've seen
> that VMs will crash/be stopped occasionally when live migration is
> happening. I experienced this on an NEW/EMPTY env, with 2 KVM hosts, and
> only SSVM and CPVM - not a capacity issues or similar.
>
> This is happening with CentOS 7 (CentOS 7.3 I believe, but we also updated
> packages to the latest stock ones and same issue was happening again).
>
> This is still under investigation, but I was wondering if anyone else has
> seen similar thing happening?
>
> Best,
>
> --
>
> Andrija Panić