You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cloudstack.apache.org by Rakesh Venkatesh <ww...@gmail.com> on 2019/10/28 13:26:41 UTC

Virtual machines volume lock manager

Hello Users


Recently we have seen cases where when the Vm migration fails, cloudstack
ends up running two instances of the same VM on different hypervisors. The
state will be "running" and not any other transition state. This will of
course lead to corruption of disk. Does CloudStack has any option of volume
locking so that two instances of the same VM wont be running?
Anyone else has faced this issue and found some solution to fix it?

We are thinking of using "virtlockd" of libvirt or implementing custom lock
mechanisms. There are some pros and cons of the both the solutions and i
want your feedback before proceeding further.

-- 
Thanks and regards
Rakesh venkatesh

Re: Virtual machines volume lock manager

Posted by Wei ZHOU <us...@gmail.com>.

We have similar discussion before actually.
see PR https://github.com/apache/cloudstack/pull/2722 and PR
https://github.com/apache/cloudstack/pull/2984
We have made similar changes as describe in PR 2722. It caused duplicated
vms.

The change in PR 2984 (same behavior in old cloudstack versions) is not
ideal for us.

-Wei




On Wed, 30 Oct 2019 at 13:46, Andrija Panic <an...@gmail.com> wrote:

> true, true... Forgot these cases while I was running KVM.
>
> Check if that VM is using a compute offering which is marked as "HA
> enabled" - and if YES< then Wei is 100% right (you can confirm this from
> logs - checking for info on starting that VM on specific hypervisor etc)
> THough, IF doing live migration, I assume it should play fair/nice with HA
> and HA should not kick in.
>
> Wei can you confirm if these 2 play together nice ^^^ ?
>
> Cheers
>
> On Wed, 30 Oct 2019 at 13:11, Wei ZHOU <us...@gmail.com> wrote:
>
> > Hi Rakesh,
> >
> > The duplicated VM is not caused by migration, but by HA.
> >
> > -Wei
> >
> > On Wed, 30 Oct 2019 at 11:31, Rakesh Venkatesh <
> www.rakeshv.com@gmail.com>
> > wrote:
> >
> > > Hi Andrija
> > >
> > >
> > > Sorry for the late reply.
> > >
> > > Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40
> > >
> > > Im not sure if ACS job failed or libvirt job as I didnt see into logs.
> > > Yes the vm will be in paused state during migration but after the
> failed
> > > migration, the same vm was in "running" state on two different
> > hypervisors.
> > > We wrote a script to find out how duplicated vm's are running and found
> > out
> > > that more than 5 vm's had this issue.
> > >
> > >
> > > On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic <andrija.panic@gmail.com
> >
> > > wrote:
> > >
> > > > I've been running KVM public cloud up to recently and have never seen
> > > such
> > > > behaviour.
> > > >
> > > > What versions (ACS, qemu, libvrit) are you running?
> > > >
> > > > How does the migration fail - ACS job - or libvirt job?
> > > > destination VM is by default always in PAUSED state, until the
> > migration
> > > is
> > > > finished - only then the destination VM (on the new host) will get
> > > RUNNING,
> > > > while previously pausing the original VM (on the old host).
> > > >
> > > > i,e.
> > > > phase1      source vm RUNNING, destination vm PAUSED (RAM content
> being
> > > > copied over... takes time...)
> > > > phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
> > > > content are migrated)
> > > > phase3      source vm destroyed, destination VM RUNNING.
> > > >
> > > > Andrija
> > > >
> > > > On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <
> > > www.rakeshv.com@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello Users
> > > > >
> > > > >
> > > > > Recently we have seen cases where when the Vm migration fails,
> > > cloudstack
> > > > > ends up running two instances of the same VM on different
> > hypervisors.
> > > > The
> > > > > state will be "running" and not any other transition state. This
> will
> > > of
> > > > > course lead to corruption of disk. Does CloudStack has any option
> of
> > > > volume
> > > > > locking so that two instances of the same VM wont be running?
> > > > > Anyone else has faced this issue and found some solution to fix it?
> > > > >
> > > > > We are thinking of using "virtlockd" of libvirt or implementing
> > custom
> > > > lock
> > > > > mechanisms. There are some pros and cons of the both the solutions
> > and
> > > i
> > > > > want your feedback before proceeding further.
> > > > >
> > > > > --
> > > > > Thanks and regards
> > > > > Rakesh venkatesh
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Andrija Panić
> > > >
> > >
> > >
> > > --
> > > Thanks and regards
> > > Rakesh venkatesh
> > >
> >
>
>
> --
>
> Andrija Panić
>

Re: Virtual machines volume lock manager

Posted by Andrija Panic <an...@gmail.com>.

true, true... Forgot these cases while I was running KVM.

Check if that VM is using a compute offering which is marked as "HA
enabled" - and if YES< then Wei is 100% right (you can confirm this from
logs - checking for info on starting that VM on specific hypervisor etc)
THough, IF doing live migration, I assume it should play fair/nice with HA
and HA should not kick in.

Wei can you confirm if these 2 play together nice ^^^ ?

Cheers

On Wed, 30 Oct 2019 at 13:11, Wei ZHOU <us...@gmail.com> wrote:

> Hi Rakesh,
>
> The duplicated VM is not caused by migration, but by HA.
>
> -Wei
>
> On Wed, 30 Oct 2019 at 11:31, Rakesh Venkatesh <ww...@gmail.com>
> wrote:
>
> > Hi Andrija
> >
> >
> > Sorry for the late reply.
> >
> > Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40
> >
> > Im not sure if ACS job failed or libvirt job as I didnt see into logs.
> > Yes the vm will be in paused state during migration but after the failed
> > migration, the same vm was in "running" state on two different
> hypervisors.
> > We wrote a script to find out how duplicated vm's are running and found
> out
> > that more than 5 vm's had this issue.
> >
> >
> > On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic <an...@gmail.com>
> > wrote:
> >
> > > I've been running KVM public cloud up to recently and have never seen
> > such
> > > behaviour.
> > >
> > > What versions (ACS, qemu, libvrit) are you running?
> > >
> > > How does the migration fail - ACS job - or libvirt job?
> > > destination VM is by default always in PAUSED state, until the
> migration
> > is
> > > finished - only then the destination VM (on the new host) will get
> > RUNNING,
> > > while previously pausing the original VM (on the old host).
> > >
> > > i,e.
> > > phase1      source vm RUNNING, destination vm PAUSED (RAM content being
> > > copied over... takes time...)
> > > phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
> > > content are migrated)
> > > phase3      source vm destroyed, destination VM RUNNING.
> > >
> > > Andrija
> > >
> > > On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <
> > www.rakeshv.com@gmail.com>
> > > wrote:
> > >
> > > > Hello Users
> > > >
> > > >
> > > > Recently we have seen cases where when the Vm migration fails,
> > cloudstack
> > > > ends up running two instances of the same VM on different
> hypervisors.
> > > The
> > > > state will be "running" and not any other transition state. This will
> > of
> > > > course lead to corruption of disk. Does CloudStack has any option of
> > > volume
> > > > locking so that two instances of the same VM wont be running?
> > > > Anyone else has faced this issue and found some solution to fix it?
> > > >
> > > > We are thinking of using "virtlockd" of libvirt or implementing
> custom
> > > lock
> > > > mechanisms. There are some pros and cons of the both the solutions
> and
> > i
> > > > want your feedback before proceeding further.
> > > >
> > > > --
> > > > Thanks and regards
> > > > Rakesh venkatesh
> > > >
> > >
> > >
> > > --
> > >
> > > Andrija Panić
> > >
> >
> >
> > --
> > Thanks and regards
> > Rakesh venkatesh
> >
>


-- 

Andrija Panić

Re: Virtual machines volume lock manager

Posted by Wei ZHOU <us...@gmail.com>.

Hi Rakesh,

The duplicated VM is not caused by migration, but by HA.

-Wei

On Wed, 30 Oct 2019 at 11:31, Rakesh Venkatesh <ww...@gmail.com>
wrote:

> Hi Andrija
>
>
> Sorry for the late reply.
>
> Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40
>
> Im not sure if ACS job failed or libvirt job as I didnt see into logs.
> Yes the vm will be in paused state during migration but after the failed
> migration, the same vm was in "running" state on two different hypervisors.
> We wrote a script to find out how duplicated vm's are running and found out
> that more than 5 vm's had this issue.
>
>
> On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic <an...@gmail.com>
> wrote:
>
> > I've been running KVM public cloud up to recently and have never seen
> such
> > behaviour.
> >
> > What versions (ACS, qemu, libvrit) are you running?
> >
> > How does the migration fail - ACS job - or libvirt job?
> > destination VM is by default always in PAUSED state, until the migration
> is
> > finished - only then the destination VM (on the new host) will get
> RUNNING,
> > while previously pausing the original VM (on the old host).
> >
> > i,e.
> > phase1      source vm RUNNING, destination vm PAUSED (RAM content being
> > copied over... takes time...)
> > phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
> > content are migrated)
> > phase3      source vm destroyed, destination VM RUNNING.
> >
> > Andrija
> >
> > On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <
> www.rakeshv.com@gmail.com>
> > wrote:
> >
> > > Hello Users
> > >
> > >
> > > Recently we have seen cases where when the Vm migration fails,
> cloudstack
> > > ends up running two instances of the same VM on different hypervisors.
> > The
> > > state will be "running" and not any other transition state. This will
> of
> > > course lead to corruption of disk. Does CloudStack has any option of
> > volume
> > > locking so that two instances of the same VM wont be running?
> > > Anyone else has faced this issue and found some solution to fix it?
> > >
> > > We are thinking of using "virtlockd" of libvirt or implementing custom
> > lock
> > > mechanisms. There are some pros and cons of the both the solutions and
> i
> > > want your feedback before proceeding further.
> > >
> > > --
> > > Thanks and regards
> > > Rakesh venkatesh
> > >
> >
> >
> > --
> >
> > Andrija Panić
> >
>
>
> --
> Thanks and regards
> Rakesh venkatesh
>

RE: Virtual machines volume lock manager

Posted by Sean Lair <sl...@ippathways.com>.

Are you using NFS?

Yea, we implmented locking because of that problem:

https://libvirt.org/locking-lockd.html

echo lock_manager = \"lockd\" >> /etc/libvirt/qemu.conf

-----Original Message-----
From: Andrija Panic <an...@gmail.com> 
Sent: Wednesday, October 30, 2019 6:55 AM
To: dev <de...@cloudstack.apache.org>
Cc: users <us...@cloudstack.apache.org>
Subject: Re: Virtual machines volume lock manager

I would advise trying to reproduce.

start migration, then either:
- configure timeout so that it''s way too low, so that migration fails due to timeouts.
- restart mgmt server in the middle of migrations This should cause migration to fail - and you can observe if you have reproduced the problem.
keep in mind, that there might be some garbage left, due to not-properly handling the failed migration But from QEMU point of view - if migration fails, by all means the new VM should be destroyed...



On Wed, 30 Oct 2019 at 11:31, Rakesh Venkatesh <ht...@gmail.com>
wrote:

> Hi Andrija
>
>
> Sorry for the late reply.
>
> Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40
>
> Im not sure if ACS job failed or libvirt job as I didnt see into logs.
> Yes the vm will be in paused state during migration but after the 
> failed migration, the same vm was in "running" state on two different hypervisors.
> We wrote a script to find out how duplicated vm's are running and 
> found out that more than 5 vm's had this issue.
>
>
> On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic 
> <an...@gmail.com>
> wrote:
>
> > I've been running KVM public cloud up to recently and have never 
> > seen
> such
> > behaviour.
> >
> > What versions (ACS, qemu, libvrit) are you running?
> >
> > How does the migration fail - ACS job - or libvirt job?
> > destination VM is by default always in PAUSED state, until the 
> > migration
> is
> > finished - only then the destination VM (on the new host) will get
> RUNNING,
> > while previously pausing the original VM (on the old host).
> >
> > i,e.
> > phase1      source vm RUNNING, destination vm PAUSED (RAM content being
> > copied over... takes time...)
> > phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
> > content are migrated)
> > phase3      source vm destroyed, destination VM RUNNING.
> >
> > Andrija
> >
> > On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <
> http://sea.ippathways.com:32224/?dmVyPTEuMDAxJiYzM2ZmODRmOWFhMzdmZmQ1O
> T01REI5N0ExQV84NTE5N18yMDM4OV8xJiZjZjE2YzBlNTI0N2VmMjM9MTIzMyYmdXJsPXd
> 3dyUyRXJha2VzaHYlMkVjb20=@gmail.com>
> > wrote:
> >
> > > Hello Users
> > >
> > >
> > > Recently we have seen cases where when the Vm migration fails,
> cloudstack
> > > ends up running two instances of the same VM on different hypervisors.
> > The
> > > state will be "running" and not any other transition state. This 
> > > will
> of
> > > course lead to corruption of disk. Does CloudStack has any option 
> > > of
> > volume
> > > locking so that two instances of the same VM wont be running?
> > > Anyone else has faced this issue and found some solution to fix it?
> > >
> > > We are thinking of using "virtlockd" of libvirt or implementing 
> > > custom
> > lock
> > > mechanisms. There are some pros and cons of the both the solutions 
> > > and
> i
> > > want your feedback before proceeding further.
> > >
> > > --
> > > Thanks and regards
> > > Rakesh venkatesh
> > >
> >
> >
> > --
> >
> > Andrija Panić
> >
>
>
> --
> Thanks and regards
> Rakesh venkatesh
>


-- 

Andrija Panić

RE: Virtual machines volume lock manager

Posted by Sean Lair <sl...@ippathways.com>.

Are you using NFS?

Yea, we implmented locking because of that problem:

https://libvirt.org/locking-lockd.html

echo lock_manager = \"lockd\" >> /etc/libvirt/qemu.conf

-----Original Message-----
From: Andrija Panic <an...@gmail.com> 
Sent: Wednesday, October 30, 2019 6:55 AM
To: dev <de...@cloudstack.apache.org>
Cc: users <us...@cloudstack.apache.org>
Subject: Re: Virtual machines volume lock manager

I would advise trying to reproduce.

start migration, then either:
- configure timeout so that it''s way too low, so that migration fails due to timeouts.
- restart mgmt server in the middle of migrations This should cause migration to fail - and you can observe if you have reproduced the problem.
keep in mind, that there might be some garbage left, due to not-properly handling the failed migration But from QEMU point of view - if migration fails, by all means the new VM should be destroyed...



On Wed, 30 Oct 2019 at 11:31, Rakesh Venkatesh <ht...@gmail.com>
wrote:

> Hi Andrija
>
>
> Sorry for the late reply.
>
> Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40
>
> Im not sure if ACS job failed or libvirt job as I didnt see into logs.
> Yes the vm will be in paused state during migration but after the 
> failed migration, the same vm was in "running" state on two different hypervisors.
> We wrote a script to find out how duplicated vm's are running and 
> found out that more than 5 vm's had this issue.
>
>
> On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic 
> <an...@gmail.com>
> wrote:
>
> > I've been running KVM public cloud up to recently and have never 
> > seen
> such
> > behaviour.
> >
> > What versions (ACS, qemu, libvrit) are you running?
> >
> > How does the migration fail - ACS job - or libvirt job?
> > destination VM is by default always in PAUSED state, until the 
> > migration
> is
> > finished - only then the destination VM (on the new host) will get
> RUNNING,
> > while previously pausing the original VM (on the old host).
> >
> > i,e.
> > phase1      source vm RUNNING, destination vm PAUSED (RAM content being
> > copied over... takes time...)
> > phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
> > content are migrated)
> > phase3      source vm destroyed, destination VM RUNNING.
> >
> > Andrija
> >
> > On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <
> http://sea.ippathways.com:32224/?dmVyPTEuMDAxJiYzM2ZmODRmOWFhMzdmZmQ1O
> T01REI5N0ExQV84NTE5N18yMDM4OV8xJiZjZjE2YzBlNTI0N2VmMjM9MTIzMyYmdXJsPXd
> 3dyUyRXJha2VzaHYlMkVjb20=@gmail.com>
> > wrote:
> >
> > > Hello Users
> > >
> > >
> > > Recently we have seen cases where when the Vm migration fails,
> cloudstack
> > > ends up running two instances of the same VM on different hypervisors.
> > The
> > > state will be "running" and not any other transition state. This 
> > > will
> of
> > > course lead to corruption of disk. Does CloudStack has any option 
> > > of
> > volume
> > > locking so that two instances of the same VM wont be running?
> > > Anyone else has faced this issue and found some solution to fix it?
> > >
> > > We are thinking of using "virtlockd" of libvirt or implementing 
> > > custom
> > lock
> > > mechanisms. There are some pros and cons of the both the solutions 
> > > and
> i
> > > want your feedback before proceeding further.
> > >
> > > --
> > > Thanks and regards
> > > Rakesh venkatesh
> > >
> >
> >
> > --
> >
> > Andrija Panić
> >
>
>
> --
> Thanks and regards
> Rakesh venkatesh
>


-- 

Andrija Panić

Re: Virtual machines volume lock manager

Posted by Andrija Panic <an...@gmail.com>.

I would advise trying to reproduce.

start migration, then either:
- configure timeout so that it''s way too low, so that migration fails due
to timeouts.
- restart mgmt server in the middle of migrations
This should cause migration to fail - and you can observe if you have
reproduced the problem.
keep in mind, that there might be some garbage left, due to not-properly
handling the failed migration
But from QEMU point of view - if migration fails, by all means the new VM
should be destroyed...



On Wed, 30 Oct 2019 at 11:31, Rakesh Venkatesh <ww...@gmail.com>
wrote:

> Hi Andrija
>
>
> Sorry for the late reply.
>
> Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40
>
> Im not sure if ACS job failed or libvirt job as I didnt see into logs.
> Yes the vm will be in paused state during migration but after the failed
> migration, the same vm was in "running" state on two different hypervisors.
> We wrote a script to find out how duplicated vm's are running and found out
> that more than 5 vm's had this issue.
>
>
> On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic <an...@gmail.com>
> wrote:
>
> > I've been running KVM public cloud up to recently and have never seen
> such
> > behaviour.
> >
> > What versions (ACS, qemu, libvrit) are you running?
> >
> > How does the migration fail - ACS job - or libvirt job?
> > destination VM is by default always in PAUSED state, until the migration
> is
> > finished - only then the destination VM (on the new host) will get
> RUNNING,
> > while previously pausing the original VM (on the old host).
> >
> > i,e.
> > phase1      source vm RUNNING, destination vm PAUSED (RAM content being
> > copied over... takes time...)
> > phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
> > content are migrated)
> > phase3      source vm destroyed, destination VM RUNNING.
> >
> > Andrija
> >
> > On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <
> www.rakeshv.com@gmail.com>
> > wrote:
> >
> > > Hello Users
> > >
> > >
> > > Recently we have seen cases where when the Vm migration fails,
> cloudstack
> > > ends up running two instances of the same VM on different hypervisors.
> > The
> > > state will be "running" and not any other transition state. This will
> of
> > > course lead to corruption of disk. Does CloudStack has any option of
> > volume
> > > locking so that two instances of the same VM wont be running?
> > > Anyone else has faced this issue and found some solution to fix it?
> > >
> > > We are thinking of using "virtlockd" of libvirt or implementing custom
> > lock
> > > mechanisms. There are some pros and cons of the both the solutions and
> i
> > > want your feedback before proceeding further.
> > >
> > > --
> > > Thanks and regards
> > > Rakesh venkatesh
> > >
> >
> >
> > --
> >
> > Andrija Panić
> >
>
>
> --
> Thanks and regards
> Rakesh venkatesh
>


-- 

Andrija Panić

Re: Virtual machines volume lock manager

Posted by Andrija Panic <an...@gmail.com>.

I would advise trying to reproduce.

start migration, then either:
- configure timeout so that it''s way too low, so that migration fails due
to timeouts.
- restart mgmt server in the middle of migrations
This should cause migration to fail - and you can observe if you have
reproduced the problem.
keep in mind, that there might be some garbage left, due to not-properly
handling the failed migration
But from QEMU point of view - if migration fails, by all means the new VM
should be destroyed...



On Wed, 30 Oct 2019 at 11:31, Rakesh Venkatesh <ww...@gmail.com>
wrote:

> Hi Andrija
>
>
> Sorry for the late reply.
>
> Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40
>
> Im not sure if ACS job failed or libvirt job as I didnt see into logs.
> Yes the vm will be in paused state during migration but after the failed
> migration, the same vm was in "running" state on two different hypervisors.
> We wrote a script to find out how duplicated vm's are running and found out
> that more than 5 vm's had this issue.
>
>
> On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic <an...@gmail.com>
> wrote:
>
> > I've been running KVM public cloud up to recently and have never seen
> such
> > behaviour.
> >
> > What versions (ACS, qemu, libvrit) are you running?
> >
> > How does the migration fail - ACS job - or libvirt job?
> > destination VM is by default always in PAUSED state, until the migration
> is
> > finished - only then the destination VM (on the new host) will get
> RUNNING,
> > while previously pausing the original VM (on the old host).
> >
> > i,e.
> > phase1      source vm RUNNING, destination vm PAUSED (RAM content being
> > copied over... takes time...)
> > phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
> > content are migrated)
> > phase3      source vm destroyed, destination VM RUNNING.
> >
> > Andrija
> >
> > On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <
> www.rakeshv.com@gmail.com>
> > wrote:
> >
> > > Hello Users
> > >
> > >
> > > Recently we have seen cases where when the Vm migration fails,
> cloudstack
> > > ends up running two instances of the same VM on different hypervisors.
> > The
> > > state will be "running" and not any other transition state. This will
> of
> > > course lead to corruption of disk. Does CloudStack has any option of
> > volume
> > > locking so that two instances of the same VM wont be running?
> > > Anyone else has faced this issue and found some solution to fix it?
> > >
> > > We are thinking of using "virtlockd" of libvirt or implementing custom
> > lock
> > > mechanisms. There are some pros and cons of the both the solutions and
> i
> > > want your feedback before proceeding further.
> > >
> > > --
> > > Thanks and regards
> > > Rakesh venkatesh
> > >
> >
> >
> > --
> >
> > Andrija Panić
> >
>
>
> --
> Thanks and regards
> Rakesh venkatesh
>


-- 

Andrija Panić

Re: Virtual machines volume lock manager

Posted by Rakesh Venkatesh <ww...@gmail.com>.

Hi Andrija


Sorry for the late reply.

Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40

Im not sure if ACS job failed or libvirt job as I didnt see into logs.
Yes the vm will be in paused state during migration but after the failed
migration, the same vm was in "running" state on two different hypervisors.
We wrote a script to find out how duplicated vm's are running and found out
that more than 5 vm's had this issue.


On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic <an...@gmail.com>
wrote:

> I've been running KVM public cloud up to recently and have never seen such
> behaviour.
>
> What versions (ACS, qemu, libvrit) are you running?
>
> How does the migration fail - ACS job - or libvirt job?
> destination VM is by default always in PAUSED state, until the migration is
> finished - only then the destination VM (on the new host) will get RUNNING,
> while previously pausing the original VM (on the old host).
>
> i,e.
> phase1      source vm RUNNING, destination vm PAUSED (RAM content being
> copied over... takes time...)
> phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
> content are migrated)
> phase3      source vm destroyed, destination VM RUNNING.
>
> Andrija
>
> On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <ww...@gmail.com>
> wrote:
>
> > Hello Users
> >
> >
> > Recently we have seen cases where when the Vm migration fails, cloudstack
> > ends up running two instances of the same VM on different hypervisors.
> The
> > state will be "running" and not any other transition state. This will of
> > course lead to corruption of disk. Does CloudStack has any option of
> volume
> > locking so that two instances of the same VM wont be running?
> > Anyone else has faced this issue and found some solution to fix it?
> >
> > We are thinking of using "virtlockd" of libvirt or implementing custom
> lock
> > mechanisms. There are some pros and cons of the both the solutions and i
> > want your feedback before proceeding further.
> >
> > --
> > Thanks and regards
> > Rakesh venkatesh
> >
>
>
> --
>
> Andrija Panić
>


-- 
Thanks and regards
Rakesh venkatesh

Re: Virtual machines volume lock manager

Posted by Rakesh Venkatesh <ww...@gmail.com>.

Hi Andrija


Sorry for the late reply.

Im using 4.7 version of ACS. Qemu version 1:2.5+dfsg-5ubuntu10.40

Im not sure if ACS job failed or libvirt job as I didnt see into logs.
Yes the vm will be in paused state during migration but after the failed
migration, the same vm was in "running" state on two different hypervisors.
We wrote a script to find out how duplicated vm's are running and found out
that more than 5 vm's had this issue.


On Mon, Oct 28, 2019 at 2:42 PM Andrija Panic <an...@gmail.com>
wrote:

> I've been running KVM public cloud up to recently and have never seen such
> behaviour.
>
> What versions (ACS, qemu, libvrit) are you running?
>
> How does the migration fail - ACS job - or libvirt job?
> destination VM is by default always in PAUSED state, until the migration is
> finished - only then the destination VM (on the new host) will get RUNNING,
> while previously pausing the original VM (on the old host).
>
> i,e.
> phase1      source vm RUNNING, destination vm PAUSED (RAM content being
> copied over... takes time...)
> phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
> content are migrated)
> phase3      source vm destroyed, destination VM RUNNING.
>
> Andrija
>
> On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <ww...@gmail.com>
> wrote:
>
> > Hello Users
> >
> >
> > Recently we have seen cases where when the Vm migration fails, cloudstack
> > ends up running two instances of the same VM on different hypervisors.
> The
> > state will be "running" and not any other transition state. This will of
> > course lead to corruption of disk. Does CloudStack has any option of
> volume
> > locking so that two instances of the same VM wont be running?
> > Anyone else has faced this issue and found some solution to fix it?
> >
> > We are thinking of using "virtlockd" of libvirt or implementing custom
> lock
> > mechanisms. There are some pros and cons of the both the solutions and i
> > want your feedback before proceeding further.
> >
> > --
> > Thanks and regards
> > Rakesh venkatesh
> >
>
>
> --
>
> Andrija Panić
>


-- 
Thanks and regards
Rakesh venkatesh

Re: Virtual machines volume lock manager

Posted by Andrija Panic <an...@gmail.com>.

I've been running KVM public cloud up to recently and have never seen such
behaviour.

What versions (ACS, qemu, libvrit) are you running?

How does the migration fail - ACS job - or libvirt job?
destination VM is by default always in PAUSED state, until the migration is
finished - only then the destination VM (on the new host) will get RUNNING,
while previously pausing the original VM (on the old host).

i,e.
phase1      source vm RUNNING, destination vm PAUSED (RAM content being
copied over... takes time...)
phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
content are migrated)
phase3      source vm destroyed, destination VM RUNNING.

Andrija

On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <ww...@gmail.com>
wrote:

> Hello Users
>
>
> Recently we have seen cases where when the Vm migration fails, cloudstack
> ends up running two instances of the same VM on different hypervisors. The
> state will be "running" and not any other transition state. This will of
> course lead to corruption of disk. Does CloudStack has any option of volume
> locking so that two instances of the same VM wont be running?
> Anyone else has faced this issue and found some solution to fix it?
>
> We are thinking of using "virtlockd" of libvirt or implementing custom lock
> mechanisms. There are some pros and cons of the both the solutions and i
> want your feedback before proceeding further.
>
> --
> Thanks and regards
> Rakesh venkatesh
>

-- 

Andrija Panić

Re: Virtual machines volume lock manager

Posted by Andrija Panic <an...@gmail.com>.

I've been running KVM public cloud up to recently and have never seen such
behaviour.

What versions (ACS, qemu, libvrit) are you running?

How does the migration fail - ACS job - or libvirt job?
destination VM is by default always in PAUSED state, until the migration is
finished - only then the destination VM (on the new host) will get RUNNING,
while previously pausing the original VM (on the old host).

i,e.
phase1      source vm RUNNING, destination vm PAUSED (RAM content being
copied over... takes time...)
phase2      source vm PAUSED, destination vm PAUSED (last bits of RAM
content are migrated)
phase3      source vm destroyed, destination VM RUNNING.

Andrija

On Mon, 28 Oct 2019 at 14:26, Rakesh Venkatesh <ww...@gmail.com>
wrote:

> Hello Users
>
>
> Recently we have seen cases where when the Vm migration fails, cloudstack
> ends up running two instances of the same VM on different hypervisors. The
> state will be "running" and not any other transition state. This will of
> course lead to corruption of disk. Does CloudStack has any option of volume
> locking so that two instances of the same VM wont be running?
> Anyone else has faced this issue and found some solution to fix it?
>
> We are thinking of using "virtlockd" of libvirt or implementing custom lock
> mechanisms. There are some pros and cons of the both the solutions and i
> want your feedback before proceeding further.
>
> --
> Thanks and regards
> Rakesh venkatesh
>

-- 

Andrija Panić