You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Edison Su <Ed...@citrix.com> on 2013/10/05 01:57:25 UTC

[DISCUSS] Pluggable VM snapshot related operations?

In 4.2, we added VM snapshot for Vmware/Xenserver. The current workflow will be like the following:
createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot -> send CreateVMSnapshotCommand to hypervisor to create vm snapshot.

If anybody wants to change the workflow, then need to either change VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl. Both are not the ideal choice, as VMSnapshotManagerImpl should be able to handle different ways to take vm snapshot, instead of hard code.

The requirements for the pluggable VM snapshot coming from:
Storage vendor may have their optimization, such as NetApp.
VM snapshot can be implemented in a totally different way(For example, I could just send a command to guest VM, to tell my application to flush disk and hold disk write, then come to hypervisor to take a volume snapshot). 

If we agree on enable pluggable VM snapshot, then we can move on discuss how to implement it.

The possible options:
1. coarse grained interface. Add a VMSnapshotStrategy interface, which has the following interfaces:
    VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
    Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
    Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);

   The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
   VMSnapshotManagerImpl will manage VM state, do the sanity check, then will handle over to VMSnapshotStrategy. 
   In VMSnapshotStrategy implementation, it may just send a Create/revert/delete VMSnapshotCommand to hypervisor host, or do anything special operations.

2. fine-grained interface. Not only add a VMSnapshotStrategy interface, but also add certain methods on the storage driver.
    The VMSnapshotStrategy interface will be the same as option 1.
    Will add the following methods on storage driver:
   /* volumesBelongToVM  is the list of volumes of the VM that created on this storage, storage vendor can either take one snapshot for this volumes in one shot, or take snapshot for each volume separately
       The pre-condition: vm is unquiesced. 
       It will return a Boolean to indicate, do need unquiesce vm or not.
       In the default storage driver, it will return false.
    */
    boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM, VMSnapshot vmSnapshot);  
    Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM, VMSnapshot vmSnapshot);
   Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM, VMSnapshot vmSNapshot);

The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage driver:takeVMSnapshot
 In the implementation of VMSnapshotStrategy's takeVMSnapshot, the pseudo code looks like:
       HypervisorHelper.quiesceVM(vm);
       val volumes = vm.getVolumes();
       val maps = new Map[driver, list[VolumeInfo]]();
       Volumes.foreach(volume => maps.put(volume.getDriver, volume :: maps.get(volume.getdriver())))
       val needUnquiesce = true;
        maps.foreach((driver, volumes) => needUnquiesce  = needUnquiesce  && driver.takeVMSnapshot(volumes))
      if (needUnquiesce ) {
       HypervisorHelper.unquiesce(vm);
    } 

By default, the quiesceVM in HypervisorHelper will actually take vm snapshot through hypervisor. 
Does above logic makes senesce?

The pros of option 1 is that: it's simple, no need to change storage driver interfaces. The cons is that each storage vendor need to implement a strategy, maybe they will do the same thing.
The pros of option 2 is that, storage driver won't need to worry about how to quiesce/unquiesce vm. The cons is that, it will add these methods on each storage drivers, so it assumes that this work flow will work for everybody.

So which option we should take? Or if you have other options, please let's know. 



   


Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Mike Tutkowski <mi...@solidfire.com>.
I agree...I think #2 is a good option, as well.


On Mon, Oct 7, 2013 at 11:01 AM, SuichII, Christopher <
Chris.Suich@netapp.com> wrote:

> I'm a fan of option 2 - this gives us the most flexibility (as you
> stated). The option is given to completely override the way VM snapshots
> work AND storage providers are given to opportunity to work within the
> default VM snapshot workflow.
>
> I believe this option should satisfy your concern, Mike. The snapshot and
> quiesce strategy would be in charge of communicating with the hypervisor.
> Storage providers should be able to leverage the default strategies and
> simply perform the storage operations.
>
> I don't think it should be much of an issue that new method to the storage
> driver interface may not apply to everyone. In fact, that is already the
> case. Some methods such as un/maintain(), attachToXXX() and takeSnapshot()
> are already not implemented by every driver - they just return false when
> asked if they can handle the operation.
>
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski <mi...@solidfire.com>
> wrote:
>
> > Well, my first thought on this is that the storage driver should not be
> > telling the hypervisor to do anything. It should be responsible for
> > creating/deleting volumes, snapshots, etc. on its storage system only.
> >
> >
> > On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Ed...@citrix.com> wrote:
> >
> >> In 4.2, we added VM snapshot for Vmware/Xenserver. The current workflow
> >> will be like the following:
> >> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot -> send
> >> CreateVMSnapshotCommand to hypervisor to create vm snapshot.
> >>
> >> If anybody wants to change the workflow, then need to either change
> >> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl. Both
> are
> >> not the ideal choice, as VMSnapshotManagerImpl should be able to handle
> >> different ways to take vm snapshot, instead of hard code.
> >>
> >> The requirements for the pluggable VM snapshot coming from:
> >> Storage vendor may have their optimization, such as NetApp.
> >> VM snapshot can be implemented in a totally different way(For example, I
> >> could just send a command to guest VM, to tell my application to flush
> disk
> >> and hold disk write, then come to hypervisor to take a volume snapshot).
> >>
> >> If we agree on enable pluggable VM snapshot, then we can move on discuss
> >> how to implement it.
> >>
> >> The possible options:
> >> 1. coarse grained interface. Add a VMSnapshotStrategy interface, which
> has
> >> the following interfaces:
> >>    VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
> >>    Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
> >>    Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
> >>
> >>   The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl:
> >> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
> >>   VMSnapshotManagerImpl will manage VM state, do the sanity check, then
> >> will handle over to VMSnapshotStrategy.
> >>   In VMSnapshotStrategy implementation, it may just send a
> >> Create/revert/delete VMSnapshotCommand to hypervisor host, or do
> anything
> >> special operations.
> >>
> >> 2. fine-grained interface. Not only add a VMSnapshotStrategy interface,
> >> but also add certain methods on the storage driver.
> >>    The VMSnapshotStrategy interface will be the same as option 1.
> >>    Will add the following methods on storage driver:
> >>   /* volumesBelongToVM  is the list of volumes of the VM that created on
> >> this storage, storage vendor can either take one snapshot for this
> volumes
> >> in one shot, or take snapshot for each volume separately
> >>       The pre-condition: vm is unquiesced.
> >>       It will return a Boolean to indicate, do need unquiesce vm or not.
> >>       In the default storage driver, it will return false.
> >>    */
> >>    boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM, VMSnapshot
> >> vmSnapshot);
> >>    Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> >> VMSnapshot vmSnapshot);
> >>   Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> VMSnapshot
> >> vmSNapshot);
> >>
> >> The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl:
> >> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
> >> driver:takeVMSnapshot
> >> In the implementation of VMSnapshotStrategy's takeVMSnapshot, the pseudo
> >> code looks like:
> >>       HypervisorHelper.quiesceVM(vm);
> >>       val volumes = vm.getVolumes();
> >>       val maps = new Map[driver, list[VolumeInfo]]();
> >>       Volumes.foreach(volume => maps.put(volume.getDriver, volume ::
> >> maps.get(volume.getdriver())))
> >>       val needUnquiesce = true;
> >>        maps.foreach((driver, volumes) => needUnquiesce  = needUnquiesce
> >> && driver.takeVMSnapshot(volumes))
> >>      if (needUnquiesce ) {
> >>       HypervisorHelper.unquiesce(vm);
> >>    }
> >>
> >> By default, the quiesceVM in HypervisorHelper will actually take vm
> >> snapshot through hypervisor.
> >> Does above logic makes senesce?
> >>
> >> The pros of option 1 is that: it's simple, no need to change storage
> >> driver interfaces. The cons is that each storage vendor need to
> implement a
> >> strategy, maybe they will do the same thing.
> >> The pros of option 2 is that, storage driver won't need to worry about
> how
> >> to quiesce/unquiesce vm. The cons is that, it will add these methods on
> >> each storage drivers, so it assumes that this work flow will work for
> >> everybody.
> >>
> >> So which option we should take? Or if you have other options, please
> let's
> >> know.
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> > --
> > *Mike Tutkowski*
> > *Senior CloudStack Developer, SolidFire Inc.*
> > e: mike.tutkowski@solidfire.com
> > o: 303.746.7302
> > Advancing the way the world uses the
> > cloud<http://solidfire.com/solution/overview/?video=play>
> > *™*
>
>


-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*

Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Mike Tutkowski <mi...@solidfire.com>.
I'm still curious how multiple VM snapshot strategies are resolved? What
happens if two vendors write a VM strategy and the VM you're taking a
snapshot of has disks provided by multiple vendors.

Chris had some ideas on this.

I'll be interested to see where this goes.


On Thu, Oct 10, 2013 at 12:38 PM, Edison Su <Ed...@citrix.com> wrote:

> Personally, I am +1 on the coarse grain interface, and totally agree with
> your points.
> As long as we separate vmsnasphotmanager and vmsnapshotstrategy, and
> provide enough helper functions(such as quiesce / un-quiesce vm) for
> vendors, then write a new vmsnapshotStrategy should be easy.
>
> > -----Original Message-----
> > From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
> > Sent: Wednesday, October 09, 2013 9:13 PM
> > To: dev@cloudstack.apache.org
> > Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
> >
> > Edison,
> >
> > I would lean toward doing the coarse grain interface only.  I'm having a
> hard
> > time seeing how the whole flow is generic and makes sense for everyone.
> > With starting with the coarse grain you have the advantage in that you
> avoid
> > possible upfront over engineering/over design that could wreak havoc down
> > the line.  If you implement the VMSnapshotStrategy and find that it
> really is
> > useful to other implementations, you can then implement the fine grain
> > interface later to allow others to benefit from it.
> >
> > Darren
> >
> > On Wed, Oct 9, 2013 at 8:54 PM, Mike Tutkowski
> > <mi...@solidfire.com> wrote:
> > > Hey guys,
> > >
> > > I haven't been giving this thread much attention, but am reviewing it
> > > somewhat now.
> > >
> > > I'm not really clear how this would work if, say, a VM has two data
> > > disks and they are not being provided by the same vendor.
> > >
> > > Can someone clarify that for me?
> > >
> > > My understanding for how this works today is that it doesn't matter.
> > > For XenServer, a VDI is on an SR, which could be supported by storage
> > vendor X.
> > > Another VDI could be on another SR, supported by storage vendor Y.
> > >
> > > In this case, a new VDI appears on each SR after a hypervisor snapshot.
> > >
> > > Same idea for VMware.
> > >
> > > I don't really know how (or if) this works for KVM.
> > >
> > > I'm not clear how this multi-vendor situation would play out in this
> > > pluggable approach.
> > >
> > > Thanks!
> > >
> > >
> > > On Tue, Oct 8, 2013 at 4:43 PM, Edison Su <Ed...@citrix.com>
> wrote:
> > >
> > >>
> > >>
> > >> > -----Original Message-----
> > >> > From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
> > >> > Sent: Tuesday, October 08, 2013 2:54 PM
> > >> > To: dev@cloudstack.apache.org
> > >> > Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
> > >> >
> > >> > A hypervisor snapshot will snapshot memory also.  So determining
> > >> > whether
> > >> The memory is optional for hypervisor vm snapshot, a.k.a, the
> > >> "Disk-only
> > >> snapshots":
> > >> http://support.citrix.com/proddocs/topic/xencenter-61/xs-xc-vms-snaps
> > >> hots-about.html It's supported by both xenserver/kvm/vmware.
> > >>
> > >> > do to the hypervisor snapshot from the quiesce option does not seem
> > >> > proper.
> > >> >
> > >> > Sorry, for all the questions, I'm trying to get to the point of
> > >> understand if this
> > >> > functionality makes sense at this point of code or if maybe their
> > >> > is a
> > >> different
> > >> > approach.  This is what I'm seeing, what if we state it this way
> > >> >
> > >> > 1) VM snapshot, AFAIK, are not backed up today and exist solely on
> > >> primary.
> > >> > What if we added a backup phase to VM snapshots that can be
> > >> > optionally supported by the storage providers to possibly backup
> > >> > the VM snapshot volumes.
> > >> It's not about backup vm snapshot, it's about how to take vm snapshot.
> > >> Usually, take/revert vm snapshot is handled by hypervisor itself, but
> > >> in NetApp(or other storage vendor) case, They want to change the
> > >> default behavior of hypervisor-base vm snapshot.
> > >>
> > >> Some examples:
> > >> 1. take hypervisor based vm snapshots, on primary storage, hypervisor
> > >> will maintain the snapshot chain.
> > >> 2. take vm snapshot through NetApp:
> > >>      a. first, quiesce VM if user specified. There is no separate API
> > >> to quiesce VM on the hypervisor, so here we will take a VM snapshot
> > >> through hypervisor API call, hypervisor will take volume snapshot  on
> > >> each volume of the VM. Let's say, on the primary storage, the disk
> > >> chain looks like:
> > >>            base-image
> > >>                     |
> > >>                     V
> > >>                 Parent disk
> > >>             /                         \
> > >>           V                            V
> > >>         Current disk        snapshot-a
> > >>      b. from snapshot-a, find out its parent disk, then take snapshot
> > >> through NetApp
> > >>      c. un- quiesce VM, here, go to hypervisor, delete snapshot
> > >> "snapshot-a", hypervisor should be able to consolidate current disk
> > >> and "parent disk" into one disk, thus from hypervisor point of view ,
> > >> there is always, at most, only one snapshot for the VM.
> > >>     For revert VM snapshot, as long as the VM is stopped, NetApp can
> > >> revert the snapshot created on NetApp storage easily, and efficiently.
> > >>    The benefit of this whole process, as Chris pointed out, if the
> > >> snapshot chain is quite long, hypervisor based VM snapshot will get
> > >> performance hit.
> > >>
> > >> >
> > >> > 2) Additionally you want to be able to backup multiple disks at
> > >> > once, regardless of VM snapshot.  Why don't we add the ability to
> > >> > put
> > >> volumeIds in
> > >> > snapshot cmd that if the storage provider supports it will get a
> > >> > batch of volumeIds.
> > >> >
> > >> > Now I know we talked about 2 and there was some concerns about it
> > >> > (mostly from me), but I think we could work through those concerns
> > >> > (forgot what they were...).  Right now I just get the feeling we
> > >> > are shoehorning some functionality into VM snapshot that isn't
> > >> > quite the right fit.  The "no
> > >> quiesce"
> > >> > flow just doesn't seem to make sense to me.
> > >>
> > >>
> > >> Not sure above NetApp proposed work flow makes sense to you or to
> > >> other body or not. If this work flow is only specific to NetApp, then
> > >> we don't need to enforce the whole process for everybody.
> > >>
> > >> >
> > >> > Darren
> > >> >
> > >> > On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
> > >> > <Ch...@netapp.com> wrote:
> > >> > > Whether the hypervisor snapshot happens depends on whether the
> > >> > 'quiesce' option is specified with the snapshot request. If a user
> > >> doesn't care
> > >> > about the consistency of their backup, then the hypervisor
> > >> snapshot/quiesce
> > >> > step can be skipped altogether. This of course is not the case if
> > >> > the
> > >> default
> > >> > provider is being used, in which case a hypervisor snapshot is the
> > >> > only
> > >> way of
> > >> > creating a backup since it can't be offloaded to the storage driver.
> > >> > >
> > >> > > --
> > >> > > Chris Suich
> > >> > > chris.suich@netapp.com
> > >> > > NetApp Software Engineer
> > >> > > Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> > >> > >
> > >> > > On Oct 8, 2013, at 4:57 PM, Darren Shepherd
> > >> > > <da...@gmail.com>
> > >> > >  wrote:
> > >> > >
> > >> > >> Who is going to decide whether the hypervisor snapshot should
> > >> > >> actually happen or not? Or how?
> > >> > >>
> > >> > >> Darren
> > >> > >>
> > >> > >> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
> > >> > >> <Ch...@netapp.com> wrote:
> > >> > >>>
> > >> > >>> --
> > >> > >>> Chris Suich
> > >> > >>> chris.suich@netapp.com
> > >> > >>> NetApp Software Engineer
> > >> > >>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> > >> > >>>
> > >> > >>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd
> > >> > <da...@gmail.com> wrote:
> > >> > >>>
> > >> > >>>> So in the implementation, when we say "quiesce" is that
> > >> > >>>> actually being implemented as a VM snapshot (memory and disk).
> > >> > >>>> And then when you say "unquiesce" you are talking about
> > >> > >>>> deleting the VM
> > >> > snapshot?
> > >> > >>>
> > >> > >>> If the VM snapshot is not going to the hypervisor, then yes, it
> > >> > >>> will
> > >> > actually be a hypervisor snapshot. Just to be clear, the unquiesce
> > >> > is
> > >> not quite
> > >> > a delete - it is a collapse of the VM snapshot and the active VM
> > >> > back
> > >> into one
> > >> > file.
> > >> > >>>
> > >> > >>>>
> > >> > >>>> In NetApp, what are you snapshotting?  The whole netapp volume
> > >> > >>>> (I don't know the correct term), a file on NFS, an iscsi
> > >> > >>>> volume?  I don't know a whole heck of a lot about the netapp
> > >> > >>>> snapshot
> > >> > capabilities.
> > >> > >>>
> > >> > >>> Essentially we are using internal APIs to create file level
> > >> > >>> backups
> > >> - don't
> > >> > worry too much about the terminology.
> > >> > >>>
> > >> > >>>>
> > >> > >>>> I know storage solutions can snapshot better and faster than
> > >> > >>>> hypervisors can with COW files.  I've personally just been
> > >> > >>>> always perplexed on whats the best way to implement it.  For
> > >> > >>>> storage solutions that are block based, its really easy to
> > >> > >>>> have the storage doing the snapshot.  For shared file systems,
> > >> > >>>> like NFS, its seems way more complicated as you don't want to
> > >> > >>>> snapshot the entire filesystem in order to snapshot one file.
> > >> > >>>
> > >> > >>> With filesystems like NFS, things are certainly more
> > >> > >>> complicated,
> > >> but that
> > >> > is taken care of by our controller's operating system, Data ONTAP,
> > >> > and we simply use APIs to communicate with it.
> > >> > >>>
> > >> > >>>>
> > >> > >>>> Darren
> > >> > >>>>
> > >> > >>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
> > >> > >>>> <Ch...@netapp.com> wrote:
> > >> > >>>>> I can comment on the second half.
> > >> > >>>>>
> > >> > >>>>> Through storage operations, storage providers can create
> > >> > >>>>> backups
> > >> > much faster than hypervisors and over time, their snapshots are
> > >> > more efficient than the snapshot chains that hypervisors create. It
> > >> > is true
> > >> that a VM
> > >> > snapshot taken at the storage level is slightly different as it
> > >> > would be
> > >> psuedo-
> > >> > quiesced, not have it's memory snapshotted. This is accomplished
> > >> > through hypervisor snapshots:
> > >> > >>>>>
> > >> > >>>>> 1) VM snapshot request (lets say VM 'A'
> > >> > >>>>> 2) Create hypervisor snapshot (optional) -VM 'A' is
> > >> > >>>>> snapshotted, creating active VM 'A*'
> > >> > >>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of
> 'A*'
> > >> > >>>>> 3) Storage driver(s) take snapshots of each volume
> > >> > >>>>> 4) Undo hypervisor snapshot (optional) -VM snapshot 'A' is
> > >> > >>>>> rolled back into VM 'A*' so the hypervisor snapshot no longer
> > >> > >>>>> exists
> > >> > >>>>>
> > >> > >>>>> Now, a couple notes:
> > >> > >>>>> -The reason this is optional is that not all users
> > >> > >>>>> necessarily
> > >> care about
> > >> > the memory or disk consistency of their VMs and would prefer faster
> > >> > snapshots to consistency.
> > >> > >>>>> -Preemptively, yes, we are actually taking hypervisor
> > >> > >>>>> snapshots
> > >> which
> > >> > means there isn't actually a performance of taking storage
> > >> > snapshots when quiescing the VM. However, the performance gain will
> > >> > come both during restoring the VM and during normal operations as
> > described above.
> > >> > >>>>>
> > >> > >>>>> Although you can think of it as a poor man's VM snapshot, I
> > >> > >>>>> would
> > >> > think of it more as a consistent multi-volume snapshot. Again, the
> > >> difference
> > >> > being that this snapshot was not truly quiesced like a hypervisor
> > >> snapshot
> > >> > would be.
> > >> > >>>>>
> > >> > >>>>> --
> > >> > >>>>> Chris Suich
> > >> > >>>>> chris.suich@netapp.com
> > >> > >>>>> NetApp Software Engineer
> > >> > >>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red
> > >> > >>>>> Hat
> > >> > >>>>>
> > >> > >>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd
> > >> > <da...@gmail.com> wrote:
> > >> > >>>>>
> > >> > >>>>>> My only comment is that having the return type as boolean
> > >> > >>>>>> and using to that indicate quiesce behaviour seems obscure
> > >> > >>>>>> and will probably lead to a problem later.  Your basically
> > >> > >>>>>> saying the result of the takeVMSnapshot will only ever need
> > >> > >>>>>> to communicate back whether unquiesce needs to happen.
> > >> > >>>>>> Maybe some result
> > >> > object
> > >> > >>>>>> would be more extensible.
> > >> > >>>>>>
> > >> > >>>>>> Actually, I think I have more comments.  This seems a bit
> > >> > >>>>>> odd to
> > >> me.
> > >> > >>>>>> Why would a storage driver in ACS implement a VM snapshot
> > >> > >>>>>> functionality?  VM snapshot is a really a hypervisor
> > >> > >>>>>> orchestrated operation.  So it seems like were trying to
> > >> > >>>>>> implement a poor mans VM snapshot.  Maybe if I understood
> > >> > >>>>>> what NetApp was trying to do it would make more sense, but
> > >> > >>>>>> its all odd.  To do a proper VM snapshot you need to
> > >> > >>>>>> snapshot memory and disk at the exact same time.  How are we
> > >> > >>>>>> going to do that if ACS is orchestrating the VM snapshot and
> > >> > >>>>>> delegating to storage providers.  Its not like you are going
> to
> > pause the VM.... or are you?
> > >> > >>>>>>
> > >> > >>>>>> Darren
> > >> > >>>>>>
> > >> > >>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su
> > >> > >>>>>> <Ed...@citrix.com>
> > >> > wrote:
> > >> > >>>>>>> I created a design document page at
> > >> >
> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM
> > >> > +s
> > >> > napshot+related+operations, feel free to add items on it.
> > >> > >>>>>>> And a new branch "pluggable_vm_snapshot" is created.
> > >> > >>>>>>>
> > >> > >>>>>>>> -----Original Message-----
> > >> > >>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
> > >> > >>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
> > >> > >>>>>>>> To: <de...@cloudstack.apache.org>
> > >> > >>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related
> > operations?
> > >> > >>>>>>>>
> > >> > >>>>>>>> I'm a fan of option 2 - this gives us the most flexibility
> > >> > >>>>>>>> (as you stated). The option is given to completely
> > >> > >>>>>>>> override the way VM snapshots work AND storage providers
> > >> > >>>>>>>> are given to opportunity to work within the default VM
> > snapshot workflow.
> > >> > >>>>>>>>
> > >> > >>>>>>>> I believe this option should satisfy your concern, Mike.
> > >> > >>>>>>>> The snapshot and quiesce strategy would be in charge of
> > >> > communicating with the hypervisor.
> > >> > >>>>>>>> Storage providers should be able to leverage the default
> > >> > >>>>>>>> strategies and simply perform the storage operations.
> > >> > >>>>>>>>
> > >> > >>>>>>>> I don't think it should be much of an issue that new
> > >> > >>>>>>>> method to the storage driver interface may not apply to
> > >> > >>>>>>>> everyone. In fact,
> > >> > that is already the case.
> > >> > >>>>>>>> Some methods such as un/maintain(), attachToXXX() and
> > >> > >>>>>>>> takeSnapshot() are already not implemented by every driver
> > >> > >>>>>>>> - they just return false when asked if they can handle the
> > >> operation.
> > >> > >>>>>>>>
> > >> > >>>>>>>> --
> > >> > >>>>>>>> Chris Suich
> > >> > >>>>>>>> chris.suich@netapp.com
> > >> > >>>>>>>> NetApp Software Engineer
> > >> > >>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco &
> > >> > >>>>>>>> Red Hat
> > >> > >>>>>>>>
> > >> > >>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski
> > >> > >>>>>>>> <mi...@solidfire.com>
> > >> > >>>>>>>> wrote:
> > >> > >>>>>>>>
> > >> > >>>>>>>>> Well, my first thought on this is that the storage driver
> > >> > >>>>>>>>> should not be telling the hypervisor to do anything. It
> > >> > >>>>>>>>> should be responsible for creating/deleting volumes,
> > snapshots, etc.
> > >> on
> > >> > its storage system only.
> > >> > >>>>>>>>>
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <
> > >> Edison.su@citrix.com>
> > >> > wrote:
> > >> > >>>>>>>>>
> > >> > >>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver.
> > The
> > >> > >>>>>>>>>> current workflow will be like the following:
> > >> > >>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
> > >> > >>>>>>>>>> creatVMSnapshot -> send CreateVMSnapshotCommand
> > to
> > >> > hypervisor to create vm snapshot.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> If anybody wants to change the workflow, then need to
> > >> > >>>>>>>>>> either change VMSnapshotManagerImpl directly or
> > subclass
> > >> > VMSnapshotManagerImpl.
> > >> > >>>>>>>>>> Both are not the ideal choice, as
> > VMSnapshotManagerImpl
> > >> > >>>>>>>>>> should be able to handle different ways to take vm
> > >> > >>>>>>>>>> snapshot,
> > >> > instead of hard code.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> The requirements for the pluggable VM snapshot coming
> > from:
> > >> > >>>>>>>>>> Storage vendor may have their optimization, such as
> > NetApp.
> > >> > >>>>>>>>>> VM snapshot can be implemented in a totally different
> > >> > >>>>>>>>>> way(For example, I could just send a command to guest
> > >> > >>>>>>>>>> VM, to tell my application to flush disk and hold disk
> > >> > >>>>>>>>>> write, then come to hypervisor to
> > >> > >>>>>>>> take a volume snapshot).
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> If we agree on enable pluggable VM snapshot, then we
> > can
> > >> > move
> > >> > >>>>>>>>>> on discuss how to implement it.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> The possible options:
> > >> > >>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy
> > >> > >>>>>>>>>> interface, which has the following interfaces:
> > >> > >>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
> > >> > Boolean
> > >> > >>>>>>>>>> revertVMSnapshot(VMSnapshot vmSnapshot); Boolean
> > >> > >>>>>>>>>> DeleteVMSnapshot(VMSnapshot vmSnapshot);
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> The work flow will be: createVMSnapshot api ->
> > >> > >>>>>>>> VMSnapshotManagerImpl:
> > >> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy:
> > takeVMSnapshot
> > >> > >>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the
> > >> > >>>>>>>>>> sanity check, then will handle over to
> > VMSnapshotStrategy.
> > >> > >>>>>>>>>> In VMSnapshotStrategy implementation, it may just send
> > a
> > >> > >>>>>>>>>> Create/revert/delete VMSnapshotCommand to
> > hypervisor
> > >> > host, or
> > >> > >>>>>>>>>> do anything special operations.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> 2. fine-grained interface. Not only add a
> > >> > >>>>>>>>>> VMSnapshotStrategy interface, but also add certain
> > methods on the storage driver.
> > >> > >>>>>>>>>> The VMSnapshotStrategy interface will be the same as
> > option 1.
> > >> > >>>>>>>>>> Will add the following methods on storage driver:
> > >> > >>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM
> > >> > >>>>>>>>>> that created on this storage, storage vendor can either
> > >> > >>>>>>>>>> take one snapshot for this volumes in one shot, or take
> > >> > >>>>>>>>>> snapshot for
> > >> > each volume separately
> > >> > >>>>>>>>>>    The pre-condition: vm is unquiesced.
> > >> > >>>>>>>>>>    It will return a Boolean to indicate, do need
> > >> > >>>>>>>>>> unquiesce vm
> > >> or
> > >> > not.
> > >> > >>>>>>>>>>    In the default storage driver, it will return false.
> > >> > >>>>>>>>>> */
> > >> > >>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo>
> > >> > volumesBelongToVM,
> > >> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> > >> > >>>>>>>>>> revertVMSnapshot(List<VolumeInfo>
> > volumesBelongToVM,
> > >> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> > >> > >>>>>>>>>> deleteVMSnapshot(List<VolumeInfo>
> > volumesBelongToVM,
> > >> > >>>>>>>>>> VMSnapshot vmSNapshot);
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> The work flow will be: createVMSnapshot api ->
> > >> > >>>>>>>> VMSnapshotManagerImpl:
> > >> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy:
> > takeVMSnapshot ->
> > >> > >>>>>>>>>> storage driver:takeVMSnapshot In the implementation of
> > >> > >>>>>>>>>> VMSnapshotStrategy's takeVMSnapshot, the pseudo
> > code
> > >> > looks like:
> > >> > >>>>>>>>>>    HypervisorHelper.quiesceVM(vm);
> > >> > >>>>>>>>>>    val volumes = vm.getVolumes();
> > >> > >>>>>>>>>>    val maps = new Map[driver, list[VolumeInfo]]();
> > >> > >>>>>>>>>>    Volumes.foreach(volume => maps.put(volume.getDriver,
> > >> > volume ::
> > >> > >>>>>>>>>> maps.get(volume.getdriver())))
> > >> > >>>>>>>>>>    val needUnquiesce = true;
> > >> > >>>>>>>>>>     maps.foreach((driver, volumes) => needUnquiesce  =
> > >> > >>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
> > >> > >>>>>>>>>>   if (needUnquiesce ) {
> > >> > >>>>>>>>>>    HypervisorHelper.unquiesce(vm); }
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> By default, the quiesceVM in HypervisorHelper will
> > >> > >>>>>>>>>> actually take vm snapshot through hypervisor.
> > >> > >>>>>>>>>> Does above logic makes senesce?
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> The pros of option 1 is that: it's simple, no need to
> > >> > >>>>>>>>>> change storage driver interfaces. The cons is that each
> > >> > >>>>>>>>>> storage vendor need to implement a strategy, maybe they
> > >> > >>>>>>>>>> will do the
> > >> > same thing.
> > >> > >>>>>>>>>> The pros of option 2 is that, storage driver won't need
> > >> > >>>>>>>>>> to worry about how to quiesce/unquiesce vm. The cons is
> > >> > >>>>>>>>>> that, it will add these methods on each storage drivers,
> > >> > >>>>>>>>>> so it assumes that this work flow will work for
> everybody.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>> So which option we should take? Or if you have other
> > >> > >>>>>>>>>> options, please let's know.
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>>
> > >> > >>>>>>>>>
> > >> > >>>>>>>>>
> > >> > >>>>>>>>> --
> > >> > >>>>>>>>> *Mike Tutkowski*
> > >> > >>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
> > >> > >>>>>>>>> e: mike.tutkowski@solidfire.com
> > >> > >>>>>>>>> o: 303.746.7302
> > >> > >>>>>>>>> Advancing the way the world uses the
> > >> > >>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
> > >> > >>>>>>>>> *(tm)*
> > >> > >>>>>>>
> > >> > >>>>>
> > >> > >>>
> > >> > >
> > >>
> > >
> > >
> > >
> > > --
> > > *Mike Tutkowski*
> > > *Senior CloudStack Developer, SolidFire Inc.*
> > > e: mike.tutkowski@solidfire.com
> > > o: 303.746.7302
> > > Advancing the way the world uses the
> > > cloud<http://solidfire.com/solution/overview/?video=play>
> > > *(tm)*
>



-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*

RE: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Edison Su <Ed...@citrix.com>.
Personally, I am +1 on the coarse grain interface, and totally agree with your points.
As long as we separate vmsnasphotmanager and vmsnapshotstrategy, and provide enough helper functions(such as quiesce / un-quiesce vm) for vendors, then write a new vmsnapshotStrategy should be easy.

> -----Original Message-----
> From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
> Sent: Wednesday, October 09, 2013 9:13 PM
> To: dev@cloudstack.apache.org
> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
> 
> Edison,
> 
> I would lean toward doing the coarse grain interface only.  I'm having a hard
> time seeing how the whole flow is generic and makes sense for everyone.
> With starting with the coarse grain you have the advantage in that you avoid
> possible upfront over engineering/over design that could wreak havoc down
> the line.  If you implement the VMSnapshotStrategy and find that it really is
> useful to other implementations, you can then implement the fine grain
> interface later to allow others to benefit from it.
> 
> Darren
> 
> On Wed, Oct 9, 2013 at 8:54 PM, Mike Tutkowski
> <mi...@solidfire.com> wrote:
> > Hey guys,
> >
> > I haven't been giving this thread much attention, but am reviewing it
> > somewhat now.
> >
> > I'm not really clear how this would work if, say, a VM has two data
> > disks and they are not being provided by the same vendor.
> >
> > Can someone clarify that for me?
> >
> > My understanding for how this works today is that it doesn't matter.
> > For XenServer, a VDI is on an SR, which could be supported by storage
> vendor X.
> > Another VDI could be on another SR, supported by storage vendor Y.
> >
> > In this case, a new VDI appears on each SR after a hypervisor snapshot.
> >
> > Same idea for VMware.
> >
> > I don't really know how (or if) this works for KVM.
> >
> > I'm not clear how this multi-vendor situation would play out in this
> > pluggable approach.
> >
> > Thanks!
> >
> >
> > On Tue, Oct 8, 2013 at 4:43 PM, Edison Su <Ed...@citrix.com> wrote:
> >
> >>
> >>
> >> > -----Original Message-----
> >> > From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
> >> > Sent: Tuesday, October 08, 2013 2:54 PM
> >> > To: dev@cloudstack.apache.org
> >> > Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
> >> >
> >> > A hypervisor snapshot will snapshot memory also.  So determining
> >> > whether
> >> The memory is optional for hypervisor vm snapshot, a.k.a, the
> >> "Disk-only
> >> snapshots":
> >> http://support.citrix.com/proddocs/topic/xencenter-61/xs-xc-vms-snaps
> >> hots-about.html It's supported by both xenserver/kvm/vmware.
> >>
> >> > do to the hypervisor snapshot from the quiesce option does not seem
> >> > proper.
> >> >
> >> > Sorry, for all the questions, I'm trying to get to the point of
> >> understand if this
> >> > functionality makes sense at this point of code or if maybe their
> >> > is a
> >> different
> >> > approach.  This is what I'm seeing, what if we state it this way
> >> >
> >> > 1) VM snapshot, AFAIK, are not backed up today and exist solely on
> >> primary.
> >> > What if we added a backup phase to VM snapshots that can be
> >> > optionally supported by the storage providers to possibly backup
> >> > the VM snapshot volumes.
> >> It's not about backup vm snapshot, it's about how to take vm snapshot.
> >> Usually, take/revert vm snapshot is handled by hypervisor itself, but
> >> in NetApp(or other storage vendor) case, They want to change the
> >> default behavior of hypervisor-base vm snapshot.
> >>
> >> Some examples:
> >> 1. take hypervisor based vm snapshots, on primary storage, hypervisor
> >> will maintain the snapshot chain.
> >> 2. take vm snapshot through NetApp:
> >>      a. first, quiesce VM if user specified. There is no separate API
> >> to quiesce VM on the hypervisor, so here we will take a VM snapshot
> >> through hypervisor API call, hypervisor will take volume snapshot  on
> >> each volume of the VM. Let's say, on the primary storage, the disk
> >> chain looks like:
> >>            base-image
> >>                     |
> >>                     V
> >>                 Parent disk
> >>             /                         \
> >>           V                            V
> >>         Current disk        snapshot-a
> >>      b. from snapshot-a, find out its parent disk, then take snapshot
> >> through NetApp
> >>      c. un- quiesce VM, here, go to hypervisor, delete snapshot
> >> "snapshot-a", hypervisor should be able to consolidate current disk
> >> and "parent disk" into one disk, thus from hypervisor point of view ,
> >> there is always, at most, only one snapshot for the VM.
> >>     For revert VM snapshot, as long as the VM is stopped, NetApp can
> >> revert the snapshot created on NetApp storage easily, and efficiently.
> >>    The benefit of this whole process, as Chris pointed out, if the
> >> snapshot chain is quite long, hypervisor based VM snapshot will get
> >> performance hit.
> >>
> >> >
> >> > 2) Additionally you want to be able to backup multiple disks at
> >> > once, regardless of VM snapshot.  Why don't we add the ability to
> >> > put
> >> volumeIds in
> >> > snapshot cmd that if the storage provider supports it will get a
> >> > batch of volumeIds.
> >> >
> >> > Now I know we talked about 2 and there was some concerns about it
> >> > (mostly from me), but I think we could work through those concerns
> >> > (forgot what they were...).  Right now I just get the feeling we
> >> > are shoehorning some functionality into VM snapshot that isn't
> >> > quite the right fit.  The "no
> >> quiesce"
> >> > flow just doesn't seem to make sense to me.
> >>
> >>
> >> Not sure above NetApp proposed work flow makes sense to you or to
> >> other body or not. If this work flow is only specific to NetApp, then
> >> we don't need to enforce the whole process for everybody.
> >>
> >> >
> >> > Darren
> >> >
> >> > On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
> >> > <Ch...@netapp.com> wrote:
> >> > > Whether the hypervisor snapshot happens depends on whether the
> >> > 'quiesce' option is specified with the snapshot request. If a user
> >> doesn't care
> >> > about the consistency of their backup, then the hypervisor
> >> snapshot/quiesce
> >> > step can be skipped altogether. This of course is not the case if
> >> > the
> >> default
> >> > provider is being used, in which case a hypervisor snapshot is the
> >> > only
> >> way of
> >> > creating a backup since it can't be offloaded to the storage driver.
> >> > >
> >> > > --
> >> > > Chris Suich
> >> > > chris.suich@netapp.com
> >> > > NetApp Software Engineer
> >> > > Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >> > >
> >> > > On Oct 8, 2013, at 4:57 PM, Darren Shepherd
> >> > > <da...@gmail.com>
> >> > >  wrote:
> >> > >
> >> > >> Who is going to decide whether the hypervisor snapshot should
> >> > >> actually happen or not? Or how?
> >> > >>
> >> > >> Darren
> >> > >>
> >> > >> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
> >> > >> <Ch...@netapp.com> wrote:
> >> > >>>
> >> > >>> --
> >> > >>> Chris Suich
> >> > >>> chris.suich@netapp.com
> >> > >>> NetApp Software Engineer
> >> > >>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >> > >>>
> >> > >>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd
> >> > <da...@gmail.com> wrote:
> >> > >>>
> >> > >>>> So in the implementation, when we say "quiesce" is that
> >> > >>>> actually being implemented as a VM snapshot (memory and disk).
> >> > >>>> And then when you say "unquiesce" you are talking about
> >> > >>>> deleting the VM
> >> > snapshot?
> >> > >>>
> >> > >>> If the VM snapshot is not going to the hypervisor, then yes, it
> >> > >>> will
> >> > actually be a hypervisor snapshot. Just to be clear, the unquiesce
> >> > is
> >> not quite
> >> > a delete - it is a collapse of the VM snapshot and the active VM
> >> > back
> >> into one
> >> > file.
> >> > >>>
> >> > >>>>
> >> > >>>> In NetApp, what are you snapshotting?  The whole netapp volume
> >> > >>>> (I don't know the correct term), a file on NFS, an iscsi
> >> > >>>> volume?  I don't know a whole heck of a lot about the netapp
> >> > >>>> snapshot
> >> > capabilities.
> >> > >>>
> >> > >>> Essentially we are using internal APIs to create file level
> >> > >>> backups
> >> - don't
> >> > worry too much about the terminology.
> >> > >>>
> >> > >>>>
> >> > >>>> I know storage solutions can snapshot better and faster than
> >> > >>>> hypervisors can with COW files.  I've personally just been
> >> > >>>> always perplexed on whats the best way to implement it.  For
> >> > >>>> storage solutions that are block based, its really easy to
> >> > >>>> have the storage doing the snapshot.  For shared file systems,
> >> > >>>> like NFS, its seems way more complicated as you don't want to
> >> > >>>> snapshot the entire filesystem in order to snapshot one file.
> >> > >>>
> >> > >>> With filesystems like NFS, things are certainly more
> >> > >>> complicated,
> >> but that
> >> > is taken care of by our controller's operating system, Data ONTAP,
> >> > and we simply use APIs to communicate with it.
> >> > >>>
> >> > >>>>
> >> > >>>> Darren
> >> > >>>>
> >> > >>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
> >> > >>>> <Ch...@netapp.com> wrote:
> >> > >>>>> I can comment on the second half.
> >> > >>>>>
> >> > >>>>> Through storage operations, storage providers can create
> >> > >>>>> backups
> >> > much faster than hypervisors and over time, their snapshots are
> >> > more efficient than the snapshot chains that hypervisors create. It
> >> > is true
> >> that a VM
> >> > snapshot taken at the storage level is slightly different as it
> >> > would be
> >> psuedo-
> >> > quiesced, not have it's memory snapshotted. This is accomplished
> >> > through hypervisor snapshots:
> >> > >>>>>
> >> > >>>>> 1) VM snapshot request (lets say VM 'A'
> >> > >>>>> 2) Create hypervisor snapshot (optional) -VM 'A' is
> >> > >>>>> snapshotted, creating active VM 'A*'
> >> > >>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*'
> >> > >>>>> 3) Storage driver(s) take snapshots of each volume
> >> > >>>>> 4) Undo hypervisor snapshot (optional) -VM snapshot 'A' is
> >> > >>>>> rolled back into VM 'A*' so the hypervisor snapshot no longer
> >> > >>>>> exists
> >> > >>>>>
> >> > >>>>> Now, a couple notes:
> >> > >>>>> -The reason this is optional is that not all users
> >> > >>>>> necessarily
> >> care about
> >> > the memory or disk consistency of their VMs and would prefer faster
> >> > snapshots to consistency.
> >> > >>>>> -Preemptively, yes, we are actually taking hypervisor
> >> > >>>>> snapshots
> >> which
> >> > means there isn't actually a performance of taking storage
> >> > snapshots when quiescing the VM. However, the performance gain will
> >> > come both during restoring the VM and during normal operations as
> described above.
> >> > >>>>>
> >> > >>>>> Although you can think of it as a poor man's VM snapshot, I
> >> > >>>>> would
> >> > think of it more as a consistent multi-volume snapshot. Again, the
> >> difference
> >> > being that this snapshot was not truly quiesced like a hypervisor
> >> snapshot
> >> > would be.
> >> > >>>>>
> >> > >>>>> --
> >> > >>>>> Chris Suich
> >> > >>>>> chris.suich@netapp.com
> >> > >>>>> NetApp Software Engineer
> >> > >>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red
> >> > >>>>> Hat
> >> > >>>>>
> >> > >>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd
> >> > <da...@gmail.com> wrote:
> >> > >>>>>
> >> > >>>>>> My only comment is that having the return type as boolean
> >> > >>>>>> and using to that indicate quiesce behaviour seems obscure
> >> > >>>>>> and will probably lead to a problem later.  Your basically
> >> > >>>>>> saying the result of the takeVMSnapshot will only ever need
> >> > >>>>>> to communicate back whether unquiesce needs to happen.
> >> > >>>>>> Maybe some result
> >> > object
> >> > >>>>>> would be more extensible.
> >> > >>>>>>
> >> > >>>>>> Actually, I think I have more comments.  This seems a bit
> >> > >>>>>> odd to
> >> me.
> >> > >>>>>> Why would a storage driver in ACS implement a VM snapshot
> >> > >>>>>> functionality?  VM snapshot is a really a hypervisor
> >> > >>>>>> orchestrated operation.  So it seems like were trying to
> >> > >>>>>> implement a poor mans VM snapshot.  Maybe if I understood
> >> > >>>>>> what NetApp was trying to do it would make more sense, but
> >> > >>>>>> its all odd.  To do a proper VM snapshot you need to
> >> > >>>>>> snapshot memory and disk at the exact same time.  How are we
> >> > >>>>>> going to do that if ACS is orchestrating the VM snapshot and
> >> > >>>>>> delegating to storage providers.  Its not like you are going to
> pause the VM.... or are you?
> >> > >>>>>>
> >> > >>>>>> Darren
> >> > >>>>>>
> >> > >>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su
> >> > >>>>>> <Ed...@citrix.com>
> >> > wrote:
> >> > >>>>>>> I created a design document page at
> >> >
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM
> >> > +s
> >> > napshot+related+operations, feel free to add items on it.
> >> > >>>>>>> And a new branch "pluggable_vm_snapshot" is created.
> >> > >>>>>>>
> >> > >>>>>>>> -----Original Message-----
> >> > >>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
> >> > >>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
> >> > >>>>>>>> To: <de...@cloudstack.apache.org>
> >> > >>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related
> operations?
> >> > >>>>>>>>
> >> > >>>>>>>> I'm a fan of option 2 - this gives us the most flexibility
> >> > >>>>>>>> (as you stated). The option is given to completely
> >> > >>>>>>>> override the way VM snapshots work AND storage providers
> >> > >>>>>>>> are given to opportunity to work within the default VM
> snapshot workflow.
> >> > >>>>>>>>
> >> > >>>>>>>> I believe this option should satisfy your concern, Mike.
> >> > >>>>>>>> The snapshot and quiesce strategy would be in charge of
> >> > communicating with the hypervisor.
> >> > >>>>>>>> Storage providers should be able to leverage the default
> >> > >>>>>>>> strategies and simply perform the storage operations.
> >> > >>>>>>>>
> >> > >>>>>>>> I don't think it should be much of an issue that new
> >> > >>>>>>>> method to the storage driver interface may not apply to
> >> > >>>>>>>> everyone. In fact,
> >> > that is already the case.
> >> > >>>>>>>> Some methods such as un/maintain(), attachToXXX() and
> >> > >>>>>>>> takeSnapshot() are already not implemented by every driver
> >> > >>>>>>>> - they just return false when asked if they can handle the
> >> operation.
> >> > >>>>>>>>
> >> > >>>>>>>> --
> >> > >>>>>>>> Chris Suich
> >> > >>>>>>>> chris.suich@netapp.com
> >> > >>>>>>>> NetApp Software Engineer
> >> > >>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco &
> >> > >>>>>>>> Red Hat
> >> > >>>>>>>>
> >> > >>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski
> >> > >>>>>>>> <mi...@solidfire.com>
> >> > >>>>>>>> wrote:
> >> > >>>>>>>>
> >> > >>>>>>>>> Well, my first thought on this is that the storage driver
> >> > >>>>>>>>> should not be telling the hypervisor to do anything. It
> >> > >>>>>>>>> should be responsible for creating/deleting volumes,
> snapshots, etc.
> >> on
> >> > its storage system only.
> >> > >>>>>>>>>
> >> > >>>>>>>>>
> >> > >>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <
> >> Edison.su@citrix.com>
> >> > wrote:
> >> > >>>>>>>>>
> >> > >>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver.
> The
> >> > >>>>>>>>>> current workflow will be like the following:
> >> > >>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
> >> > >>>>>>>>>> creatVMSnapshot -> send CreateVMSnapshotCommand
> to
> >> > hypervisor to create vm snapshot.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> If anybody wants to change the workflow, then need to
> >> > >>>>>>>>>> either change VMSnapshotManagerImpl directly or
> subclass
> >> > VMSnapshotManagerImpl.
> >> > >>>>>>>>>> Both are not the ideal choice, as
> VMSnapshotManagerImpl
> >> > >>>>>>>>>> should be able to handle different ways to take vm
> >> > >>>>>>>>>> snapshot,
> >> > instead of hard code.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> The requirements for the pluggable VM snapshot coming
> from:
> >> > >>>>>>>>>> Storage vendor may have their optimization, such as
> NetApp.
> >> > >>>>>>>>>> VM snapshot can be implemented in a totally different
> >> > >>>>>>>>>> way(For example, I could just send a command to guest
> >> > >>>>>>>>>> VM, to tell my application to flush disk and hold disk
> >> > >>>>>>>>>> write, then come to hypervisor to
> >> > >>>>>>>> take a volume snapshot).
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> If we agree on enable pluggable VM snapshot, then we
> can
> >> > move
> >> > >>>>>>>>>> on discuss how to implement it.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> The possible options:
> >> > >>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy
> >> > >>>>>>>>>> interface, which has the following interfaces:
> >> > >>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
> >> > Boolean
> >> > >>>>>>>>>> revertVMSnapshot(VMSnapshot vmSnapshot); Boolean
> >> > >>>>>>>>>> DeleteVMSnapshot(VMSnapshot vmSnapshot);
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> The work flow will be: createVMSnapshot api ->
> >> > >>>>>>>> VMSnapshotManagerImpl:
> >> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy:
> takeVMSnapshot
> >> > >>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the
> >> > >>>>>>>>>> sanity check, then will handle over to
> VMSnapshotStrategy.
> >> > >>>>>>>>>> In VMSnapshotStrategy implementation, it may just send
> a
> >> > >>>>>>>>>> Create/revert/delete VMSnapshotCommand to
> hypervisor
> >> > host, or
> >> > >>>>>>>>>> do anything special operations.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> 2. fine-grained interface. Not only add a
> >> > >>>>>>>>>> VMSnapshotStrategy interface, but also add certain
> methods on the storage driver.
> >> > >>>>>>>>>> The VMSnapshotStrategy interface will be the same as
> option 1.
> >> > >>>>>>>>>> Will add the following methods on storage driver:
> >> > >>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM
> >> > >>>>>>>>>> that created on this storage, storage vendor can either
> >> > >>>>>>>>>> take one snapshot for this volumes in one shot, or take
> >> > >>>>>>>>>> snapshot for
> >> > each volume separately
> >> > >>>>>>>>>>    The pre-condition: vm is unquiesced.
> >> > >>>>>>>>>>    It will return a Boolean to indicate, do need
> >> > >>>>>>>>>> unquiesce vm
> >> or
> >> > not.
> >> > >>>>>>>>>>    In the default storage driver, it will return false.
> >> > >>>>>>>>>> */
> >> > >>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo>
> >> > volumesBelongToVM,
> >> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> >> > >>>>>>>>>> revertVMSnapshot(List<VolumeInfo>
> volumesBelongToVM,
> >> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> >> > >>>>>>>>>> deleteVMSnapshot(List<VolumeInfo>
> volumesBelongToVM,
> >> > >>>>>>>>>> VMSnapshot vmSNapshot);
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> The work flow will be: createVMSnapshot api ->
> >> > >>>>>>>> VMSnapshotManagerImpl:
> >> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy:
> takeVMSnapshot ->
> >> > >>>>>>>>>> storage driver:takeVMSnapshot In the implementation of
> >> > >>>>>>>>>> VMSnapshotStrategy's takeVMSnapshot, the pseudo
> code
> >> > looks like:
> >> > >>>>>>>>>>    HypervisorHelper.quiesceVM(vm);
> >> > >>>>>>>>>>    val volumes = vm.getVolumes();
> >> > >>>>>>>>>>    val maps = new Map[driver, list[VolumeInfo]]();
> >> > >>>>>>>>>>    Volumes.foreach(volume => maps.put(volume.getDriver,
> >> > volume ::
> >> > >>>>>>>>>> maps.get(volume.getdriver())))
> >> > >>>>>>>>>>    val needUnquiesce = true;
> >> > >>>>>>>>>>     maps.foreach((driver, volumes) => needUnquiesce  =
> >> > >>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
> >> > >>>>>>>>>>   if (needUnquiesce ) {
> >> > >>>>>>>>>>    HypervisorHelper.unquiesce(vm); }
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> By default, the quiesceVM in HypervisorHelper will
> >> > >>>>>>>>>> actually take vm snapshot through hypervisor.
> >> > >>>>>>>>>> Does above logic makes senesce?
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> The pros of option 1 is that: it's simple, no need to
> >> > >>>>>>>>>> change storage driver interfaces. The cons is that each
> >> > >>>>>>>>>> storage vendor need to implement a strategy, maybe they
> >> > >>>>>>>>>> will do the
> >> > same thing.
> >> > >>>>>>>>>> The pros of option 2 is that, storage driver won't need
> >> > >>>>>>>>>> to worry about how to quiesce/unquiesce vm. The cons is
> >> > >>>>>>>>>> that, it will add these methods on each storage drivers,
> >> > >>>>>>>>>> so it assumes that this work flow will work for everybody.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> So which option we should take? Or if you have other
> >> > >>>>>>>>>> options, please let's know.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>
> >> > >>>>>>>>>
> >> > >>>>>>>>> --
> >> > >>>>>>>>> *Mike Tutkowski*
> >> > >>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
> >> > >>>>>>>>> e: mike.tutkowski@solidfire.com
> >> > >>>>>>>>> o: 303.746.7302
> >> > >>>>>>>>> Advancing the way the world uses the
> >> > >>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
> >> > >>>>>>>>> *(tm)*
> >> > >>>>>>>
> >> > >>>>>
> >> > >>>
> >> > >
> >>
> >
> >
> >
> > --
> > *Mike Tutkowski*
> > *Senior CloudStack Developer, SolidFire Inc.*
> > e: mike.tutkowski@solidfire.com
> > o: 303.746.7302
> > Advancing the way the world uses the
> > cloud<http://solidfire.com/solution/overview/?video=play>
> > *(tm)*

Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Mike Tutkowski <mi...@solidfire.com>.
My initial tendency here is to stick with standard hypervisor snapshots for
4.3 for block storage.

I am curious how this work pans out, though, so I plan to keep up to date
with it.

Thanks!


On Thu, Oct 10, 2013 at 9:06 AM, SuichII, Christopher <
Chris.Suich@netapp.com> wrote:

> Hm, that is tricky. I haven't looked into block stuff too much, but maybe
> we can…
>
> -Create another temporary lun &, register it as a SR/DS
> -Move the active VM to that SR
> -Ask the storage driver to snapshot.
> -Move the active VM back to the original SR
> -Delete the temporary lun and SR
> -Delete the snapshot, causing the snapshot and active VM to be combined
> into one VDI again
>
> This way, the only file on the original lun when the driver snapshot
> occurs is the hypervisor snapshot. Maybe this is way too much work and to
> hackish, though.
>
> Alternatively...
> Maybe initially, quiesce on block storage wouldn't be supported? If
> quiesce was requested on block storage, then the driver could say it isn't
> supported and have the default implementation take an actual hypervisor
> snapshot and keep it. If quiesce wasn't supported, then you can just take a
> snapshot of the lun and not worry about consistency.
>
> I know that this is not a new issue, though. Snapshotting with block
> storage is always more difficult than with NFS.
>
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Oct 10, 2013, at 10:37 AM, Mike Tutkowski <mi...@solidfire.com>
> wrote:
>
> > I wonder if this technique is only going to work for NFS?
> >
> > In the block world, the VDI we take a snapshot of on the SR will lead to
> > the creation of another VDI and a block system cannot just snapshot the
> > hypervisor snapshot - it needs to snapshot the entire volume (which is
> > analogous to the SR).
> >
> >
> > On Thu, Oct 10, 2013 at 6:29 AM, SuichII, Christopher <
> > Chris.Suich@netapp.com> wrote:
> >
> >> Multivendor snapshotting:
> >> The case with two storage providers is a bit trickier and is one that we
> >> are still working on. I believe there are a couple options on the table:
> >>
> >> -Give both storage providers the option to take the snapshot and fail if
> >> either one fails or cannot take the snapshot
> >> -Give both storage providers the option to take the snapshot and use the
> >> hypervisor/default if either one fails or cannot take the snapshot
> >> -Fall back to using the hypervisor/default if the VM has volumes on
> >> storage managed by different providers
> >>
> >> The only purpose of the hypervisor snapshot is to give storage
> providers a
> >> consistent volume to take their snapshot against. Once that snapshot is
> >> taken, the hypervisor snapshot is pushed back into the parent or active
> VM
> >> (essentially removing the fact the hypervisor snapshot ever existed).
> >>
> >>
> >> Quiescing:
> >> This is something that has been debated a lot. Ultimately, one reason
> for
> >> having drivers perform the quiescing is because we don't know how every
> >> storage provider will want to work. As far as I've ever known, any
> storage
> >> provider that wants to create the snapshots themselves will want the VM
> to
> >> be quiesced through the hypervisor. However, there may be some storage
> >> provider that has some way of taking snapshots (that we don't know
> about)
> >> that doesn't require the VM to be quiesced. In that case, we wouldn't
> want
> >> them to be forced into having the VM quiesced before they're asked to
> take
> >> the snapshot.
> >>
> >>
> >> Two snapshot methods:
> >> I believe the main reason for this is that storage drivers may want to
> >> take the snapshot differently depending on whether it is a single volume
> >> snapshot or an entire VM snapshot. Again, erring on the side of
> flexibility
> >> so that things don't have to change when a new storage provider comes
> along
> >> with different requirements.
> >>
> >>
> >> --
> >> Chris Suich
> >> chris.suich@netapp.com
> >> NetApp Software Engineer
> >> Data Center Platforms – Cloud Solutions
> >> Citrix, Cisco & Red Hat
> >>
> >> On Oct 10, 2013, at 1:40 AM, Mike Tutkowski <
> mike.tutkowski@solidfire.com>
> >> wrote:
> >>
> >>> "The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl:
> >>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
> >>> driver:takeVMSnapshot"
> >>>
> >>> I also think it's a bit weird for the storage driver to have any
> >> knowledge
> >>> of VM snapshots.
> >>>
> >>> I would think another part of the system would quiesce (or not) the VM
> in
> >>> question and then the takeSnapshot method would be called on the
> driver.
> >>>
> >>> I might have missed something...why does the driver "care" if the
> >> snapshot
> >>> to be taken is going to be in a consistent state or not (I understand
> why
> >>> the user care, but not the storage driver)? Why is that not a problem
> for
> >>> some other part of the system that is aware of hypervisor snapshots?
> >>> Shouldn't the driver just take a snapshot (or snapshots) as it is
> >>> instructed to do (regardless of whether or not a VM is quiesced)?
> >>>
> >>> Basically I'm wondering why we need two "take snapshot" methods on the
> >>> driver.
> >>>
> >>>
> >>> On Wed, Oct 9, 2013 at 11:24 PM, Mike Tutkowski <
> >>> mike.tutkowski@solidfire.com> wrote:
> >>>
> >>>> Yeah, I'm not really clear how the snapshot strategy works if you have
> >>>> multiple vendors that implement that interface either.
> >>>>
> >>>>
> >>>> On Wed, Oct 9, 2013 at 10:12 PM, Darren Shepherd <
> >>>> darren.s.shepherd@gmail.com> wrote:
> >>>>
> >>>>> Edison,
> >>>>>
> >>>>> I would lean toward doing the coarse grain interface only.  I'm
> having
> >>>>> a hard time seeing how the whole flow is generic and makes sense for
> >>>>> everyone.  With starting with the coarse grain you have the advantage
> >>>>> in that you avoid possible upfront over engineering/over design that
> >>>>> could wreak havoc down the line.  If you implement the
> >>>>> VMSnapshotStrategy and find that it really is useful to other
> >>>>> implementations, you can then implement the fine grain interface
> later
> >>>>> to allow others to benefit from it.
> >>>>>
> >>>>> Darren
> >>>>>
> >>>>> On Wed, Oct 9, 2013 at 8:54 PM, Mike Tutkowski
> >>>>> <mi...@solidfire.com> wrote:
> >>>>>> Hey guys,
> >>>>>>
> >>>>>> I haven't been giving this thread much attention, but am reviewing
> it
> >>>>>> somewhat now.
> >>>>>>
> >>>>>> I'm not really clear how this would work if, say, a VM has two data
> >>>>> disks
> >>>>>> and they are not being provided by the same vendor.
> >>>>>>
> >>>>>> Can someone clarify that for me?
> >>>>>>
> >>>>>> My understanding for how this works today is that it doesn't matter.
> >> For
> >>>>>> XenServer, a VDI is on an SR, which could be supported by storage
> >>>>> vendor X.
> >>>>>> Another VDI could be on another SR, supported by storage vendor Y.
> >>>>>>
> >>>>>> In this case, a new VDI appears on each SR after a hypervisor
> >> snapshot.
> >>>>>>
> >>>>>> Same idea for VMware.
> >>>>>>
> >>>>>> I don't really know how (or if) this works for KVM.
> >>>>>>
> >>>>>> I'm not clear how this multi-vendor situation would play out in this
> >>>>>> pluggable approach.
> >>>>>>
> >>>>>> Thanks!
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Oct 8, 2013 at 4:43 PM, Edison Su <Ed...@citrix.com>
> >> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
> >>>>>>>> Sent: Tuesday, October 08, 2013 2:54 PM
> >>>>>>>> To: dev@cloudstack.apache.org
> >>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
> >>>>>>>>
> >>>>>>>> A hypervisor snapshot will snapshot memory also.  So determining
> >>>>> whether
> >>>>>>> The memory is optional for hypervisor vm snapshot, a.k.a, the
> >>>>> "Disk-only
> >>>>>>> snapshots":
> >>>>>>>
> >>>>>
> >>
> http://support.citrix.com/proddocs/topic/xencenter-61/xs-xc-vms-snapshots-about.html
> >>>>>>> It's supported by both xenserver/kvm/vmware.
> >>>>>>>
> >>>>>>>> do to the hypervisor snapshot from the quiesce option does not
> seem
> >>>>>>>> proper.
> >>>>>>>>
> >>>>>>>> Sorry, for all the questions, I'm trying to get to the point of
> >>>>>>> understand if this
> >>>>>>>> functionality makes sense at this point of code or if maybe their
> is
> >>>>> a
> >>>>>>> different
> >>>>>>>> approach.  This is what I'm seeing, what if we state it this way
> >>>>>>>>
> >>>>>>>> 1) VM snapshot, AFAIK, are not backed up today and exist solely on
> >>>>>>> primary.
> >>>>>>>> What if we added a backup phase to VM snapshots that can be
> >>>>> optionally
> >>>>>>>> supported by the storage providers to possibly backup the VM
> >> snapshot
> >>>>>>>> volumes.
> >>>>>>> It's not about backup vm snapshot, it's about how to take vm
> >> snapshot.
> >>>>>>> Usually, take/revert vm snapshot is handled by hypervisor itself,
> but
> >>>>> in
> >>>>>>> NetApp(or other storage vendor) case,
> >>>>>>> They want to change the default behavior of hypervisor-base vm
> >>>>> snapshot.
> >>>>>>>
> >>>>>>> Some examples:
> >>>>>>> 1. take hypervisor based vm snapshots, on primary storage,
> hypervisor
> >>>>> will
> >>>>>>> maintain the snapshot chain.
> >>>>>>> 2. take vm snapshot through NetApp:
> >>>>>>>    a. first, quiesce VM if user specified. There is no separate API
> >>>>> to
> >>>>>>> quiesce VM on the hypervisor, so here we will
> >>>>>>> take a VM snapshot through hypervisor API call, hypervisor will
> take
> >>>>>>> volume snapshot  on each volume of the VM. Let's say, on the
> primary
> >>>>>>> storage, the disk chain looks like:
> >>>>>>>          base-image
> >>>>>>>                   |
> >>>>>>>                   V
> >>>>>>>               Parent disk
> >>>>>>>           /                         \
> >>>>>>>         V                            V
> >>>>>>>       Current disk        snapshot-a
> >>>>>>>    b. from snapshot-a, find out its parent disk, then take snapshot
> >>>>>>> through NetApp
> >>>>>>>    c. un- quiesce VM, here, go to hypervisor, delete snapshot
> >>>>>>> "snapshot-a", hypervisor should be able to consolidate current disk
> >> and
> >>>>>>> "parent disk" into one disk, thus from hypervisor point of view
> >>>>>>> , there is always, at most, only one snapshot for the VM.
> >>>>>>>   For revert VM snapshot, as long as the VM is stopped, NetApp can
> >>>>>>> revert the snapshot created on NetApp storage easily, and
> >> efficiently.
> >>>>>>>  The benefit of this whole process, as Chris pointed out, if the
> >>>>>>> snapshot chain is quite long, hypervisor based VM snapshot will get
> >>>>>>> performance hit.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> 2) Additionally you want to be able to backup multiple disks at
> >> once,
> >>>>>>>> regardless of VM snapshot.  Why don't we add the ability to put
> >>>>>>> volumeIds in
> >>>>>>>> snapshot cmd that if the storage provider supports it will get a
> >>>>> batch of
> >>>>>>>> volumeIds.
> >>>>>>>>
> >>>>>>>> Now I know we talked about 2 and there was some concerns about it
> >>>>> (mostly
> >>>>>>>> from me), but I think we could work through those concerns (forgot
> >>>>> what
> >>>>>>>> they were...).  Right now I just get the feeling we are
> shoehorning
> >>>>> some
> >>>>>>>> functionality into VM snapshot that isn't quite the right fit.
>  The
> >>>>> "no
> >>>>>>> quiesce"
> >>>>>>>> flow just doesn't seem to make sense to me.
> >>>>>>>
> >>>>>>>
> >>>>>>> Not sure above NetApp proposed work flow makes sense to you or to
> >> other
> >>>>>>> body or not. If this work flow is only specific to NetApp, then we
> >>>>> don't
> >>>>>>> need to enforce the whole process for everybody.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Darren
> >>>>>>>>
> >>>>>>>> On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
> >>>>>>>> <Ch...@netapp.com> wrote:
> >>>>>>>>> Whether the hypervisor snapshot happens depends on whether the
> >>>>>>>> 'quiesce' option is specified with the snapshot request. If a user
> >>>>>>> doesn't care
> >>>>>>>> about the consistency of their backup, then the hypervisor
> >>>>>>> snapshot/quiesce
> >>>>>>>> step can be skipped altogether. This of course is not the case if
> >> the
> >>>>>>> default
> >>>>>>>> provider is being used, in which case a hypervisor snapshot is the
> >>>>> only
> >>>>>>> way of
> >>>>>>>> creating a backup since it can't be offloaded to the storage
> driver.
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Chris Suich
> >>>>>>>>> chris.suich@netapp.com
> >>>>>>>>> NetApp Software Engineer
> >>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>>>>>>>
> >>>>>>>>> On Oct 8, 2013, at 4:57 PM, Darren Shepherd
> >>>>>>>>> <da...@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Who is going to decide whether the hypervisor snapshot should
> >>>>>>>>>> actually happen or not? Or how?
> >>>>>>>>>>
> >>>>>>>>>> Darren
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
> >>>>>>>>>> <Ch...@netapp.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Chris Suich
> >>>>>>>>>>> chris.suich@netapp.com
> >>>>>>>>>>> NetApp Software Engineer
> >>>>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>>>>>>>>>
> >>>>>>>>>>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd
> >>>>>>>> <da...@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> So in the implementation, when we say "quiesce" is that
> actually
> >>>>>>>>>>>> being implemented as a VM snapshot (memory and disk).  And
> then
> >>>>>>>>>>>> when you say "unquiesce" you are talking about deleting the VM
> >>>>>>>> snapshot?
> >>>>>>>>>>>
> >>>>>>>>>>> If the VM snapshot is not going to the hypervisor, then yes, it
> >>>>> will
> >>>>>>>> actually be a hypervisor snapshot. Just to be clear, the unquiesce
> >> is
> >>>>>>> not quite
> >>>>>>>> a delete - it is a collapse of the VM snapshot and the active VM
> >> back
> >>>>>>> into one
> >>>>>>>> file.
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> In NetApp, what are you snapshotting?  The whole netapp volume
> >>>>> (I
> >>>>>>>>>>>> don't know the correct term), a file on NFS, an iscsi volume?
>  I
> >>>>>>>>>>>> don't know a whole heck of a lot about the netapp snapshot
> >>>>>>>> capabilities.
> >>>>>>>>>>>
> >>>>>>>>>>> Essentially we are using internal APIs to create file level
> >>>>> backups
> >>>>>>> - don't
> >>>>>>>> worry too much about the terminology.
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I know storage solutions can snapshot better and faster than
> >>>>>>>>>>>> hypervisors can with COW files.  I've personally just been
> >>>>> always
> >>>>>>>>>>>> perplexed on whats the best way to implement it.  For storage
> >>>>>>>>>>>> solutions that are block based, its really easy to have the
> >>>>> storage
> >>>>>>>>>>>> doing the snapshot.  For shared file systems, like NFS, its
> >>>>> seems
> >>>>>>>>>>>> way more complicated as you don't want to snapshot the entire
> >>>>>>>>>>>> filesystem in order to snapshot one file.
> >>>>>>>>>>>
> >>>>>>>>>>> With filesystems like NFS, things are certainly more
> complicated,
> >>>>>>> but that
> >>>>>>>> is taken care of by our controller's operating system, Data ONTAP,
> >>>>> and we
> >>>>>>>> simply use APIs to communicate with it.
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Darren
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
> >>>>>>>>>>>> <Ch...@netapp.com> wrote:
> >>>>>>>>>>>>> I can comment on the second half.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Through storage operations, storage providers can create
> >>>>> backups
> >>>>>>>> much faster than hypervisors and over time, their snapshots are
> more
> >>>>>>>> efficient than the snapshot chains that hypervisors create. It is
> >>>>> true
> >>>>>>> that a VM
> >>>>>>>> snapshot taken at the storage level is slightly different as it
> >>>>> would be
> >>>>>>> psuedo-
> >>>>>>>> quiesced, not have it's memory snapshotted. This is accomplished
> >>>>> through
> >>>>>>>> hypervisor snapshots:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) VM snapshot request (lets say VM 'A'
> >>>>>>>>>>>>> 2) Create hypervisor snapshot (optional) -VM 'A' is
> >>>>> snapshotted,
> >>>>>>>>>>>>> creating active VM 'A*'
> >>>>>>>>>>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of
> >>>>> 'A*'
> >>>>>>>>>>>>> 3) Storage driver(s) take snapshots of each volume
> >>>>>>>>>>>>> 4) Undo hypervisor snapshot (optional) -VM snapshot 'A' is
> >>>>> rolled
> >>>>>>>>>>>>> back into VM 'A*' so the hypervisor snapshot no longer exists
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Now, a couple notes:
> >>>>>>>>>>>>> -The reason this is optional is that not all users
> necessarily
> >>>>>>> care about
> >>>>>>>> the memory or disk consistency of their VMs and would prefer
> faster
> >>>>>>>> snapshots to consistency.
> >>>>>>>>>>>>> -Preemptively, yes, we are actually taking hypervisor
> snapshots
> >>>>>>> which
> >>>>>>>> means there isn't actually a performance of taking storage
> snapshots
> >>>>> when
> >>>>>>>> quiescing the VM. However, the performance gain will come both
> >> during
> >>>>>>>> restoring the VM and during normal operations as described above.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Although you can think of it as a poor man's VM snapshot, I
> >>>>> would
> >>>>>>>> think of it more as a consistent multi-volume snapshot. Again, the
> >>>>>>> difference
> >>>>>>>> being that this snapshot was not truly quiesced like a hypervisor
> >>>>>>> snapshot
> >>>>>>>> would be.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Chris Suich
> >>>>>>>>>>>>> chris.suich@netapp.com
> >>>>>>>>>>>>> NetApp Software Engineer
> >>>>>>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red
> Hat
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd
> >>>>>>>> <da...@gmail.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> My only comment is that having the return type as boolean
> and
> >>>>>>>>>>>>>> using to that indicate quiesce behaviour seems obscure and
> >>>>> will
> >>>>>>>>>>>>>> probably lead to a problem later.  Your basically saying the
> >>>>>>>>>>>>>> result of the takeVMSnapshot will only ever need to
> >>>>> communicate
> >>>>>>>>>>>>>> back whether unquiesce needs to happen.  Maybe some result
> >>>>>>>> object
> >>>>>>>>>>>>>> would be more extensible.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Actually, I think I have more comments.  This seems a bit
> odd
> >>>>> to
> >>>>>>> me.
> >>>>>>>>>>>>>> Why would a storage driver in ACS implement a VM snapshot
> >>>>>>>>>>>>>> functionality?  VM snapshot is a really a hypervisor
> >>>>> orchestrated
> >>>>>>>>>>>>>> operation.  So it seems like were trying to implement a poor
> >>>>> mans
> >>>>>>>>>>>>>> VM snapshot.  Maybe if I understood what NetApp was trying
> to
> >>>>> do
> >>>>>>>>>>>>>> it would make more sense, but its all odd.  To do a proper
> VM
> >>>>>>>>>>>>>> snapshot you need to snapshot memory and disk at the exact
> >>>>> same
> >>>>>>>>>>>>>> time.  How are we going to do that if ACS is orchestrating
> >>>>> the VM
> >>>>>>>>>>>>>> snapshot and delegating to storage providers.  Its not like
> >>>>> you
> >>>>>>>>>>>>>> are going to pause the VM.... or are you?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Darren
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <
> >>>>> Edison.su@citrix.com>
> >>>>>>>> wrote:
> >>>>>>>>>>>>>>> I created a design document page at
> >>>>>>>>
> >>>>>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+s
> >>>>>>>> napshot+related+operations, feel free to add items on it.
> >>>>>>>>>>>>>>> And a new branch "pluggable_vm_snapshot" is created.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com
> ]
> >>>>>>>>>>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
> >>>>>>>>>>>>>>>> To: <de...@cloudstack.apache.org>
> >>>>>>>>>>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related
> >>>>> operations?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I'm a fan of option 2 - this gives us the most flexibility
> >>>>> (as
> >>>>>>>>>>>>>>>> you stated). The option is given to completely override
> the
> >>>>> way
> >>>>>>>>>>>>>>>> VM snapshots work AND storage providers are given to
> >>>>>>>>>>>>>>>> opportunity to work within the default VM snapshot
> workflow.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I believe this option should satisfy your concern, Mike.
> The
> >>>>>>>>>>>>>>>> snapshot and quiesce strategy would be in charge of
> >>>>>>>> communicating with the hypervisor.
> >>>>>>>>>>>>>>>> Storage providers should be able to leverage the default
> >>>>>>>>>>>>>>>> strategies and simply perform the storage operations.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I don't think it should be much of an issue that new
> method
> >>>>> to
> >>>>>>>>>>>>>>>> the storage driver interface may not apply to everyone. In
> >>>>> fact,
> >>>>>>>> that is already the case.
> >>>>>>>>>>>>>>>> Some methods such as un/maintain(), attachToXXX() and
> >>>>>>>>>>>>>>>> takeSnapshot() are already not implemented by every
> driver -
> >>>>>>>>>>>>>>>> they just return false when asked if they can handle the
> >>>>>>> operation.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> Chris Suich
> >>>>>>>>>>>>>>>> chris.suich@netapp.com
> >>>>>>>>>>>>>>>> NetApp Software Engineer
> >>>>>>>>>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco &
> Red
> >>>>> Hat
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski
> >>>>>>>>>>>>>>>> <mi...@solidfire.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Well, my first thought on this is that the storage driver
> >>>>>>>>>>>>>>>>> should not be telling the hypervisor to do anything. It
> >>>>> should
> >>>>>>>>>>>>>>>>> be responsible for creating/deleting volumes, snapshots,
> >>>>> etc.
> >>>>>>> on
> >>>>>>>> its storage system only.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <
> >>>>>>> Edison.su@citrix.com>
> >>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The
> >>>>>>>>>>>>>>>>>> current workflow will be like the following:
> >>>>>>>>>>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
> >>>>>>>>>>>>>>>>>> creatVMSnapshot -> send CreateVMSnapshotCommand to
> >>>>>>>> hypervisor to create vm snapshot.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> If anybody wants to change the workflow, then need to
> >>>>> either
> >>>>>>>>>>>>>>>>>> change VMSnapshotManagerImpl directly or subclass
> >>>>>>>> VMSnapshotManagerImpl.
> >>>>>>>>>>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl
> >>>>>>>>>>>>>>>>>> should be able to handle different ways to take vm
> >>>>> snapshot,
> >>>>>>>> instead of hard code.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The requirements for the pluggable VM snapshot coming
> >>>>> from:
> >>>>>>>>>>>>>>>>>> Storage vendor may have their optimization, such as
> >>>>> NetApp.
> >>>>>>>>>>>>>>>>>> VM snapshot can be implemented in a totally different
> >>>>> way(For
> >>>>>>>>>>>>>>>>>> example, I could just send a command to guest VM, to
> tell
> >>>>> my
> >>>>>>>>>>>>>>>>>> application to flush disk and hold disk write, then come
> >>>>> to
> >>>>>>>>>>>>>>>>>> hypervisor to
> >>>>>>>>>>>>>>>> take a volume snapshot).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> If we agree on enable pluggable VM snapshot, then we can
> >>>>>>>> move
> >>>>>>>>>>>>>>>>>> on discuss how to implement it.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The possible options:
> >>>>>>>>>>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy
> >>>>>>>>>>>>>>>>>> interface, which has the following interfaces:
> >>>>>>>>>>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
> >>>>>>>> Boolean
> >>>>>>>>>>>>>>>>>> revertVMSnapshot(VMSnapshot vmSnapshot); Boolean
> >>>>>>>>>>>>>>>>>> DeleteVMSnapshot(VMSnapshot vmSnapshot);
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The work flow will be: createVMSnapshot api ->
> >>>>>>>>>>>>>>>> VMSnapshotManagerImpl:
> >>>>>>>>>>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
> >>>>>>>>>>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the
> sanity
> >>>>>>>>>>>>>>>>>> check, then will handle over to VMSnapshotStrategy.
> >>>>>>>>>>>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
> >>>>>>>>>>>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor
> >>>>>>>> host, or
> >>>>>>>>>>>>>>>>>> do anything special operations.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 2. fine-grained interface. Not only add a
> >>>>> VMSnapshotStrategy
> >>>>>>>>>>>>>>>>>> interface, but also add certain methods on the storage
> >>>>> driver.
> >>>>>>>>>>>>>>>>>> The VMSnapshotStrategy interface will be the same as
> >>>>> option 1.
> >>>>>>>>>>>>>>>>>> Will add the following methods on storage driver:
> >>>>>>>>>>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM
> >>>>> that
> >>>>>>>>>>>>>>>>>> created on this storage, storage vendor can either take
> >>>>> one
> >>>>>>>>>>>>>>>>>> snapshot for this volumes in one shot, or take snapshot
> >>>>> for
> >>>>>>>> each volume separately
> >>>>>>>>>>>>>>>>>>  The pre-condition: vm is unquiesced.
> >>>>>>>>>>>>>>>>>>  It will return a Boolean to indicate, do need
> >>>>> unquiesce vm
> >>>>>>> or
> >>>>>>>> not.
> >>>>>>>>>>>>>>>>>>  In the default storage driver, it will return false.
> >>>>>>>>>>>>>>>>>> */
> >>>>>>>>>>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo>
> >>>>>>>> volumesBelongToVM,
> >>>>>>>>>>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> >>>>>>>>>>>>>>>>>> revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> >>>>>>>>>>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> >>>>>>>>>>>>>>>>>> deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> >>>>>>>>>>>>>>>>>> VMSnapshot vmSNapshot);
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The work flow will be: createVMSnapshot api ->
> >>>>>>>>>>>>>>>> VMSnapshotManagerImpl:
> >>>>>>>>>>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot ->
> >>>>>>>>>>>>>>>>>> storage driver:takeVMSnapshot In the implementation of
> >>>>>>>>>>>>>>>>>> VMSnapshotStrategy's takeVMSnapshot, the pseudo code
> >>>>>>>> looks like:
> >>>>>>>>>>>>>>>>>>  HypervisorHelper.quiesceVM(vm);
> >>>>>>>>>>>>>>>>>>  val volumes = vm.getVolumes();
> >>>>>>>>>>>>>>>>>>  val maps = new Map[driver, list[VolumeInfo]]();
> >>>>>>>>>>>>>>>>>>  Volumes.foreach(volume => maps.put(volume.getDriver,
> >>>>>>>> volume ::
> >>>>>>>>>>>>>>>>>> maps.get(volume.getdriver())))
> >>>>>>>>>>>>>>>>>>  val needUnquiesce = true;
> >>>>>>>>>>>>>>>>>>   maps.foreach((driver, volumes) => needUnquiesce  =
> >>>>>>>>>>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
> >>>>>>>>>>>>>>>>>> if (needUnquiesce ) {
> >>>>>>>>>>>>>>>>>>  HypervisorHelper.unquiesce(vm); }
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> By default, the quiesceVM in HypervisorHelper will
> >>>>> actually
> >>>>>>>>>>>>>>>>>> take vm snapshot through hypervisor.
> >>>>>>>>>>>>>>>>>> Does above logic makes senesce?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The pros of option 1 is that: it's simple, no need to
> >>>>> change
> >>>>>>>>>>>>>>>>>> storage driver interfaces. The cons is that each storage
> >>>>>>>>>>>>>>>>>> vendor need to implement a strategy, maybe they will do
> >>>>> the
> >>>>>>>> same thing.
> >>>>>>>>>>>>>>>>>> The pros of option 2 is that, storage driver won't need
> to
> >>>>>>>>>>>>>>>>>> worry about how to quiesce/unquiesce vm. The cons is
> >>>>> that, it
> >>>>>>>>>>>>>>>>>> will add these methods on each storage drivers, so it
> >>>>> assumes
> >>>>>>>>>>>>>>>>>> that this work flow will work for everybody.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> So which option we should take? Or if you have other
> >>>>> options,
> >>>>>>>>>>>>>>>>>> please let's know.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> *Mike Tutkowski*
> >>>>>>>>>>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
> >>>>>>>>>>>>>>>>> e: mike.tutkowski@solidfire.com
> >>>>>>>>>>>>>>>>> o: 303.746.7302
> >>>>>>>>>>>>>>>>> Advancing the way the world uses the
> >>>>>>>>>>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play
> >
> >>>>>>>>>>>>>>>>> *(tm)*
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> *Mike Tutkowski*
> >>>>>> *Senior CloudStack Developer, SolidFire Inc.*
> >>>>>> e: mike.tutkowski@solidfire.com
> >>>>>> o: 303.746.7302
> >>>>>> Advancing the way the world uses the
> >>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
> >>>>>> *™*
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> *Mike Tutkowski*
> >>>> *Senior CloudStack Developer, SolidFire Inc.*
> >>>> e: mike.tutkowski@solidfire.com
> >>>> o: 303.746.7302
> >>>> Advancing the way the world uses the cloud<
> >> http://solidfire.com/solution/overview/?video=play>
> >>>> *™*
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> *Mike Tutkowski*
> >>> *Senior CloudStack Developer, SolidFire Inc.*
> >>> e: mike.tutkowski@solidfire.com
> >>> o: 303.746.7302
> >>> Advancing the way the world uses the
> >>> cloud<http://solidfire.com/solution/overview/?video=play>
> >>> *™*
> >>
> >>
> >
> >
> > --
> > *Mike Tutkowski*
> > *Senior CloudStack Developer, SolidFire Inc.*
> > e: mike.tutkowski@solidfire.com
> > o: 303.746.7302
> > Advancing the way the world uses the
> > cloud<http://solidfire.com/solution/overview/?video=play>
> > *™*
>
>


-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*

Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by "SuichII, Christopher" <Ch...@netapp.com>.
Hm, that is tricky. I haven't looked into block stuff too much, but maybe we can…

-Create another temporary lun &, register it as a SR/DS
-Move the active VM to that SR 
-Ask the storage driver to snapshot.
-Move the active VM back to the original SR
-Delete the temporary lun and SR
-Delete the snapshot, causing the snapshot and active VM to be combined into one VDI again

This way, the only file on the original lun when the driver snapshot occurs is the hypervisor snapshot. Maybe this is way too much work and to hackish, though.

Alternatively...
Maybe initially, quiesce on block storage wouldn't be supported? If quiesce was requested on block storage, then the driver could say it isn't supported and have the default implementation take an actual hypervisor snapshot and keep it. If quiesce wasn't supported, then you can just take a snapshot of the lun and not worry about consistency.

I know that this is not a new issue, though. Snapshotting with block storage is always more difficult than with NFS.

-- 
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Oct 10, 2013, at 10:37 AM, Mike Tutkowski <mi...@solidfire.com> wrote:

> I wonder if this technique is only going to work for NFS?
> 
> In the block world, the VDI we take a snapshot of on the SR will lead to
> the creation of another VDI and a block system cannot just snapshot the
> hypervisor snapshot - it needs to snapshot the entire volume (which is
> analogous to the SR).
> 
> 
> On Thu, Oct 10, 2013 at 6:29 AM, SuichII, Christopher <
> Chris.Suich@netapp.com> wrote:
> 
>> Multivendor snapshotting:
>> The case with two storage providers is a bit trickier and is one that we
>> are still working on. I believe there are a couple options on the table:
>> 
>> -Give both storage providers the option to take the snapshot and fail if
>> either one fails or cannot take the snapshot
>> -Give both storage providers the option to take the snapshot and use the
>> hypervisor/default if either one fails or cannot take the snapshot
>> -Fall back to using the hypervisor/default if the VM has volumes on
>> storage managed by different providers
>> 
>> The only purpose of the hypervisor snapshot is to give storage providers a
>> consistent volume to take their snapshot against. Once that snapshot is
>> taken, the hypervisor snapshot is pushed back into the parent or active VM
>> (essentially removing the fact the hypervisor snapshot ever existed).
>> 
>> 
>> Quiescing:
>> This is something that has been debated a lot. Ultimately, one reason for
>> having drivers perform the quiescing is because we don't know how every
>> storage provider will want to work. As far as I've ever known, any storage
>> provider that wants to create the snapshots themselves will want the VM to
>> be quiesced through the hypervisor. However, there may be some storage
>> provider that has some way of taking snapshots (that we don't know about)
>> that doesn't require the VM to be quiesced. In that case, we wouldn't want
>> them to be forced into having the VM quiesced before they're asked to take
>> the snapshot.
>> 
>> 
>> Two snapshot methods:
>> I believe the main reason for this is that storage drivers may want to
>> take the snapshot differently depending on whether it is a single volume
>> snapshot or an entire VM snapshot. Again, erring on the side of flexibility
>> so that things don't have to change when a new storage provider comes along
>> with different requirements.
>> 
>> 
>> --
>> Chris Suich
>> chris.suich@netapp.com
>> NetApp Software Engineer
>> Data Center Platforms – Cloud Solutions
>> Citrix, Cisco & Red Hat
>> 
>> On Oct 10, 2013, at 1:40 AM, Mike Tutkowski <mi...@solidfire.com>
>> wrote:
>> 
>>> "The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl:
>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
>>> driver:takeVMSnapshot"
>>> 
>>> I also think it's a bit weird for the storage driver to have any
>> knowledge
>>> of VM snapshots.
>>> 
>>> I would think another part of the system would quiesce (or not) the VM in
>>> question and then the takeSnapshot method would be called on the driver.
>>> 
>>> I might have missed something...why does the driver "care" if the
>> snapshot
>>> to be taken is going to be in a consistent state or not (I understand why
>>> the user care, but not the storage driver)? Why is that not a problem for
>>> some other part of the system that is aware of hypervisor snapshots?
>>> Shouldn't the driver just take a snapshot (or snapshots) as it is
>>> instructed to do (regardless of whether or not a VM is quiesced)?
>>> 
>>> Basically I'm wondering why we need two "take snapshot" methods on the
>>> driver.
>>> 
>>> 
>>> On Wed, Oct 9, 2013 at 11:24 PM, Mike Tutkowski <
>>> mike.tutkowski@solidfire.com> wrote:
>>> 
>>>> Yeah, I'm not really clear how the snapshot strategy works if you have
>>>> multiple vendors that implement that interface either.
>>>> 
>>>> 
>>>> On Wed, Oct 9, 2013 at 10:12 PM, Darren Shepherd <
>>>> darren.s.shepherd@gmail.com> wrote:
>>>> 
>>>>> Edison,
>>>>> 
>>>>> I would lean toward doing the coarse grain interface only.  I'm having
>>>>> a hard time seeing how the whole flow is generic and makes sense for
>>>>> everyone.  With starting with the coarse grain you have the advantage
>>>>> in that you avoid possible upfront over engineering/over design that
>>>>> could wreak havoc down the line.  If you implement the
>>>>> VMSnapshotStrategy and find that it really is useful to other
>>>>> implementations, you can then implement the fine grain interface later
>>>>> to allow others to benefit from it.
>>>>> 
>>>>> Darren
>>>>> 
>>>>> On Wed, Oct 9, 2013 at 8:54 PM, Mike Tutkowski
>>>>> <mi...@solidfire.com> wrote:
>>>>>> Hey guys,
>>>>>> 
>>>>>> I haven't been giving this thread much attention, but am reviewing it
>>>>>> somewhat now.
>>>>>> 
>>>>>> I'm not really clear how this would work if, say, a VM has two data
>>>>> disks
>>>>>> and they are not being provided by the same vendor.
>>>>>> 
>>>>>> Can someone clarify that for me?
>>>>>> 
>>>>>> My understanding for how this works today is that it doesn't matter.
>> For
>>>>>> XenServer, a VDI is on an SR, which could be supported by storage
>>>>> vendor X.
>>>>>> Another VDI could be on another SR, supported by storage vendor Y.
>>>>>> 
>>>>>> In this case, a new VDI appears on each SR after a hypervisor
>> snapshot.
>>>>>> 
>>>>>> Same idea for VMware.
>>>>>> 
>>>>>> I don't really know how (or if) this works for KVM.
>>>>>> 
>>>>>> I'm not clear how this multi-vendor situation would play out in this
>>>>>> pluggable approach.
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> 
>>>>>> On Tue, Oct 8, 2013 at 4:43 PM, Edison Su <Ed...@citrix.com>
>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
>>>>>>>> Sent: Tuesday, October 08, 2013 2:54 PM
>>>>>>>> To: dev@cloudstack.apache.org
>>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>>>>>>>> 
>>>>>>>> A hypervisor snapshot will snapshot memory also.  So determining
>>>>> whether
>>>>>>> The memory is optional for hypervisor vm snapshot, a.k.a, the
>>>>> "Disk-only
>>>>>>> snapshots":
>>>>>>> 
>>>>> 
>> http://support.citrix.com/proddocs/topic/xencenter-61/xs-xc-vms-snapshots-about.html
>>>>>>> It's supported by both xenserver/kvm/vmware.
>>>>>>> 
>>>>>>>> do to the hypervisor snapshot from the quiesce option does not seem
>>>>>>>> proper.
>>>>>>>> 
>>>>>>>> Sorry, for all the questions, I'm trying to get to the point of
>>>>>>> understand if this
>>>>>>>> functionality makes sense at this point of code or if maybe their is
>>>>> a
>>>>>>> different
>>>>>>>> approach.  This is what I'm seeing, what if we state it this way
>>>>>>>> 
>>>>>>>> 1) VM snapshot, AFAIK, are not backed up today and exist solely on
>>>>>>> primary.
>>>>>>>> What if we added a backup phase to VM snapshots that can be
>>>>> optionally
>>>>>>>> supported by the storage providers to possibly backup the VM
>> snapshot
>>>>>>>> volumes.
>>>>>>> It's not about backup vm snapshot, it's about how to take vm
>> snapshot.
>>>>>>> Usually, take/revert vm snapshot is handled by hypervisor itself, but
>>>>> in
>>>>>>> NetApp(or other storage vendor) case,
>>>>>>> They want to change the default behavior of hypervisor-base vm
>>>>> snapshot.
>>>>>>> 
>>>>>>> Some examples:
>>>>>>> 1. take hypervisor based vm snapshots, on primary storage, hypervisor
>>>>> will
>>>>>>> maintain the snapshot chain.
>>>>>>> 2. take vm snapshot through NetApp:
>>>>>>>    a. first, quiesce VM if user specified. There is no separate API
>>>>> to
>>>>>>> quiesce VM on the hypervisor, so here we will
>>>>>>> take a VM snapshot through hypervisor API call, hypervisor will take
>>>>>>> volume snapshot  on each volume of the VM. Let's say, on the primary
>>>>>>> storage, the disk chain looks like:
>>>>>>>          base-image
>>>>>>>                   |
>>>>>>>                   V
>>>>>>>               Parent disk
>>>>>>>           /                         \
>>>>>>>         V                            V
>>>>>>>       Current disk        snapshot-a
>>>>>>>    b. from snapshot-a, find out its parent disk, then take snapshot
>>>>>>> through NetApp
>>>>>>>    c. un- quiesce VM, here, go to hypervisor, delete snapshot
>>>>>>> "snapshot-a", hypervisor should be able to consolidate current disk
>> and
>>>>>>> "parent disk" into one disk, thus from hypervisor point of view
>>>>>>> , there is always, at most, only one snapshot for the VM.
>>>>>>>   For revert VM snapshot, as long as the VM is stopped, NetApp can
>>>>>>> revert the snapshot created on NetApp storage easily, and
>> efficiently.
>>>>>>>  The benefit of this whole process, as Chris pointed out, if the
>>>>>>> snapshot chain is quite long, hypervisor based VM snapshot will get
>>>>>>> performance hit.
>>>>>>> 
>>>>>>>> 
>>>>>>>> 2) Additionally you want to be able to backup multiple disks at
>> once,
>>>>>>>> regardless of VM snapshot.  Why don't we add the ability to put
>>>>>>> volumeIds in
>>>>>>>> snapshot cmd that if the storage provider supports it will get a
>>>>> batch of
>>>>>>>> volumeIds.
>>>>>>>> 
>>>>>>>> Now I know we talked about 2 and there was some concerns about it
>>>>> (mostly
>>>>>>>> from me), but I think we could work through those concerns (forgot
>>>>> what
>>>>>>>> they were...).  Right now I just get the feeling we are shoehorning
>>>>> some
>>>>>>>> functionality into VM snapshot that isn't quite the right fit.  The
>>>>> "no
>>>>>>> quiesce"
>>>>>>>> flow just doesn't seem to make sense to me.
>>>>>>> 
>>>>>>> 
>>>>>>> Not sure above NetApp proposed work flow makes sense to you or to
>> other
>>>>>>> body or not. If this work flow is only specific to NetApp, then we
>>>>> don't
>>>>>>> need to enforce the whole process for everybody.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Darren
>>>>>>>> 
>>>>>>>> On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
>>>>>>>> <Ch...@netapp.com> wrote:
>>>>>>>>> Whether the hypervisor snapshot happens depends on whether the
>>>>>>>> 'quiesce' option is specified with the snapshot request. If a user
>>>>>>> doesn't care
>>>>>>>> about the consistency of their backup, then the hypervisor
>>>>>>> snapshot/quiesce
>>>>>>>> step can be skipped altogether. This of course is not the case if
>> the
>>>>>>> default
>>>>>>>> provider is being used, in which case a hypervisor snapshot is the
>>>>> only
>>>>>>> way of
>>>>>>>> creating a backup since it can't be offloaded to the storage driver.
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Chris Suich
>>>>>>>>> chris.suich@netapp.com
>>>>>>>>> NetApp Software Engineer
>>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>>>>>>>>> 
>>>>>>>>> On Oct 8, 2013, at 4:57 PM, Darren Shepherd
>>>>>>>>> <da...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Who is going to decide whether the hypervisor snapshot should
>>>>>>>>>> actually happen or not? Or how?
>>>>>>>>>> 
>>>>>>>>>> Darren
>>>>>>>>>> 
>>>>>>>>>> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
>>>>>>>>>> <Ch...@netapp.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Chris Suich
>>>>>>>>>>> chris.suich@netapp.com
>>>>>>>>>>> NetApp Software Engineer
>>>>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>>>>>>>>>>> 
>>>>>>>>>>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd
>>>>>>>> <da...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> So in the implementation, when we say "quiesce" is that actually
>>>>>>>>>>>> being implemented as a VM snapshot (memory and disk).  And then
>>>>>>>>>>>> when you say "unquiesce" you are talking about deleting the VM
>>>>>>>> snapshot?
>>>>>>>>>>> 
>>>>>>>>>>> If the VM snapshot is not going to the hypervisor, then yes, it
>>>>> will
>>>>>>>> actually be a hypervisor snapshot. Just to be clear, the unquiesce
>> is
>>>>>>> not quite
>>>>>>>> a delete - it is a collapse of the VM snapshot and the active VM
>> back
>>>>>>> into one
>>>>>>>> file.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> In NetApp, what are you snapshotting?  The whole netapp volume
>>>>> (I
>>>>>>>>>>>> don't know the correct term), a file on NFS, an iscsi volume?  I
>>>>>>>>>>>> don't know a whole heck of a lot about the netapp snapshot
>>>>>>>> capabilities.
>>>>>>>>>>> 
>>>>>>>>>>> Essentially we are using internal APIs to create file level
>>>>> backups
>>>>>>> - don't
>>>>>>>> worry too much about the terminology.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> I know storage solutions can snapshot better and faster than
>>>>>>>>>>>> hypervisors can with COW files.  I've personally just been
>>>>> always
>>>>>>>>>>>> perplexed on whats the best way to implement it.  For storage
>>>>>>>>>>>> solutions that are block based, its really easy to have the
>>>>> storage
>>>>>>>>>>>> doing the snapshot.  For shared file systems, like NFS, its
>>>>> seems
>>>>>>>>>>>> way more complicated as you don't want to snapshot the entire
>>>>>>>>>>>> filesystem in order to snapshot one file.
>>>>>>>>>>> 
>>>>>>>>>>> With filesystems like NFS, things are certainly more complicated,
>>>>>>> but that
>>>>>>>> is taken care of by our controller's operating system, Data ONTAP,
>>>>> and we
>>>>>>>> simply use APIs to communicate with it.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Darren
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
>>>>>>>>>>>> <Ch...@netapp.com> wrote:
>>>>>>>>>>>>> I can comment on the second half.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Through storage operations, storage providers can create
>>>>> backups
>>>>>>>> much faster than hypervisors and over time, their snapshots are more
>>>>>>>> efficient than the snapshot chains that hypervisors create. It is
>>>>> true
>>>>>>> that a VM
>>>>>>>> snapshot taken at the storage level is slightly different as it
>>>>> would be
>>>>>>> psuedo-
>>>>>>>> quiesced, not have it's memory snapshotted. This is accomplished
>>>>> through
>>>>>>>> hypervisor snapshots:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1) VM snapshot request (lets say VM 'A'
>>>>>>>>>>>>> 2) Create hypervisor snapshot (optional) -VM 'A' is
>>>>> snapshotted,
>>>>>>>>>>>>> creating active VM 'A*'
>>>>>>>>>>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of
>>>>> 'A*'
>>>>>>>>>>>>> 3) Storage driver(s) take snapshots of each volume
>>>>>>>>>>>>> 4) Undo hypervisor snapshot (optional) -VM snapshot 'A' is
>>>>> rolled
>>>>>>>>>>>>> back into VM 'A*' so the hypervisor snapshot no longer exists
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Now, a couple notes:
>>>>>>>>>>>>> -The reason this is optional is that not all users necessarily
>>>>>>> care about
>>>>>>>> the memory or disk consistency of their VMs and would prefer faster
>>>>>>>> snapshots to consistency.
>>>>>>>>>>>>> -Preemptively, yes, we are actually taking hypervisor snapshots
>>>>>>> which
>>>>>>>> means there isn't actually a performance of taking storage snapshots
>>>>> when
>>>>>>>> quiescing the VM. However, the performance gain will come both
>> during
>>>>>>>> restoring the VM and during normal operations as described above.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Although you can think of it as a poor man's VM snapshot, I
>>>>> would
>>>>>>>> think of it more as a consistent multi-volume snapshot. Again, the
>>>>>>> difference
>>>>>>>> being that this snapshot was not truly quiesced like a hypervisor
>>>>>>> snapshot
>>>>>>>> would be.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Chris Suich
>>>>>>>>>>>>> chris.suich@netapp.com
>>>>>>>>>>>>> NetApp Software Engineer
>>>>>>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd
>>>>>>>> <da...@gmail.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> My only comment is that having the return type as boolean and
>>>>>>>>>>>>>> using to that indicate quiesce behaviour seems obscure and
>>>>> will
>>>>>>>>>>>>>> probably lead to a problem later.  Your basically saying the
>>>>>>>>>>>>>> result of the takeVMSnapshot will only ever need to
>>>>> communicate
>>>>>>>>>>>>>> back whether unquiesce needs to happen.  Maybe some result
>>>>>>>> object
>>>>>>>>>>>>>> would be more extensible.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Actually, I think I have more comments.  This seems a bit odd
>>>>> to
>>>>>>> me.
>>>>>>>>>>>>>> Why would a storage driver in ACS implement a VM snapshot
>>>>>>>>>>>>>> functionality?  VM snapshot is a really a hypervisor
>>>>> orchestrated
>>>>>>>>>>>>>> operation.  So it seems like were trying to implement a poor
>>>>> mans
>>>>>>>>>>>>>> VM snapshot.  Maybe if I understood what NetApp was trying to
>>>>> do
>>>>>>>>>>>>>> it would make more sense, but its all odd.  To do a proper VM
>>>>>>>>>>>>>> snapshot you need to snapshot memory and disk at the exact
>>>>> same
>>>>>>>>>>>>>> time.  How are we going to do that if ACS is orchestrating
>>>>> the VM
>>>>>>>>>>>>>> snapshot and delegating to storage providers.  Its not like
>>>>> you
>>>>>>>>>>>>>> are going to pause the VM.... or are you?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Darren
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <
>>>>> Edison.su@citrix.com>
>>>>>>>> wrote:
>>>>>>>>>>>>>>> I created a design document page at
>>>>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+s
>>>>>>>> napshot+related+operations, feel free to add items on it.
>>>>>>>>>>>>>>> And a new branch "pluggable_vm_snapshot" is created.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>>>>>>>>>>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
>>>>>>>>>>>>>>>> To: <de...@cloudstack.apache.org>
>>>>>>>>>>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related
>>>>> operations?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I'm a fan of option 2 - this gives us the most flexibility
>>>>> (as
>>>>>>>>>>>>>>>> you stated). The option is given to completely override the
>>>>> way
>>>>>>>>>>>>>>>> VM snapshots work AND storage providers are given to
>>>>>>>>>>>>>>>> opportunity to work within the default VM snapshot workflow.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I believe this option should satisfy your concern, Mike. The
>>>>>>>>>>>>>>>> snapshot and quiesce strategy would be in charge of
>>>>>>>> communicating with the hypervisor.
>>>>>>>>>>>>>>>> Storage providers should be able to leverage the default
>>>>>>>>>>>>>>>> strategies and simply perform the storage operations.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I don't think it should be much of an issue that new method
>>>>> to
>>>>>>>>>>>>>>>> the storage driver interface may not apply to everyone. In
>>>>> fact,
>>>>>>>> that is already the case.
>>>>>>>>>>>>>>>> Some methods such as un/maintain(), attachToXXX() and
>>>>>>>>>>>>>>>> takeSnapshot() are already not implemented by every driver -
>>>>>>>>>>>>>>>> they just return false when asked if they can handle the
>>>>>>> operation.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Chris Suich
>>>>>>>>>>>>>>>> chris.suich@netapp.com
>>>>>>>>>>>>>>>> NetApp Software Engineer
>>>>>>>>>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red
>>>>> Hat
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski
>>>>>>>>>>>>>>>> <mi...@solidfire.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Well, my first thought on this is that the storage driver
>>>>>>>>>>>>>>>>> should not be telling the hypervisor to do anything. It
>>>>> should
>>>>>>>>>>>>>>>>> be responsible for creating/deleting volumes, snapshots,
>>>>> etc.
>>>>>>> on
>>>>>>>> its storage system only.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <
>>>>>>> Edison.su@citrix.com>
>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The
>>>>>>>>>>>>>>>>>> current workflow will be like the following:
>>>>>>>>>>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
>>>>>>>>>>>>>>>>>> creatVMSnapshot -> send CreateVMSnapshotCommand to
>>>>>>>> hypervisor to create vm snapshot.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> If anybody wants to change the workflow, then need to
>>>>> either
>>>>>>>>>>>>>>>>>> change VMSnapshotManagerImpl directly or subclass
>>>>>>>> VMSnapshotManagerImpl.
>>>>>>>>>>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl
>>>>>>>>>>>>>>>>>> should be able to handle different ways to take vm
>>>>> snapshot,
>>>>>>>> instead of hard code.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The requirements for the pluggable VM snapshot coming
>>>>> from:
>>>>>>>>>>>>>>>>>> Storage vendor may have their optimization, such as
>>>>> NetApp.
>>>>>>>>>>>>>>>>>> VM snapshot can be implemented in a totally different
>>>>> way(For
>>>>>>>>>>>>>>>>>> example, I could just send a command to guest VM, to tell
>>>>> my
>>>>>>>>>>>>>>>>>> application to flush disk and hold disk write, then come
>>>>> to
>>>>>>>>>>>>>>>>>> hypervisor to
>>>>>>>>>>>>>>>> take a volume snapshot).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> If we agree on enable pluggable VM snapshot, then we can
>>>>>>>> move
>>>>>>>>>>>>>>>>>> on discuss how to implement it.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The possible options:
>>>>>>>>>>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy
>>>>>>>>>>>>>>>>>> interface, which has the following interfaces:
>>>>>>>>>>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>> Boolean
>>>>>>>>>>>>>>>>>> revertVMSnapshot(VMSnapshot vmSnapshot); Boolean
>>>>>>>>>>>>>>>>>> DeleteVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>>>>>>>>>>>> VMSnapshotManagerImpl:
>>>>>>>>>>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>>>>>>>>>>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity
>>>>>>>>>>>>>>>>>> check, then will handle over to VMSnapshotStrategy.
>>>>>>>>>>>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
>>>>>>>>>>>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor
>>>>>>>> host, or
>>>>>>>>>>>>>>>>>> do anything special operations.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 2. fine-grained interface. Not only add a
>>>>> VMSnapshotStrategy
>>>>>>>>>>>>>>>>>> interface, but also add certain methods on the storage
>>>>> driver.
>>>>>>>>>>>>>>>>>> The VMSnapshotStrategy interface will be the same as
>>>>> option 1.
>>>>>>>>>>>>>>>>>> Will add the following methods on storage driver:
>>>>>>>>>>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM
>>>>> that
>>>>>>>>>>>>>>>>>> created on this storage, storage vendor can either take
>>>>> one
>>>>>>>>>>>>>>>>>> snapshot for this volumes in one shot, or take snapshot
>>>>> for
>>>>>>>> each volume separately
>>>>>>>>>>>>>>>>>>  The pre-condition: vm is unquiesced.
>>>>>>>>>>>>>>>>>>  It will return a Boolean to indicate, do need
>>>>> unquiesce vm
>>>>>>> or
>>>>>>>> not.
>>>>>>>>>>>>>>>>>>  In the default storage driver, it will return false.
>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo>
>>>>>>>> volumesBelongToVM,
>>>>>>>>>>>>>>>>>> VMSnapshot vmSnapshot); Boolean
>>>>>>>>>>>>>>>>>> revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>>>>>>>>>>> VMSnapshot vmSnapshot); Boolean
>>>>>>>>>>>>>>>>>> deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>>>>>>>>>>> VMSnapshot vmSNapshot);
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>>>>>>>>>>>> VMSnapshotManagerImpl:
>>>>>>>>>>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot ->
>>>>>>>>>>>>>>>>>> storage driver:takeVMSnapshot In the implementation of
>>>>>>>>>>>>>>>>>> VMSnapshotStrategy's takeVMSnapshot, the pseudo code
>>>>>>>> looks like:
>>>>>>>>>>>>>>>>>>  HypervisorHelper.quiesceVM(vm);
>>>>>>>>>>>>>>>>>>  val volumes = vm.getVolumes();
>>>>>>>>>>>>>>>>>>  val maps = new Map[driver, list[VolumeInfo]]();
>>>>>>>>>>>>>>>>>>  Volumes.foreach(volume => maps.put(volume.getDriver,
>>>>>>>> volume ::
>>>>>>>>>>>>>>>>>> maps.get(volume.getdriver())))
>>>>>>>>>>>>>>>>>>  val needUnquiesce = true;
>>>>>>>>>>>>>>>>>>   maps.foreach((driver, volumes) => needUnquiesce  =
>>>>>>>>>>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>>>>>>>>>>>>>>>>>> if (needUnquiesce ) {
>>>>>>>>>>>>>>>>>>  HypervisorHelper.unquiesce(vm); }
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> By default, the quiesceVM in HypervisorHelper will
>>>>> actually
>>>>>>>>>>>>>>>>>> take vm snapshot through hypervisor.
>>>>>>>>>>>>>>>>>> Does above logic makes senesce?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The pros of option 1 is that: it's simple, no need to
>>>>> change
>>>>>>>>>>>>>>>>>> storage driver interfaces. The cons is that each storage
>>>>>>>>>>>>>>>>>> vendor need to implement a strategy, maybe they will do
>>>>> the
>>>>>>>> same thing.
>>>>>>>>>>>>>>>>>> The pros of option 2 is that, storage driver won't need to
>>>>>>>>>>>>>>>>>> worry about how to quiesce/unquiesce vm. The cons is
>>>>> that, it
>>>>>>>>>>>>>>>>>> will add these methods on each storage drivers, so it
>>>>> assumes
>>>>>>>>>>>>>>>>>> that this work flow will work for everybody.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> So which option we should take? Or if you have other
>>>>> options,
>>>>>>>>>>>>>>>>>> please let's know.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> *Mike Tutkowski*
>>>>>>>>>>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>>>>>>>>>>>>>> e: mike.tutkowski@solidfire.com
>>>>>>>>>>>>>>>>> o: 303.746.7302
>>>>>>>>>>>>>>>>> Advancing the way the world uses the
>>>>>>>>>>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>>>>>>>>>>>>>> *(tm)*
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> *Mike Tutkowski*
>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>>> e: mike.tutkowski@solidfire.com
>>>>>> o: 303.746.7302
>>>>>> Advancing the way the world uses the
>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>>> *™*
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> *Mike Tutkowski*
>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>> e: mike.tutkowski@solidfire.com
>>>> o: 303.746.7302
>>>> Advancing the way the world uses the cloud<
>> http://solidfire.com/solution/overview/?video=play>
>>>> *™*
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> *Mike Tutkowski*
>>> *Senior CloudStack Developer, SolidFire Inc.*
>>> e: mike.tutkowski@solidfire.com
>>> o: 303.746.7302
>>> Advancing the way the world uses the
>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>> *™*
>> 
>> 
> 
> 
> -- 
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*


Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Mike Tutkowski <mi...@solidfire.com>.
I wonder if this technique is only going to work for NFS?

In the block world, the VDI we take a snapshot of on the SR will lead to
the creation of another VDI and a block system cannot just snapshot the
hypervisor snapshot - it needs to snapshot the entire volume (which is
analogous to the SR).


On Thu, Oct 10, 2013 at 6:29 AM, SuichII, Christopher <
Chris.Suich@netapp.com> wrote:

> Multivendor snapshotting:
> The case with two storage providers is a bit trickier and is one that we
> are still working on. I believe there are a couple options on the table:
>
> -Give both storage providers the option to take the snapshot and fail if
> either one fails or cannot take the snapshot
> -Give both storage providers the option to take the snapshot and use the
> hypervisor/default if either one fails or cannot take the snapshot
> -Fall back to using the hypervisor/default if the VM has volumes on
> storage managed by different providers
>
> The only purpose of the hypervisor snapshot is to give storage providers a
> consistent volume to take their snapshot against. Once that snapshot is
> taken, the hypervisor snapshot is pushed back into the parent or active VM
> (essentially removing the fact the hypervisor snapshot ever existed).
>
>
> Quiescing:
> This is something that has been debated a lot. Ultimately, one reason for
> having drivers perform the quiescing is because we don't know how every
> storage provider will want to work. As far as I've ever known, any storage
> provider that wants to create the snapshots themselves will want the VM to
> be quiesced through the hypervisor. However, there may be some storage
> provider that has some way of taking snapshots (that we don't know about)
> that doesn't require the VM to be quiesced. In that case, we wouldn't want
> them to be forced into having the VM quiesced before they're asked to take
> the snapshot.
>
>
> Two snapshot methods:
> I believe the main reason for this is that storage drivers may want to
> take the snapshot differently depending on whether it is a single volume
> snapshot or an entire VM snapshot. Again, erring on the side of flexibility
> so that things don't have to change when a new storage provider comes along
> with different requirements.
>
>
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Oct 10, 2013, at 1:40 AM, Mike Tutkowski <mi...@solidfire.com>
> wrote:
>
> > "The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl:
> > creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
> > driver:takeVMSnapshot"
> >
> > I also think it's a bit weird for the storage driver to have any
> knowledge
> > of VM snapshots.
> >
> > I would think another part of the system would quiesce (or not) the VM in
> > question and then the takeSnapshot method would be called on the driver.
> >
> > I might have missed something...why does the driver "care" if the
> snapshot
> > to be taken is going to be in a consistent state or not (I understand why
> > the user care, but not the storage driver)? Why is that not a problem for
> > some other part of the system that is aware of hypervisor snapshots?
> > Shouldn't the driver just take a snapshot (or snapshots) as it is
> > instructed to do (regardless of whether or not a VM is quiesced)?
> >
> > Basically I'm wondering why we need two "take snapshot" methods on the
> > driver.
> >
> >
> > On Wed, Oct 9, 2013 at 11:24 PM, Mike Tutkowski <
> > mike.tutkowski@solidfire.com> wrote:
> >
> >> Yeah, I'm not really clear how the snapshot strategy works if you have
> >> multiple vendors that implement that interface either.
> >>
> >>
> >> On Wed, Oct 9, 2013 at 10:12 PM, Darren Shepherd <
> >> darren.s.shepherd@gmail.com> wrote:
> >>
> >>> Edison,
> >>>
> >>> I would lean toward doing the coarse grain interface only.  I'm having
> >>> a hard time seeing how the whole flow is generic and makes sense for
> >>> everyone.  With starting with the coarse grain you have the advantage
> >>> in that you avoid possible upfront over engineering/over design that
> >>> could wreak havoc down the line.  If you implement the
> >>> VMSnapshotStrategy and find that it really is useful to other
> >>> implementations, you can then implement the fine grain interface later
> >>> to allow others to benefit from it.
> >>>
> >>> Darren
> >>>
> >>> On Wed, Oct 9, 2013 at 8:54 PM, Mike Tutkowski
> >>> <mi...@solidfire.com> wrote:
> >>>> Hey guys,
> >>>>
> >>>> I haven't been giving this thread much attention, but am reviewing it
> >>>> somewhat now.
> >>>>
> >>>> I'm not really clear how this would work if, say, a VM has two data
> >>> disks
> >>>> and they are not being provided by the same vendor.
> >>>>
> >>>> Can someone clarify that for me?
> >>>>
> >>>> My understanding for how this works today is that it doesn't matter.
> For
> >>>> XenServer, a VDI is on an SR, which could be supported by storage
> >>> vendor X.
> >>>> Another VDI could be on another SR, supported by storage vendor Y.
> >>>>
> >>>> In this case, a new VDI appears on each SR after a hypervisor
> snapshot.
> >>>>
> >>>> Same idea for VMware.
> >>>>
> >>>> I don't really know how (or if) this works for KVM.
> >>>>
> >>>> I'm not clear how this multi-vendor situation would play out in this
> >>>> pluggable approach.
> >>>>
> >>>> Thanks!
> >>>>
> >>>>
> >>>> On Tue, Oct 8, 2013 at 4:43 PM, Edison Su <Ed...@citrix.com>
> wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
> >>>>>> Sent: Tuesday, October 08, 2013 2:54 PM
> >>>>>> To: dev@cloudstack.apache.org
> >>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
> >>>>>>
> >>>>>> A hypervisor snapshot will snapshot memory also.  So determining
> >>> whether
> >>>>> The memory is optional for hypervisor vm snapshot, a.k.a, the
> >>> "Disk-only
> >>>>> snapshots":
> >>>>>
> >>>
> http://support.citrix.com/proddocs/topic/xencenter-61/xs-xc-vms-snapshots-about.html
> >>>>> It's supported by both xenserver/kvm/vmware.
> >>>>>
> >>>>>> do to the hypervisor snapshot from the quiesce option does not seem
> >>>>>> proper.
> >>>>>>
> >>>>>> Sorry, for all the questions, I'm trying to get to the point of
> >>>>> understand if this
> >>>>>> functionality makes sense at this point of code or if maybe their is
> >>> a
> >>>>> different
> >>>>>> approach.  This is what I'm seeing, what if we state it this way
> >>>>>>
> >>>>>> 1) VM snapshot, AFAIK, are not backed up today and exist solely on
> >>>>> primary.
> >>>>>> What if we added a backup phase to VM snapshots that can be
> >>> optionally
> >>>>>> supported by the storage providers to possibly backup the VM
> snapshot
> >>>>>> volumes.
> >>>>> It's not about backup vm snapshot, it's about how to take vm
> snapshot.
> >>>>> Usually, take/revert vm snapshot is handled by hypervisor itself, but
> >>> in
> >>>>> NetApp(or other storage vendor) case,
> >>>>> They want to change the default behavior of hypervisor-base vm
> >>> snapshot.
> >>>>>
> >>>>> Some examples:
> >>>>> 1. take hypervisor based vm snapshots, on primary storage, hypervisor
> >>> will
> >>>>> maintain the snapshot chain.
> >>>>> 2. take vm snapshot through NetApp:
> >>>>>     a. first, quiesce VM if user specified. There is no separate API
> >>> to
> >>>>> quiesce VM on the hypervisor, so here we will
> >>>>> take a VM snapshot through hypervisor API call, hypervisor will take
> >>>>> volume snapshot  on each volume of the VM. Let's say, on the primary
> >>>>> storage, the disk chain looks like:
> >>>>>           base-image
> >>>>>                    |
> >>>>>                    V
> >>>>>                Parent disk
> >>>>>            /                         \
> >>>>>          V                            V
> >>>>>        Current disk        snapshot-a
> >>>>>     b. from snapshot-a, find out its parent disk, then take snapshot
> >>>>> through NetApp
> >>>>>     c. un- quiesce VM, here, go to hypervisor, delete snapshot
> >>>>> "snapshot-a", hypervisor should be able to consolidate current disk
> and
> >>>>> "parent disk" into one disk, thus from hypervisor point of view
> >>>>> , there is always, at most, only one snapshot for the VM.
> >>>>>    For revert VM snapshot, as long as the VM is stopped, NetApp can
> >>>>> revert the snapshot created on NetApp storage easily, and
> efficiently.
> >>>>>   The benefit of this whole process, as Chris pointed out, if the
> >>>>> snapshot chain is quite long, hypervisor based VM snapshot will get
> >>>>> performance hit.
> >>>>>
> >>>>>>
> >>>>>> 2) Additionally you want to be able to backup multiple disks at
> once,
> >>>>>> regardless of VM snapshot.  Why don't we add the ability to put
> >>>>> volumeIds in
> >>>>>> snapshot cmd that if the storage provider supports it will get a
> >>> batch of
> >>>>>> volumeIds.
> >>>>>>
> >>>>>> Now I know we talked about 2 and there was some concerns about it
> >>> (mostly
> >>>>>> from me), but I think we could work through those concerns (forgot
> >>> what
> >>>>>> they were...).  Right now I just get the feeling we are shoehorning
> >>> some
> >>>>>> functionality into VM snapshot that isn't quite the right fit.  The
> >>> "no
> >>>>> quiesce"
> >>>>>> flow just doesn't seem to make sense to me.
> >>>>>
> >>>>>
> >>>>> Not sure above NetApp proposed work flow makes sense to you or to
> other
> >>>>> body or not. If this work flow is only specific to NetApp, then we
> >>> don't
> >>>>> need to enforce the whole process for everybody.
> >>>>>
> >>>>>>
> >>>>>> Darren
> >>>>>>
> >>>>>> On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
> >>>>>> <Ch...@netapp.com> wrote:
> >>>>>>> Whether the hypervisor snapshot happens depends on whether the
> >>>>>> 'quiesce' option is specified with the snapshot request. If a user
> >>>>> doesn't care
> >>>>>> about the consistency of their backup, then the hypervisor
> >>>>> snapshot/quiesce
> >>>>>> step can be skipped altogether. This of course is not the case if
> the
> >>>>> default
> >>>>>> provider is being used, in which case a hypervisor snapshot is the
> >>> only
> >>>>> way of
> >>>>>> creating a backup since it can't be offloaded to the storage driver.
> >>>>>>>
> >>>>>>> --
> >>>>>>> Chris Suich
> >>>>>>> chris.suich@netapp.com
> >>>>>>> NetApp Software Engineer
> >>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>>>>>
> >>>>>>> On Oct 8, 2013, at 4:57 PM, Darren Shepherd
> >>>>>>> <da...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Who is going to decide whether the hypervisor snapshot should
> >>>>>>>> actually happen or not? Or how?
> >>>>>>>>
> >>>>>>>> Darren
> >>>>>>>>
> >>>>>>>> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
> >>>>>>>> <Ch...@netapp.com> wrote:
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Chris Suich
> >>>>>>>>> chris.suich@netapp.com
> >>>>>>>>> NetApp Software Engineer
> >>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>>>>>>>
> >>>>>>>>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd
> >>>>>> <da...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> So in the implementation, when we say "quiesce" is that actually
> >>>>>>>>>> being implemented as a VM snapshot (memory and disk).  And then
> >>>>>>>>>> when you say "unquiesce" you are talking about deleting the VM
> >>>>>> snapshot?
> >>>>>>>>>
> >>>>>>>>> If the VM snapshot is not going to the hypervisor, then yes, it
> >>> will
> >>>>>> actually be a hypervisor snapshot. Just to be clear, the unquiesce
> is
> >>>>> not quite
> >>>>>> a delete - it is a collapse of the VM snapshot and the active VM
> back
> >>>>> into one
> >>>>>> file.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> In NetApp, what are you snapshotting?  The whole netapp volume
> >>> (I
> >>>>>>>>>> don't know the correct term), a file on NFS, an iscsi volume?  I
> >>>>>>>>>> don't know a whole heck of a lot about the netapp snapshot
> >>>>>> capabilities.
> >>>>>>>>>
> >>>>>>>>> Essentially we are using internal APIs to create file level
> >>> backups
> >>>>> - don't
> >>>>>> worry too much about the terminology.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I know storage solutions can snapshot better and faster than
> >>>>>>>>>> hypervisors can with COW files.  I've personally just been
> >>> always
> >>>>>>>>>> perplexed on whats the best way to implement it.  For storage
> >>>>>>>>>> solutions that are block based, its really easy to have the
> >>> storage
> >>>>>>>>>> doing the snapshot.  For shared file systems, like NFS, its
> >>> seems
> >>>>>>>>>> way more complicated as you don't want to snapshot the entire
> >>>>>>>>>> filesystem in order to snapshot one file.
> >>>>>>>>>
> >>>>>>>>> With filesystems like NFS, things are certainly more complicated,
> >>>>> but that
> >>>>>> is taken care of by our controller's operating system, Data ONTAP,
> >>> and we
> >>>>>> simply use APIs to communicate with it.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Darren
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
> >>>>>>>>>> <Ch...@netapp.com> wrote:
> >>>>>>>>>>> I can comment on the second half.
> >>>>>>>>>>>
> >>>>>>>>>>> Through storage operations, storage providers can create
> >>> backups
> >>>>>> much faster than hypervisors and over time, their snapshots are more
> >>>>>> efficient than the snapshot chains that hypervisors create. It is
> >>> true
> >>>>> that a VM
> >>>>>> snapshot taken at the storage level is slightly different as it
> >>> would be
> >>>>> psuedo-
> >>>>>> quiesced, not have it's memory snapshotted. This is accomplished
> >>> through
> >>>>>> hypervisor snapshots:
> >>>>>>>>>>>
> >>>>>>>>>>> 1) VM snapshot request (lets say VM 'A'
> >>>>>>>>>>> 2) Create hypervisor snapshot (optional) -VM 'A' is
> >>> snapshotted,
> >>>>>>>>>>> creating active VM 'A*'
> >>>>>>>>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of
> >>> 'A*'
> >>>>>>>>>>> 3) Storage driver(s) take snapshots of each volume
> >>>>>>>>>>> 4) Undo hypervisor snapshot (optional) -VM snapshot 'A' is
> >>> rolled
> >>>>>>>>>>> back into VM 'A*' so the hypervisor snapshot no longer exists
> >>>>>>>>>>>
> >>>>>>>>>>> Now, a couple notes:
> >>>>>>>>>>> -The reason this is optional is that not all users necessarily
> >>>>> care about
> >>>>>> the memory or disk consistency of their VMs and would prefer faster
> >>>>>> snapshots to consistency.
> >>>>>>>>>>> -Preemptively, yes, we are actually taking hypervisor snapshots
> >>>>> which
> >>>>>> means there isn't actually a performance of taking storage snapshots
> >>> when
> >>>>>> quiescing the VM. However, the performance gain will come both
> during
> >>>>>> restoring the VM and during normal operations as described above.
> >>>>>>>>>>>
> >>>>>>>>>>> Although you can think of it as a poor man's VM snapshot, I
> >>> would
> >>>>>> think of it more as a consistent multi-volume snapshot. Again, the
> >>>>> difference
> >>>>>> being that this snapshot was not truly quiesced like a hypervisor
> >>>>> snapshot
> >>>>>> would be.
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Chris Suich
> >>>>>>>>>>> chris.suich@netapp.com
> >>>>>>>>>>> NetApp Software Engineer
> >>>>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>>>>>>>>>
> >>>>>>>>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd
> >>>>>> <da...@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> My only comment is that having the return type as boolean and
> >>>>>>>>>>>> using to that indicate quiesce behaviour seems obscure and
> >>> will
> >>>>>>>>>>>> probably lead to a problem later.  Your basically saying the
> >>>>>>>>>>>> result of the takeVMSnapshot will only ever need to
> >>> communicate
> >>>>>>>>>>>> back whether unquiesce needs to happen.  Maybe some result
> >>>>>> object
> >>>>>>>>>>>> would be more extensible.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Actually, I think I have more comments.  This seems a bit odd
> >>> to
> >>>>> me.
> >>>>>>>>>>>> Why would a storage driver in ACS implement a VM snapshot
> >>>>>>>>>>>> functionality?  VM snapshot is a really a hypervisor
> >>> orchestrated
> >>>>>>>>>>>> operation.  So it seems like were trying to implement a poor
> >>> mans
> >>>>>>>>>>>> VM snapshot.  Maybe if I understood what NetApp was trying to
> >>> do
> >>>>>>>>>>>> it would make more sense, but its all odd.  To do a proper VM
> >>>>>>>>>>>> snapshot you need to snapshot memory and disk at the exact
> >>> same
> >>>>>>>>>>>> time.  How are we going to do that if ACS is orchestrating
> >>> the VM
> >>>>>>>>>>>> snapshot and delegating to storage providers.  Its not like
> >>> you
> >>>>>>>>>>>> are going to pause the VM.... or are you?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Darren
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <
> >>> Edison.su@citrix.com>
> >>>>>> wrote:
> >>>>>>>>>>>>> I created a design document page at
> >>>>>>
> >>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+s
> >>>>>> napshot+related+operations, feel free to add items on it.
> >>>>>>>>>>>>> And a new branch "pluggable_vm_snapshot" is created.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
> >>>>>>>>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
> >>>>>>>>>>>>>> To: <de...@cloudstack.apache.org>
> >>>>>>>>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related
> >>> operations?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'm a fan of option 2 - this gives us the most flexibility
> >>> (as
> >>>>>>>>>>>>>> you stated). The option is given to completely override the
> >>> way
> >>>>>>>>>>>>>> VM snapshots work AND storage providers are given to
> >>>>>>>>>>>>>> opportunity to work within the default VM snapshot workflow.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I believe this option should satisfy your concern, Mike. The
> >>>>>>>>>>>>>> snapshot and quiesce strategy would be in charge of
> >>>>>> communicating with the hypervisor.
> >>>>>>>>>>>>>> Storage providers should be able to leverage the default
> >>>>>>>>>>>>>> strategies and simply perform the storage operations.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I don't think it should be much of an issue that new method
> >>> to
> >>>>>>>>>>>>>> the storage driver interface may not apply to everyone. In
> >>> fact,
> >>>>>> that is already the case.
> >>>>>>>>>>>>>> Some methods such as un/maintain(), attachToXXX() and
> >>>>>>>>>>>>>> takeSnapshot() are already not implemented by every driver -
> >>>>>>>>>>>>>> they just return false when asked if they can handle the
> >>>>> operation.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Chris Suich
> >>>>>>>>>>>>>> chris.suich@netapp.com
> >>>>>>>>>>>>>> NetApp Software Engineer
> >>>>>>>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red
> >>> Hat
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski
> >>>>>>>>>>>>>> <mi...@solidfire.com>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Well, my first thought on this is that the storage driver
> >>>>>>>>>>>>>>> should not be telling the hypervisor to do anything. It
> >>> should
> >>>>>>>>>>>>>>> be responsible for creating/deleting volumes, snapshots,
> >>> etc.
> >>>>> on
> >>>>>> its storage system only.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <
> >>>>> Edison.su@citrix.com>
> >>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The
> >>>>>>>>>>>>>>>> current workflow will be like the following:
> >>>>>>>>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
> >>>>>>>>>>>>>>>> creatVMSnapshot -> send CreateVMSnapshotCommand to
> >>>>>> hypervisor to create vm snapshot.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If anybody wants to change the workflow, then need to
> >>> either
> >>>>>>>>>>>>>>>> change VMSnapshotManagerImpl directly or subclass
> >>>>>> VMSnapshotManagerImpl.
> >>>>>>>>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl
> >>>>>>>>>>>>>>>> should be able to handle different ways to take vm
> >>> snapshot,
> >>>>>> instead of hard code.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The requirements for the pluggable VM snapshot coming
> >>> from:
> >>>>>>>>>>>>>>>> Storage vendor may have their optimization, such as
> >>> NetApp.
> >>>>>>>>>>>>>>>> VM snapshot can be implemented in a totally different
> >>> way(For
> >>>>>>>>>>>>>>>> example, I could just send a command to guest VM, to tell
> >>> my
> >>>>>>>>>>>>>>>> application to flush disk and hold disk write, then come
> >>> to
> >>>>>>>>>>>>>>>> hypervisor to
> >>>>>>>>>>>>>> take a volume snapshot).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> If we agree on enable pluggable VM snapshot, then we can
> >>>>>> move
> >>>>>>>>>>>>>>>> on discuss how to implement it.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The possible options:
> >>>>>>>>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy
> >>>>>>>>>>>>>>>> interface, which has the following interfaces:
> >>>>>>>>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
> >>>>>> Boolean
> >>>>>>>>>>>>>>>> revertVMSnapshot(VMSnapshot vmSnapshot); Boolean
> >>>>>>>>>>>>>>>> DeleteVMSnapshot(VMSnapshot vmSnapshot);
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The work flow will be: createVMSnapshot api ->
> >>>>>>>>>>>>>> VMSnapshotManagerImpl:
> >>>>>>>>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
> >>>>>>>>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity
> >>>>>>>>>>>>>>>> check, then will handle over to VMSnapshotStrategy.
> >>>>>>>>>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
> >>>>>>>>>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor
> >>>>>> host, or
> >>>>>>>>>>>>>>>> do anything special operations.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 2. fine-grained interface. Not only add a
> >>> VMSnapshotStrategy
> >>>>>>>>>>>>>>>> interface, but also add certain methods on the storage
> >>> driver.
> >>>>>>>>>>>>>>>> The VMSnapshotStrategy interface will be the same as
> >>> option 1.
> >>>>>>>>>>>>>>>> Will add the following methods on storage driver:
> >>>>>>>>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM
> >>> that
> >>>>>>>>>>>>>>>> created on this storage, storage vendor can either take
> >>> one
> >>>>>>>>>>>>>>>> snapshot for this volumes in one shot, or take snapshot
> >>> for
> >>>>>> each volume separately
> >>>>>>>>>>>>>>>>   The pre-condition: vm is unquiesced.
> >>>>>>>>>>>>>>>>   It will return a Boolean to indicate, do need
> >>> unquiesce vm
> >>>>> or
> >>>>>> not.
> >>>>>>>>>>>>>>>>   In the default storage driver, it will return false.
> >>>>>>>>>>>>>>>> */
> >>>>>>>>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo>
> >>>>>> volumesBelongToVM,
> >>>>>>>>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> >>>>>>>>>>>>>>>> revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> >>>>>>>>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> >>>>>>>>>>>>>>>> deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> >>>>>>>>>>>>>>>> VMSnapshot vmSNapshot);
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The work flow will be: createVMSnapshot api ->
> >>>>>>>>>>>>>> VMSnapshotManagerImpl:
> >>>>>>>>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot ->
> >>>>>>>>>>>>>>>> storage driver:takeVMSnapshot In the implementation of
> >>>>>>>>>>>>>>>> VMSnapshotStrategy's takeVMSnapshot, the pseudo code
> >>>>>> looks like:
> >>>>>>>>>>>>>>>>   HypervisorHelper.quiesceVM(vm);
> >>>>>>>>>>>>>>>>   val volumes = vm.getVolumes();
> >>>>>>>>>>>>>>>>   val maps = new Map[driver, list[VolumeInfo]]();
> >>>>>>>>>>>>>>>>   Volumes.foreach(volume => maps.put(volume.getDriver,
> >>>>>> volume ::
> >>>>>>>>>>>>>>>> maps.get(volume.getdriver())))
> >>>>>>>>>>>>>>>>   val needUnquiesce = true;
> >>>>>>>>>>>>>>>>    maps.foreach((driver, volumes) => needUnquiesce  =
> >>>>>>>>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
> >>>>>>>>>>>>>>>>  if (needUnquiesce ) {
> >>>>>>>>>>>>>>>>   HypervisorHelper.unquiesce(vm); }
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> By default, the quiesceVM in HypervisorHelper will
> >>> actually
> >>>>>>>>>>>>>>>> take vm snapshot through hypervisor.
> >>>>>>>>>>>>>>>> Does above logic makes senesce?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The pros of option 1 is that: it's simple, no need to
> >>> change
> >>>>>>>>>>>>>>>> storage driver interfaces. The cons is that each storage
> >>>>>>>>>>>>>>>> vendor need to implement a strategy, maybe they will do
> >>> the
> >>>>>> same thing.
> >>>>>>>>>>>>>>>> The pros of option 2 is that, storage driver won't need to
> >>>>>>>>>>>>>>>> worry about how to quiesce/unquiesce vm. The cons is
> >>> that, it
> >>>>>>>>>>>>>>>> will add these methods on each storage drivers, so it
> >>> assumes
> >>>>>>>>>>>>>>>> that this work flow will work for everybody.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> So which option we should take? Or if you have other
> >>> options,
> >>>>>>>>>>>>>>>> please let's know.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> *Mike Tutkowski*
> >>>>>>>>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
> >>>>>>>>>>>>>>> e: mike.tutkowski@solidfire.com
> >>>>>>>>>>>>>>> o: 303.746.7302
> >>>>>>>>>>>>>>> Advancing the way the world uses the
> >>>>>>>>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
> >>>>>>>>>>>>>>> *(tm)*
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> *Mike Tutkowski*
> >>>> *Senior CloudStack Developer, SolidFire Inc.*
> >>>> e: mike.tutkowski@solidfire.com
> >>>> o: 303.746.7302
> >>>> Advancing the way the world uses the
> >>>> cloud<http://solidfire.com/solution/overview/?video=play>
> >>>> *™*
> >>>
> >>
> >>
> >>
> >> --
> >> *Mike Tutkowski*
> >> *Senior CloudStack Developer, SolidFire Inc.*
> >> e: mike.tutkowski@solidfire.com
> >> o: 303.746.7302
> >> Advancing the way the world uses the cloud<
> http://solidfire.com/solution/overview/?video=play>
> >> *™*
> >>
> >
> >
> >
> > --
> > *Mike Tutkowski*
> > *Senior CloudStack Developer, SolidFire Inc.*
> > e: mike.tutkowski@solidfire.com
> > o: 303.746.7302
> > Advancing the way the world uses the
> > cloud<http://solidfire.com/solution/overview/?video=play>
> > *™*
>
>


-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*

Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by "SuichII, Christopher" <Ch...@netapp.com>.
Multivendor snapshotting:
The case with two storage providers is a bit trickier and is one that we are still working on. I believe there are a couple options on the table:

-Give both storage providers the option to take the snapshot and fail if either one fails or cannot take the snapshot
-Give both storage providers the option to take the snapshot and use the hypervisor/default if either one fails or cannot take the snapshot
-Fall back to using the hypervisor/default if the VM has volumes on storage managed by different providers

The only purpose of the hypervisor snapshot is to give storage providers a consistent volume to take their snapshot against. Once that snapshot is taken, the hypervisor snapshot is pushed back into the parent or active VM (essentially removing the fact the hypervisor snapshot ever existed).


Quiescing:
This is something that has been debated a lot. Ultimately, one reason for having drivers perform the quiescing is because we don't know how every storage provider will want to work. As far as I've ever known, any storage provider that wants to create the snapshots themselves will want the VM to be quiesced through the hypervisor. However, there may be some storage provider that has some way of taking snapshots (that we don't know about) that doesn't require the VM to be quiesced. In that case, we wouldn't want them to be forced into having the VM quiesced before they're asked to take the snapshot.


Two snapshot methods:
I believe the main reason for this is that storage drivers may want to take the snapshot differently depending on whether it is a single volume snapshot or an entire VM snapshot. Again, erring on the side of flexibility so that things don't have to change when a new storage provider comes along with different requirements.


-- 
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Oct 10, 2013, at 1:40 AM, Mike Tutkowski <mi...@solidfire.com> wrote:

> "The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl:
> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
> driver:takeVMSnapshot"
> 
> I also think it's a bit weird for the storage driver to have any knowledge
> of VM snapshots.
> 
> I would think another part of the system would quiesce (or not) the VM in
> question and then the takeSnapshot method would be called on the driver.
> 
> I might have missed something...why does the driver "care" if the snapshot
> to be taken is going to be in a consistent state or not (I understand why
> the user care, but not the storage driver)? Why is that not a problem for
> some other part of the system that is aware of hypervisor snapshots?
> Shouldn't the driver just take a snapshot (or snapshots) as it is
> instructed to do (regardless of whether or not a VM is quiesced)?
> 
> Basically I'm wondering why we need two "take snapshot" methods on the
> driver.
> 
> 
> On Wed, Oct 9, 2013 at 11:24 PM, Mike Tutkowski <
> mike.tutkowski@solidfire.com> wrote:
> 
>> Yeah, I'm not really clear how the snapshot strategy works if you have
>> multiple vendors that implement that interface either.
>> 
>> 
>> On Wed, Oct 9, 2013 at 10:12 PM, Darren Shepherd <
>> darren.s.shepherd@gmail.com> wrote:
>> 
>>> Edison,
>>> 
>>> I would lean toward doing the coarse grain interface only.  I'm having
>>> a hard time seeing how the whole flow is generic and makes sense for
>>> everyone.  With starting with the coarse grain you have the advantage
>>> in that you avoid possible upfront over engineering/over design that
>>> could wreak havoc down the line.  If you implement the
>>> VMSnapshotStrategy and find that it really is useful to other
>>> implementations, you can then implement the fine grain interface later
>>> to allow others to benefit from it.
>>> 
>>> Darren
>>> 
>>> On Wed, Oct 9, 2013 at 8:54 PM, Mike Tutkowski
>>> <mi...@solidfire.com> wrote:
>>>> Hey guys,
>>>> 
>>>> I haven't been giving this thread much attention, but am reviewing it
>>>> somewhat now.
>>>> 
>>>> I'm not really clear how this would work if, say, a VM has two data
>>> disks
>>>> and they are not being provided by the same vendor.
>>>> 
>>>> Can someone clarify that for me?
>>>> 
>>>> My understanding for how this works today is that it doesn't matter. For
>>>> XenServer, a VDI is on an SR, which could be supported by storage
>>> vendor X.
>>>> Another VDI could be on another SR, supported by storage vendor Y.
>>>> 
>>>> In this case, a new VDI appears on each SR after a hypervisor snapshot.
>>>> 
>>>> Same idea for VMware.
>>>> 
>>>> I don't really know how (or if) this works for KVM.
>>>> 
>>>> I'm not clear how this multi-vendor situation would play out in this
>>>> pluggable approach.
>>>> 
>>>> Thanks!
>>>> 
>>>> 
>>>> On Tue, Oct 8, 2013 at 4:43 PM, Edison Su <Ed...@citrix.com> wrote:
>>>> 
>>>>> 
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
>>>>>> Sent: Tuesday, October 08, 2013 2:54 PM
>>>>>> To: dev@cloudstack.apache.org
>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>>>>>> 
>>>>>> A hypervisor snapshot will snapshot memory also.  So determining
>>> whether
>>>>> The memory is optional for hypervisor vm snapshot, a.k.a, the
>>> "Disk-only
>>>>> snapshots":
>>>>> 
>>> http://support.citrix.com/proddocs/topic/xencenter-61/xs-xc-vms-snapshots-about.html
>>>>> It's supported by both xenserver/kvm/vmware.
>>>>> 
>>>>>> do to the hypervisor snapshot from the quiesce option does not seem
>>>>>> proper.
>>>>>> 
>>>>>> Sorry, for all the questions, I'm trying to get to the point of
>>>>> understand if this
>>>>>> functionality makes sense at this point of code or if maybe their is
>>> a
>>>>> different
>>>>>> approach.  This is what I'm seeing, what if we state it this way
>>>>>> 
>>>>>> 1) VM snapshot, AFAIK, are not backed up today and exist solely on
>>>>> primary.
>>>>>> What if we added a backup phase to VM snapshots that can be
>>> optionally
>>>>>> supported by the storage providers to possibly backup the VM snapshot
>>>>>> volumes.
>>>>> It's not about backup vm snapshot, it's about how to take vm snapshot.
>>>>> Usually, take/revert vm snapshot is handled by hypervisor itself, but
>>> in
>>>>> NetApp(or other storage vendor) case,
>>>>> They want to change the default behavior of hypervisor-base vm
>>> snapshot.
>>>>> 
>>>>> Some examples:
>>>>> 1. take hypervisor based vm snapshots, on primary storage, hypervisor
>>> will
>>>>> maintain the snapshot chain.
>>>>> 2. take vm snapshot through NetApp:
>>>>>     a. first, quiesce VM if user specified. There is no separate API
>>> to
>>>>> quiesce VM on the hypervisor, so here we will
>>>>> take a VM snapshot through hypervisor API call, hypervisor will take
>>>>> volume snapshot  on each volume of the VM. Let's say, on the primary
>>>>> storage, the disk chain looks like:
>>>>>           base-image
>>>>>                    |
>>>>>                    V
>>>>>                Parent disk
>>>>>            /                         \
>>>>>          V                            V
>>>>>        Current disk        snapshot-a
>>>>>     b. from snapshot-a, find out its parent disk, then take snapshot
>>>>> through NetApp
>>>>>     c. un- quiesce VM, here, go to hypervisor, delete snapshot
>>>>> "snapshot-a", hypervisor should be able to consolidate current disk and
>>>>> "parent disk" into one disk, thus from hypervisor point of view
>>>>> , there is always, at most, only one snapshot for the VM.
>>>>>    For revert VM snapshot, as long as the VM is stopped, NetApp can
>>>>> revert the snapshot created on NetApp storage easily, and efficiently.
>>>>>   The benefit of this whole process, as Chris pointed out, if the
>>>>> snapshot chain is quite long, hypervisor based VM snapshot will get
>>>>> performance hit.
>>>>> 
>>>>>> 
>>>>>> 2) Additionally you want to be able to backup multiple disks at once,
>>>>>> regardless of VM snapshot.  Why don't we add the ability to put
>>>>> volumeIds in
>>>>>> snapshot cmd that if the storage provider supports it will get a
>>> batch of
>>>>>> volumeIds.
>>>>>> 
>>>>>> Now I know we talked about 2 and there was some concerns about it
>>> (mostly
>>>>>> from me), but I think we could work through those concerns (forgot
>>> what
>>>>>> they were...).  Right now I just get the feeling we are shoehorning
>>> some
>>>>>> functionality into VM snapshot that isn't quite the right fit.  The
>>> "no
>>>>> quiesce"
>>>>>> flow just doesn't seem to make sense to me.
>>>>> 
>>>>> 
>>>>> Not sure above NetApp proposed work flow makes sense to you or to other
>>>>> body or not. If this work flow is only specific to NetApp, then we
>>> don't
>>>>> need to enforce the whole process for everybody.
>>>>> 
>>>>>> 
>>>>>> Darren
>>>>>> 
>>>>>> On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
>>>>>> <Ch...@netapp.com> wrote:
>>>>>>> Whether the hypervisor snapshot happens depends on whether the
>>>>>> 'quiesce' option is specified with the snapshot request. If a user
>>>>> doesn't care
>>>>>> about the consistency of their backup, then the hypervisor
>>>>> snapshot/quiesce
>>>>>> step can be skipped altogether. This of course is not the case if the
>>>>> default
>>>>>> provider is being used, in which case a hypervisor snapshot is the
>>> only
>>>>> way of
>>>>>> creating a backup since it can't be offloaded to the storage driver.
>>>>>>> 
>>>>>>> --
>>>>>>> Chris Suich
>>>>>>> chris.suich@netapp.com
>>>>>>> NetApp Software Engineer
>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>>>>>>> 
>>>>>>> On Oct 8, 2013, at 4:57 PM, Darren Shepherd
>>>>>>> <da...@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Who is going to decide whether the hypervisor snapshot should
>>>>>>>> actually happen or not? Or how?
>>>>>>>> 
>>>>>>>> Darren
>>>>>>>> 
>>>>>>>> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
>>>>>>>> <Ch...@netapp.com> wrote:
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Chris Suich
>>>>>>>>> chris.suich@netapp.com
>>>>>>>>> NetApp Software Engineer
>>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>>>>>>>>> 
>>>>>>>>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd
>>>>>> <da...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> So in the implementation, when we say "quiesce" is that actually
>>>>>>>>>> being implemented as a VM snapshot (memory and disk).  And then
>>>>>>>>>> when you say "unquiesce" you are talking about deleting the VM
>>>>>> snapshot?
>>>>>>>>> 
>>>>>>>>> If the VM snapshot is not going to the hypervisor, then yes, it
>>> will
>>>>>> actually be a hypervisor snapshot. Just to be clear, the unquiesce is
>>>>> not quite
>>>>>> a delete - it is a collapse of the VM snapshot and the active VM back
>>>>> into one
>>>>>> file.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> In NetApp, what are you snapshotting?  The whole netapp volume
>>> (I
>>>>>>>>>> don't know the correct term), a file on NFS, an iscsi volume?  I
>>>>>>>>>> don't know a whole heck of a lot about the netapp snapshot
>>>>>> capabilities.
>>>>>>>>> 
>>>>>>>>> Essentially we are using internal APIs to create file level
>>> backups
>>>>> - don't
>>>>>> worry too much about the terminology.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I know storage solutions can snapshot better and faster than
>>>>>>>>>> hypervisors can with COW files.  I've personally just been
>>> always
>>>>>>>>>> perplexed on whats the best way to implement it.  For storage
>>>>>>>>>> solutions that are block based, its really easy to have the
>>> storage
>>>>>>>>>> doing the snapshot.  For shared file systems, like NFS, its
>>> seems
>>>>>>>>>> way more complicated as you don't want to snapshot the entire
>>>>>>>>>> filesystem in order to snapshot one file.
>>>>>>>>> 
>>>>>>>>> With filesystems like NFS, things are certainly more complicated,
>>>>> but that
>>>>>> is taken care of by our controller's operating system, Data ONTAP,
>>> and we
>>>>>> simply use APIs to communicate with it.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Darren
>>>>>>>>>> 
>>>>>>>>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
>>>>>>>>>> <Ch...@netapp.com> wrote:
>>>>>>>>>>> I can comment on the second half.
>>>>>>>>>>> 
>>>>>>>>>>> Through storage operations, storage providers can create
>>> backups
>>>>>> much faster than hypervisors and over time, their snapshots are more
>>>>>> efficient than the snapshot chains that hypervisors create. It is
>>> true
>>>>> that a VM
>>>>>> snapshot taken at the storage level is slightly different as it
>>> would be
>>>>> psuedo-
>>>>>> quiesced, not have it's memory snapshotted. This is accomplished
>>> through
>>>>>> hypervisor snapshots:
>>>>>>>>>>> 
>>>>>>>>>>> 1) VM snapshot request (lets say VM 'A'
>>>>>>>>>>> 2) Create hypervisor snapshot (optional) -VM 'A' is
>>> snapshotted,
>>>>>>>>>>> creating active VM 'A*'
>>>>>>>>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of
>>> 'A*'
>>>>>>>>>>> 3) Storage driver(s) take snapshots of each volume
>>>>>>>>>>> 4) Undo hypervisor snapshot (optional) -VM snapshot 'A' is
>>> rolled
>>>>>>>>>>> back into VM 'A*' so the hypervisor snapshot no longer exists
>>>>>>>>>>> 
>>>>>>>>>>> Now, a couple notes:
>>>>>>>>>>> -The reason this is optional is that not all users necessarily
>>>>> care about
>>>>>> the memory or disk consistency of their VMs and would prefer faster
>>>>>> snapshots to consistency.
>>>>>>>>>>> -Preemptively, yes, we are actually taking hypervisor snapshots
>>>>> which
>>>>>> means there isn't actually a performance of taking storage snapshots
>>> when
>>>>>> quiescing the VM. However, the performance gain will come both during
>>>>>> restoring the VM and during normal operations as described above.
>>>>>>>>>>> 
>>>>>>>>>>> Although you can think of it as a poor man's VM snapshot, I
>>> would
>>>>>> think of it more as a consistent multi-volume snapshot. Again, the
>>>>> difference
>>>>>> being that this snapshot was not truly quiesced like a hypervisor
>>>>> snapshot
>>>>>> would be.
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Chris Suich
>>>>>>>>>>> chris.suich@netapp.com
>>>>>>>>>>> NetApp Software Engineer
>>>>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>>>>>>>>>>> 
>>>>>>>>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd
>>>>>> <da...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> My only comment is that having the return type as boolean and
>>>>>>>>>>>> using to that indicate quiesce behaviour seems obscure and
>>> will
>>>>>>>>>>>> probably lead to a problem later.  Your basically saying the
>>>>>>>>>>>> result of the takeVMSnapshot will only ever need to
>>> communicate
>>>>>>>>>>>> back whether unquiesce needs to happen.  Maybe some result
>>>>>> object
>>>>>>>>>>>> would be more extensible.
>>>>>>>>>>>> 
>>>>>>>>>>>> Actually, I think I have more comments.  This seems a bit odd
>>> to
>>>>> me.
>>>>>>>>>>>> Why would a storage driver in ACS implement a VM snapshot
>>>>>>>>>>>> functionality?  VM snapshot is a really a hypervisor
>>> orchestrated
>>>>>>>>>>>> operation.  So it seems like were trying to implement a poor
>>> mans
>>>>>>>>>>>> VM snapshot.  Maybe if I understood what NetApp was trying to
>>> do
>>>>>>>>>>>> it would make more sense, but its all odd.  To do a proper VM
>>>>>>>>>>>> snapshot you need to snapshot memory and disk at the exact
>>> same
>>>>>>>>>>>> time.  How are we going to do that if ACS is orchestrating
>>> the VM
>>>>>>>>>>>> snapshot and delegating to storage providers.  Its not like
>>> you
>>>>>>>>>>>> are going to pause the VM.... or are you?
>>>>>>>>>>>> 
>>>>>>>>>>>> Darren
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <
>>> Edison.su@citrix.com>
>>>>>> wrote:
>>>>>>>>>>>>> I created a design document page at
>>>>>> 
>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+s
>>>>>> napshot+related+operations, feel free to add items on it.
>>>>>>>>>>>>> And a new branch "pluggable_vm_snapshot" is created.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>>>>>>>>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
>>>>>>>>>>>>>> To: <de...@cloudstack.apache.org>
>>>>>>>>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related
>>> operations?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'm a fan of option 2 - this gives us the most flexibility
>>> (as
>>>>>>>>>>>>>> you stated). The option is given to completely override the
>>> way
>>>>>>>>>>>>>> VM snapshots work AND storage providers are given to
>>>>>>>>>>>>>> opportunity to work within the default VM snapshot workflow.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I believe this option should satisfy your concern, Mike. The
>>>>>>>>>>>>>> snapshot and quiesce strategy would be in charge of
>>>>>> communicating with the hypervisor.
>>>>>>>>>>>>>> Storage providers should be able to leverage the default
>>>>>>>>>>>>>> strategies and simply perform the storage operations.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I don't think it should be much of an issue that new method
>>> to
>>>>>>>>>>>>>> the storage driver interface may not apply to everyone. In
>>> fact,
>>>>>> that is already the case.
>>>>>>>>>>>>>> Some methods such as un/maintain(), attachToXXX() and
>>>>>>>>>>>>>> takeSnapshot() are already not implemented by every driver -
>>>>>>>>>>>>>> they just return false when asked if they can handle the
>>>>> operation.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Chris Suich
>>>>>>>>>>>>>> chris.suich@netapp.com
>>>>>>>>>>>>>> NetApp Software Engineer
>>>>>>>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red
>>> Hat
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski
>>>>>>>>>>>>>> <mi...@solidfire.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Well, my first thought on this is that the storage driver
>>>>>>>>>>>>>>> should not be telling the hypervisor to do anything. It
>>> should
>>>>>>>>>>>>>>> be responsible for creating/deleting volumes, snapshots,
>>> etc.
>>>>> on
>>>>>> its storage system only.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <
>>>>> Edison.su@citrix.com>
>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The
>>>>>>>>>>>>>>>> current workflow will be like the following:
>>>>>>>>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
>>>>>>>>>>>>>>>> creatVMSnapshot -> send CreateVMSnapshotCommand to
>>>>>> hypervisor to create vm snapshot.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> If anybody wants to change the workflow, then need to
>>> either
>>>>>>>>>>>>>>>> change VMSnapshotManagerImpl directly or subclass
>>>>>> VMSnapshotManagerImpl.
>>>>>>>>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl
>>>>>>>>>>>>>>>> should be able to handle different ways to take vm
>>> snapshot,
>>>>>> instead of hard code.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The requirements for the pluggable VM snapshot coming
>>> from:
>>>>>>>>>>>>>>>> Storage vendor may have their optimization, such as
>>> NetApp.
>>>>>>>>>>>>>>>> VM snapshot can be implemented in a totally different
>>> way(For
>>>>>>>>>>>>>>>> example, I could just send a command to guest VM, to tell
>>> my
>>>>>>>>>>>>>>>> application to flush disk and hold disk write, then come
>>> to
>>>>>>>>>>>>>>>> hypervisor to
>>>>>>>>>>>>>> take a volume snapshot).
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> If we agree on enable pluggable VM snapshot, then we can
>>>>>> move
>>>>>>>>>>>>>>>> on discuss how to implement it.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The possible options:
>>>>>>>>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy
>>>>>>>>>>>>>>>> interface, which has the following interfaces:
>>>>>>>>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>>>>>> Boolean
>>>>>>>>>>>>>>>> revertVMSnapshot(VMSnapshot vmSnapshot); Boolean
>>>>>>>>>>>>>>>> DeleteVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>>>>>>>>>> VMSnapshotManagerImpl:
>>>>>>>>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>>>>>>>>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity
>>>>>>>>>>>>>>>> check, then will handle over to VMSnapshotStrategy.
>>>>>>>>>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
>>>>>>>>>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor
>>>>>> host, or
>>>>>>>>>>>>>>>> do anything special operations.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2. fine-grained interface. Not only add a
>>> VMSnapshotStrategy
>>>>>>>>>>>>>>>> interface, but also add certain methods on the storage
>>> driver.
>>>>>>>>>>>>>>>> The VMSnapshotStrategy interface will be the same as
>>> option 1.
>>>>>>>>>>>>>>>> Will add the following methods on storage driver:
>>>>>>>>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM
>>> that
>>>>>>>>>>>>>>>> created on this storage, storage vendor can either take
>>> one
>>>>>>>>>>>>>>>> snapshot for this volumes in one shot, or take snapshot
>>> for
>>>>>> each volume separately
>>>>>>>>>>>>>>>>   The pre-condition: vm is unquiesced.
>>>>>>>>>>>>>>>>   It will return a Boolean to indicate, do need
>>> unquiesce vm
>>>>> or
>>>>>> not.
>>>>>>>>>>>>>>>>   In the default storage driver, it will return false.
>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo>
>>>>>> volumesBelongToVM,
>>>>>>>>>>>>>>>> VMSnapshot vmSnapshot); Boolean
>>>>>>>>>>>>>>>> revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>>>>>>>>> VMSnapshot vmSnapshot); Boolean
>>>>>>>>>>>>>>>> deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>>>>>>>>> VMSnapshot vmSNapshot);
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>>>>>>>>>> VMSnapshotManagerImpl:
>>>>>>>>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot ->
>>>>>>>>>>>>>>>> storage driver:takeVMSnapshot In the implementation of
>>>>>>>>>>>>>>>> VMSnapshotStrategy's takeVMSnapshot, the pseudo code
>>>>>> looks like:
>>>>>>>>>>>>>>>>   HypervisorHelper.quiesceVM(vm);
>>>>>>>>>>>>>>>>   val volumes = vm.getVolumes();
>>>>>>>>>>>>>>>>   val maps = new Map[driver, list[VolumeInfo]]();
>>>>>>>>>>>>>>>>   Volumes.foreach(volume => maps.put(volume.getDriver,
>>>>>> volume ::
>>>>>>>>>>>>>>>> maps.get(volume.getdriver())))
>>>>>>>>>>>>>>>>   val needUnquiesce = true;
>>>>>>>>>>>>>>>>    maps.foreach((driver, volumes) => needUnquiesce  =
>>>>>>>>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>>>>>>>>>>>>>>>>  if (needUnquiesce ) {
>>>>>>>>>>>>>>>>   HypervisorHelper.unquiesce(vm); }
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> By default, the quiesceVM in HypervisorHelper will
>>> actually
>>>>>>>>>>>>>>>> take vm snapshot through hypervisor.
>>>>>>>>>>>>>>>> Does above logic makes senesce?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The pros of option 1 is that: it's simple, no need to
>>> change
>>>>>>>>>>>>>>>> storage driver interfaces. The cons is that each storage
>>>>>>>>>>>>>>>> vendor need to implement a strategy, maybe they will do
>>> the
>>>>>> same thing.
>>>>>>>>>>>>>>>> The pros of option 2 is that, storage driver won't need to
>>>>>>>>>>>>>>>> worry about how to quiesce/unquiesce vm. The cons is
>>> that, it
>>>>>>>>>>>>>>>> will add these methods on each storage drivers, so it
>>> assumes
>>>>>>>>>>>>>>>> that this work flow will work for everybody.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> So which option we should take? Or if you have other
>>> options,
>>>>>>>>>>>>>>>> please let's know.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> *Mike Tutkowski*
>>>>>>>>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>>>>>>>>>>>> e: mike.tutkowski@solidfire.com
>>>>>>>>>>>>>>> o: 303.746.7302
>>>>>>>>>>>>>>> Advancing the way the world uses the
>>>>>>>>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>>>>>>>>>>>> *(tm)*
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> *Mike Tutkowski*
>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>> e: mike.tutkowski@solidfire.com
>>>> o: 303.746.7302
>>>> Advancing the way the world uses the
>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>> *™*
>>> 
>> 
>> 
>> 
>> --
>> *Mike Tutkowski*
>> *Senior CloudStack Developer, SolidFire Inc.*
>> e: mike.tutkowski@solidfire.com
>> o: 303.746.7302
>> Advancing the way the world uses the cloud<http://solidfire.com/solution/overview/?video=play>
>> *™*
>> 
> 
> 
> 
> -- 
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*


Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Mike Tutkowski <mi...@solidfire.com>.
"The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl:
creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
driver:takeVMSnapshot"

I also think it's a bit weird for the storage driver to have any knowledge
of VM snapshots.

I would think another part of the system would quiesce (or not) the VM in
question and then the takeSnapshot method would be called on the driver.

I might have missed something...why does the driver "care" if the snapshot
to be taken is going to be in a consistent state or not (I understand why
the user care, but not the storage driver)? Why is that not a problem for
some other part of the system that is aware of hypervisor snapshots?
Shouldn't the driver just take a snapshot (or snapshots) as it is
instructed to do (regardless of whether or not a VM is quiesced)?

Basically I'm wondering why we need two "take snapshot" methods on the
driver.


On Wed, Oct 9, 2013 at 11:24 PM, Mike Tutkowski <
mike.tutkowski@solidfire.com> wrote:

> Yeah, I'm not really clear how the snapshot strategy works if you have
> multiple vendors that implement that interface either.
>
>
> On Wed, Oct 9, 2013 at 10:12 PM, Darren Shepherd <
> darren.s.shepherd@gmail.com> wrote:
>
>> Edison,
>>
>> I would lean toward doing the coarse grain interface only.  I'm having
>> a hard time seeing how the whole flow is generic and makes sense for
>> everyone.  With starting with the coarse grain you have the advantage
>> in that you avoid possible upfront over engineering/over design that
>> could wreak havoc down the line.  If you implement the
>> VMSnapshotStrategy and find that it really is useful to other
>> implementations, you can then implement the fine grain interface later
>> to allow others to benefit from it.
>>
>> Darren
>>
>> On Wed, Oct 9, 2013 at 8:54 PM, Mike Tutkowski
>> <mi...@solidfire.com> wrote:
>> > Hey guys,
>> >
>> > I haven't been giving this thread much attention, but am reviewing it
>> > somewhat now.
>> >
>> > I'm not really clear how this would work if, say, a VM has two data
>> disks
>> > and they are not being provided by the same vendor.
>> >
>> > Can someone clarify that for me?
>> >
>> > My understanding for how this works today is that it doesn't matter. For
>> > XenServer, a VDI is on an SR, which could be supported by storage
>> vendor X.
>> > Another VDI could be on another SR, supported by storage vendor Y.
>> >
>> > In this case, a new VDI appears on each SR after a hypervisor snapshot.
>> >
>> > Same idea for VMware.
>> >
>> > I don't really know how (or if) this works for KVM.
>> >
>> > I'm not clear how this multi-vendor situation would play out in this
>> > pluggable approach.
>> >
>> > Thanks!
>> >
>> >
>> > On Tue, Oct 8, 2013 at 4:43 PM, Edison Su <Ed...@citrix.com> wrote:
>> >
>> >>
>> >>
>> >> > -----Original Message-----
>> >> > From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
>> >> > Sent: Tuesday, October 08, 2013 2:54 PM
>> >> > To: dev@cloudstack.apache.org
>> >> > Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>> >> >
>> >> > A hypervisor snapshot will snapshot memory also.  So determining
>> whether
>> >> The memory is optional for hypervisor vm snapshot, a.k.a, the
>> "Disk-only
>> >> snapshots":
>> >>
>> http://support.citrix.com/proddocs/topic/xencenter-61/xs-xc-vms-snapshots-about.html
>> >> It's supported by both xenserver/kvm/vmware.
>> >>
>> >> > do to the hypervisor snapshot from the quiesce option does not seem
>> >> > proper.
>> >> >
>> >> > Sorry, for all the questions, I'm trying to get to the point of
>> >> understand if this
>> >> > functionality makes sense at this point of code or if maybe their is
>> a
>> >> different
>> >> > approach.  This is what I'm seeing, what if we state it this way
>> >> >
>> >> > 1) VM snapshot, AFAIK, are not backed up today and exist solely on
>> >> primary.
>> >> > What if we added a backup phase to VM snapshots that can be
>> optionally
>> >> > supported by the storage providers to possibly backup the VM snapshot
>> >> > volumes.
>> >> It's not about backup vm snapshot, it's about how to take vm snapshot.
>> >> Usually, take/revert vm snapshot is handled by hypervisor itself, but
>> in
>> >> NetApp(or other storage vendor) case,
>> >> They want to change the default behavior of hypervisor-base vm
>> snapshot.
>> >>
>> >> Some examples:
>> >> 1. take hypervisor based vm snapshots, on primary storage, hypervisor
>> will
>> >> maintain the snapshot chain.
>> >> 2. take vm snapshot through NetApp:
>> >>      a. first, quiesce VM if user specified. There is no separate API
>> to
>> >> quiesce VM on the hypervisor, so here we will
>> >> take a VM snapshot through hypervisor API call, hypervisor will take
>> >> volume snapshot  on each volume of the VM. Let's say, on the primary
>> >> storage, the disk chain looks like:
>> >>            base-image
>> >>                     |
>> >>                     V
>> >>                 Parent disk
>> >>             /                         \
>> >>           V                            V
>> >>         Current disk        snapshot-a
>> >>      b. from snapshot-a, find out its parent disk, then take snapshot
>> >> through NetApp
>> >>      c. un- quiesce VM, here, go to hypervisor, delete snapshot
>> >> "snapshot-a", hypervisor should be able to consolidate current disk and
>> >> "parent disk" into one disk, thus from hypervisor point of view
>> >> , there is always, at most, only one snapshot for the VM.
>> >>     For revert VM snapshot, as long as the VM is stopped, NetApp can
>> >> revert the snapshot created on NetApp storage easily, and efficiently.
>> >>    The benefit of this whole process, as Chris pointed out, if the
>> >> snapshot chain is quite long, hypervisor based VM snapshot will get
>> >> performance hit.
>> >>
>> >> >
>> >> > 2) Additionally you want to be able to backup multiple disks at once,
>> >> > regardless of VM snapshot.  Why don't we add the ability to put
>> >> volumeIds in
>> >> > snapshot cmd that if the storage provider supports it will get a
>> batch of
>> >> > volumeIds.
>> >> >
>> >> > Now I know we talked about 2 and there was some concerns about it
>> (mostly
>> >> > from me), but I think we could work through those concerns (forgot
>> what
>> >> > they were...).  Right now I just get the feeling we are shoehorning
>> some
>> >> > functionality into VM snapshot that isn't quite the right fit.  The
>> "no
>> >> quiesce"
>> >> > flow just doesn't seem to make sense to me.
>> >>
>> >>
>> >> Not sure above NetApp proposed work flow makes sense to you or to other
>> >> body or not. If this work flow is only specific to NetApp, then we
>> don't
>> >> need to enforce the whole process for everybody.
>> >>
>> >> >
>> >> > Darren
>> >> >
>> >> > On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
>> >> > <Ch...@netapp.com> wrote:
>> >> > > Whether the hypervisor snapshot happens depends on whether the
>> >> > 'quiesce' option is specified with the snapshot request. If a user
>> >> doesn't care
>> >> > about the consistency of their backup, then the hypervisor
>> >> snapshot/quiesce
>> >> > step can be skipped altogether. This of course is not the case if the
>> >> default
>> >> > provider is being used, in which case a hypervisor snapshot is the
>> only
>> >> way of
>> >> > creating a backup since it can't be offloaded to the storage driver.
>> >> > >
>> >> > > --
>> >> > > Chris Suich
>> >> > > chris.suich@netapp.com
>> >> > > NetApp Software Engineer
>> >> > > Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>> >> > >
>> >> > > On Oct 8, 2013, at 4:57 PM, Darren Shepherd
>> >> > > <da...@gmail.com>
>> >> > >  wrote:
>> >> > >
>> >> > >> Who is going to decide whether the hypervisor snapshot should
>> >> > >> actually happen or not? Or how?
>> >> > >>
>> >> > >> Darren
>> >> > >>
>> >> > >> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
>> >> > >> <Ch...@netapp.com> wrote:
>> >> > >>>
>> >> > >>> --
>> >> > >>> Chris Suich
>> >> > >>> chris.suich@netapp.com
>> >> > >>> NetApp Software Engineer
>> >> > >>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>> >> > >>>
>> >> > >>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd
>> >> > <da...@gmail.com> wrote:
>> >> > >>>
>> >> > >>>> So in the implementation, when we say "quiesce" is that actually
>> >> > >>>> being implemented as a VM snapshot (memory and disk).  And then
>> >> > >>>> when you say "unquiesce" you are talking about deleting the VM
>> >> > snapshot?
>> >> > >>>
>> >> > >>> If the VM snapshot is not going to the hypervisor, then yes, it
>> will
>> >> > actually be a hypervisor snapshot. Just to be clear, the unquiesce is
>> >> not quite
>> >> > a delete - it is a collapse of the VM snapshot and the active VM back
>> >> into one
>> >> > file.
>> >> > >>>
>> >> > >>>>
>> >> > >>>> In NetApp, what are you snapshotting?  The whole netapp volume
>> (I
>> >> > >>>> don't know the correct term), a file on NFS, an iscsi volume?  I
>> >> > >>>> don't know a whole heck of a lot about the netapp snapshot
>> >> > capabilities.
>> >> > >>>
>> >> > >>> Essentially we are using internal APIs to create file level
>> backups
>> >> - don't
>> >> > worry too much about the terminology.
>> >> > >>>
>> >> > >>>>
>> >> > >>>> I know storage solutions can snapshot better and faster than
>> >> > >>>> hypervisors can with COW files.  I've personally just been
>> always
>> >> > >>>> perplexed on whats the best way to implement it.  For storage
>> >> > >>>> solutions that are block based, its really easy to have the
>> storage
>> >> > >>>> doing the snapshot.  For shared file systems, like NFS, its
>> seems
>> >> > >>>> way more complicated as you don't want to snapshot the entire
>> >> > >>>> filesystem in order to snapshot one file.
>> >> > >>>
>> >> > >>> With filesystems like NFS, things are certainly more complicated,
>> >> but that
>> >> > is taken care of by our controller's operating system, Data ONTAP,
>> and we
>> >> > simply use APIs to communicate with it.
>> >> > >>>
>> >> > >>>>
>> >> > >>>> Darren
>> >> > >>>>
>> >> > >>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
>> >> > >>>> <Ch...@netapp.com> wrote:
>> >> > >>>>> I can comment on the second half.
>> >> > >>>>>
>> >> > >>>>> Through storage operations, storage providers can create
>> backups
>> >> > much faster than hypervisors and over time, their snapshots are more
>> >> > efficient than the snapshot chains that hypervisors create. It is
>> true
>> >> that a VM
>> >> > snapshot taken at the storage level is slightly different as it
>> would be
>> >> psuedo-
>> >> > quiesced, not have it's memory snapshotted. This is accomplished
>> through
>> >> > hypervisor snapshots:
>> >> > >>>>>
>> >> > >>>>> 1) VM snapshot request (lets say VM 'A'
>> >> > >>>>> 2) Create hypervisor snapshot (optional) -VM 'A' is
>> snapshotted,
>> >> > >>>>> creating active VM 'A*'
>> >> > >>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of
>> 'A*'
>> >> > >>>>> 3) Storage driver(s) take snapshots of each volume
>> >> > >>>>> 4) Undo hypervisor snapshot (optional) -VM snapshot 'A' is
>> rolled
>> >> > >>>>> back into VM 'A*' so the hypervisor snapshot no longer exists
>> >> > >>>>>
>> >> > >>>>> Now, a couple notes:
>> >> > >>>>> -The reason this is optional is that not all users necessarily
>> >> care about
>> >> > the memory or disk consistency of their VMs and would prefer faster
>> >> > snapshots to consistency.
>> >> > >>>>> -Preemptively, yes, we are actually taking hypervisor snapshots
>> >> which
>> >> > means there isn't actually a performance of taking storage snapshots
>> when
>> >> > quiescing the VM. However, the performance gain will come both during
>> >> > restoring the VM and during normal operations as described above.
>> >> > >>>>>
>> >> > >>>>> Although you can think of it as a poor man's VM snapshot, I
>> would
>> >> > think of it more as a consistent multi-volume snapshot. Again, the
>> >> difference
>> >> > being that this snapshot was not truly quiesced like a hypervisor
>> >> snapshot
>> >> > would be.
>> >> > >>>>>
>> >> > >>>>> --
>> >> > >>>>> Chris Suich
>> >> > >>>>> chris.suich@netapp.com
>> >> > >>>>> NetApp Software Engineer
>> >> > >>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>> >> > >>>>>
>> >> > >>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd
>> >> > <da...@gmail.com> wrote:
>> >> > >>>>>
>> >> > >>>>>> My only comment is that having the return type as boolean and
>> >> > >>>>>> using to that indicate quiesce behaviour seems obscure and
>> will
>> >> > >>>>>> probably lead to a problem later.  Your basically saying the
>> >> > >>>>>> result of the takeVMSnapshot will only ever need to
>> communicate
>> >> > >>>>>> back whether unquiesce needs to happen.  Maybe some result
>> >> > object
>> >> > >>>>>> would be more extensible.
>> >> > >>>>>>
>> >> > >>>>>> Actually, I think I have more comments.  This seems a bit odd
>> to
>> >> me.
>> >> > >>>>>> Why would a storage driver in ACS implement a VM snapshot
>> >> > >>>>>> functionality?  VM snapshot is a really a hypervisor
>> orchestrated
>> >> > >>>>>> operation.  So it seems like were trying to implement a poor
>> mans
>> >> > >>>>>> VM snapshot.  Maybe if I understood what NetApp was trying to
>> do
>> >> > >>>>>> it would make more sense, but its all odd.  To do a proper VM
>> >> > >>>>>> snapshot you need to snapshot memory and disk at the exact
>> same
>> >> > >>>>>> time.  How are we going to do that if ACS is orchestrating
>> the VM
>> >> > >>>>>> snapshot and delegating to storage providers.  Its not like
>> you
>> >> > >>>>>> are going to pause the VM.... or are you?
>> >> > >>>>>>
>> >> > >>>>>> Darren
>> >> > >>>>>>
>> >> > >>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <
>> Edison.su@citrix.com>
>> >> > wrote:
>> >> > >>>>>>> I created a design document page at
>> >> >
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+s
>> >> > napshot+related+operations, feel free to add items on it.
>> >> > >>>>>>> And a new branch "pluggable_vm_snapshot" is created.
>> >> > >>>>>>>
>> >> > >>>>>>>> -----Original Message-----
>> >> > >>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>> >> > >>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
>> >> > >>>>>>>> To: <de...@cloudstack.apache.org>
>> >> > >>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related
>> operations?
>> >> > >>>>>>>>
>> >> > >>>>>>>> I'm a fan of option 2 - this gives us the most flexibility
>> (as
>> >> > >>>>>>>> you stated). The option is given to completely override the
>> way
>> >> > >>>>>>>> VM snapshots work AND storage providers are given to
>> >> > >>>>>>>> opportunity to work within the default VM snapshot workflow.
>> >> > >>>>>>>>
>> >> > >>>>>>>> I believe this option should satisfy your concern, Mike. The
>> >> > >>>>>>>> snapshot and quiesce strategy would be in charge of
>> >> > communicating with the hypervisor.
>> >> > >>>>>>>> Storage providers should be able to leverage the default
>> >> > >>>>>>>> strategies and simply perform the storage operations.
>> >> > >>>>>>>>
>> >> > >>>>>>>> I don't think it should be much of an issue that new method
>> to
>> >> > >>>>>>>> the storage driver interface may not apply to everyone. In
>> fact,
>> >> > that is already the case.
>> >> > >>>>>>>> Some methods such as un/maintain(), attachToXXX() and
>> >> > >>>>>>>> takeSnapshot() are already not implemented by every driver -
>> >> > >>>>>>>> they just return false when asked if they can handle the
>> >> operation.
>> >> > >>>>>>>>
>> >> > >>>>>>>> --
>> >> > >>>>>>>> Chris Suich
>> >> > >>>>>>>> chris.suich@netapp.com
>> >> > >>>>>>>> NetApp Software Engineer
>> >> > >>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red
>> Hat
>> >> > >>>>>>>>
>> >> > >>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski
>> >> > >>>>>>>> <mi...@solidfire.com>
>> >> > >>>>>>>> wrote:
>> >> > >>>>>>>>
>> >> > >>>>>>>>> Well, my first thought on this is that the storage driver
>> >> > >>>>>>>>> should not be telling the hypervisor to do anything. It
>> should
>> >> > >>>>>>>>> be responsible for creating/deleting volumes, snapshots,
>> etc.
>> >> on
>> >> > its storage system only.
>> >> > >>>>>>>>>
>> >> > >>>>>>>>>
>> >> > >>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <
>> >> Edison.su@citrix.com>
>> >> > wrote:
>> >> > >>>>>>>>>
>> >> > >>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The
>> >> > >>>>>>>>>> current workflow will be like the following:
>> >> > >>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
>> >> > >>>>>>>>>> creatVMSnapshot -> send CreateVMSnapshotCommand to
>> >> > hypervisor to create vm snapshot.
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>> If anybody wants to change the workflow, then need to
>> either
>> >> > >>>>>>>>>> change VMSnapshotManagerImpl directly or subclass
>> >> > VMSnapshotManagerImpl.
>> >> > >>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl
>> >> > >>>>>>>>>> should be able to handle different ways to take vm
>> snapshot,
>> >> > instead of hard code.
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>> The requirements for the pluggable VM snapshot coming
>> from:
>> >> > >>>>>>>>>> Storage vendor may have their optimization, such as
>> NetApp.
>> >> > >>>>>>>>>> VM snapshot can be implemented in a totally different
>> way(For
>> >> > >>>>>>>>>> example, I could just send a command to guest VM, to tell
>> my
>> >> > >>>>>>>>>> application to flush disk and hold disk write, then come
>> to
>> >> > >>>>>>>>>> hypervisor to
>> >> > >>>>>>>> take a volume snapshot).
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>> If we agree on enable pluggable VM snapshot, then we can
>> >> > move
>> >> > >>>>>>>>>> on discuss how to implement it.
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>> The possible options:
>> >> > >>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy
>> >> > >>>>>>>>>> interface, which has the following interfaces:
>> >> > >>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>> >> > Boolean
>> >> > >>>>>>>>>> revertVMSnapshot(VMSnapshot vmSnapshot); Boolean
>> >> > >>>>>>>>>> DeleteVMSnapshot(VMSnapshot vmSnapshot);
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>> The work flow will be: createVMSnapshot api ->
>> >> > >>>>>>>> VMSnapshotManagerImpl:
>> >> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>> >> > >>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity
>> >> > >>>>>>>>>> check, then will handle over to VMSnapshotStrategy.
>> >> > >>>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
>> >> > >>>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor
>> >> > host, or
>> >> > >>>>>>>>>> do anything special operations.
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>> 2. fine-grained interface. Not only add a
>> VMSnapshotStrategy
>> >> > >>>>>>>>>> interface, but also add certain methods on the storage
>> driver.
>> >> > >>>>>>>>>> The VMSnapshotStrategy interface will be the same as
>> option 1.
>> >> > >>>>>>>>>> Will add the following methods on storage driver:
>> >> > >>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM
>> that
>> >> > >>>>>>>>>> created on this storage, storage vendor can either take
>> one
>> >> > >>>>>>>>>> snapshot for this volumes in one shot, or take snapshot
>> for
>> >> > each volume separately
>> >> > >>>>>>>>>>    The pre-condition: vm is unquiesced.
>> >> > >>>>>>>>>>    It will return a Boolean to indicate, do need
>> unquiesce vm
>> >> or
>> >> > not.
>> >> > >>>>>>>>>>    In the default storage driver, it will return false.
>> >> > >>>>>>>>>> */
>> >> > >>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo>
>> >> > volumesBelongToVM,
>> >> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
>> >> > >>>>>>>>>> revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>> >> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
>> >> > >>>>>>>>>> deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>> >> > >>>>>>>>>> VMSnapshot vmSNapshot);
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>> The work flow will be: createVMSnapshot api ->
>> >> > >>>>>>>> VMSnapshotManagerImpl:
>> >> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot ->
>> >> > >>>>>>>>>> storage driver:takeVMSnapshot In the implementation of
>> >> > >>>>>>>>>> VMSnapshotStrategy's takeVMSnapshot, the pseudo code
>> >> > looks like:
>> >> > >>>>>>>>>>    HypervisorHelper.quiesceVM(vm);
>> >> > >>>>>>>>>>    val volumes = vm.getVolumes();
>> >> > >>>>>>>>>>    val maps = new Map[driver, list[VolumeInfo]]();
>> >> > >>>>>>>>>>    Volumes.foreach(volume => maps.put(volume.getDriver,
>> >> > volume ::
>> >> > >>>>>>>>>> maps.get(volume.getdriver())))
>> >> > >>>>>>>>>>    val needUnquiesce = true;
>> >> > >>>>>>>>>>     maps.foreach((driver, volumes) => needUnquiesce  =
>> >> > >>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>> >> > >>>>>>>>>>   if (needUnquiesce ) {
>> >> > >>>>>>>>>>    HypervisorHelper.unquiesce(vm); }
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>> By default, the quiesceVM in HypervisorHelper will
>> actually
>> >> > >>>>>>>>>> take vm snapshot through hypervisor.
>> >> > >>>>>>>>>> Does above logic makes senesce?
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>> The pros of option 1 is that: it's simple, no need to
>> change
>> >> > >>>>>>>>>> storage driver interfaces. The cons is that each storage
>> >> > >>>>>>>>>> vendor need to implement a strategy, maybe they will do
>> the
>> >> > same thing.
>> >> > >>>>>>>>>> The pros of option 2 is that, storage driver won't need to
>> >> > >>>>>>>>>> worry about how to quiesce/unquiesce vm. The cons is
>> that, it
>> >> > >>>>>>>>>> will add these methods on each storage drivers, so it
>> assumes
>> >> > >>>>>>>>>> that this work flow will work for everybody.
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>> So which option we should take? Or if you have other
>> options,
>> >> > >>>>>>>>>> please let's know.
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>>
>> >> > >>>>>>>>>
>> >> > >>>>>>>>>
>> >> > >>>>>>>>> --
>> >> > >>>>>>>>> *Mike Tutkowski*
>> >> > >>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>> >> > >>>>>>>>> e: mike.tutkowski@solidfire.com
>> >> > >>>>>>>>> o: 303.746.7302
>> >> > >>>>>>>>> Advancing the way the world uses the
>> >> > >>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>> >> > >>>>>>>>> *(tm)*
>> >> > >>>>>>>
>> >> > >>>>>
>> >> > >>>
>> >> > >
>> >>
>> >
>> >
>> >
>> > --
>> > *Mike Tutkowski*
>> > *Senior CloudStack Developer, SolidFire Inc.*
>> > e: mike.tutkowski@solidfire.com
>> > o: 303.746.7302
>> > Advancing the way the world uses the
>> > cloud<http://solidfire.com/solution/overview/?video=play>
>> > *™*
>>
>
>
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the cloud<http://solidfire.com/solution/overview/?video=play>
> *™*
>



-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*

Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Mike Tutkowski <mi...@solidfire.com>.
Yeah, I'm not really clear how the snapshot strategy works if you have
multiple vendors that implement that interface either.


On Wed, Oct 9, 2013 at 10:12 PM, Darren Shepherd <
darren.s.shepherd@gmail.com> wrote:

> Edison,
>
> I would lean toward doing the coarse grain interface only.  I'm having
> a hard time seeing how the whole flow is generic and makes sense for
> everyone.  With starting with the coarse grain you have the advantage
> in that you avoid possible upfront over engineering/over design that
> could wreak havoc down the line.  If you implement the
> VMSnapshotStrategy and find that it really is useful to other
> implementations, you can then implement the fine grain interface later
> to allow others to benefit from it.
>
> Darren
>
> On Wed, Oct 9, 2013 at 8:54 PM, Mike Tutkowski
> <mi...@solidfire.com> wrote:
> > Hey guys,
> >
> > I haven't been giving this thread much attention, but am reviewing it
> > somewhat now.
> >
> > I'm not really clear how this would work if, say, a VM has two data disks
> > and they are not being provided by the same vendor.
> >
> > Can someone clarify that for me?
> >
> > My understanding for how this works today is that it doesn't matter. For
> > XenServer, a VDI is on an SR, which could be supported by storage vendor
> X.
> > Another VDI could be on another SR, supported by storage vendor Y.
> >
> > In this case, a new VDI appears on each SR after a hypervisor snapshot.
> >
> > Same idea for VMware.
> >
> > I don't really know how (or if) this works for KVM.
> >
> > I'm not clear how this multi-vendor situation would play out in this
> > pluggable approach.
> >
> > Thanks!
> >
> >
> > On Tue, Oct 8, 2013 at 4:43 PM, Edison Su <Ed...@citrix.com> wrote:
> >
> >>
> >>
> >> > -----Original Message-----
> >> > From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
> >> > Sent: Tuesday, October 08, 2013 2:54 PM
> >> > To: dev@cloudstack.apache.org
> >> > Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
> >> >
> >> > A hypervisor snapshot will snapshot memory also.  So determining
> whether
> >> The memory is optional for hypervisor vm snapshot, a.k.a, the "Disk-only
> >> snapshots":
> >>
> http://support.citrix.com/proddocs/topic/xencenter-61/xs-xc-vms-snapshots-about.html
> >> It's supported by both xenserver/kvm/vmware.
> >>
> >> > do to the hypervisor snapshot from the quiesce option does not seem
> >> > proper.
> >> >
> >> > Sorry, for all the questions, I'm trying to get to the point of
> >> understand if this
> >> > functionality makes sense at this point of code or if maybe their is a
> >> different
> >> > approach.  This is what I'm seeing, what if we state it this way
> >> >
> >> > 1) VM snapshot, AFAIK, are not backed up today and exist solely on
> >> primary.
> >> > What if we added a backup phase to VM snapshots that can be optionally
> >> > supported by the storage providers to possibly backup the VM snapshot
> >> > volumes.
> >> It's not about backup vm snapshot, it's about how to take vm snapshot.
> >> Usually, take/revert vm snapshot is handled by hypervisor itself, but in
> >> NetApp(or other storage vendor) case,
> >> They want to change the default behavior of hypervisor-base vm snapshot.
> >>
> >> Some examples:
> >> 1. take hypervisor based vm snapshots, on primary storage, hypervisor
> will
> >> maintain the snapshot chain.
> >> 2. take vm snapshot through NetApp:
> >>      a. first, quiesce VM if user specified. There is no separate API to
> >> quiesce VM on the hypervisor, so here we will
> >> take a VM snapshot through hypervisor API call, hypervisor will take
> >> volume snapshot  on each volume of the VM. Let's say, on the primary
> >> storage, the disk chain looks like:
> >>            base-image
> >>                     |
> >>                     V
> >>                 Parent disk
> >>             /                         \
> >>           V                            V
> >>         Current disk        snapshot-a
> >>      b. from snapshot-a, find out its parent disk, then take snapshot
> >> through NetApp
> >>      c. un- quiesce VM, here, go to hypervisor, delete snapshot
> >> "snapshot-a", hypervisor should be able to consolidate current disk and
> >> "parent disk" into one disk, thus from hypervisor point of view
> >> , there is always, at most, only one snapshot for the VM.
> >>     For revert VM snapshot, as long as the VM is stopped, NetApp can
> >> revert the snapshot created on NetApp storage easily, and efficiently.
> >>    The benefit of this whole process, as Chris pointed out, if the
> >> snapshot chain is quite long, hypervisor based VM snapshot will get
> >> performance hit.
> >>
> >> >
> >> > 2) Additionally you want to be able to backup multiple disks at once,
> >> > regardless of VM snapshot.  Why don't we add the ability to put
> >> volumeIds in
> >> > snapshot cmd that if the storage provider supports it will get a
> batch of
> >> > volumeIds.
> >> >
> >> > Now I know we talked about 2 and there was some concerns about it
> (mostly
> >> > from me), but I think we could work through those concerns (forgot
> what
> >> > they were...).  Right now I just get the feeling we are shoehorning
> some
> >> > functionality into VM snapshot that isn't quite the right fit.  The
> "no
> >> quiesce"
> >> > flow just doesn't seem to make sense to me.
> >>
> >>
> >> Not sure above NetApp proposed work flow makes sense to you or to other
> >> body or not. If this work flow is only specific to NetApp, then we don't
> >> need to enforce the whole process for everybody.
> >>
> >> >
> >> > Darren
> >> >
> >> > On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
> >> > <Ch...@netapp.com> wrote:
> >> > > Whether the hypervisor snapshot happens depends on whether the
> >> > 'quiesce' option is specified with the snapshot request. If a user
> >> doesn't care
> >> > about the consistency of their backup, then the hypervisor
> >> snapshot/quiesce
> >> > step can be skipped altogether. This of course is not the case if the
> >> default
> >> > provider is being used, in which case a hypervisor snapshot is the
> only
> >> way of
> >> > creating a backup since it can't be offloaded to the storage driver.
> >> > >
> >> > > --
> >> > > Chris Suich
> >> > > chris.suich@netapp.com
> >> > > NetApp Software Engineer
> >> > > Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >> > >
> >> > > On Oct 8, 2013, at 4:57 PM, Darren Shepherd
> >> > > <da...@gmail.com>
> >> > >  wrote:
> >> > >
> >> > >> Who is going to decide whether the hypervisor snapshot should
> >> > >> actually happen or not? Or how?
> >> > >>
> >> > >> Darren
> >> > >>
> >> > >> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
> >> > >> <Ch...@netapp.com> wrote:
> >> > >>>
> >> > >>> --
> >> > >>> Chris Suich
> >> > >>> chris.suich@netapp.com
> >> > >>> NetApp Software Engineer
> >> > >>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >> > >>>
> >> > >>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd
> >> > <da...@gmail.com> wrote:
> >> > >>>
> >> > >>>> So in the implementation, when we say "quiesce" is that actually
> >> > >>>> being implemented as a VM snapshot (memory and disk).  And then
> >> > >>>> when you say "unquiesce" you are talking about deleting the VM
> >> > snapshot?
> >> > >>>
> >> > >>> If the VM snapshot is not going to the hypervisor, then yes, it
> will
> >> > actually be a hypervisor snapshot. Just to be clear, the unquiesce is
> >> not quite
> >> > a delete - it is a collapse of the VM snapshot and the active VM back
> >> into one
> >> > file.
> >> > >>>
> >> > >>>>
> >> > >>>> In NetApp, what are you snapshotting?  The whole netapp volume (I
> >> > >>>> don't know the correct term), a file on NFS, an iscsi volume?  I
> >> > >>>> don't know a whole heck of a lot about the netapp snapshot
> >> > capabilities.
> >> > >>>
> >> > >>> Essentially we are using internal APIs to create file level
> backups
> >> - don't
> >> > worry too much about the terminology.
> >> > >>>
> >> > >>>>
> >> > >>>> I know storage solutions can snapshot better and faster than
> >> > >>>> hypervisors can with COW files.  I've personally just been always
> >> > >>>> perplexed on whats the best way to implement it.  For storage
> >> > >>>> solutions that are block based, its really easy to have the
> storage
> >> > >>>> doing the snapshot.  For shared file systems, like NFS, its seems
> >> > >>>> way more complicated as you don't want to snapshot the entire
> >> > >>>> filesystem in order to snapshot one file.
> >> > >>>
> >> > >>> With filesystems like NFS, things are certainly more complicated,
> >> but that
> >> > is taken care of by our controller's operating system, Data ONTAP,
> and we
> >> > simply use APIs to communicate with it.
> >> > >>>
> >> > >>>>
> >> > >>>> Darren
> >> > >>>>
> >> > >>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
> >> > >>>> <Ch...@netapp.com> wrote:
> >> > >>>>> I can comment on the second half.
> >> > >>>>>
> >> > >>>>> Through storage operations, storage providers can create backups
> >> > much faster than hypervisors and over time, their snapshots are more
> >> > efficient than the snapshot chains that hypervisors create. It is true
> >> that a VM
> >> > snapshot taken at the storage level is slightly different as it would
> be
> >> psuedo-
> >> > quiesced, not have it's memory snapshotted. This is accomplished
> through
> >> > hypervisor snapshots:
> >> > >>>>>
> >> > >>>>> 1) VM snapshot request (lets say VM 'A'
> >> > >>>>> 2) Create hypervisor snapshot (optional) -VM 'A' is snapshotted,
> >> > >>>>> creating active VM 'A*'
> >> > >>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of
> 'A*'
> >> > >>>>> 3) Storage driver(s) take snapshots of each volume
> >> > >>>>> 4) Undo hypervisor snapshot (optional) -VM snapshot 'A' is
> rolled
> >> > >>>>> back into VM 'A*' so the hypervisor snapshot no longer exists
> >> > >>>>>
> >> > >>>>> Now, a couple notes:
> >> > >>>>> -The reason this is optional is that not all users necessarily
> >> care about
> >> > the memory or disk consistency of their VMs and would prefer faster
> >> > snapshots to consistency.
> >> > >>>>> -Preemptively, yes, we are actually taking hypervisor snapshots
> >> which
> >> > means there isn't actually a performance of taking storage snapshots
> when
> >> > quiescing the VM. However, the performance gain will come both during
> >> > restoring the VM and during normal operations as described above.
> >> > >>>>>
> >> > >>>>> Although you can think of it as a poor man's VM snapshot, I
> would
> >> > think of it more as a consistent multi-volume snapshot. Again, the
> >> difference
> >> > being that this snapshot was not truly quiesced like a hypervisor
> >> snapshot
> >> > would be.
> >> > >>>>>
> >> > >>>>> --
> >> > >>>>> Chris Suich
> >> > >>>>> chris.suich@netapp.com
> >> > >>>>> NetApp Software Engineer
> >> > >>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >> > >>>>>
> >> > >>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd
> >> > <da...@gmail.com> wrote:
> >> > >>>>>
> >> > >>>>>> My only comment is that having the return type as boolean and
> >> > >>>>>> using to that indicate quiesce behaviour seems obscure and will
> >> > >>>>>> probably lead to a problem later.  Your basically saying the
> >> > >>>>>> result of the takeVMSnapshot will only ever need to communicate
> >> > >>>>>> back whether unquiesce needs to happen.  Maybe some result
> >> > object
> >> > >>>>>> would be more extensible.
> >> > >>>>>>
> >> > >>>>>> Actually, I think I have more comments.  This seems a bit odd
> to
> >> me.
> >> > >>>>>> Why would a storage driver in ACS implement a VM snapshot
> >> > >>>>>> functionality?  VM snapshot is a really a hypervisor
> orchestrated
> >> > >>>>>> operation.  So it seems like were trying to implement a poor
> mans
> >> > >>>>>> VM snapshot.  Maybe if I understood what NetApp was trying to
> do
> >> > >>>>>> it would make more sense, but its all odd.  To do a proper VM
> >> > >>>>>> snapshot you need to snapshot memory and disk at the exact same
> >> > >>>>>> time.  How are we going to do that if ACS is orchestrating the
> VM
> >> > >>>>>> snapshot and delegating to storage providers.  Its not like you
> >> > >>>>>> are going to pause the VM.... or are you?
> >> > >>>>>>
> >> > >>>>>> Darren
> >> > >>>>>>
> >> > >>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <
> Edison.su@citrix.com>
> >> > wrote:
> >> > >>>>>>> I created a design document page at
> >> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+s
> >> > napshot+related+operations, feel free to add items on it.
> >> > >>>>>>> And a new branch "pluggable_vm_snapshot" is created.
> >> > >>>>>>>
> >> > >>>>>>>> -----Original Message-----
> >> > >>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
> >> > >>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
> >> > >>>>>>>> To: <de...@cloudstack.apache.org>
> >> > >>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related
> operations?
> >> > >>>>>>>>
> >> > >>>>>>>> I'm a fan of option 2 - this gives us the most flexibility
> (as
> >> > >>>>>>>> you stated). The option is given to completely override the
> way
> >> > >>>>>>>> VM snapshots work AND storage providers are given to
> >> > >>>>>>>> opportunity to work within the default VM snapshot workflow.
> >> > >>>>>>>>
> >> > >>>>>>>> I believe this option should satisfy your concern, Mike. The
> >> > >>>>>>>> snapshot and quiesce strategy would be in charge of
> >> > communicating with the hypervisor.
> >> > >>>>>>>> Storage providers should be able to leverage the default
> >> > >>>>>>>> strategies and simply perform the storage operations.
> >> > >>>>>>>>
> >> > >>>>>>>> I don't think it should be much of an issue that new method
> to
> >> > >>>>>>>> the storage driver interface may not apply to everyone. In
> fact,
> >> > that is already the case.
> >> > >>>>>>>> Some methods such as un/maintain(), attachToXXX() and
> >> > >>>>>>>> takeSnapshot() are already not implemented by every driver -
> >> > >>>>>>>> they just return false when asked if they can handle the
> >> operation.
> >> > >>>>>>>>
> >> > >>>>>>>> --
> >> > >>>>>>>> Chris Suich
> >> > >>>>>>>> chris.suich@netapp.com
> >> > >>>>>>>> NetApp Software Engineer
> >> > >>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red
> Hat
> >> > >>>>>>>>
> >> > >>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski
> >> > >>>>>>>> <mi...@solidfire.com>
> >> > >>>>>>>> wrote:
> >> > >>>>>>>>
> >> > >>>>>>>>> Well, my first thought on this is that the storage driver
> >> > >>>>>>>>> should not be telling the hypervisor to do anything. It
> should
> >> > >>>>>>>>> be responsible for creating/deleting volumes, snapshots,
> etc.
> >> on
> >> > its storage system only.
> >> > >>>>>>>>>
> >> > >>>>>>>>>
> >> > >>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <
> >> Edison.su@citrix.com>
> >> > wrote:
> >> > >>>>>>>>>
> >> > >>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The
> >> > >>>>>>>>>> current workflow will be like the following:
> >> > >>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
> >> > >>>>>>>>>> creatVMSnapshot -> send CreateVMSnapshotCommand to
> >> > hypervisor to create vm snapshot.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> If anybody wants to change the workflow, then need to
> either
> >> > >>>>>>>>>> change VMSnapshotManagerImpl directly or subclass
> >> > VMSnapshotManagerImpl.
> >> > >>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl
> >> > >>>>>>>>>> should be able to handle different ways to take vm
> snapshot,
> >> > instead of hard code.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> The requirements for the pluggable VM snapshot coming from:
> >> > >>>>>>>>>> Storage vendor may have their optimization, such as NetApp.
> >> > >>>>>>>>>> VM snapshot can be implemented in a totally different
> way(For
> >> > >>>>>>>>>> example, I could just send a command to guest VM, to tell
> my
> >> > >>>>>>>>>> application to flush disk and hold disk write, then come to
> >> > >>>>>>>>>> hypervisor to
> >> > >>>>>>>> take a volume snapshot).
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> If we agree on enable pluggable VM snapshot, then we can
> >> > move
> >> > >>>>>>>>>> on discuss how to implement it.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> The possible options:
> >> > >>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy
> >> > >>>>>>>>>> interface, which has the following interfaces:
> >> > >>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
> >> > Boolean
> >> > >>>>>>>>>> revertVMSnapshot(VMSnapshot vmSnapshot); Boolean
> >> > >>>>>>>>>> DeleteVMSnapshot(VMSnapshot vmSnapshot);
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> The work flow will be: createVMSnapshot api ->
> >> > >>>>>>>> VMSnapshotManagerImpl:
> >> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
> >> > >>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity
> >> > >>>>>>>>>> check, then will handle over to VMSnapshotStrategy.
> >> > >>>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
> >> > >>>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor
> >> > host, or
> >> > >>>>>>>>>> do anything special operations.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> 2. fine-grained interface. Not only add a
> VMSnapshotStrategy
> >> > >>>>>>>>>> interface, but also add certain methods on the storage
> driver.
> >> > >>>>>>>>>> The VMSnapshotStrategy interface will be the same as
> option 1.
> >> > >>>>>>>>>> Will add the following methods on storage driver:
> >> > >>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM that
> >> > >>>>>>>>>> created on this storage, storage vendor can either take one
> >> > >>>>>>>>>> snapshot for this volumes in one shot, or take snapshot for
> >> > each volume separately
> >> > >>>>>>>>>>    The pre-condition: vm is unquiesced.
> >> > >>>>>>>>>>    It will return a Boolean to indicate, do need unquiesce
> vm
> >> or
> >> > not.
> >> > >>>>>>>>>>    In the default storage driver, it will return false.
> >> > >>>>>>>>>> */
> >> > >>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo>
> >> > volumesBelongToVM,
> >> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> >> > >>>>>>>>>> revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> >> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> >> > >>>>>>>>>> deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> >> > >>>>>>>>>> VMSnapshot vmSNapshot);
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> The work flow will be: createVMSnapshot api ->
> >> > >>>>>>>> VMSnapshotManagerImpl:
> >> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot ->
> >> > >>>>>>>>>> storage driver:takeVMSnapshot In the implementation of
> >> > >>>>>>>>>> VMSnapshotStrategy's takeVMSnapshot, the pseudo code
> >> > looks like:
> >> > >>>>>>>>>>    HypervisorHelper.quiesceVM(vm);
> >> > >>>>>>>>>>    val volumes = vm.getVolumes();
> >> > >>>>>>>>>>    val maps = new Map[driver, list[VolumeInfo]]();
> >> > >>>>>>>>>>    Volumes.foreach(volume => maps.put(volume.getDriver,
> >> > volume ::
> >> > >>>>>>>>>> maps.get(volume.getdriver())))
> >> > >>>>>>>>>>    val needUnquiesce = true;
> >> > >>>>>>>>>>     maps.foreach((driver, volumes) => needUnquiesce  =
> >> > >>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
> >> > >>>>>>>>>>   if (needUnquiesce ) {
> >> > >>>>>>>>>>    HypervisorHelper.unquiesce(vm); }
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> By default, the quiesceVM in HypervisorHelper will actually
> >> > >>>>>>>>>> take vm snapshot through hypervisor.
> >> > >>>>>>>>>> Does above logic makes senesce?
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> The pros of option 1 is that: it's simple, no need to
> change
> >> > >>>>>>>>>> storage driver interfaces. The cons is that each storage
> >> > >>>>>>>>>> vendor need to implement a strategy, maybe they will do the
> >> > same thing.
> >> > >>>>>>>>>> The pros of option 2 is that, storage driver won't need to
> >> > >>>>>>>>>> worry about how to quiesce/unquiesce vm. The cons is that,
> it
> >> > >>>>>>>>>> will add these methods on each storage drivers, so it
> assumes
> >> > >>>>>>>>>> that this work flow will work for everybody.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>> So which option we should take? Or if you have other
> options,
> >> > >>>>>>>>>> please let's know.
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>>
> >> > >>>>>>>>>
> >> > >>>>>>>>>
> >> > >>>>>>>>> --
> >> > >>>>>>>>> *Mike Tutkowski*
> >> > >>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
> >> > >>>>>>>>> e: mike.tutkowski@solidfire.com
> >> > >>>>>>>>> o: 303.746.7302
> >> > >>>>>>>>> Advancing the way the world uses the
> >> > >>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
> >> > >>>>>>>>> *(tm)*
> >> > >>>>>>>
> >> > >>>>>
> >> > >>>
> >> > >
> >>
> >
> >
> >
> > --
> > *Mike Tutkowski*
> > *Senior CloudStack Developer, SolidFire Inc.*
> > e: mike.tutkowski@solidfire.com
> > o: 303.746.7302
> > Advancing the way the world uses the
> > cloud<http://solidfire.com/solution/overview/?video=play>
> > *™*
>



-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*

Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Darren Shepherd <da...@gmail.com>.
Edison,

I would lean toward doing the coarse grain interface only.  I'm having
a hard time seeing how the whole flow is generic and makes sense for
everyone.  With starting with the coarse grain you have the advantage
in that you avoid possible upfront over engineering/over design that
could wreak havoc down the line.  If you implement the
VMSnapshotStrategy and find that it really is useful to other
implementations, you can then implement the fine grain interface later
to allow others to benefit from it.

Darren

On Wed, Oct 9, 2013 at 8:54 PM, Mike Tutkowski
<mi...@solidfire.com> wrote:
> Hey guys,
>
> I haven't been giving this thread much attention, but am reviewing it
> somewhat now.
>
> I'm not really clear how this would work if, say, a VM has two data disks
> and they are not being provided by the same vendor.
>
> Can someone clarify that for me?
>
> My understanding for how this works today is that it doesn't matter. For
> XenServer, a VDI is on an SR, which could be supported by storage vendor X.
> Another VDI could be on another SR, supported by storage vendor Y.
>
> In this case, a new VDI appears on each SR after a hypervisor snapshot.
>
> Same idea for VMware.
>
> I don't really know how (or if) this works for KVM.
>
> I'm not clear how this multi-vendor situation would play out in this
> pluggable approach.
>
> Thanks!
>
>
> On Tue, Oct 8, 2013 at 4:43 PM, Edison Su <Ed...@citrix.com> wrote:
>
>>
>>
>> > -----Original Message-----
>> > From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
>> > Sent: Tuesday, October 08, 2013 2:54 PM
>> > To: dev@cloudstack.apache.org
>> > Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>> >
>> > A hypervisor snapshot will snapshot memory also.  So determining whether
>> The memory is optional for hypervisor vm snapshot, a.k.a, the "Disk-only
>> snapshots":
>> http://support.citrix.com/proddocs/topic/xencenter-61/xs-xc-vms-snapshots-about.html
>> It's supported by both xenserver/kvm/vmware.
>>
>> > do to the hypervisor snapshot from the quiesce option does not seem
>> > proper.
>> >
>> > Sorry, for all the questions, I'm trying to get to the point of
>> understand if this
>> > functionality makes sense at this point of code or if maybe their is a
>> different
>> > approach.  This is what I'm seeing, what if we state it this way
>> >
>> > 1) VM snapshot, AFAIK, are not backed up today and exist solely on
>> primary.
>> > What if we added a backup phase to VM snapshots that can be optionally
>> > supported by the storage providers to possibly backup the VM snapshot
>> > volumes.
>> It's not about backup vm snapshot, it's about how to take vm snapshot.
>> Usually, take/revert vm snapshot is handled by hypervisor itself, but in
>> NetApp(or other storage vendor) case,
>> They want to change the default behavior of hypervisor-base vm snapshot.
>>
>> Some examples:
>> 1. take hypervisor based vm snapshots, on primary storage, hypervisor will
>> maintain the snapshot chain.
>> 2. take vm snapshot through NetApp:
>>      a. first, quiesce VM if user specified. There is no separate API to
>> quiesce VM on the hypervisor, so here we will
>> take a VM snapshot through hypervisor API call, hypervisor will take
>> volume snapshot  on each volume of the VM. Let's say, on the primary
>> storage, the disk chain looks like:
>>            base-image
>>                     |
>>                     V
>>                 Parent disk
>>             /                         \
>>           V                            V
>>         Current disk        snapshot-a
>>      b. from snapshot-a, find out its parent disk, then take snapshot
>> through NetApp
>>      c. un- quiesce VM, here, go to hypervisor, delete snapshot
>> "snapshot-a", hypervisor should be able to consolidate current disk and
>> "parent disk" into one disk, thus from hypervisor point of view
>> , there is always, at most, only one snapshot for the VM.
>>     For revert VM snapshot, as long as the VM is stopped, NetApp can
>> revert the snapshot created on NetApp storage easily, and efficiently.
>>    The benefit of this whole process, as Chris pointed out, if the
>> snapshot chain is quite long, hypervisor based VM snapshot will get
>> performance hit.
>>
>> >
>> > 2) Additionally you want to be able to backup multiple disks at once,
>> > regardless of VM snapshot.  Why don't we add the ability to put
>> volumeIds in
>> > snapshot cmd that if the storage provider supports it will get a batch of
>> > volumeIds.
>> >
>> > Now I know we talked about 2 and there was some concerns about it (mostly
>> > from me), but I think we could work through those concerns (forgot what
>> > they were...).  Right now I just get the feeling we are shoehorning some
>> > functionality into VM snapshot that isn't quite the right fit.  The "no
>> quiesce"
>> > flow just doesn't seem to make sense to me.
>>
>>
>> Not sure above NetApp proposed work flow makes sense to you or to other
>> body or not. If this work flow is only specific to NetApp, then we don't
>> need to enforce the whole process for everybody.
>>
>> >
>> > Darren
>> >
>> > On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
>> > <Ch...@netapp.com> wrote:
>> > > Whether the hypervisor snapshot happens depends on whether the
>> > 'quiesce' option is specified with the snapshot request. If a user
>> doesn't care
>> > about the consistency of their backup, then the hypervisor
>> snapshot/quiesce
>> > step can be skipped altogether. This of course is not the case if the
>> default
>> > provider is being used, in which case a hypervisor snapshot is the only
>> way of
>> > creating a backup since it can't be offloaded to the storage driver.
>> > >
>> > > --
>> > > Chris Suich
>> > > chris.suich@netapp.com
>> > > NetApp Software Engineer
>> > > Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>> > >
>> > > On Oct 8, 2013, at 4:57 PM, Darren Shepherd
>> > > <da...@gmail.com>
>> > >  wrote:
>> > >
>> > >> Who is going to decide whether the hypervisor snapshot should
>> > >> actually happen or not? Or how?
>> > >>
>> > >> Darren
>> > >>
>> > >> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
>> > >> <Ch...@netapp.com> wrote:
>> > >>>
>> > >>> --
>> > >>> Chris Suich
>> > >>> chris.suich@netapp.com
>> > >>> NetApp Software Engineer
>> > >>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>> > >>>
>> > >>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd
>> > <da...@gmail.com> wrote:
>> > >>>
>> > >>>> So in the implementation, when we say "quiesce" is that actually
>> > >>>> being implemented as a VM snapshot (memory and disk).  And then
>> > >>>> when you say "unquiesce" you are talking about deleting the VM
>> > snapshot?
>> > >>>
>> > >>> If the VM snapshot is not going to the hypervisor, then yes, it will
>> > actually be a hypervisor snapshot. Just to be clear, the unquiesce is
>> not quite
>> > a delete - it is a collapse of the VM snapshot and the active VM back
>> into one
>> > file.
>> > >>>
>> > >>>>
>> > >>>> In NetApp, what are you snapshotting?  The whole netapp volume (I
>> > >>>> don't know the correct term), a file on NFS, an iscsi volume?  I
>> > >>>> don't know a whole heck of a lot about the netapp snapshot
>> > capabilities.
>> > >>>
>> > >>> Essentially we are using internal APIs to create file level backups
>> - don't
>> > worry too much about the terminology.
>> > >>>
>> > >>>>
>> > >>>> I know storage solutions can snapshot better and faster than
>> > >>>> hypervisors can with COW files.  I've personally just been always
>> > >>>> perplexed on whats the best way to implement it.  For storage
>> > >>>> solutions that are block based, its really easy to have the storage
>> > >>>> doing the snapshot.  For shared file systems, like NFS, its seems
>> > >>>> way more complicated as you don't want to snapshot the entire
>> > >>>> filesystem in order to snapshot one file.
>> > >>>
>> > >>> With filesystems like NFS, things are certainly more complicated,
>> but that
>> > is taken care of by our controller's operating system, Data ONTAP, and we
>> > simply use APIs to communicate with it.
>> > >>>
>> > >>>>
>> > >>>> Darren
>> > >>>>
>> > >>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
>> > >>>> <Ch...@netapp.com> wrote:
>> > >>>>> I can comment on the second half.
>> > >>>>>
>> > >>>>> Through storage operations, storage providers can create backups
>> > much faster than hypervisors and over time, their snapshots are more
>> > efficient than the snapshot chains that hypervisors create. It is true
>> that a VM
>> > snapshot taken at the storage level is slightly different as it would be
>> psuedo-
>> > quiesced, not have it's memory snapshotted. This is accomplished through
>> > hypervisor snapshots:
>> > >>>>>
>> > >>>>> 1) VM snapshot request (lets say VM 'A'
>> > >>>>> 2) Create hypervisor snapshot (optional) -VM 'A' is snapshotted,
>> > >>>>> creating active VM 'A*'
>> > >>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*'
>> > >>>>> 3) Storage driver(s) take snapshots of each volume
>> > >>>>> 4) Undo hypervisor snapshot (optional) -VM snapshot 'A' is rolled
>> > >>>>> back into VM 'A*' so the hypervisor snapshot no longer exists
>> > >>>>>
>> > >>>>> Now, a couple notes:
>> > >>>>> -The reason this is optional is that not all users necessarily
>> care about
>> > the memory or disk consistency of their VMs and would prefer faster
>> > snapshots to consistency.
>> > >>>>> -Preemptively, yes, we are actually taking hypervisor snapshots
>> which
>> > means there isn't actually a performance of taking storage snapshots when
>> > quiescing the VM. However, the performance gain will come both during
>> > restoring the VM and during normal operations as described above.
>> > >>>>>
>> > >>>>> Although you can think of it as a poor man's VM snapshot, I would
>> > think of it more as a consistent multi-volume snapshot. Again, the
>> difference
>> > being that this snapshot was not truly quiesced like a hypervisor
>> snapshot
>> > would be.
>> > >>>>>
>> > >>>>> --
>> > >>>>> Chris Suich
>> > >>>>> chris.suich@netapp.com
>> > >>>>> NetApp Software Engineer
>> > >>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>> > >>>>>
>> > >>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd
>> > <da...@gmail.com> wrote:
>> > >>>>>
>> > >>>>>> My only comment is that having the return type as boolean and
>> > >>>>>> using to that indicate quiesce behaviour seems obscure and will
>> > >>>>>> probably lead to a problem later.  Your basically saying the
>> > >>>>>> result of the takeVMSnapshot will only ever need to communicate
>> > >>>>>> back whether unquiesce needs to happen.  Maybe some result
>> > object
>> > >>>>>> would be more extensible.
>> > >>>>>>
>> > >>>>>> Actually, I think I have more comments.  This seems a bit odd to
>> me.
>> > >>>>>> Why would a storage driver in ACS implement a VM snapshot
>> > >>>>>> functionality?  VM snapshot is a really a hypervisor orchestrated
>> > >>>>>> operation.  So it seems like were trying to implement a poor mans
>> > >>>>>> VM snapshot.  Maybe if I understood what NetApp was trying to do
>> > >>>>>> it would make more sense, but its all odd.  To do a proper VM
>> > >>>>>> snapshot you need to snapshot memory and disk at the exact same
>> > >>>>>> time.  How are we going to do that if ACS is orchestrating the VM
>> > >>>>>> snapshot and delegating to storage providers.  Its not like you
>> > >>>>>> are going to pause the VM.... or are you?
>> > >>>>>>
>> > >>>>>> Darren
>> > >>>>>>
>> > >>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <Ed...@citrix.com>
>> > wrote:
>> > >>>>>>> I created a design document page at
>> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+s
>> > napshot+related+operations, feel free to add items on it.
>> > >>>>>>> And a new branch "pluggable_vm_snapshot" is created.
>> > >>>>>>>
>> > >>>>>>>> -----Original Message-----
>> > >>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>> > >>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
>> > >>>>>>>> To: <de...@cloudstack.apache.org>
>> > >>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>> > >>>>>>>>
>> > >>>>>>>> I'm a fan of option 2 - this gives us the most flexibility (as
>> > >>>>>>>> you stated). The option is given to completely override the way
>> > >>>>>>>> VM snapshots work AND storage providers are given to
>> > >>>>>>>> opportunity to work within the default VM snapshot workflow.
>> > >>>>>>>>
>> > >>>>>>>> I believe this option should satisfy your concern, Mike. The
>> > >>>>>>>> snapshot and quiesce strategy would be in charge of
>> > communicating with the hypervisor.
>> > >>>>>>>> Storage providers should be able to leverage the default
>> > >>>>>>>> strategies and simply perform the storage operations.
>> > >>>>>>>>
>> > >>>>>>>> I don't think it should be much of an issue that new method to
>> > >>>>>>>> the storage driver interface may not apply to everyone. In fact,
>> > that is already the case.
>> > >>>>>>>> Some methods such as un/maintain(), attachToXXX() and
>> > >>>>>>>> takeSnapshot() are already not implemented by every driver -
>> > >>>>>>>> they just return false when asked if they can handle the
>> operation.
>> > >>>>>>>>
>> > >>>>>>>> --
>> > >>>>>>>> Chris Suich
>> > >>>>>>>> chris.suich@netapp.com
>> > >>>>>>>> NetApp Software Engineer
>> > >>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>> > >>>>>>>>
>> > >>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski
>> > >>>>>>>> <mi...@solidfire.com>
>> > >>>>>>>> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Well, my first thought on this is that the storage driver
>> > >>>>>>>>> should not be telling the hypervisor to do anything. It should
>> > >>>>>>>>> be responsible for creating/deleting volumes, snapshots, etc.
>> on
>> > its storage system only.
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <
>> Edison.su@citrix.com>
>> > wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The
>> > >>>>>>>>>> current workflow will be like the following:
>> > >>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
>> > >>>>>>>>>> creatVMSnapshot -> send CreateVMSnapshotCommand to
>> > hypervisor to create vm snapshot.
>> > >>>>>>>>>>
>> > >>>>>>>>>> If anybody wants to change the workflow, then need to either
>> > >>>>>>>>>> change VMSnapshotManagerImpl directly or subclass
>> > VMSnapshotManagerImpl.
>> > >>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl
>> > >>>>>>>>>> should be able to handle different ways to take vm snapshot,
>> > instead of hard code.
>> > >>>>>>>>>>
>> > >>>>>>>>>> The requirements for the pluggable VM snapshot coming from:
>> > >>>>>>>>>> Storage vendor may have their optimization, such as NetApp.
>> > >>>>>>>>>> VM snapshot can be implemented in a totally different way(For
>> > >>>>>>>>>> example, I could just send a command to guest VM, to tell my
>> > >>>>>>>>>> application to flush disk and hold disk write, then come to
>> > >>>>>>>>>> hypervisor to
>> > >>>>>>>> take a volume snapshot).
>> > >>>>>>>>>>
>> > >>>>>>>>>> If we agree on enable pluggable VM snapshot, then we can
>> > move
>> > >>>>>>>>>> on discuss how to implement it.
>> > >>>>>>>>>>
>> > >>>>>>>>>> The possible options:
>> > >>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy
>> > >>>>>>>>>> interface, which has the following interfaces:
>> > >>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>> > Boolean
>> > >>>>>>>>>> revertVMSnapshot(VMSnapshot vmSnapshot); Boolean
>> > >>>>>>>>>> DeleteVMSnapshot(VMSnapshot vmSnapshot);
>> > >>>>>>>>>>
>> > >>>>>>>>>> The work flow will be: createVMSnapshot api ->
>> > >>>>>>>> VMSnapshotManagerImpl:
>> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>> > >>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity
>> > >>>>>>>>>> check, then will handle over to VMSnapshotStrategy.
>> > >>>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
>> > >>>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor
>> > host, or
>> > >>>>>>>>>> do anything special operations.
>> > >>>>>>>>>>
>> > >>>>>>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy
>> > >>>>>>>>>> interface, but also add certain methods on the storage driver.
>> > >>>>>>>>>> The VMSnapshotStrategy interface will be the same as option 1.
>> > >>>>>>>>>> Will add the following methods on storage driver:
>> > >>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM that
>> > >>>>>>>>>> created on this storage, storage vendor can either take one
>> > >>>>>>>>>> snapshot for this volumes in one shot, or take snapshot for
>> > each volume separately
>> > >>>>>>>>>>    The pre-condition: vm is unquiesced.
>> > >>>>>>>>>>    It will return a Boolean to indicate, do need unquiesce vm
>> or
>> > not.
>> > >>>>>>>>>>    In the default storage driver, it will return false.
>> > >>>>>>>>>> */
>> > >>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo>
>> > volumesBelongToVM,
>> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
>> > >>>>>>>>>> revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
>> > >>>>>>>>>> deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>> > >>>>>>>>>> VMSnapshot vmSNapshot);
>> > >>>>>>>>>>
>> > >>>>>>>>>> The work flow will be: createVMSnapshot api ->
>> > >>>>>>>> VMSnapshotManagerImpl:
>> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot ->
>> > >>>>>>>>>> storage driver:takeVMSnapshot In the implementation of
>> > >>>>>>>>>> VMSnapshotStrategy's takeVMSnapshot, the pseudo code
>> > looks like:
>> > >>>>>>>>>>    HypervisorHelper.quiesceVM(vm);
>> > >>>>>>>>>>    val volumes = vm.getVolumes();
>> > >>>>>>>>>>    val maps = new Map[driver, list[VolumeInfo]]();
>> > >>>>>>>>>>    Volumes.foreach(volume => maps.put(volume.getDriver,
>> > volume ::
>> > >>>>>>>>>> maps.get(volume.getdriver())))
>> > >>>>>>>>>>    val needUnquiesce = true;
>> > >>>>>>>>>>     maps.foreach((driver, volumes) => needUnquiesce  =
>> > >>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>> > >>>>>>>>>>   if (needUnquiesce ) {
>> > >>>>>>>>>>    HypervisorHelper.unquiesce(vm); }
>> > >>>>>>>>>>
>> > >>>>>>>>>> By default, the quiesceVM in HypervisorHelper will actually
>> > >>>>>>>>>> take vm snapshot through hypervisor.
>> > >>>>>>>>>> Does above logic makes senesce?
>> > >>>>>>>>>>
>> > >>>>>>>>>> The pros of option 1 is that: it's simple, no need to change
>> > >>>>>>>>>> storage driver interfaces. The cons is that each storage
>> > >>>>>>>>>> vendor need to implement a strategy, maybe they will do the
>> > same thing.
>> > >>>>>>>>>> The pros of option 2 is that, storage driver won't need to
>> > >>>>>>>>>> worry about how to quiesce/unquiesce vm. The cons is that, it
>> > >>>>>>>>>> will add these methods on each storage drivers, so it assumes
>> > >>>>>>>>>> that this work flow will work for everybody.
>> > >>>>>>>>>>
>> > >>>>>>>>>> So which option we should take? Or if you have other options,
>> > >>>>>>>>>> please let's know.
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>> --
>> > >>>>>>>>> *Mike Tutkowski*
>> > >>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>> > >>>>>>>>> e: mike.tutkowski@solidfire.com
>> > >>>>>>>>> o: 303.746.7302
>> > >>>>>>>>> Advancing the way the world uses the
>> > >>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>> > >>>>>>>>> *(tm)*
>> > >>>>>>>
>> > >>>>>
>> > >>>
>> > >
>>
>
>
>
> --
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*

Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Mike Tutkowski <mi...@solidfire.com>.
Hey guys,

I haven't been giving this thread much attention, but am reviewing it
somewhat now.

I'm not really clear how this would work if, say, a VM has two data disks
and they are not being provided by the same vendor.

Can someone clarify that for me?

My understanding for how this works today is that it doesn't matter. For
XenServer, a VDI is on an SR, which could be supported by storage vendor X.
Another VDI could be on another SR, supported by storage vendor Y.

In this case, a new VDI appears on each SR after a hypervisor snapshot.

Same idea for VMware.

I don't really know how (or if) this works for KVM.

I'm not clear how this multi-vendor situation would play out in this
pluggable approach.

Thanks!


On Tue, Oct 8, 2013 at 4:43 PM, Edison Su <Ed...@citrix.com> wrote:

>
>
> > -----Original Message-----
> > From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
> > Sent: Tuesday, October 08, 2013 2:54 PM
> > To: dev@cloudstack.apache.org
> > Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
> >
> > A hypervisor snapshot will snapshot memory also.  So determining whether
> The memory is optional for hypervisor vm snapshot, a.k.a, the "Disk-only
> snapshots":
> http://support.citrix.com/proddocs/topic/xencenter-61/xs-xc-vms-snapshots-about.html
> It's supported by both xenserver/kvm/vmware.
>
> > do to the hypervisor snapshot from the quiesce option does not seem
> > proper.
> >
> > Sorry, for all the questions, I'm trying to get to the point of
> understand if this
> > functionality makes sense at this point of code or if maybe their is a
> different
> > approach.  This is what I'm seeing, what if we state it this way
> >
> > 1) VM snapshot, AFAIK, are not backed up today and exist solely on
> primary.
> > What if we added a backup phase to VM snapshots that can be optionally
> > supported by the storage providers to possibly backup the VM snapshot
> > volumes.
> It's not about backup vm snapshot, it's about how to take vm snapshot.
> Usually, take/revert vm snapshot is handled by hypervisor itself, but in
> NetApp(or other storage vendor) case,
> They want to change the default behavior of hypervisor-base vm snapshot.
>
> Some examples:
> 1. take hypervisor based vm snapshots, on primary storage, hypervisor will
> maintain the snapshot chain.
> 2. take vm snapshot through NetApp:
>      a. first, quiesce VM if user specified. There is no separate API to
> quiesce VM on the hypervisor, so here we will
> take a VM snapshot through hypervisor API call, hypervisor will take
> volume snapshot  on each volume of the VM. Let's say, on the primary
> storage, the disk chain looks like:
>            base-image
>                     |
>                     V
>                 Parent disk
>             /                         \
>           V                            V
>         Current disk        snapshot-a
>      b. from snapshot-a, find out its parent disk, then take snapshot
> through NetApp
>      c. un- quiesce VM, here, go to hypervisor, delete snapshot
> "snapshot-a", hypervisor should be able to consolidate current disk and
> "parent disk" into one disk, thus from hypervisor point of view
> , there is always, at most, only one snapshot for the VM.
>     For revert VM snapshot, as long as the VM is stopped, NetApp can
> revert the snapshot created on NetApp storage easily, and efficiently.
>    The benefit of this whole process, as Chris pointed out, if the
> snapshot chain is quite long, hypervisor based VM snapshot will get
> performance hit.
>
> >
> > 2) Additionally you want to be able to backup multiple disks at once,
> > regardless of VM snapshot.  Why don't we add the ability to put
> volumeIds in
> > snapshot cmd that if the storage provider supports it will get a batch of
> > volumeIds.
> >
> > Now I know we talked about 2 and there was some concerns about it (mostly
> > from me), but I think we could work through those concerns (forgot what
> > they were...).  Right now I just get the feeling we are shoehorning some
> > functionality into VM snapshot that isn't quite the right fit.  The "no
> quiesce"
> > flow just doesn't seem to make sense to me.
>
>
> Not sure above NetApp proposed work flow makes sense to you or to other
> body or not. If this work flow is only specific to NetApp, then we don't
> need to enforce the whole process for everybody.
>
> >
> > Darren
> >
> > On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
> > <Ch...@netapp.com> wrote:
> > > Whether the hypervisor snapshot happens depends on whether the
> > 'quiesce' option is specified with the snapshot request. If a user
> doesn't care
> > about the consistency of their backup, then the hypervisor
> snapshot/quiesce
> > step can be skipped altogether. This of course is not the case if the
> default
> > provider is being used, in which case a hypervisor snapshot is the only
> way of
> > creating a backup since it can't be offloaded to the storage driver.
> > >
> > > --
> > > Chris Suich
> > > chris.suich@netapp.com
> > > NetApp Software Engineer
> > > Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> > >
> > > On Oct 8, 2013, at 4:57 PM, Darren Shepherd
> > > <da...@gmail.com>
> > >  wrote:
> > >
> > >> Who is going to decide whether the hypervisor snapshot should
> > >> actually happen or not? Or how?
> > >>
> > >> Darren
> > >>
> > >> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
> > >> <Ch...@netapp.com> wrote:
> > >>>
> > >>> --
> > >>> Chris Suich
> > >>> chris.suich@netapp.com
> > >>> NetApp Software Engineer
> > >>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> > >>>
> > >>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd
> > <da...@gmail.com> wrote:
> > >>>
> > >>>> So in the implementation, when we say "quiesce" is that actually
> > >>>> being implemented as a VM snapshot (memory and disk).  And then
> > >>>> when you say "unquiesce" you are talking about deleting the VM
> > snapshot?
> > >>>
> > >>> If the VM snapshot is not going to the hypervisor, then yes, it will
> > actually be a hypervisor snapshot. Just to be clear, the unquiesce is
> not quite
> > a delete - it is a collapse of the VM snapshot and the active VM back
> into one
> > file.
> > >>>
> > >>>>
> > >>>> In NetApp, what are you snapshotting?  The whole netapp volume (I
> > >>>> don't know the correct term), a file on NFS, an iscsi volume?  I
> > >>>> don't know a whole heck of a lot about the netapp snapshot
> > capabilities.
> > >>>
> > >>> Essentially we are using internal APIs to create file level backups
> - don't
> > worry too much about the terminology.
> > >>>
> > >>>>
> > >>>> I know storage solutions can snapshot better and faster than
> > >>>> hypervisors can with COW files.  I've personally just been always
> > >>>> perplexed on whats the best way to implement it.  For storage
> > >>>> solutions that are block based, its really easy to have the storage
> > >>>> doing the snapshot.  For shared file systems, like NFS, its seems
> > >>>> way more complicated as you don't want to snapshot the entire
> > >>>> filesystem in order to snapshot one file.
> > >>>
> > >>> With filesystems like NFS, things are certainly more complicated,
> but that
> > is taken care of by our controller's operating system, Data ONTAP, and we
> > simply use APIs to communicate with it.
> > >>>
> > >>>>
> > >>>> Darren
> > >>>>
> > >>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
> > >>>> <Ch...@netapp.com> wrote:
> > >>>>> I can comment on the second half.
> > >>>>>
> > >>>>> Through storage operations, storage providers can create backups
> > much faster than hypervisors and over time, their snapshots are more
> > efficient than the snapshot chains that hypervisors create. It is true
> that a VM
> > snapshot taken at the storage level is slightly different as it would be
> psuedo-
> > quiesced, not have it's memory snapshotted. This is accomplished through
> > hypervisor snapshots:
> > >>>>>
> > >>>>> 1) VM snapshot request (lets say VM 'A'
> > >>>>> 2) Create hypervisor snapshot (optional) -VM 'A' is snapshotted,
> > >>>>> creating active VM 'A*'
> > >>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*'
> > >>>>> 3) Storage driver(s) take snapshots of each volume
> > >>>>> 4) Undo hypervisor snapshot (optional) -VM snapshot 'A' is rolled
> > >>>>> back into VM 'A*' so the hypervisor snapshot no longer exists
> > >>>>>
> > >>>>> Now, a couple notes:
> > >>>>> -The reason this is optional is that not all users necessarily
> care about
> > the memory or disk consistency of their VMs and would prefer faster
> > snapshots to consistency.
> > >>>>> -Preemptively, yes, we are actually taking hypervisor snapshots
> which
> > means there isn't actually a performance of taking storage snapshots when
> > quiescing the VM. However, the performance gain will come both during
> > restoring the VM and during normal operations as described above.
> > >>>>>
> > >>>>> Although you can think of it as a poor man's VM snapshot, I would
> > think of it more as a consistent multi-volume snapshot. Again, the
> difference
> > being that this snapshot was not truly quiesced like a hypervisor
> snapshot
> > would be.
> > >>>>>
> > >>>>> --
> > >>>>> Chris Suich
> > >>>>> chris.suich@netapp.com
> > >>>>> NetApp Software Engineer
> > >>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> > >>>>>
> > >>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd
> > <da...@gmail.com> wrote:
> > >>>>>
> > >>>>>> My only comment is that having the return type as boolean and
> > >>>>>> using to that indicate quiesce behaviour seems obscure and will
> > >>>>>> probably lead to a problem later.  Your basically saying the
> > >>>>>> result of the takeVMSnapshot will only ever need to communicate
> > >>>>>> back whether unquiesce needs to happen.  Maybe some result
> > object
> > >>>>>> would be more extensible.
> > >>>>>>
> > >>>>>> Actually, I think I have more comments.  This seems a bit odd to
> me.
> > >>>>>> Why would a storage driver in ACS implement a VM snapshot
> > >>>>>> functionality?  VM snapshot is a really a hypervisor orchestrated
> > >>>>>> operation.  So it seems like were trying to implement a poor mans
> > >>>>>> VM snapshot.  Maybe if I understood what NetApp was trying to do
> > >>>>>> it would make more sense, but its all odd.  To do a proper VM
> > >>>>>> snapshot you need to snapshot memory and disk at the exact same
> > >>>>>> time.  How are we going to do that if ACS is orchestrating the VM
> > >>>>>> snapshot and delegating to storage providers.  Its not like you
> > >>>>>> are going to pause the VM.... or are you?
> > >>>>>>
> > >>>>>> Darren
> > >>>>>>
> > >>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <Ed...@citrix.com>
> > wrote:
> > >>>>>>> I created a design document page at
> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+s
> > napshot+related+operations, feel free to add items on it.
> > >>>>>>> And a new branch "pluggable_vm_snapshot" is created.
> > >>>>>>>
> > >>>>>>>> -----Original Message-----
> > >>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
> > >>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
> > >>>>>>>> To: <de...@cloudstack.apache.org>
> > >>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
> > >>>>>>>>
> > >>>>>>>> I'm a fan of option 2 - this gives us the most flexibility (as
> > >>>>>>>> you stated). The option is given to completely override the way
> > >>>>>>>> VM snapshots work AND storage providers are given to
> > >>>>>>>> opportunity to work within the default VM snapshot workflow.
> > >>>>>>>>
> > >>>>>>>> I believe this option should satisfy your concern, Mike. The
> > >>>>>>>> snapshot and quiesce strategy would be in charge of
> > communicating with the hypervisor.
> > >>>>>>>> Storage providers should be able to leverage the default
> > >>>>>>>> strategies and simply perform the storage operations.
> > >>>>>>>>
> > >>>>>>>> I don't think it should be much of an issue that new method to
> > >>>>>>>> the storage driver interface may not apply to everyone. In fact,
> > that is already the case.
> > >>>>>>>> Some methods such as un/maintain(), attachToXXX() and
> > >>>>>>>> takeSnapshot() are already not implemented by every driver -
> > >>>>>>>> they just return false when asked if they can handle the
> operation.
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> Chris Suich
> > >>>>>>>> chris.suich@netapp.com
> > >>>>>>>> NetApp Software Engineer
> > >>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> > >>>>>>>>
> > >>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski
> > >>>>>>>> <mi...@solidfire.com>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Well, my first thought on this is that the storage driver
> > >>>>>>>>> should not be telling the hypervisor to do anything. It should
> > >>>>>>>>> be responsible for creating/deleting volumes, snapshots, etc.
> on
> > its storage system only.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <
> Edison.su@citrix.com>
> > wrote:
> > >>>>>>>>>
> > >>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The
> > >>>>>>>>>> current workflow will be like the following:
> > >>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
> > >>>>>>>>>> creatVMSnapshot -> send CreateVMSnapshotCommand to
> > hypervisor to create vm snapshot.
> > >>>>>>>>>>
> > >>>>>>>>>> If anybody wants to change the workflow, then need to either
> > >>>>>>>>>> change VMSnapshotManagerImpl directly or subclass
> > VMSnapshotManagerImpl.
> > >>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl
> > >>>>>>>>>> should be able to handle different ways to take vm snapshot,
> > instead of hard code.
> > >>>>>>>>>>
> > >>>>>>>>>> The requirements for the pluggable VM snapshot coming from:
> > >>>>>>>>>> Storage vendor may have their optimization, such as NetApp.
> > >>>>>>>>>> VM snapshot can be implemented in a totally different way(For
> > >>>>>>>>>> example, I could just send a command to guest VM, to tell my
> > >>>>>>>>>> application to flush disk and hold disk write, then come to
> > >>>>>>>>>> hypervisor to
> > >>>>>>>> take a volume snapshot).
> > >>>>>>>>>>
> > >>>>>>>>>> If we agree on enable pluggable VM snapshot, then we can
> > move
> > >>>>>>>>>> on discuss how to implement it.
> > >>>>>>>>>>
> > >>>>>>>>>> The possible options:
> > >>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy
> > >>>>>>>>>> interface, which has the following interfaces:
> > >>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
> > Boolean
> > >>>>>>>>>> revertVMSnapshot(VMSnapshot vmSnapshot); Boolean
> > >>>>>>>>>> DeleteVMSnapshot(VMSnapshot vmSnapshot);
> > >>>>>>>>>>
> > >>>>>>>>>> The work flow will be: createVMSnapshot api ->
> > >>>>>>>> VMSnapshotManagerImpl:
> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
> > >>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity
> > >>>>>>>>>> check, then will handle over to VMSnapshotStrategy.
> > >>>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
> > >>>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor
> > host, or
> > >>>>>>>>>> do anything special operations.
> > >>>>>>>>>>
> > >>>>>>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy
> > >>>>>>>>>> interface, but also add certain methods on the storage driver.
> > >>>>>>>>>> The VMSnapshotStrategy interface will be the same as option 1.
> > >>>>>>>>>> Will add the following methods on storage driver:
> > >>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM that
> > >>>>>>>>>> created on this storage, storage vendor can either take one
> > >>>>>>>>>> snapshot for this volumes in one shot, or take snapshot for
> > each volume separately
> > >>>>>>>>>>    The pre-condition: vm is unquiesced.
> > >>>>>>>>>>    It will return a Boolean to indicate, do need unquiesce vm
> or
> > not.
> > >>>>>>>>>>    In the default storage driver, it will return false.
> > >>>>>>>>>> */
> > >>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo>
> > volumesBelongToVM,
> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> > >>>>>>>>>> revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> > >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> > >>>>>>>>>> deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> > >>>>>>>>>> VMSnapshot vmSNapshot);
> > >>>>>>>>>>
> > >>>>>>>>>> The work flow will be: createVMSnapshot api ->
> > >>>>>>>> VMSnapshotManagerImpl:
> > >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot ->
> > >>>>>>>>>> storage driver:takeVMSnapshot In the implementation of
> > >>>>>>>>>> VMSnapshotStrategy's takeVMSnapshot, the pseudo code
> > looks like:
> > >>>>>>>>>>    HypervisorHelper.quiesceVM(vm);
> > >>>>>>>>>>    val volumes = vm.getVolumes();
> > >>>>>>>>>>    val maps = new Map[driver, list[VolumeInfo]]();
> > >>>>>>>>>>    Volumes.foreach(volume => maps.put(volume.getDriver,
> > volume ::
> > >>>>>>>>>> maps.get(volume.getdriver())))
> > >>>>>>>>>>    val needUnquiesce = true;
> > >>>>>>>>>>     maps.foreach((driver, volumes) => needUnquiesce  =
> > >>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
> > >>>>>>>>>>   if (needUnquiesce ) {
> > >>>>>>>>>>    HypervisorHelper.unquiesce(vm); }
> > >>>>>>>>>>
> > >>>>>>>>>> By default, the quiesceVM in HypervisorHelper will actually
> > >>>>>>>>>> take vm snapshot through hypervisor.
> > >>>>>>>>>> Does above logic makes senesce?
> > >>>>>>>>>>
> > >>>>>>>>>> The pros of option 1 is that: it's simple, no need to change
> > >>>>>>>>>> storage driver interfaces. The cons is that each storage
> > >>>>>>>>>> vendor need to implement a strategy, maybe they will do the
> > same thing.
> > >>>>>>>>>> The pros of option 2 is that, storage driver won't need to
> > >>>>>>>>>> worry about how to quiesce/unquiesce vm. The cons is that, it
> > >>>>>>>>>> will add these methods on each storage drivers, so it assumes
> > >>>>>>>>>> that this work flow will work for everybody.
> > >>>>>>>>>>
> > >>>>>>>>>> So which option we should take? Or if you have other options,
> > >>>>>>>>>> please let's know.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> *Mike Tutkowski*
> > >>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
> > >>>>>>>>> e: mike.tutkowski@solidfire.com
> > >>>>>>>>> o: 303.746.7302
> > >>>>>>>>> Advancing the way the world uses the
> > >>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
> > >>>>>>>>> *(tm)*
> > >>>>>>>
> > >>>>>
> > >>>
> > >
>



-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*

RE: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Edison Su <Ed...@citrix.com>.

> -----Original Message-----
> From: Darren Shepherd [mailto:darren.s.shepherd@gmail.com]
> Sent: Tuesday, October 08, 2013 2:54 PM
> To: dev@cloudstack.apache.org
> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
> 
> A hypervisor snapshot will snapshot memory also.  So determining whether
The memory is optional for hypervisor vm snapshot, a.k.a, the "Disk-only snapshots": http://support.citrix.com/proddocs/topic/xencenter-61/xs-xc-vms-snapshots-about.html
It's supported by both xenserver/kvm/vmware.

> do to the hypervisor snapshot from the quiesce option does not seem
> proper.
> 
> Sorry, for all the questions, I'm trying to get to the point of understand if this
> functionality makes sense at this point of code or if maybe their is a different
> approach.  This is what I'm seeing, what if we state it this way
> 
> 1) VM snapshot, AFAIK, are not backed up today and exist solely on primary.
> What if we added a backup phase to VM snapshots that can be optionally
> supported by the storage providers to possibly backup the VM snapshot
> volumes.
It's not about backup vm snapshot, it's about how to take vm snapshot.
Usually, take/revert vm snapshot is handled by hypervisor itself, but in NetApp(or other storage vendor) case,
They want to change the default behavior of hypervisor-base vm snapshot.

Some examples:
1. take hypervisor based vm snapshots, on primary storage, hypervisor will maintain the snapshot chain.
2. take vm snapshot through NetApp:
     a. first, quiesce VM if user specified. There is no separate API to quiesce VM on the hypervisor, so here we will
take a VM snapshot through hypervisor API call, hypervisor will take volume snapshot  on each volume of the VM. Let's say, on the primary storage, the disk chain looks like:
           base-image
                    |
                    V
                Parent disk
            /                         \
          V                            V
        Current disk        snapshot-a
     b. from snapshot-a, find out its parent disk, then take snapshot through NetApp 
     c. un- quiesce VM, here, go to hypervisor, delete snapshot "snapshot-a", hypervisor should be able to consolidate current disk and "parent disk" into one disk, thus from hypervisor point of view
, there is always, at most, only one snapshot for the VM.
    For revert VM snapshot, as long as the VM is stopped, NetApp can revert the snapshot created on NetApp storage easily, and efficiently.
   The benefit of this whole process, as Chris pointed out, if the snapshot chain is quite long, hypervisor based VM snapshot will get performance hit.

> 
> 2) Additionally you want to be able to backup multiple disks at once,
> regardless of VM snapshot.  Why don't we add the ability to put volumeIds in
> snapshot cmd that if the storage provider supports it will get a batch of
> volumeIds.
> 
> Now I know we talked about 2 and there was some concerns about it (mostly
> from me), but I think we could work through those concerns (forgot what
> they were...).  Right now I just get the feeling we are shoehorning some
> functionality into VM snapshot that isn't quite the right fit.  The "no quiesce"
> flow just doesn't seem to make sense to me.


Not sure above NetApp proposed work flow makes sense to you or to other body or not. If this work flow is only specific to NetApp, then we don't need to enforce the whole process for everybody.

> 
> Darren
> 
> On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
> <Ch...@netapp.com> wrote:
> > Whether the hypervisor snapshot happens depends on whether the
> 'quiesce' option is specified with the snapshot request. If a user doesn't care
> about the consistency of their backup, then the hypervisor snapshot/quiesce
> step can be skipped altogether. This of course is not the case if the default
> provider is being used, in which case a hypervisor snapshot is the only way of
> creating a backup since it can't be offloaded to the storage driver.
> >
> > --
> > Chris Suich
> > chris.suich@netapp.com
> > NetApp Software Engineer
> > Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >
> > On Oct 8, 2013, at 4:57 PM, Darren Shepherd
> > <da...@gmail.com>
> >  wrote:
> >
> >> Who is going to decide whether the hypervisor snapshot should
> >> actually happen or not? Or how?
> >>
> >> Darren
> >>
> >> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
> >> <Ch...@netapp.com> wrote:
> >>>
> >>> --
> >>> Chris Suich
> >>> chris.suich@netapp.com
> >>> NetApp Software Engineer
> >>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>
> >>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd
> <da...@gmail.com> wrote:
> >>>
> >>>> So in the implementation, when we say "quiesce" is that actually
> >>>> being implemented as a VM snapshot (memory and disk).  And then
> >>>> when you say "unquiesce" you are talking about deleting the VM
> snapshot?
> >>>
> >>> If the VM snapshot is not going to the hypervisor, then yes, it will
> actually be a hypervisor snapshot. Just to be clear, the unquiesce is not quite
> a delete - it is a collapse of the VM snapshot and the active VM back into one
> file.
> >>>
> >>>>
> >>>> In NetApp, what are you snapshotting?  The whole netapp volume (I
> >>>> don't know the correct term), a file on NFS, an iscsi volume?  I
> >>>> don't know a whole heck of a lot about the netapp snapshot
> capabilities.
> >>>
> >>> Essentially we are using internal APIs to create file level backups - don't
> worry too much about the terminology.
> >>>
> >>>>
> >>>> I know storage solutions can snapshot better and faster than
> >>>> hypervisors can with COW files.  I've personally just been always
> >>>> perplexed on whats the best way to implement it.  For storage
> >>>> solutions that are block based, its really easy to have the storage
> >>>> doing the snapshot.  For shared file systems, like NFS, its seems
> >>>> way more complicated as you don't want to snapshot the entire
> >>>> filesystem in order to snapshot one file.
> >>>
> >>> With filesystems like NFS, things are certainly more complicated, but that
> is taken care of by our controller's operating system, Data ONTAP, and we
> simply use APIs to communicate with it.
> >>>
> >>>>
> >>>> Darren
> >>>>
> >>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
> >>>> <Ch...@netapp.com> wrote:
> >>>>> I can comment on the second half.
> >>>>>
> >>>>> Through storage operations, storage providers can create backups
> much faster than hypervisors and over time, their snapshots are more
> efficient than the snapshot chains that hypervisors create. It is true that a VM
> snapshot taken at the storage level is slightly different as it would be psuedo-
> quiesced, not have it's memory snapshotted. This is accomplished through
> hypervisor snapshots:
> >>>>>
> >>>>> 1) VM snapshot request (lets say VM 'A'
> >>>>> 2) Create hypervisor snapshot (optional) -VM 'A' is snapshotted,
> >>>>> creating active VM 'A*'
> >>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*'
> >>>>> 3) Storage driver(s) take snapshots of each volume
> >>>>> 4) Undo hypervisor snapshot (optional) -VM snapshot 'A' is rolled
> >>>>> back into VM 'A*' so the hypervisor snapshot no longer exists
> >>>>>
> >>>>> Now, a couple notes:
> >>>>> -The reason this is optional is that not all users necessarily care about
> the memory or disk consistency of their VMs and would prefer faster
> snapshots to consistency.
> >>>>> -Preemptively, yes, we are actually taking hypervisor snapshots which
> means there isn't actually a performance of taking storage snapshots when
> quiescing the VM. However, the performance gain will come both during
> restoring the VM and during normal operations as described above.
> >>>>>
> >>>>> Although you can think of it as a poor man's VM snapshot, I would
> think of it more as a consistent multi-volume snapshot. Again, the difference
> being that this snapshot was not truly quiesced like a hypervisor snapshot
> would be.
> >>>>>
> >>>>> --
> >>>>> Chris Suich
> >>>>> chris.suich@netapp.com
> >>>>> NetApp Software Engineer
> >>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>>>
> >>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd
> <da...@gmail.com> wrote:
> >>>>>
> >>>>>> My only comment is that having the return type as boolean and
> >>>>>> using to that indicate quiesce behaviour seems obscure and will
> >>>>>> probably lead to a problem later.  Your basically saying the
> >>>>>> result of the takeVMSnapshot will only ever need to communicate
> >>>>>> back whether unquiesce needs to happen.  Maybe some result
> object
> >>>>>> would be more extensible.
> >>>>>>
> >>>>>> Actually, I think I have more comments.  This seems a bit odd to me.
> >>>>>> Why would a storage driver in ACS implement a VM snapshot
> >>>>>> functionality?  VM snapshot is a really a hypervisor orchestrated
> >>>>>> operation.  So it seems like were trying to implement a poor mans
> >>>>>> VM snapshot.  Maybe if I understood what NetApp was trying to do
> >>>>>> it would make more sense, but its all odd.  To do a proper VM
> >>>>>> snapshot you need to snapshot memory and disk at the exact same
> >>>>>> time.  How are we going to do that if ACS is orchestrating the VM
> >>>>>> snapshot and delegating to storage providers.  Its not like you
> >>>>>> are going to pause the VM.... or are you?
> >>>>>>
> >>>>>> Darren
> >>>>>>
> >>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <Ed...@citrix.com>
> wrote:
> >>>>>>> I created a design document page at
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+s
> napshot+related+operations, feel free to add items on it.
> >>>>>>> And a new branch "pluggable_vm_snapshot" is created.
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
> >>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
> >>>>>>>> To: <de...@cloudstack.apache.org>
> >>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
> >>>>>>>>
> >>>>>>>> I'm a fan of option 2 - this gives us the most flexibility (as
> >>>>>>>> you stated). The option is given to completely override the way
> >>>>>>>> VM snapshots work AND storage providers are given to
> >>>>>>>> opportunity to work within the default VM snapshot workflow.
> >>>>>>>>
> >>>>>>>> I believe this option should satisfy your concern, Mike. The
> >>>>>>>> snapshot and quiesce strategy would be in charge of
> communicating with the hypervisor.
> >>>>>>>> Storage providers should be able to leverage the default
> >>>>>>>> strategies and simply perform the storage operations.
> >>>>>>>>
> >>>>>>>> I don't think it should be much of an issue that new method to
> >>>>>>>> the storage driver interface may not apply to everyone. In fact,
> that is already the case.
> >>>>>>>> Some methods such as un/maintain(), attachToXXX() and
> >>>>>>>> takeSnapshot() are already not implemented by every driver -
> >>>>>>>> they just return false when asked if they can handle the operation.
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Chris Suich
> >>>>>>>> chris.suich@netapp.com
> >>>>>>>> NetApp Software Engineer
> >>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>>>>>>
> >>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski
> >>>>>>>> <mi...@solidfire.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Well, my first thought on this is that the storage driver
> >>>>>>>>> should not be telling the hypervisor to do anything. It should
> >>>>>>>>> be responsible for creating/deleting volumes, snapshots, etc. on
> its storage system only.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Ed...@citrix.com>
> wrote:
> >>>>>>>>>
> >>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The
> >>>>>>>>>> current workflow will be like the following:
> >>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl:
> >>>>>>>>>> creatVMSnapshot -> send CreateVMSnapshotCommand to
> hypervisor to create vm snapshot.
> >>>>>>>>>>
> >>>>>>>>>> If anybody wants to change the workflow, then need to either
> >>>>>>>>>> change VMSnapshotManagerImpl directly or subclass
> VMSnapshotManagerImpl.
> >>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl
> >>>>>>>>>> should be able to handle different ways to take vm snapshot,
> instead of hard code.
> >>>>>>>>>>
> >>>>>>>>>> The requirements for the pluggable VM snapshot coming from:
> >>>>>>>>>> Storage vendor may have their optimization, such as NetApp.
> >>>>>>>>>> VM snapshot can be implemented in a totally different way(For
> >>>>>>>>>> example, I could just send a command to guest VM, to tell my
> >>>>>>>>>> application to flush disk and hold disk write, then come to
> >>>>>>>>>> hypervisor to
> >>>>>>>> take a volume snapshot).
> >>>>>>>>>>
> >>>>>>>>>> If we agree on enable pluggable VM snapshot, then we can
> move
> >>>>>>>>>> on discuss how to implement it.
> >>>>>>>>>>
> >>>>>>>>>> The possible options:
> >>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy
> >>>>>>>>>> interface, which has the following interfaces:
> >>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
> Boolean
> >>>>>>>>>> revertVMSnapshot(VMSnapshot vmSnapshot); Boolean
> >>>>>>>>>> DeleteVMSnapshot(VMSnapshot vmSnapshot);
> >>>>>>>>>>
> >>>>>>>>>> The work flow will be: createVMSnapshot api ->
> >>>>>>>> VMSnapshotManagerImpl:
> >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
> >>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity
> >>>>>>>>>> check, then will handle over to VMSnapshotStrategy.
> >>>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
> >>>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor
> host, or
> >>>>>>>>>> do anything special operations.
> >>>>>>>>>>
> >>>>>>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy
> >>>>>>>>>> interface, but also add certain methods on the storage driver.
> >>>>>>>>>> The VMSnapshotStrategy interface will be the same as option 1.
> >>>>>>>>>> Will add the following methods on storage driver:
> >>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM that
> >>>>>>>>>> created on this storage, storage vendor can either take one
> >>>>>>>>>> snapshot for this volumes in one shot, or take snapshot for
> each volume separately
> >>>>>>>>>>    The pre-condition: vm is unquiesced.
> >>>>>>>>>>    It will return a Boolean to indicate, do need unquiesce vm or
> not.
> >>>>>>>>>>    In the default storage driver, it will return false.
> >>>>>>>>>> */
> >>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo>
> volumesBelongToVM,
> >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> >>>>>>>>>> revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> >>>>>>>>>> VMSnapshot vmSnapshot); Boolean
> >>>>>>>>>> deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> >>>>>>>>>> VMSnapshot vmSNapshot);
> >>>>>>>>>>
> >>>>>>>>>> The work flow will be: createVMSnapshot api ->
> >>>>>>>> VMSnapshotManagerImpl:
> >>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot ->
> >>>>>>>>>> storage driver:takeVMSnapshot In the implementation of
> >>>>>>>>>> VMSnapshotStrategy's takeVMSnapshot, the pseudo code
> looks like:
> >>>>>>>>>>    HypervisorHelper.quiesceVM(vm);
> >>>>>>>>>>    val volumes = vm.getVolumes();
> >>>>>>>>>>    val maps = new Map[driver, list[VolumeInfo]]();
> >>>>>>>>>>    Volumes.foreach(volume => maps.put(volume.getDriver,
> volume ::
> >>>>>>>>>> maps.get(volume.getdriver())))
> >>>>>>>>>>    val needUnquiesce = true;
> >>>>>>>>>>     maps.foreach((driver, volumes) => needUnquiesce  =
> >>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
> >>>>>>>>>>   if (needUnquiesce ) {
> >>>>>>>>>>    HypervisorHelper.unquiesce(vm); }
> >>>>>>>>>>
> >>>>>>>>>> By default, the quiesceVM in HypervisorHelper will actually
> >>>>>>>>>> take vm snapshot through hypervisor.
> >>>>>>>>>> Does above logic makes senesce?
> >>>>>>>>>>
> >>>>>>>>>> The pros of option 1 is that: it's simple, no need to change
> >>>>>>>>>> storage driver interfaces. The cons is that each storage
> >>>>>>>>>> vendor need to implement a strategy, maybe they will do the
> same thing.
> >>>>>>>>>> The pros of option 2 is that, storage driver won't need to
> >>>>>>>>>> worry about how to quiesce/unquiesce vm. The cons is that, it
> >>>>>>>>>> will add these methods on each storage drivers, so it assumes
> >>>>>>>>>> that this work flow will work for everybody.
> >>>>>>>>>>
> >>>>>>>>>> So which option we should take? Or if you have other options,
> >>>>>>>>>> please let's know.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> *Mike Tutkowski*
> >>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
> >>>>>>>>> e: mike.tutkowski@solidfire.com
> >>>>>>>>> o: 303.746.7302
> >>>>>>>>> Advancing the way the world uses the
> >>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
> >>>>>>>>> *(tm)*
> >>>>>>>
> >>>>>
> >>>
> >

Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Darren Shepherd <da...@gmail.com>.
A hypervisor snapshot will snapshot memory also.  So determining
whether do to the hypervisor snapshot from the quiesce option does not
seem proper.

Sorry, for all the questions, I'm trying to get to the point of
understand if this functionality makes sense at this point of code or
if maybe their is a different approach.  This is what I'm seeing, what
if we state it this way

1) VM snapshot, AFAIK, are not backed up today and exist solely on
primary.  What if we added a backup phase to VM snapshots that can be
optionally supported by the storage providers to possibly backup the
VM snapshot volumes.

2) Additionally you want to be able to backup multiple disks at once,
regardless of VM snapshot.  Why don't we add the ability to put
volumeIds in snapshot cmd that if the storage provider supports it
will get a batch of volumeIds.

Now I know we talked about 2 and there was some concerns about it
(mostly from me), but I think we could work through those concerns
(forgot what they were...).  Right now I just get the feeling we are
shoehorning some functionality into VM snapshot that isn't quite the
right fit.  The "no quiesce" flow just doesn't seem to make sense to
me.

Darren

On Tue, Oct 8, 2013 at 2:05 PM, SuichII, Christopher
<Ch...@netapp.com> wrote:
> Whether the hypervisor snapshot happens depends on whether the 'quiesce' option is specified with the snapshot request. If a user doesn't care about the consistency of their backup, then the hypervisor snapshot/quiesce step can be skipped altogether. This of course is not the case if the default provider is being used, in which case a hypervisor snapshot is the only way of creating a backup since it can't be offloaded to the storage driver.
>
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Oct 8, 2013, at 4:57 PM, Darren Shepherd <da...@gmail.com>
>  wrote:
>
>> Who is going to decide whether the hypervisor snapshot should actually
>> happen or not? Or how?
>>
>> Darren
>>
>> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
>> <Ch...@netapp.com> wrote:
>>>
>>> --
>>> Chris Suich
>>> chris.suich@netapp.com
>>> NetApp Software Engineer
>>> Data Center Platforms – Cloud Solutions
>>> Citrix, Cisco & Red Hat
>>>
>>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd <da...@gmail.com> wrote:
>>>
>>>> So in the implementation, when we say "quiesce" is that actually being
>>>> implemented as a VM snapshot (memory and disk).  And then when you say
>>>> "unquiesce" you are talking about deleting the VM snapshot?
>>>
>>> If the VM snapshot is not going to the hypervisor, then yes, it will actually be a hypervisor snapshot. Just to be clear, the unquiesce is not quite a delete - it is a collapse of the VM snapshot and the active VM back into one file.
>>>
>>>>
>>>> In NetApp, what are you snapshotting?  The whole netapp volume (I
>>>> don't know the correct term), a file on NFS, an iscsi volume?  I don't
>>>> know a whole heck of a lot about the netapp snapshot capabilities.
>>>
>>> Essentially we are using internal APIs to create file level backups - don't worry too much about the terminology.
>>>
>>>>
>>>> I know storage solutions can snapshot better and faster than
>>>> hypervisors can with COW files.  I've personally just been always
>>>> perplexed on whats the best way to implement it.  For storage
>>>> solutions that are block based, its really easy to have the storage
>>>> doing the snapshot.  For shared file systems, like NFS, its seems way
>>>> more complicated as you don't want to snapshot the entire filesystem
>>>> in order to snapshot one file.
>>>
>>> With filesystems like NFS, things are certainly more complicated, but that is taken care of by our controller's operating system, Data ONTAP, and we simply use APIs to communicate with it.
>>>
>>>>
>>>> Darren
>>>>
>>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
>>>> <Ch...@netapp.com> wrote:
>>>>> I can comment on the second half.
>>>>>
>>>>> Through storage operations, storage providers can create backups much faster than hypervisors and over time, their snapshots are more efficient than the snapshot chains that hypervisors create. It is true that a VM snapshot taken at the storage level is slightly different as it would be psuedo-quiesced, not have it's memory snapshotted. This is accomplished through hypervisor snapshots:
>>>>>
>>>>> 1) VM snapshot request (lets say VM 'A'
>>>>> 2) Create hypervisor snapshot (optional)
>>>>> -VM 'A' is snapshotted, creating active VM 'A*'
>>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*'
>>>>> 3) Storage driver(s) take snapshots of each volume
>>>>> 4) Undo hypervisor snapshot (optional)
>>>>> -VM snapshot 'A' is rolled back into VM 'A*' so the hypervisor snapshot no longer exists
>>>>>
>>>>> Now, a couple notes:
>>>>> -The reason this is optional is that not all users necessarily care about the memory or disk consistency of their VMs and would prefer faster snapshots to consistency.
>>>>> -Preemptively, yes, we are actually taking hypervisor snapshots which means there isn't actually a performance of taking storage snapshots when quiescing the VM. However, the performance gain will come both during restoring the VM and during normal operations as described above.
>>>>>
>>>>> Although you can think of it as a poor man's VM snapshot, I would think of it more as a consistent multi-volume snapshot. Again, the difference being that this snapshot was not truly quiesced like a hypervisor snapshot would be.
>>>>>
>>>>> --
>>>>> Chris Suich
>>>>> chris.suich@netapp.com
>>>>> NetApp Software Engineer
>>>>> Data Center Platforms – Cloud Solutions
>>>>> Citrix, Cisco & Red Hat
>>>>>
>>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd <da...@gmail.com> wrote:
>>>>>
>>>>>> My only comment is that having the return type as boolean and using to
>>>>>> that indicate quiesce behaviour seems obscure and will probably lead
>>>>>> to a problem later.  Your basically saying the result of the
>>>>>> takeVMSnapshot will only ever need to communicate back whether
>>>>>> unquiesce needs to happen.  Maybe some result object would be more
>>>>>> extensible.
>>>>>>
>>>>>> Actually, I think I have more comments.  This seems a bit odd to me.
>>>>>> Why would a storage driver in ACS implement a VM snapshot
>>>>>> functionality?  VM snapshot is a really a hypervisor orchestrated
>>>>>> operation.  So it seems like were trying to implement a poor mans VM
>>>>>> snapshot.  Maybe if I understood what NetApp was trying to do it would
>>>>>> make more sense, but its all odd.  To do a proper VM snapshot you need
>>>>>> to snapshot memory and disk at the exact same time.  How are we going
>>>>>> to do that if ACS is orchestrating the VM snapshot and delegating to
>>>>>> storage providers.  Its not like you are going to pause the VM.... or
>>>>>> are you?
>>>>>>
>>>>>> Darren
>>>>>>
>>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <Ed...@citrix.com> wrote:
>>>>>>> I created a design document page at https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+snapshot+related+operations, feel free to add items on it.
>>>>>>> And a new branch "pluggable_vm_snapshot" is created.
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
>>>>>>>> To: <de...@cloudstack.apache.org>
>>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>>>>>>>>
>>>>>>>> I'm a fan of option 2 - this gives us the most flexibility (as you stated). The
>>>>>>>> option is given to completely override the way VM snapshots work AND
>>>>>>>> storage providers are given to opportunity to work within the default VM
>>>>>>>> snapshot workflow.
>>>>>>>>
>>>>>>>> I believe this option should satisfy your concern, Mike. The snapshot and
>>>>>>>> quiesce strategy would be in charge of communicating with the hypervisor.
>>>>>>>> Storage providers should be able to leverage the default strategies and
>>>>>>>> simply perform the storage operations.
>>>>>>>>
>>>>>>>> I don't think it should be much of an issue that new method to the storage
>>>>>>>> driver interface may not apply to everyone. In fact, that is already the case.
>>>>>>>> Some methods such as un/maintain(), attachToXXX() and takeSnapshot() are
>>>>>>>> already not implemented by every driver - they just return false when asked
>>>>>>>> if they can handle the operation.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Chris Suich
>>>>>>>> chris.suich@netapp.com
>>>>>>>> NetApp Software Engineer
>>>>>>>> Data Center Platforms - Cloud Solutions
>>>>>>>> Citrix, Cisco & Red Hat
>>>>>>>>
>>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski <mi...@solidfire.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Well, my first thought on this is that the storage driver should not
>>>>>>>>> be telling the hypervisor to do anything. It should be responsible for
>>>>>>>>> creating/deleting volumes, snapshots, etc. on its storage system only.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Ed...@citrix.com> wrote:
>>>>>>>>>
>>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The current
>>>>>>>>>> workflow will be like the following:
>>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot ->
>>>>>>>>>> send CreateVMSnapshotCommand to hypervisor to create vm snapshot.
>>>>>>>>>>
>>>>>>>>>> If anybody wants to change the workflow, then need to either change
>>>>>>>>>> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl.
>>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl should be
>>>>>>>>>> able to handle different ways to take vm snapshot, instead of hard code.
>>>>>>>>>>
>>>>>>>>>> The requirements for the pluggable VM snapshot coming from:
>>>>>>>>>> Storage vendor may have their optimization, such as NetApp.
>>>>>>>>>> VM snapshot can be implemented in a totally different way(For
>>>>>>>>>> example, I could just send a command to guest VM, to tell my
>>>>>>>>>> application to flush disk and hold disk write, then come to hypervisor to
>>>>>>>> take a volume snapshot).
>>>>>>>>>>
>>>>>>>>>> If we agree on enable pluggable VM snapshot, then we can move on
>>>>>>>>>> discuss how to implement it.
>>>>>>>>>>
>>>>>>>>>> The possible options:
>>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy interface,
>>>>>>>>>> which has the following interfaces:
>>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>>> Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>>> Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>>>
>>>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>>>> VMSnapshotManagerImpl:
>>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity check,
>>>>>>>>>> then will handle over to VMSnapshotStrategy.
>>>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
>>>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor host, or do
>>>>>>>>>> anything special operations.
>>>>>>>>>>
>>>>>>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy
>>>>>>>>>> interface, but also add certain methods on the storage driver.
>>>>>>>>>> The VMSnapshotStrategy interface will be the same as option 1.
>>>>>>>>>> Will add the following methods on storage driver:
>>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM that created
>>>>>>>>>> on this storage, storage vendor can either take one snapshot for this
>>>>>>>>>> volumes in one shot, or take snapshot for each volume separately
>>>>>>>>>>    The pre-condition: vm is unquiesced.
>>>>>>>>>>    It will return a Boolean to indicate, do need unquiesce vm or not.
>>>>>>>>>>    In the default storage driver, it will return false.
>>>>>>>>>> */
>>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>>> VMSnapshot vmSnapshot);
>>>>>>>>>> Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>>> VMSnapshot vmSnapshot);
>>>>>>>>>> Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>>> VMSnapshot vmSNapshot);
>>>>>>>>>>
>>>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>>>> VMSnapshotManagerImpl:
>>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
>>>>>>>>>> driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's
>>>>>>>>>> takeVMSnapshot, the pseudo code looks like:
>>>>>>>>>>    HypervisorHelper.quiesceVM(vm);
>>>>>>>>>>    val volumes = vm.getVolumes();
>>>>>>>>>>    val maps = new Map[driver, list[VolumeInfo]]();
>>>>>>>>>>    Volumes.foreach(volume => maps.put(volume.getDriver, volume ::
>>>>>>>>>> maps.get(volume.getdriver())))
>>>>>>>>>>    val needUnquiesce = true;
>>>>>>>>>>     maps.foreach((driver, volumes) => needUnquiesce  =
>>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>>>>>>>>>>   if (needUnquiesce ) {
>>>>>>>>>>    HypervisorHelper.unquiesce(vm);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> By default, the quiesceVM in HypervisorHelper will actually take vm
>>>>>>>>>> snapshot through hypervisor.
>>>>>>>>>> Does above logic makes senesce?
>>>>>>>>>>
>>>>>>>>>> The pros of option 1 is that: it's simple, no need to change storage
>>>>>>>>>> driver interfaces. The cons is that each storage vendor need to
>>>>>>>>>> implement a strategy, maybe they will do the same thing.
>>>>>>>>>> The pros of option 2 is that, storage driver won't need to worry
>>>>>>>>>> about how to quiesce/unquiesce vm. The cons is that, it will add
>>>>>>>>>> these methods on each storage drivers, so it assumes that this work
>>>>>>>>>> flow will work for everybody.
>>>>>>>>>>
>>>>>>>>>> So which option we should take? Or if you have other options, please
>>>>>>>>>> let's know.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Mike Tutkowski*
>>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>>>>>> e: mike.tutkowski@solidfire.com
>>>>>>>>> o: 303.746.7302
>>>>>>>>> Advancing the way the world uses the
>>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>>>>>> *(tm)*
>>>>>>>
>>>>>
>>>
>

Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by "SuichII, Christopher" <Ch...@netapp.com>.
Whether the hypervisor snapshot happens depends on whether the 'quiesce' option is specified with the snapshot request. If a user doesn't care about the consistency of their backup, then the hypervisor snapshot/quiesce step can be skipped altogether. This of course is not the case if the default provider is being used, in which case a hypervisor snapshot is the only way of creating a backup since it can't be offloaded to the storage driver.

-- 
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Oct 8, 2013, at 4:57 PM, Darren Shepherd <da...@gmail.com>
 wrote:

> Who is going to decide whether the hypervisor snapshot should actually
> happen or not? Or how?
> 
> Darren
> 
> On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
> <Ch...@netapp.com> wrote:
>> 
>> --
>> Chris Suich
>> chris.suich@netapp.com
>> NetApp Software Engineer
>> Data Center Platforms – Cloud Solutions
>> Citrix, Cisco & Red Hat
>> 
>> On Oct 8, 2013, at 2:24 PM, Darren Shepherd <da...@gmail.com> wrote:
>> 
>>> So in the implementation, when we say "quiesce" is that actually being
>>> implemented as a VM snapshot (memory and disk).  And then when you say
>>> "unquiesce" you are talking about deleting the VM snapshot?
>> 
>> If the VM snapshot is not going to the hypervisor, then yes, it will actually be a hypervisor snapshot. Just to be clear, the unquiesce is not quite a delete - it is a collapse of the VM snapshot and the active VM back into one file.
>> 
>>> 
>>> In NetApp, what are you snapshotting?  The whole netapp volume (I
>>> don't know the correct term), a file on NFS, an iscsi volume?  I don't
>>> know a whole heck of a lot about the netapp snapshot capabilities.
>> 
>> Essentially we are using internal APIs to create file level backups - don't worry too much about the terminology.
>> 
>>> 
>>> I know storage solutions can snapshot better and faster than
>>> hypervisors can with COW files.  I've personally just been always
>>> perplexed on whats the best way to implement it.  For storage
>>> solutions that are block based, its really easy to have the storage
>>> doing the snapshot.  For shared file systems, like NFS, its seems way
>>> more complicated as you don't want to snapshot the entire filesystem
>>> in order to snapshot one file.
>> 
>> With filesystems like NFS, things are certainly more complicated, but that is taken care of by our controller's operating system, Data ONTAP, and we simply use APIs to communicate with it.
>> 
>>> 
>>> Darren
>>> 
>>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
>>> <Ch...@netapp.com> wrote:
>>>> I can comment on the second half.
>>>> 
>>>> Through storage operations, storage providers can create backups much faster than hypervisors and over time, their snapshots are more efficient than the snapshot chains that hypervisors create. It is true that a VM snapshot taken at the storage level is slightly different as it would be psuedo-quiesced, not have it's memory snapshotted. This is accomplished through hypervisor snapshots:
>>>> 
>>>> 1) VM snapshot request (lets say VM 'A'
>>>> 2) Create hypervisor snapshot (optional)
>>>> -VM 'A' is snapshotted, creating active VM 'A*'
>>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*'
>>>> 3) Storage driver(s) take snapshots of each volume
>>>> 4) Undo hypervisor snapshot (optional)
>>>> -VM snapshot 'A' is rolled back into VM 'A*' so the hypervisor snapshot no longer exists
>>>> 
>>>> Now, a couple notes:
>>>> -The reason this is optional is that not all users necessarily care about the memory or disk consistency of their VMs and would prefer faster snapshots to consistency.
>>>> -Preemptively, yes, we are actually taking hypervisor snapshots which means there isn't actually a performance of taking storage snapshots when quiescing the VM. However, the performance gain will come both during restoring the VM and during normal operations as described above.
>>>> 
>>>> Although you can think of it as a poor man's VM snapshot, I would think of it more as a consistent multi-volume snapshot. Again, the difference being that this snapshot was not truly quiesced like a hypervisor snapshot would be.
>>>> 
>>>> --
>>>> Chris Suich
>>>> chris.suich@netapp.com
>>>> NetApp Software Engineer
>>>> Data Center Platforms – Cloud Solutions
>>>> Citrix, Cisco & Red Hat
>>>> 
>>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd <da...@gmail.com> wrote:
>>>> 
>>>>> My only comment is that having the return type as boolean and using to
>>>>> that indicate quiesce behaviour seems obscure and will probably lead
>>>>> to a problem later.  Your basically saying the result of the
>>>>> takeVMSnapshot will only ever need to communicate back whether
>>>>> unquiesce needs to happen.  Maybe some result object would be more
>>>>> extensible.
>>>>> 
>>>>> Actually, I think I have more comments.  This seems a bit odd to me.
>>>>> Why would a storage driver in ACS implement a VM snapshot
>>>>> functionality?  VM snapshot is a really a hypervisor orchestrated
>>>>> operation.  So it seems like were trying to implement a poor mans VM
>>>>> snapshot.  Maybe if I understood what NetApp was trying to do it would
>>>>> make more sense, but its all odd.  To do a proper VM snapshot you need
>>>>> to snapshot memory and disk at the exact same time.  How are we going
>>>>> to do that if ACS is orchestrating the VM snapshot and delegating to
>>>>> storage providers.  Its not like you are going to pause the VM.... or
>>>>> are you?
>>>>> 
>>>>> Darren
>>>>> 
>>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <Ed...@citrix.com> wrote:
>>>>>> I created a design document page at https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+snapshot+related+operations, feel free to add items on it.
>>>>>> And a new branch "pluggable_vm_snapshot" is created.
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>>>>>>> Sent: Monday, October 07, 2013 10:02 AM
>>>>>>> To: <de...@cloudstack.apache.org>
>>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>>>>>>> 
>>>>>>> I'm a fan of option 2 - this gives us the most flexibility (as you stated). The
>>>>>>> option is given to completely override the way VM snapshots work AND
>>>>>>> storage providers are given to opportunity to work within the default VM
>>>>>>> snapshot workflow.
>>>>>>> 
>>>>>>> I believe this option should satisfy your concern, Mike. The snapshot and
>>>>>>> quiesce strategy would be in charge of communicating with the hypervisor.
>>>>>>> Storage providers should be able to leverage the default strategies and
>>>>>>> simply perform the storage operations.
>>>>>>> 
>>>>>>> I don't think it should be much of an issue that new method to the storage
>>>>>>> driver interface may not apply to everyone. In fact, that is already the case.
>>>>>>> Some methods such as un/maintain(), attachToXXX() and takeSnapshot() are
>>>>>>> already not implemented by every driver - they just return false when asked
>>>>>>> if they can handle the operation.
>>>>>>> 
>>>>>>> --
>>>>>>> Chris Suich
>>>>>>> chris.suich@netapp.com
>>>>>>> NetApp Software Engineer
>>>>>>> Data Center Platforms - Cloud Solutions
>>>>>>> Citrix, Cisco & Red Hat
>>>>>>> 
>>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski <mi...@solidfire.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Well, my first thought on this is that the storage driver should not
>>>>>>>> be telling the hypervisor to do anything. It should be responsible for
>>>>>>>> creating/deleting volumes, snapshots, etc. on its storage system only.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Ed...@citrix.com> wrote:
>>>>>>>> 
>>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The current
>>>>>>>>> workflow will be like the following:
>>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot ->
>>>>>>>>> send CreateVMSnapshotCommand to hypervisor to create vm snapshot.
>>>>>>>>> 
>>>>>>>>> If anybody wants to change the workflow, then need to either change
>>>>>>>>> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl.
>>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl should be
>>>>>>>>> able to handle different ways to take vm snapshot, instead of hard code.
>>>>>>>>> 
>>>>>>>>> The requirements for the pluggable VM snapshot coming from:
>>>>>>>>> Storage vendor may have their optimization, such as NetApp.
>>>>>>>>> VM snapshot can be implemented in a totally different way(For
>>>>>>>>> example, I could just send a command to guest VM, to tell my
>>>>>>>>> application to flush disk and hold disk write, then come to hypervisor to
>>>>>>> take a volume snapshot).
>>>>>>>>> 
>>>>>>>>> If we agree on enable pluggable VM snapshot, then we can move on
>>>>>>>>> discuss how to implement it.
>>>>>>>>> 
>>>>>>>>> The possible options:
>>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy interface,
>>>>>>>>> which has the following interfaces:
>>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>> Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>> Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>> 
>>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>>> VMSnapshotManagerImpl:
>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity check,
>>>>>>>>> then will handle over to VMSnapshotStrategy.
>>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
>>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor host, or do
>>>>>>>>> anything special operations.
>>>>>>>>> 
>>>>>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy
>>>>>>>>> interface, but also add certain methods on the storage driver.
>>>>>>>>> The VMSnapshotStrategy interface will be the same as option 1.
>>>>>>>>> Will add the following methods on storage driver:
>>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM that created
>>>>>>>>> on this storage, storage vendor can either take one snapshot for this
>>>>>>>>> volumes in one shot, or take snapshot for each volume separately
>>>>>>>>>    The pre-condition: vm is unquiesced.
>>>>>>>>>    It will return a Boolean to indicate, do need unquiesce vm or not.
>>>>>>>>>    In the default storage driver, it will return false.
>>>>>>>>> */
>>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>> VMSnapshot vmSnapshot);
>>>>>>>>> Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>> VMSnapshot vmSnapshot);
>>>>>>>>> Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>>> VMSnapshot vmSNapshot);
>>>>>>>>> 
>>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>>> VMSnapshotManagerImpl:
>>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
>>>>>>>>> driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's
>>>>>>>>> takeVMSnapshot, the pseudo code looks like:
>>>>>>>>>    HypervisorHelper.quiesceVM(vm);
>>>>>>>>>    val volumes = vm.getVolumes();
>>>>>>>>>    val maps = new Map[driver, list[VolumeInfo]]();
>>>>>>>>>    Volumes.foreach(volume => maps.put(volume.getDriver, volume ::
>>>>>>>>> maps.get(volume.getdriver())))
>>>>>>>>>    val needUnquiesce = true;
>>>>>>>>>     maps.foreach((driver, volumes) => needUnquiesce  =
>>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>>>>>>>>>   if (needUnquiesce ) {
>>>>>>>>>    HypervisorHelper.unquiesce(vm);
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> By default, the quiesceVM in HypervisorHelper will actually take vm
>>>>>>>>> snapshot through hypervisor.
>>>>>>>>> Does above logic makes senesce?
>>>>>>>>> 
>>>>>>>>> The pros of option 1 is that: it's simple, no need to change storage
>>>>>>>>> driver interfaces. The cons is that each storage vendor need to
>>>>>>>>> implement a strategy, maybe they will do the same thing.
>>>>>>>>> The pros of option 2 is that, storage driver won't need to worry
>>>>>>>>> about how to quiesce/unquiesce vm. The cons is that, it will add
>>>>>>>>> these methods on each storage drivers, so it assumes that this work
>>>>>>>>> flow will work for everybody.
>>>>>>>>> 
>>>>>>>>> So which option we should take? Or if you have other options, please
>>>>>>>>> let's know.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> *Mike Tutkowski*
>>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>>>>> e: mike.tutkowski@solidfire.com
>>>>>>>> o: 303.746.7302
>>>>>>>> Advancing the way the world uses the
>>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>>>>> *(tm)*
>>>>>> 
>>>> 
>> 


Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Darren Shepherd <da...@gmail.com>.
Who is going to decide whether the hypervisor snapshot should actually
happen or not? Or how?

Darren

On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher
<Ch...@netapp.com> wrote:
>
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Oct 8, 2013, at 2:24 PM, Darren Shepherd <da...@gmail.com> wrote:
>
>> So in the implementation, when we say "quiesce" is that actually being
>> implemented as a VM snapshot (memory and disk).  And then when you say
>> "unquiesce" you are talking about deleting the VM snapshot?
>
> If the VM snapshot is not going to the hypervisor, then yes, it will actually be a hypervisor snapshot. Just to be clear, the unquiesce is not quite a delete - it is a collapse of the VM snapshot and the active VM back into one file.
>
>>
>> In NetApp, what are you snapshotting?  The whole netapp volume (I
>> don't know the correct term), a file on NFS, an iscsi volume?  I don't
>> know a whole heck of a lot about the netapp snapshot capabilities.
>
> Essentially we are using internal APIs to create file level backups - don't worry too much about the terminology.
>
>>
>> I know storage solutions can snapshot better and faster than
>> hypervisors can with COW files.  I've personally just been always
>> perplexed on whats the best way to implement it.  For storage
>> solutions that are block based, its really easy to have the storage
>> doing the snapshot.  For shared file systems, like NFS, its seems way
>> more complicated as you don't want to snapshot the entire filesystem
>> in order to snapshot one file.
>
> With filesystems like NFS, things are certainly more complicated, but that is taken care of by our controller's operating system, Data ONTAP, and we simply use APIs to communicate with it.
>
>>
>> Darren
>>
>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
>> <Ch...@netapp.com> wrote:
>>> I can comment on the second half.
>>>
>>> Through storage operations, storage providers can create backups much faster than hypervisors and over time, their snapshots are more efficient than the snapshot chains that hypervisors create. It is true that a VM snapshot taken at the storage level is slightly different as it would be psuedo-quiesced, not have it's memory snapshotted. This is accomplished through hypervisor snapshots:
>>>
>>> 1) VM snapshot request (lets say VM 'A'
>>> 2) Create hypervisor snapshot (optional)
>>>  -VM 'A' is snapshotted, creating active VM 'A*'
>>>  -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*'
>>> 3) Storage driver(s) take snapshots of each volume
>>> 4) Undo hypervisor snapshot (optional)
>>>  -VM snapshot 'A' is rolled back into VM 'A*' so the hypervisor snapshot no longer exists
>>>
>>> Now, a couple notes:
>>> -The reason this is optional is that not all users necessarily care about the memory or disk consistency of their VMs and would prefer faster snapshots to consistency.
>>> -Preemptively, yes, we are actually taking hypervisor snapshots which means there isn't actually a performance of taking storage snapshots when quiescing the VM. However, the performance gain will come both during restoring the VM and during normal operations as described above.
>>>
>>> Although you can think of it as a poor man's VM snapshot, I would think of it more as a consistent multi-volume snapshot. Again, the difference being that this snapshot was not truly quiesced like a hypervisor snapshot would be.
>>>
>>> --
>>> Chris Suich
>>> chris.suich@netapp.com
>>> NetApp Software Engineer
>>> Data Center Platforms – Cloud Solutions
>>> Citrix, Cisco & Red Hat
>>>
>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd <da...@gmail.com> wrote:
>>>
>>>> My only comment is that having the return type as boolean and using to
>>>> that indicate quiesce behaviour seems obscure and will probably lead
>>>> to a problem later.  Your basically saying the result of the
>>>> takeVMSnapshot will only ever need to communicate back whether
>>>> unquiesce needs to happen.  Maybe some result object would be more
>>>> extensible.
>>>>
>>>> Actually, I think I have more comments.  This seems a bit odd to me.
>>>> Why would a storage driver in ACS implement a VM snapshot
>>>> functionality?  VM snapshot is a really a hypervisor orchestrated
>>>> operation.  So it seems like were trying to implement a poor mans VM
>>>> snapshot.  Maybe if I understood what NetApp was trying to do it would
>>>> make more sense, but its all odd.  To do a proper VM snapshot you need
>>>> to snapshot memory and disk at the exact same time.  How are we going
>>>> to do that if ACS is orchestrating the VM snapshot and delegating to
>>>> storage providers.  Its not like you are going to pause the VM.... or
>>>> are you?
>>>>
>>>> Darren
>>>>
>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <Ed...@citrix.com> wrote:
>>>>> I created a design document page at https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+snapshot+related+operations, feel free to add items on it.
>>>>> And a new branch "pluggable_vm_snapshot" is created.
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>>>>>> Sent: Monday, October 07, 2013 10:02 AM
>>>>>> To: <de...@cloudstack.apache.org>
>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>>>>>>
>>>>>> I'm a fan of option 2 - this gives us the most flexibility (as you stated). The
>>>>>> option is given to completely override the way VM snapshots work AND
>>>>>> storage providers are given to opportunity to work within the default VM
>>>>>> snapshot workflow.
>>>>>>
>>>>>> I believe this option should satisfy your concern, Mike. The snapshot and
>>>>>> quiesce strategy would be in charge of communicating with the hypervisor.
>>>>>> Storage providers should be able to leverage the default strategies and
>>>>>> simply perform the storage operations.
>>>>>>
>>>>>> I don't think it should be much of an issue that new method to the storage
>>>>>> driver interface may not apply to everyone. In fact, that is already the case.
>>>>>> Some methods such as un/maintain(), attachToXXX() and takeSnapshot() are
>>>>>> already not implemented by every driver - they just return false when asked
>>>>>> if they can handle the operation.
>>>>>>
>>>>>> --
>>>>>> Chris Suich
>>>>>> chris.suich@netapp.com
>>>>>> NetApp Software Engineer
>>>>>> Data Center Platforms - Cloud Solutions
>>>>>> Citrix, Cisco & Red Hat
>>>>>>
>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski <mi...@solidfire.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Well, my first thought on this is that the storage driver should not
>>>>>>> be telling the hypervisor to do anything. It should be responsible for
>>>>>>> creating/deleting volumes, snapshots, etc. on its storage system only.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Ed...@citrix.com> wrote:
>>>>>>>
>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The current
>>>>>>>> workflow will be like the following:
>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot ->
>>>>>>>> send CreateVMSnapshotCommand to hypervisor to create vm snapshot.
>>>>>>>>
>>>>>>>> If anybody wants to change the workflow, then need to either change
>>>>>>>> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl.
>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl should be
>>>>>>>> able to handle different ways to take vm snapshot, instead of hard code.
>>>>>>>>
>>>>>>>> The requirements for the pluggable VM snapshot coming from:
>>>>>>>> Storage vendor may have their optimization, such as NetApp.
>>>>>>>> VM snapshot can be implemented in a totally different way(For
>>>>>>>> example, I could just send a command to guest VM, to tell my
>>>>>>>> application to flush disk and hold disk write, then come to hypervisor to
>>>>>> take a volume snapshot).
>>>>>>>>
>>>>>>>> If we agree on enable pluggable VM snapshot, then we can move on
>>>>>>>> discuss how to implement it.
>>>>>>>>
>>>>>>>> The possible options:
>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy interface,
>>>>>>>> which has the following interfaces:
>>>>>>>>  VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>  Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>  Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>>
>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>> VMSnapshotManagerImpl:
>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity check,
>>>>>>>> then will handle over to VMSnapshotStrategy.
>>>>>>>> In VMSnapshotStrategy implementation, it may just send a
>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor host, or do
>>>>>>>> anything special operations.
>>>>>>>>
>>>>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy
>>>>>>>> interface, but also add certain methods on the storage driver.
>>>>>>>>  The VMSnapshotStrategy interface will be the same as option 1.
>>>>>>>>  Will add the following methods on storage driver:
>>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM that created
>>>>>>>> on this storage, storage vendor can either take one snapshot for this
>>>>>>>> volumes in one shot, or take snapshot for each volume separately
>>>>>>>>     The pre-condition: vm is unquiesced.
>>>>>>>>     It will return a Boolean to indicate, do need unquiesce vm or not.
>>>>>>>>     In the default storage driver, it will return false.
>>>>>>>>  */
>>>>>>>>  boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>> VMSnapshot vmSnapshot);
>>>>>>>>  Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>> VMSnapshot vmSnapshot);
>>>>>>>> Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>>> VMSnapshot vmSNapshot);
>>>>>>>>
>>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>>> VMSnapshotManagerImpl:
>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
>>>>>>>> driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's
>>>>>>>> takeVMSnapshot, the pseudo code looks like:
>>>>>>>>     HypervisorHelper.quiesceVM(vm);
>>>>>>>>     val volumes = vm.getVolumes();
>>>>>>>>     val maps = new Map[driver, list[VolumeInfo]]();
>>>>>>>>     Volumes.foreach(volume => maps.put(volume.getDriver, volume ::
>>>>>>>> maps.get(volume.getdriver())))
>>>>>>>>     val needUnquiesce = true;
>>>>>>>>      maps.foreach((driver, volumes) => needUnquiesce  =
>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>>>>>>>>    if (needUnquiesce ) {
>>>>>>>>     HypervisorHelper.unquiesce(vm);
>>>>>>>>  }
>>>>>>>>
>>>>>>>> By default, the quiesceVM in HypervisorHelper will actually take vm
>>>>>>>> snapshot through hypervisor.
>>>>>>>> Does above logic makes senesce?
>>>>>>>>
>>>>>>>> The pros of option 1 is that: it's simple, no need to change storage
>>>>>>>> driver interfaces. The cons is that each storage vendor need to
>>>>>>>> implement a strategy, maybe they will do the same thing.
>>>>>>>> The pros of option 2 is that, storage driver won't need to worry
>>>>>>>> about how to quiesce/unquiesce vm. The cons is that, it will add
>>>>>>>> these methods on each storage drivers, so it assumes that this work
>>>>>>>> flow will work for everybody.
>>>>>>>>
>>>>>>>> So which option we should take? Or if you have other options, please
>>>>>>>> let's know.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Mike Tutkowski*
>>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>>>> e: mike.tutkowski@solidfire.com
>>>>>>> o: 303.746.7302
>>>>>>> Advancing the way the world uses the
>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>>>> *(tm)*
>>>>>
>>>
>

Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by "SuichII, Christopher" <Ch...@netapp.com>.
-- 
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Oct 8, 2013, at 2:24 PM, Darren Shepherd <da...@gmail.com> wrote:

> So in the implementation, when we say "quiesce" is that actually being
> implemented as a VM snapshot (memory and disk).  And then when you say
> "unquiesce" you are talking about deleting the VM snapshot?

If the VM snapshot is not going to the hypervisor, then yes, it will actually be a hypervisor snapshot. Just to be clear, the unquiesce is not quite a delete - it is a collapse of the VM snapshot and the active VM back into one file.

> 
> In NetApp, what are you snapshotting?  The whole netapp volume (I
> don't know the correct term), a file on NFS, an iscsi volume?  I don't
> know a whole heck of a lot about the netapp snapshot capabilities.

Essentially we are using internal APIs to create file level backups - don't worry too much about the terminology.

> 
> I know storage solutions can snapshot better and faster than
> hypervisors can with COW files.  I've personally just been always
> perplexed on whats the best way to implement it.  For storage
> solutions that are block based, its really easy to have the storage
> doing the snapshot.  For shared file systems, like NFS, its seems way
> more complicated as you don't want to snapshot the entire filesystem
> in order to snapshot one file.

With filesystems like NFS, things are certainly more complicated, but that is taken care of by our controller's operating system, Data ONTAP, and we simply use APIs to communicate with it.

> 
> Darren
> 
> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
> <Ch...@netapp.com> wrote:
>> I can comment on the second half.
>> 
>> Through storage operations, storage providers can create backups much faster than hypervisors and over time, their snapshots are more efficient than the snapshot chains that hypervisors create. It is true that a VM snapshot taken at the storage level is slightly different as it would be psuedo-quiesced, not have it's memory snapshotted. This is accomplished through hypervisor snapshots:
>> 
>> 1) VM snapshot request (lets say VM 'A'
>> 2) Create hypervisor snapshot (optional)
>>  -VM 'A' is snapshotted, creating active VM 'A*'
>>  -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*'
>> 3) Storage driver(s) take snapshots of each volume
>> 4) Undo hypervisor snapshot (optional)
>>  -VM snapshot 'A' is rolled back into VM 'A*' so the hypervisor snapshot no longer exists
>> 
>> Now, a couple notes:
>> -The reason this is optional is that not all users necessarily care about the memory or disk consistency of their VMs and would prefer faster snapshots to consistency.
>> -Preemptively, yes, we are actually taking hypervisor snapshots which means there isn't actually a performance of taking storage snapshots when quiescing the VM. However, the performance gain will come both during restoring the VM and during normal operations as described above.
>> 
>> Although you can think of it as a poor man's VM snapshot, I would think of it more as a consistent multi-volume snapshot. Again, the difference being that this snapshot was not truly quiesced like a hypervisor snapshot would be.
>> 
>> --
>> Chris Suich
>> chris.suich@netapp.com
>> NetApp Software Engineer
>> Data Center Platforms – Cloud Solutions
>> Citrix, Cisco & Red Hat
>> 
>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd <da...@gmail.com> wrote:
>> 
>>> My only comment is that having the return type as boolean and using to
>>> that indicate quiesce behaviour seems obscure and will probably lead
>>> to a problem later.  Your basically saying the result of the
>>> takeVMSnapshot will only ever need to communicate back whether
>>> unquiesce needs to happen.  Maybe some result object would be more
>>> extensible.
>>> 
>>> Actually, I think I have more comments.  This seems a bit odd to me.
>>> Why would a storage driver in ACS implement a VM snapshot
>>> functionality?  VM snapshot is a really a hypervisor orchestrated
>>> operation.  So it seems like were trying to implement a poor mans VM
>>> snapshot.  Maybe if I understood what NetApp was trying to do it would
>>> make more sense, but its all odd.  To do a proper VM snapshot you need
>>> to snapshot memory and disk at the exact same time.  How are we going
>>> to do that if ACS is orchestrating the VM snapshot and delegating to
>>> storage providers.  Its not like you are going to pause the VM.... or
>>> are you?
>>> 
>>> Darren
>>> 
>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <Ed...@citrix.com> wrote:
>>>> I created a design document page at https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+snapshot+related+operations, feel free to add items on it.
>>>> And a new branch "pluggable_vm_snapshot" is created.
>>>> 
>>>>> -----Original Message-----
>>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>>>>> Sent: Monday, October 07, 2013 10:02 AM
>>>>> To: <de...@cloudstack.apache.org>
>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>>>>> 
>>>>> I'm a fan of option 2 - this gives us the most flexibility (as you stated). The
>>>>> option is given to completely override the way VM snapshots work AND
>>>>> storage providers are given to opportunity to work within the default VM
>>>>> snapshot workflow.
>>>>> 
>>>>> I believe this option should satisfy your concern, Mike. The snapshot and
>>>>> quiesce strategy would be in charge of communicating with the hypervisor.
>>>>> Storage providers should be able to leverage the default strategies and
>>>>> simply perform the storage operations.
>>>>> 
>>>>> I don't think it should be much of an issue that new method to the storage
>>>>> driver interface may not apply to everyone. In fact, that is already the case.
>>>>> Some methods such as un/maintain(), attachToXXX() and takeSnapshot() are
>>>>> already not implemented by every driver - they just return false when asked
>>>>> if they can handle the operation.
>>>>> 
>>>>> --
>>>>> Chris Suich
>>>>> chris.suich@netapp.com
>>>>> NetApp Software Engineer
>>>>> Data Center Platforms - Cloud Solutions
>>>>> Citrix, Cisco & Red Hat
>>>>> 
>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski <mi...@solidfire.com>
>>>>> wrote:
>>>>> 
>>>>>> Well, my first thought on this is that the storage driver should not
>>>>>> be telling the hypervisor to do anything. It should be responsible for
>>>>>> creating/deleting volumes, snapshots, etc. on its storage system only.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Ed...@citrix.com> wrote:
>>>>>> 
>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The current
>>>>>>> workflow will be like the following:
>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot ->
>>>>>>> send CreateVMSnapshotCommand to hypervisor to create vm snapshot.
>>>>>>> 
>>>>>>> If anybody wants to change the workflow, then need to either change
>>>>>>> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl.
>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl should be
>>>>>>> able to handle different ways to take vm snapshot, instead of hard code.
>>>>>>> 
>>>>>>> The requirements for the pluggable VM snapshot coming from:
>>>>>>> Storage vendor may have their optimization, such as NetApp.
>>>>>>> VM snapshot can be implemented in a totally different way(For
>>>>>>> example, I could just send a command to guest VM, to tell my
>>>>>>> application to flush disk and hold disk write, then come to hypervisor to
>>>>> take a volume snapshot).
>>>>>>> 
>>>>>>> If we agree on enable pluggable VM snapshot, then we can move on
>>>>>>> discuss how to implement it.
>>>>>>> 
>>>>>>> The possible options:
>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy interface,
>>>>>>> which has the following interfaces:
>>>>>>>  VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>  Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>>  Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>> 
>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>> VMSnapshotManagerImpl:
>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity check,
>>>>>>> then will handle over to VMSnapshotStrategy.
>>>>>>> In VMSnapshotStrategy implementation, it may just send a
>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor host, or do
>>>>>>> anything special operations.
>>>>>>> 
>>>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy
>>>>>>> interface, but also add certain methods on the storage driver.
>>>>>>>  The VMSnapshotStrategy interface will be the same as option 1.
>>>>>>>  Will add the following methods on storage driver:
>>>>>>> /* volumesBelongToVM  is the list of volumes of the VM that created
>>>>>>> on this storage, storage vendor can either take one snapshot for this
>>>>>>> volumes in one shot, or take snapshot for each volume separately
>>>>>>>     The pre-condition: vm is unquiesced.
>>>>>>>     It will return a Boolean to indicate, do need unquiesce vm or not.
>>>>>>>     In the default storage driver, it will return false.
>>>>>>>  */
>>>>>>>  boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>> VMSnapshot vmSnapshot);
>>>>>>>  Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>> VMSnapshot vmSnapshot);
>>>>>>> Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>>> VMSnapshot vmSNapshot);
>>>>>>> 
>>>>>>> The work flow will be: createVMSnapshot api ->
>>>>> VMSnapshotManagerImpl:
>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
>>>>>>> driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's
>>>>>>> takeVMSnapshot, the pseudo code looks like:
>>>>>>>     HypervisorHelper.quiesceVM(vm);
>>>>>>>     val volumes = vm.getVolumes();
>>>>>>>     val maps = new Map[driver, list[VolumeInfo]]();
>>>>>>>     Volumes.foreach(volume => maps.put(volume.getDriver, volume ::
>>>>>>> maps.get(volume.getdriver())))
>>>>>>>     val needUnquiesce = true;
>>>>>>>      maps.foreach((driver, volumes) => needUnquiesce  =
>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>>>>>>>    if (needUnquiesce ) {
>>>>>>>     HypervisorHelper.unquiesce(vm);
>>>>>>>  }
>>>>>>> 
>>>>>>> By default, the quiesceVM in HypervisorHelper will actually take vm
>>>>>>> snapshot through hypervisor.
>>>>>>> Does above logic makes senesce?
>>>>>>> 
>>>>>>> The pros of option 1 is that: it's simple, no need to change storage
>>>>>>> driver interfaces. The cons is that each storage vendor need to
>>>>>>> implement a strategy, maybe they will do the same thing.
>>>>>>> The pros of option 2 is that, storage driver won't need to worry
>>>>>>> about how to quiesce/unquiesce vm. The cons is that, it will add
>>>>>>> these methods on each storage drivers, so it assumes that this work
>>>>>>> flow will work for everybody.
>>>>>>> 
>>>>>>> So which option we should take? Or if you have other options, please
>>>>>>> let's know.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> *Mike Tutkowski*
>>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>>> e: mike.tutkowski@solidfire.com
>>>>>> o: 303.746.7302
>>>>>> Advancing the way the world uses the
>>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>>> *(tm)*
>>>> 
>> 


Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Darren Shepherd <da...@gmail.com>.
So in the implementation, when we say "quiesce" is that actually being
implemented as a VM snapshot (memory and disk).  And then when you say
"unquiesce" you are talking about deleting the VM snapshot?

In NetApp, what are you snapshotting?  The whole netapp volume (I
don't know the correct term), a file on NFS, an iscsi volume?  I don't
know a whole heck of a lot about the netapp snapshot capabilities.

I know storage solutions can snapshot better and faster than
hypervisors can with COW files.  I've personally just been always
perplexed on whats the best way to implement it.  For storage
solutions that are block based, its really easy to have the storage
doing the snapshot.  For shared file systems, like NFS, its seems way
more complicated as you don't want to snapshot the entire filesystem
in order to snapshot one file.

Darren

On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher
<Ch...@netapp.com> wrote:
> I can comment on the second half.
>
> Through storage operations, storage providers can create backups much faster than hypervisors and over time, their snapshots are more efficient than the snapshot chains that hypervisors create. It is true that a VM snapshot taken at the storage level is slightly different as it would be psuedo-quiesced, not have it's memory snapshotted. This is accomplished through hypervisor snapshots:
>
> 1) VM snapshot request (lets say VM 'A'
> 2) Create hypervisor snapshot (optional)
>   -VM 'A' is snapshotted, creating active VM 'A*'
>   -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*'
> 3) Storage driver(s) take snapshots of each volume
> 4) Undo hypervisor snapshot (optional)
>   -VM snapshot 'A' is rolled back into VM 'A*' so the hypervisor snapshot no longer exists
>
> Now, a couple notes:
> -The reason this is optional is that not all users necessarily care about the memory or disk consistency of their VMs and would prefer faster snapshots to consistency.
> -Preemptively, yes, we are actually taking hypervisor snapshots which means there isn't actually a performance of taking storage snapshots when quiescing the VM. However, the performance gain will come both during restoring the VM and during normal operations as described above.
>
> Although you can think of it as a poor man's VM snapshot, I would think of it more as a consistent multi-volume snapshot. Again, the difference being that this snapshot was not truly quiesced like a hypervisor snapshot would be.
>
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Oct 8, 2013, at 1:47 PM, Darren Shepherd <da...@gmail.com> wrote:
>
>> My only comment is that having the return type as boolean and using to
>> that indicate quiesce behaviour seems obscure and will probably lead
>> to a problem later.  Your basically saying the result of the
>> takeVMSnapshot will only ever need to communicate back whether
>> unquiesce needs to happen.  Maybe some result object would be more
>> extensible.
>>
>> Actually, I think I have more comments.  This seems a bit odd to me.
>> Why would a storage driver in ACS implement a VM snapshot
>> functionality?  VM snapshot is a really a hypervisor orchestrated
>> operation.  So it seems like were trying to implement a poor mans VM
>> snapshot.  Maybe if I understood what NetApp was trying to do it would
>> make more sense, but its all odd.  To do a proper VM snapshot you need
>> to snapshot memory and disk at the exact same time.  How are we going
>> to do that if ACS is orchestrating the VM snapshot and delegating to
>> storage providers.  Its not like you are going to pause the VM.... or
>> are you?
>>
>> Darren
>>
>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <Ed...@citrix.com> wrote:
>>> I created a design document page at https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+snapshot+related+operations, feel free to add items on it.
>>> And a new branch "pluggable_vm_snapshot" is created.
>>>
>>>> -----Original Message-----
>>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>>>> Sent: Monday, October 07, 2013 10:02 AM
>>>> To: <de...@cloudstack.apache.org>
>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>>>>
>>>> I'm a fan of option 2 - this gives us the most flexibility (as you stated). The
>>>> option is given to completely override the way VM snapshots work AND
>>>> storage providers are given to opportunity to work within the default VM
>>>> snapshot workflow.
>>>>
>>>> I believe this option should satisfy your concern, Mike. The snapshot and
>>>> quiesce strategy would be in charge of communicating with the hypervisor.
>>>> Storage providers should be able to leverage the default strategies and
>>>> simply perform the storage operations.
>>>>
>>>> I don't think it should be much of an issue that new method to the storage
>>>> driver interface may not apply to everyone. In fact, that is already the case.
>>>> Some methods such as un/maintain(), attachToXXX() and takeSnapshot() are
>>>> already not implemented by every driver - they just return false when asked
>>>> if they can handle the operation.
>>>>
>>>> --
>>>> Chris Suich
>>>> chris.suich@netapp.com
>>>> NetApp Software Engineer
>>>> Data Center Platforms - Cloud Solutions
>>>> Citrix, Cisco & Red Hat
>>>>
>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski <mi...@solidfire.com>
>>>> wrote:
>>>>
>>>>> Well, my first thought on this is that the storage driver should not
>>>>> be telling the hypervisor to do anything. It should be responsible for
>>>>> creating/deleting volumes, snapshots, etc. on its storage system only.
>>>>>
>>>>>
>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Ed...@citrix.com> wrote:
>>>>>
>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The current
>>>>>> workflow will be like the following:
>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot ->
>>>>>> send CreateVMSnapshotCommand to hypervisor to create vm snapshot.
>>>>>>
>>>>>> If anybody wants to change the workflow, then need to either change
>>>>>> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl.
>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl should be
>>>>>> able to handle different ways to take vm snapshot, instead of hard code.
>>>>>>
>>>>>> The requirements for the pluggable VM snapshot coming from:
>>>>>> Storage vendor may have their optimization, such as NetApp.
>>>>>> VM snapshot can be implemented in a totally different way(For
>>>>>> example, I could just send a command to guest VM, to tell my
>>>>>> application to flush disk and hold disk write, then come to hypervisor to
>>>> take a volume snapshot).
>>>>>>
>>>>>> If we agree on enable pluggable VM snapshot, then we can move on
>>>>>> discuss how to implement it.
>>>>>>
>>>>>> The possible options:
>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy interface,
>>>>>> which has the following interfaces:
>>>>>>   VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>   Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>   Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
>>>>>>
>>>>>>  The work flow will be: createVMSnapshot api ->
>>>> VMSnapshotManagerImpl:
>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>>>>>>  VMSnapshotManagerImpl will manage VM state, do the sanity check,
>>>>>> then will handle over to VMSnapshotStrategy.
>>>>>>  In VMSnapshotStrategy implementation, it may just send a
>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor host, or do
>>>>>> anything special operations.
>>>>>>
>>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy
>>>>>> interface, but also add certain methods on the storage driver.
>>>>>>   The VMSnapshotStrategy interface will be the same as option 1.
>>>>>>   Will add the following methods on storage driver:
>>>>>>  /* volumesBelongToVM  is the list of volumes of the VM that created
>>>>>> on this storage, storage vendor can either take one snapshot for this
>>>>>> volumes in one shot, or take snapshot for each volume separately
>>>>>>      The pre-condition: vm is unquiesced.
>>>>>>      It will return a Boolean to indicate, do need unquiesce vm or not.
>>>>>>      In the default storage driver, it will return false.
>>>>>>   */
>>>>>>   boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>> VMSnapshot vmSnapshot);
>>>>>>   Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>> VMSnapshot vmSnapshot);
>>>>>>  Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>>> VMSnapshot vmSNapshot);
>>>>>>
>>>>>> The work flow will be: createVMSnapshot api ->
>>>> VMSnapshotManagerImpl:
>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
>>>>>> driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's
>>>>>> takeVMSnapshot, the pseudo code looks like:
>>>>>>      HypervisorHelper.quiesceVM(vm);
>>>>>>      val volumes = vm.getVolumes();
>>>>>>      val maps = new Map[driver, list[VolumeInfo]]();
>>>>>>      Volumes.foreach(volume => maps.put(volume.getDriver, volume ::
>>>>>> maps.get(volume.getdriver())))
>>>>>>      val needUnquiesce = true;
>>>>>>       maps.foreach((driver, volumes) => needUnquiesce  =
>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>>>>>>     if (needUnquiesce ) {
>>>>>>      HypervisorHelper.unquiesce(vm);
>>>>>>   }
>>>>>>
>>>>>> By default, the quiesceVM in HypervisorHelper will actually take vm
>>>>>> snapshot through hypervisor.
>>>>>> Does above logic makes senesce?
>>>>>>
>>>>>> The pros of option 1 is that: it's simple, no need to change storage
>>>>>> driver interfaces. The cons is that each storage vendor need to
>>>>>> implement a strategy, maybe they will do the same thing.
>>>>>> The pros of option 2 is that, storage driver won't need to worry
>>>>>> about how to quiesce/unquiesce vm. The cons is that, it will add
>>>>>> these methods on each storage drivers, so it assumes that this work
>>>>>> flow will work for everybody.
>>>>>>
>>>>>> So which option we should take? Or if you have other options, please
>>>>>> let's know.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Mike Tutkowski*
>>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>>> e: mike.tutkowski@solidfire.com
>>>>> o: 303.746.7302
>>>>> Advancing the way the world uses the
>>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>>> *(tm)*
>>>
>

Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by "SuichII, Christopher" <Ch...@netapp.com>.
I can comment on the second half.

Through storage operations, storage providers can create backups much faster than hypervisors and over time, their snapshots are more efficient than the snapshot chains that hypervisors create. It is true that a VM snapshot taken at the storage level is slightly different as it would be psuedo-quiesced, not have it's memory snapshotted. This is accomplished through hypervisor snapshots:

1) VM snapshot request (lets say VM 'A'
2) Create hypervisor snapshot (optional)
  -VM 'A' is snapshotted, creating active VM 'A*'
  -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*'
3) Storage driver(s) take snapshots of each volume
4) Undo hypervisor snapshot (optional)
  -VM snapshot 'A' is rolled back into VM 'A*' so the hypervisor snapshot no longer exists

Now, a couple notes:
-The reason this is optional is that not all users necessarily care about the memory or disk consistency of their VMs and would prefer faster snapshots to consistency.
-Preemptively, yes, we are actually taking hypervisor snapshots which means there isn't actually a performance of taking storage snapshots when quiescing the VM. However, the performance gain will come both during restoring the VM and during normal operations as described above.

Although you can think of it as a poor man's VM snapshot, I would think of it more as a consistent multi-volume snapshot. Again, the difference being that this snapshot was not truly quiesced like a hypervisor snapshot would be.

-- 
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Oct 8, 2013, at 1:47 PM, Darren Shepherd <da...@gmail.com> wrote:

> My only comment is that having the return type as boolean and using to
> that indicate quiesce behaviour seems obscure and will probably lead
> to a problem later.  Your basically saying the result of the
> takeVMSnapshot will only ever need to communicate back whether
> unquiesce needs to happen.  Maybe some result object would be more
> extensible.
> 
> Actually, I think I have more comments.  This seems a bit odd to me.
> Why would a storage driver in ACS implement a VM snapshot
> functionality?  VM snapshot is a really a hypervisor orchestrated
> operation.  So it seems like were trying to implement a poor mans VM
> snapshot.  Maybe if I understood what NetApp was trying to do it would
> make more sense, but its all odd.  To do a proper VM snapshot you need
> to snapshot memory and disk at the exact same time.  How are we going
> to do that if ACS is orchestrating the VM snapshot and delegating to
> storage providers.  Its not like you are going to pause the VM.... or
> are you?
> 
> Darren
> 
> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <Ed...@citrix.com> wrote:
>> I created a design document page at https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+snapshot+related+operations, feel free to add items on it.
>> And a new branch "pluggable_vm_snapshot" is created.
>> 
>>> -----Original Message-----
>>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>>> Sent: Monday, October 07, 2013 10:02 AM
>>> To: <de...@cloudstack.apache.org>
>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>>> 
>>> I'm a fan of option 2 - this gives us the most flexibility (as you stated). The
>>> option is given to completely override the way VM snapshots work AND
>>> storage providers are given to opportunity to work within the default VM
>>> snapshot workflow.
>>> 
>>> I believe this option should satisfy your concern, Mike. The snapshot and
>>> quiesce strategy would be in charge of communicating with the hypervisor.
>>> Storage providers should be able to leverage the default strategies and
>>> simply perform the storage operations.
>>> 
>>> I don't think it should be much of an issue that new method to the storage
>>> driver interface may not apply to everyone. In fact, that is already the case.
>>> Some methods such as un/maintain(), attachToXXX() and takeSnapshot() are
>>> already not implemented by every driver - they just return false when asked
>>> if they can handle the operation.
>>> 
>>> --
>>> Chris Suich
>>> chris.suich@netapp.com
>>> NetApp Software Engineer
>>> Data Center Platforms - Cloud Solutions
>>> Citrix, Cisco & Red Hat
>>> 
>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski <mi...@solidfire.com>
>>> wrote:
>>> 
>>>> Well, my first thought on this is that the storage driver should not
>>>> be telling the hypervisor to do anything. It should be responsible for
>>>> creating/deleting volumes, snapshots, etc. on its storage system only.
>>>> 
>>>> 
>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Ed...@citrix.com> wrote:
>>>> 
>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The current
>>>>> workflow will be like the following:
>>>>> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot ->
>>>>> send CreateVMSnapshotCommand to hypervisor to create vm snapshot.
>>>>> 
>>>>> If anybody wants to change the workflow, then need to either change
>>>>> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl.
>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl should be
>>>>> able to handle different ways to take vm snapshot, instead of hard code.
>>>>> 
>>>>> The requirements for the pluggable VM snapshot coming from:
>>>>> Storage vendor may have their optimization, such as NetApp.
>>>>> VM snapshot can be implemented in a totally different way(For
>>>>> example, I could just send a command to guest VM, to tell my
>>>>> application to flush disk and hold disk write, then come to hypervisor to
>>> take a volume snapshot).
>>>>> 
>>>>> If we agree on enable pluggable VM snapshot, then we can move on
>>>>> discuss how to implement it.
>>>>> 
>>>>> The possible options:
>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy interface,
>>>>> which has the following interfaces:
>>>>>   VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>>>>>   Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
>>>>>   Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
>>>>> 
>>>>>  The work flow will be: createVMSnapshot api ->
>>> VMSnapshotManagerImpl:
>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>>>>>  VMSnapshotManagerImpl will manage VM state, do the sanity check,
>>>>> then will handle over to VMSnapshotStrategy.
>>>>>  In VMSnapshotStrategy implementation, it may just send a
>>>>> Create/revert/delete VMSnapshotCommand to hypervisor host, or do
>>>>> anything special operations.
>>>>> 
>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy
>>>>> interface, but also add certain methods on the storage driver.
>>>>>   The VMSnapshotStrategy interface will be the same as option 1.
>>>>>   Will add the following methods on storage driver:
>>>>>  /* volumesBelongToVM  is the list of volumes of the VM that created
>>>>> on this storage, storage vendor can either take one snapshot for this
>>>>> volumes in one shot, or take snapshot for each volume separately
>>>>>      The pre-condition: vm is unquiesced.
>>>>>      It will return a Boolean to indicate, do need unquiesce vm or not.
>>>>>      In the default storage driver, it will return false.
>>>>>   */
>>>>>   boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>> VMSnapshot vmSnapshot);
>>>>>   Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>> VMSnapshot vmSnapshot);
>>>>>  Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>>>>> VMSnapshot vmSNapshot);
>>>>> 
>>>>> The work flow will be: createVMSnapshot api ->
>>> VMSnapshotManagerImpl:
>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
>>>>> driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's
>>>>> takeVMSnapshot, the pseudo code looks like:
>>>>>      HypervisorHelper.quiesceVM(vm);
>>>>>      val volumes = vm.getVolumes();
>>>>>      val maps = new Map[driver, list[VolumeInfo]]();
>>>>>      Volumes.foreach(volume => maps.put(volume.getDriver, volume ::
>>>>> maps.get(volume.getdriver())))
>>>>>      val needUnquiesce = true;
>>>>>       maps.foreach((driver, volumes) => needUnquiesce  =
>>>>> needUnquiesce && driver.takeVMSnapshot(volumes))
>>>>>     if (needUnquiesce ) {
>>>>>      HypervisorHelper.unquiesce(vm);
>>>>>   }
>>>>> 
>>>>> By default, the quiesceVM in HypervisorHelper will actually take vm
>>>>> snapshot through hypervisor.
>>>>> Does above logic makes senesce?
>>>>> 
>>>>> The pros of option 1 is that: it's simple, no need to change storage
>>>>> driver interfaces. The cons is that each storage vendor need to
>>>>> implement a strategy, maybe they will do the same thing.
>>>>> The pros of option 2 is that, storage driver won't need to worry
>>>>> about how to quiesce/unquiesce vm. The cons is that, it will add
>>>>> these methods on each storage drivers, so it assumes that this work
>>>>> flow will work for everybody.
>>>>> 
>>>>> So which option we should take? Or if you have other options, please
>>>>> let's know.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> *Mike Tutkowski*
>>>> *Senior CloudStack Developer, SolidFire Inc.*
>>>> e: mike.tutkowski@solidfire.com
>>>> o: 303.746.7302
>>>> Advancing the way the world uses the
>>>> cloud<http://solidfire.com/solution/overview/?video=play>
>>>> *(tm)*
>> 


Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Darren Shepherd <da...@gmail.com>.
My only comment is that having the return type as boolean and using to
that indicate quiesce behaviour seems obscure and will probably lead
to a problem later.  Your basically saying the result of the
takeVMSnapshot will only ever need to communicate back whether
unquiesce needs to happen.  Maybe some result object would be more
extensible.

Actually, I think I have more comments.  This seems a bit odd to me.
Why would a storage driver in ACS implement a VM snapshot
functionality?  VM snapshot is a really a hypervisor orchestrated
operation.  So it seems like were trying to implement a poor mans VM
snapshot.  Maybe if I understood what NetApp was trying to do it would
make more sense, but its all odd.  To do a proper VM snapshot you need
to snapshot memory and disk at the exact same time.  How are we going
to do that if ACS is orchestrating the VM snapshot and delegating to
storage providers.  Its not like you are going to pause the VM.... or
are you?

Darren

On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <Ed...@citrix.com> wrote:
> I created a design document page at https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+snapshot+related+operations, feel free to add items on it.
> And a new branch "pluggable_vm_snapshot" is created.
>
>> -----Original Message-----
>> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
>> Sent: Monday, October 07, 2013 10:02 AM
>> To: <de...@cloudstack.apache.org>
>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
>>
>> I'm a fan of option 2 - this gives us the most flexibility (as you stated). The
>> option is given to completely override the way VM snapshots work AND
>> storage providers are given to opportunity to work within the default VM
>> snapshot workflow.
>>
>> I believe this option should satisfy your concern, Mike. The snapshot and
>> quiesce strategy would be in charge of communicating with the hypervisor.
>> Storage providers should be able to leverage the default strategies and
>> simply perform the storage operations.
>>
>> I don't think it should be much of an issue that new method to the storage
>> driver interface may not apply to everyone. In fact, that is already the case.
>> Some methods such as un/maintain(), attachToXXX() and takeSnapshot() are
>> already not implemented by every driver - they just return false when asked
>> if they can handle the operation.
>>
>> --
>> Chris Suich
>> chris.suich@netapp.com
>> NetApp Software Engineer
>> Data Center Platforms - Cloud Solutions
>> Citrix, Cisco & Red Hat
>>
>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski <mi...@solidfire.com>
>> wrote:
>>
>> > Well, my first thought on this is that the storage driver should not
>> > be telling the hypervisor to do anything. It should be responsible for
>> > creating/deleting volumes, snapshots, etc. on its storage system only.
>> >
>> >
>> > On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Ed...@citrix.com> wrote:
>> >
>> >> In 4.2, we added VM snapshot for Vmware/Xenserver. The current
>> >> workflow will be like the following:
>> >> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot ->
>> >> send CreateVMSnapshotCommand to hypervisor to create vm snapshot.
>> >>
>> >> If anybody wants to change the workflow, then need to either change
>> >> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl.
>> >> Both are not the ideal choice, as VMSnapshotManagerImpl should be
>> >> able to handle different ways to take vm snapshot, instead of hard code.
>> >>
>> >> The requirements for the pluggable VM snapshot coming from:
>> >> Storage vendor may have their optimization, such as NetApp.
>> >> VM snapshot can be implemented in a totally different way(For
>> >> example, I could just send a command to guest VM, to tell my
>> >> application to flush disk and hold disk write, then come to hypervisor to
>> take a volume snapshot).
>> >>
>> >> If we agree on enable pluggable VM snapshot, then we can move on
>> >> discuss how to implement it.
>> >>
>> >> The possible options:
>> >> 1. coarse grained interface. Add a VMSnapshotStrategy interface,
>> >> which has the following interfaces:
>> >>    VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>> >>    Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
>> >>    Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
>> >>
>> >>   The work flow will be: createVMSnapshot api ->
>> VMSnapshotManagerImpl:
>> >> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>> >>   VMSnapshotManagerImpl will manage VM state, do the sanity check,
>> >> then will handle over to VMSnapshotStrategy.
>> >>   In VMSnapshotStrategy implementation, it may just send a
>> >> Create/revert/delete VMSnapshotCommand to hypervisor host, or do
>> >> anything special operations.
>> >>
>> >> 2. fine-grained interface. Not only add a VMSnapshotStrategy
>> >> interface, but also add certain methods on the storage driver.
>> >>    The VMSnapshotStrategy interface will be the same as option 1.
>> >>    Will add the following methods on storage driver:
>> >>   /* volumesBelongToVM  is the list of volumes of the VM that created
>> >> on this storage, storage vendor can either take one snapshot for this
>> >> volumes in one shot, or take snapshot for each volume separately
>> >>       The pre-condition: vm is unquiesced.
>> >>       It will return a Boolean to indicate, do need unquiesce vm or not.
>> >>       In the default storage driver, it will return false.
>> >>    */
>> >>    boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>> >> VMSnapshot vmSnapshot);
>> >>    Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>> >> VMSnapshot vmSnapshot);
>> >>   Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>> >> VMSnapshot vmSNapshot);
>> >>
>> >> The work flow will be: createVMSnapshot api ->
>> VMSnapshotManagerImpl:
>> >> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
>> >> driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's
>> >> takeVMSnapshot, the pseudo code looks like:
>> >>       HypervisorHelper.quiesceVM(vm);
>> >>       val volumes = vm.getVolumes();
>> >>       val maps = new Map[driver, list[VolumeInfo]]();
>> >>       Volumes.foreach(volume => maps.put(volume.getDriver, volume ::
>> >> maps.get(volume.getdriver())))
>> >>       val needUnquiesce = true;
>> >>        maps.foreach((driver, volumes) => needUnquiesce  =
>> >> needUnquiesce && driver.takeVMSnapshot(volumes))
>> >>      if (needUnquiesce ) {
>> >>       HypervisorHelper.unquiesce(vm);
>> >>    }
>> >>
>> >> By default, the quiesceVM in HypervisorHelper will actually take vm
>> >> snapshot through hypervisor.
>> >> Does above logic makes senesce?
>> >>
>> >> The pros of option 1 is that: it's simple, no need to change storage
>> >> driver interfaces. The cons is that each storage vendor need to
>> >> implement a strategy, maybe they will do the same thing.
>> >> The pros of option 2 is that, storage driver won't need to worry
>> >> about how to quiesce/unquiesce vm. The cons is that, it will add
>> >> these methods on each storage drivers, so it assumes that this work
>> >> flow will work for everybody.
>> >>
>> >> So which option we should take? Or if you have other options, please
>> >> let's know.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > *Mike Tutkowski*
>> > *Senior CloudStack Developer, SolidFire Inc.*
>> > e: mike.tutkowski@solidfire.com
>> > o: 303.746.7302
>> > Advancing the way the world uses the
>> > cloud<http://solidfire.com/solution/overview/?video=play>
>> > *(tm)*
>

RE: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Edison Su <Ed...@citrix.com>.
I created a design document page at https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+snapshot+related+operations, feel free to add items on it.
And a new branch "pluggable_vm_snapshot" is created. 

> -----Original Message-----
> From: SuichII, Christopher [mailto:Chris.Suich@netapp.com]
> Sent: Monday, October 07, 2013 10:02 AM
> To: <de...@cloudstack.apache.org>
> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations?
> 
> I'm a fan of option 2 - this gives us the most flexibility (as you stated). The
> option is given to completely override the way VM snapshots work AND
> storage providers are given to opportunity to work within the default VM
> snapshot workflow.
> 
> I believe this option should satisfy your concern, Mike. The snapshot and
> quiesce strategy would be in charge of communicating with the hypervisor.
> Storage providers should be able to leverage the default strategies and
> simply perform the storage operations.
> 
> I don't think it should be much of an issue that new method to the storage
> driver interface may not apply to everyone. In fact, that is already the case.
> Some methods such as un/maintain(), attachToXXX() and takeSnapshot() are
> already not implemented by every driver - they just return false when asked
> if they can handle the operation.
> 
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms - Cloud Solutions
> Citrix, Cisco & Red Hat
> 
> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski <mi...@solidfire.com>
> wrote:
> 
> > Well, my first thought on this is that the storage driver should not
> > be telling the hypervisor to do anything. It should be responsible for
> > creating/deleting volumes, snapshots, etc. on its storage system only.
> >
> >
> > On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Ed...@citrix.com> wrote:
> >
> >> In 4.2, we added VM snapshot for Vmware/Xenserver. The current
> >> workflow will be like the following:
> >> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot ->
> >> send CreateVMSnapshotCommand to hypervisor to create vm snapshot.
> >>
> >> If anybody wants to change the workflow, then need to either change
> >> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl.
> >> Both are not the ideal choice, as VMSnapshotManagerImpl should be
> >> able to handle different ways to take vm snapshot, instead of hard code.
> >>
> >> The requirements for the pluggable VM snapshot coming from:
> >> Storage vendor may have their optimization, such as NetApp.
> >> VM snapshot can be implemented in a totally different way(For
> >> example, I could just send a command to guest VM, to tell my
> >> application to flush disk and hold disk write, then come to hypervisor to
> take a volume snapshot).
> >>
> >> If we agree on enable pluggable VM snapshot, then we can move on
> >> discuss how to implement it.
> >>
> >> The possible options:
> >> 1. coarse grained interface. Add a VMSnapshotStrategy interface,
> >> which has the following interfaces:
> >>    VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
> >>    Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
> >>    Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
> >>
> >>   The work flow will be: createVMSnapshot api ->
> VMSnapshotManagerImpl:
> >> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
> >>   VMSnapshotManagerImpl will manage VM state, do the sanity check,
> >> then will handle over to VMSnapshotStrategy.
> >>   In VMSnapshotStrategy implementation, it may just send a
> >> Create/revert/delete VMSnapshotCommand to hypervisor host, or do
> >> anything special operations.
> >>
> >> 2. fine-grained interface. Not only add a VMSnapshotStrategy
> >> interface, but also add certain methods on the storage driver.
> >>    The VMSnapshotStrategy interface will be the same as option 1.
> >>    Will add the following methods on storage driver:
> >>   /* volumesBelongToVM  is the list of volumes of the VM that created
> >> on this storage, storage vendor can either take one snapshot for this
> >> volumes in one shot, or take snapshot for each volume separately
> >>       The pre-condition: vm is unquiesced.
> >>       It will return a Boolean to indicate, do need unquiesce vm or not.
> >>       In the default storage driver, it will return false.
> >>    */
> >>    boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> >> VMSnapshot vmSnapshot);
> >>    Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> >> VMSnapshot vmSnapshot);
> >>   Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> >> VMSnapshot vmSNapshot);
> >>
> >> The work flow will be: createVMSnapshot api ->
> VMSnapshotManagerImpl:
> >> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
> >> driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's
> >> takeVMSnapshot, the pseudo code looks like:
> >>       HypervisorHelper.quiesceVM(vm);
> >>       val volumes = vm.getVolumes();
> >>       val maps = new Map[driver, list[VolumeInfo]]();
> >>       Volumes.foreach(volume => maps.put(volume.getDriver, volume ::
> >> maps.get(volume.getdriver())))
> >>       val needUnquiesce = true;
> >>        maps.foreach((driver, volumes) => needUnquiesce  =
> >> needUnquiesce && driver.takeVMSnapshot(volumes))
> >>      if (needUnquiesce ) {
> >>       HypervisorHelper.unquiesce(vm);
> >>    }
> >>
> >> By default, the quiesceVM in HypervisorHelper will actually take vm
> >> snapshot through hypervisor.
> >> Does above logic makes senesce?
> >>
> >> The pros of option 1 is that: it's simple, no need to change storage
> >> driver interfaces. The cons is that each storage vendor need to
> >> implement a strategy, maybe they will do the same thing.
> >> The pros of option 2 is that, storage driver won't need to worry
> >> about how to quiesce/unquiesce vm. The cons is that, it will add
> >> these methods on each storage drivers, so it assumes that this work
> >> flow will work for everybody.
> >>
> >> So which option we should take? Or if you have other options, please
> >> let's know.
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> > --
> > *Mike Tutkowski*
> > *Senior CloudStack Developer, SolidFire Inc.*
> > e: mike.tutkowski@solidfire.com
> > o: 303.746.7302
> > Advancing the way the world uses the
> > cloud<http://solidfire.com/solution/overview/?video=play>
> > *(tm)*


Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by "SuichII, Christopher" <Ch...@netapp.com>.
I'm a fan of option 2 - this gives us the most flexibility (as you stated). The option is given to completely override the way VM snapshots work AND storage providers are given to opportunity to work within the default VM snapshot workflow.

I believe this option should satisfy your concern, Mike. The snapshot and quiesce strategy would be in charge of communicating with the hypervisor. Storage providers should be able to leverage the default strategies and simply perform the storage operations.

I don't think it should be much of an issue that new method to the storage driver interface may not apply to everyone. In fact, that is already the case. Some methods such as un/maintain(), attachToXXX() and takeSnapshot() are already not implemented by every driver - they just return false when asked if they can handle the operation.

-- 
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Oct 5, 2013, at 12:11 AM, Mike Tutkowski <mi...@solidfire.com> wrote:

> Well, my first thought on this is that the storage driver should not be
> telling the hypervisor to do anything. It should be responsible for
> creating/deleting volumes, snapshots, etc. on its storage system only.
> 
> 
> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Ed...@citrix.com> wrote:
> 
>> In 4.2, we added VM snapshot for Vmware/Xenserver. The current workflow
>> will be like the following:
>> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot -> send
>> CreateVMSnapshotCommand to hypervisor to create vm snapshot.
>> 
>> If anybody wants to change the workflow, then need to either change
>> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl. Both are
>> not the ideal choice, as VMSnapshotManagerImpl should be able to handle
>> different ways to take vm snapshot, instead of hard code.
>> 
>> The requirements for the pluggable VM snapshot coming from:
>> Storage vendor may have their optimization, such as NetApp.
>> VM snapshot can be implemented in a totally different way(For example, I
>> could just send a command to guest VM, to tell my application to flush disk
>> and hold disk write, then come to hypervisor to take a volume snapshot).
>> 
>> If we agree on enable pluggable VM snapshot, then we can move on discuss
>> how to implement it.
>> 
>> The possible options:
>> 1. coarse grained interface. Add a VMSnapshotStrategy interface, which has
>> the following interfaces:
>>    VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>>    Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
>>    Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
>> 
>>   The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl:
>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>>   VMSnapshotManagerImpl will manage VM state, do the sanity check, then
>> will handle over to VMSnapshotStrategy.
>>   In VMSnapshotStrategy implementation, it may just send a
>> Create/revert/delete VMSnapshotCommand to hypervisor host, or do anything
>> special operations.
>> 
>> 2. fine-grained interface. Not only add a VMSnapshotStrategy interface,
>> but also add certain methods on the storage driver.
>>    The VMSnapshotStrategy interface will be the same as option 1.
>>    Will add the following methods on storage driver:
>>   /* volumesBelongToVM  is the list of volumes of the VM that created on
>> this storage, storage vendor can either take one snapshot for this volumes
>> in one shot, or take snapshot for each volume separately
>>       The pre-condition: vm is unquiesced.
>>       It will return a Boolean to indicate, do need unquiesce vm or not.
>>       In the default storage driver, it will return false.
>>    */
>>    boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM, VMSnapshot
>> vmSnapshot);
>>    Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
>> VMSnapshot vmSnapshot);
>>   Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM, VMSnapshot
>> vmSNapshot);
>> 
>> The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl:
>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
>> driver:takeVMSnapshot
>> In the implementation of VMSnapshotStrategy's takeVMSnapshot, the pseudo
>> code looks like:
>>       HypervisorHelper.quiesceVM(vm);
>>       val volumes = vm.getVolumes();
>>       val maps = new Map[driver, list[VolumeInfo]]();
>>       Volumes.foreach(volume => maps.put(volume.getDriver, volume ::
>> maps.get(volume.getdriver())))
>>       val needUnquiesce = true;
>>        maps.foreach((driver, volumes) => needUnquiesce  = needUnquiesce
>> && driver.takeVMSnapshot(volumes))
>>      if (needUnquiesce ) {
>>       HypervisorHelper.unquiesce(vm);
>>    }
>> 
>> By default, the quiesceVM in HypervisorHelper will actually take vm
>> snapshot through hypervisor.
>> Does above logic makes senesce?
>> 
>> The pros of option 1 is that: it's simple, no need to change storage
>> driver interfaces. The cons is that each storage vendor need to implement a
>> strategy, maybe they will do the same thing.
>> The pros of option 2 is that, storage driver won't need to worry about how
>> to quiesce/unquiesce vm. The cons is that, it will add these methods on
>> each storage drivers, so it assumes that this work flow will work for
>> everybody.
>> 
>> So which option we should take? Or if you have other options, please let's
>> know.
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> *Mike Tutkowski*
> *Senior CloudStack Developer, SolidFire Inc.*
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the
> cloud<http://solidfire.com/solution/overview/?video=play>
> *™*


Re: [DISCUSS] Pluggable VM snapshot related operations?

Posted by Mike Tutkowski <mi...@solidfire.com>.
Well, my first thought on this is that the storage driver should not be
telling the hypervisor to do anything. It should be responsible for
creating/deleting volumes, snapshots, etc. on its storage system only.


On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <Ed...@citrix.com> wrote:

> In 4.2, we added VM snapshot for Vmware/Xenserver. The current workflow
> will be like the following:
> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot -> send
> CreateVMSnapshotCommand to hypervisor to create vm snapshot.
>
> If anybody wants to change the workflow, then need to either change
> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl. Both are
> not the ideal choice, as VMSnapshotManagerImpl should be able to handle
> different ways to take vm snapshot, instead of hard code.
>
> The requirements for the pluggable VM snapshot coming from:
> Storage vendor may have their optimization, such as NetApp.
> VM snapshot can be implemented in a totally different way(For example, I
> could just send a command to guest VM, to tell my application to flush disk
> and hold disk write, then come to hypervisor to take a volume snapshot).
>
> If we agree on enable pluggable VM snapshot, then we can move on discuss
> how to implement it.
>
> The possible options:
> 1. coarse grained interface. Add a VMSnapshotStrategy interface, which has
> the following interfaces:
>     VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot);
>     Boolean revertVMSnapshot(VMSnapshot vmSnapshot);
>     Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot);
>
>    The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl:
> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot
>    VMSnapshotManagerImpl will manage VM state, do the sanity check, then
> will handle over to VMSnapshotStrategy.
>    In VMSnapshotStrategy implementation, it may just send a
> Create/revert/delete VMSnapshotCommand to hypervisor host, or do anything
> special operations.
>
> 2. fine-grained interface. Not only add a VMSnapshotStrategy interface,
> but also add certain methods on the storage driver.
>     The VMSnapshotStrategy interface will be the same as option 1.
>     Will add the following methods on storage driver:
>    /* volumesBelongToVM  is the list of volumes of the VM that created on
> this storage, storage vendor can either take one snapshot for this volumes
> in one shot, or take snapshot for each volume separately
>        The pre-condition: vm is unquiesced.
>        It will return a Boolean to indicate, do need unquiesce vm or not.
>        In the default storage driver, it will return false.
>     */
>     boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM, VMSnapshot
> vmSnapshot);
>     Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM,
> VMSnapshot vmSnapshot);
>    Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM, VMSnapshot
> vmSNapshot);
>
> The work flow will be: createVMSnapshot api -> VMSnapshotManagerImpl:
> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage
> driver:takeVMSnapshot
>  In the implementation of VMSnapshotStrategy's takeVMSnapshot, the pseudo
> code looks like:
>        HypervisorHelper.quiesceVM(vm);
>        val volumes = vm.getVolumes();
>        val maps = new Map[driver, list[VolumeInfo]]();
>        Volumes.foreach(volume => maps.put(volume.getDriver, volume ::
> maps.get(volume.getdriver())))
>        val needUnquiesce = true;
>         maps.foreach((driver, volumes) => needUnquiesce  = needUnquiesce
>  && driver.takeVMSnapshot(volumes))
>       if (needUnquiesce ) {
>        HypervisorHelper.unquiesce(vm);
>     }
>
> By default, the quiesceVM in HypervisorHelper will actually take vm
> snapshot through hypervisor.
> Does above logic makes senesce?
>
> The pros of option 1 is that: it's simple, no need to change storage
> driver interfaces. The cons is that each storage vendor need to implement a
> strategy, maybe they will do the same thing.
> The pros of option 2 is that, storage driver won't need to worry about how
> to quiesce/unquiesce vm. The cons is that, it will add these methods on
> each storage drivers, so it assumes that this work flow will work for
> everybody.
>
> So which option we should take? Or if you have other options, please let's
> know.
>
>
>
>
>
>


-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*