You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cloudstack.apache.org by "SuichII, Christopher" <Ch...@netapp.com> on 2013/09/18 14:22:28 UTC

[PROPOSAL] Storage Subsystem API Interface Additions

I would like to raise for discussion the idea of adding a couple methods to the Storage Subsystem API interface. Currently, takeSnapshot() and revertSnapshot() only support single VM volumes. We have a use case for snapshotting multiple VM volumes at the same time. For us, it is more efficient to snapshot them all at once rather than snapshot VM Volumes individually and this seems like a more elegant solution than queueing the requests within our plugin.

Base on my investigation, this should require:
-Two additional API to be invoked from the UI
-Two additional methods added to the Storage Subsystem API interface
-Changes in between the API level and invoking the Storage Subsystem API implementations (I know this is broad and vague), mainly around the SnapshotManger/Impl

There are a couple topics we would like discussion on:
-Would this be beneficial/detrimental/neutral to other storage providers?
-How should we handle the addition of new methods to the Storage Subsystem API interface? Default them to throw an UnsupportedOperationException? Default to calling the single VM volume version multiple times?
-Does anyone see any issues with allowing multiple snapshots to be taken at the same time or letting storage providers have a list of all the requested volumes to backup?

Please let me know if I've missed any major topics for discussion or if anything needs clarification.

Thanks,
Chris
-- 
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by John Burwell <jb...@basho.com>.

Mike,

I apologize for being MIA on my due out regarding storage drivers and design.  I have gotten pulled onto another $dayjob project that taking all of my time.  

+1 to notion that resource tagging of any type is a mess.  Ideally, all device drivers should be completely isolated from each other.  The tagging model we employ today requires that drivers not only be aware of each other, but all of the possible combinations in which they could be conceivably employed.  As a simple example, see the logical mess we encountered dealing with provisioned IOPS for storage devices and hypervisors in 4.1.0.  In my opinion, the goal should be for the user to describe the service levels they need from infrastructure, and for orchestration layer to allocate the resources that meet that SLA.  We are a long way from such a model, but hopefully, we can start evolving that direction for the next release.

In terms of extra specs, we have discussed the notion of generic properties which function in a manner similar to what you describe in OpenStack.  Each driver could optionally expose a set of property metadata descriptors (e.g. name, display name, type, required flag, description) which would drive UI rendering and validation to avoid the garbage in garbage out problem.  The values of these properties would then be passed into each driver operation.  The benefit of this approach is two fold.  First, it providers vendors with a simple, but powerful extension mechanism.  Second, it helps to prevent the pollution of the orchestration layer's domain model with vendor specific concepts.  

Thanks,
-John

On Sep 19, 2013, at 12:32 AM, Mike Tutkowski <mi...@solidfire.com> wrote:

> We might want to bring John Burwell into this discussion as he has documented ideas on how storage drivers might be able to advertise to the outside CloudStack world their capabilities.
> 
> For example, my SolidFire driver could advertise that the volumes it creates support min, max, and burst IOPS. This info could be leveraged by input dialogs in the GUI to display certain applicable fields.
> 
> Right now my feeling is that storage tagging is a bit of a mess.
> 
> When you encounter dialogs like Add Disk Offering, the user can enter one or more storage tags.
> 
> That's great, but it doesn't let the dialog know (easily) what features primary storage that is tagged with those storage tags support.
> 
> That being the case, the dialog ends up displaying all sorts of options that users can fill in that may or may not be supported by the storage that CloudStack ends up placing the newly created volume on. It's really up to the admin who is creating the Disk Offering (for example) to make sure if he fills in options that they can be supported by the storage that is tagged.
> 
> OpenStack has a completely different paradigm here.
> 
> It has the concept of a Volume Type. A Volume Type is backed by one and only one storage driver. You can fill in what they call Extra Specs to pass in driver-specific info (key/value pairs, ex: min_iops=300).
> 
> If we had something like this, these dialogs would know what data is applicable to collect.
> 
> 
> On Wed, Sep 18, 2013 at 6:29 PM, SuichII, Christopher <Ch...@netapp.com> wrote:
> That's a good point and I'm not sure. Maybe we can have the drivers indicate whether they support batched/multiple volume snapshotting. If not, then we can spin up a thread per volume to snapshot. I completely agree that simply calling them in sequence could end up badly.
> 
> You're right though, the examples I provided are only partial examples as they really do just edit something in the db.
> 
> Edison, Alex or anyone else knowledgable of the storage system - Do you have any input. Was my clarification helpful?
> 
> For what it's worth, after some further investigation, it sounds like we won't actually be performing NetApp volume level snapshots to backup vm volumes - either when requested or as a batch. We believe we have come up with a solution that lets us create individual backups while avoiding our 255 snapshot limit. However, it would still be beneficial to let users request backups of multiple volumes at once for the other reasons I explained earlier.
> 
> -Chris
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
> 
> On Sep 18, 2013, at 6:56 PM, Darren Shepherd <da...@gmail.com> wrote:
> 
> > Nah, I guess its not so bad to have a volumeIds param.  Just as long as its
> > really clear that means nothing about consistency.
> >
> > I would be a little concerned about how this will be implemented by other
> > storage providers though.  Currently if you do 5 snapshot API calls that
> > will launch 5 threads and they will happen in parallel.  If they get
> > batched and sent in one thread, how is the framework/driver going to handle
> > the snapshots for drivers that don't supporting batching?  Sequential would
> > be bad as sometimes it takes awhile to snapshot.
> >
> > The APIs you mentioned that takes lists really just manipulate data in the
> > DB, so they can easily batch and transactionally do a bunch at once.
> >
> > Maybe someone who's more familiar with the storage implementation can
> > comment?
> >
> > Darren
> >
> >
> > On Wed, Sep 18, 2013 at 12:46 PM, SuichII, Christopher <
> > Chris.Suich@netapp.com> wrote:
> >
> >> That certainly would work, but I don't see how it is a better design. Can
> >> you elaborate on how sending multiple volumeIds is hackish? Look at the
> >> existing API framework. We have several APIs that accept lists as
> >> parameters. Normally, they're used for things like querying or deleting.
> >> Take a look at some of these commands:
> >> -ArchiveEventsCmd
> >> -DeleteEventsCmd
> >> -DeleteSnapshotPoliciesCmd
> >>
> >> This kind of API is simply a shorthand for invoking another API many times.
> >>
> >> I think it is only an NetApp optimization in the sense that we're the only
> >> ones who need it right now. What we're asking for has nothing specific to
> >> do to NetApp. We would just like the shorthand ability to do things all at
> >> once rather than one at a time. I think other vendors could utilize this
> >> just as easily.
> >>
> >> -Chris
> >> --
> >> Chris Suich
> >> chris.suich@netapp.com
> >> NetApp Software Engineer
> >> Data Center Platforms – Cloud Solutions
> >> Citrix, Cisco & Red Hat
> >>
> >> On Sep 18, 2013, at 2:32 PM, Darren Shepherd <da...@gmail.com>
> >> wrote:
> >>
> >>> Given this explanation.  Would the following not work?
> >>>
> >>> 1) Enhance UI to allow multi select.  There is no API change, UI will
> >> just
> >>> call snapshot a bunch of time
> >>> 2) Enhance storage framework or driver to detect that 20 requests just
> >> came
> >>> in within a window of X seconds and send them to the driver all at once.
> >>>
> >>> I know you said queuing on the backend is hackish, but having the user
> >> send
> >>> multiple volumeIds in the API is just as hackish to me.  We can only
> >>> guarentee to the user that the multiple snapshots taken are as consistent
> >>> as if they called snapshot API individually.  The user won't really know
> >>> exactly what NetApp volume they exist on and really neither will the
> >>> storage framework, as export != volume.  Only the driver knows if
> >> batching
> >>> is really possible.  So I'm not exactly saying queue, its short batches.
> >>>
> >>> In short, I'm seeing this as a bit more of a NetApp optimization than a
> >>> general thing.  I'm all for using storage device level snapshotting but
> >> it
> >>> seems like its going to be implementation specific.  Its interesting, if
> >>> you look at digital ocean they have snapshots and backups as two
> >> different
> >>> concepts.  You can see that they ran into this specific issue.  Full
> >>> storage volume snapshots are really difficult to expose to the user.  So
> >>> digital ocean does "backups" which are live but on a schedule and seem to
> >>> be a full volume backup.  And then there are snapshots which are on
> >> demand,
> >>> but require you to stop your VM (so they can essentially copy the qcow or
> >>> lv somewhere).
> >>>
> >>> Darren
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Sep 18, 2013 at 11:12 AM, SuichII, Christopher <
> >>> Chris.Suich@netapp.com> wrote:
> >>>
> >>>> First, let me apologize for the confusing terms, because some words here
> >>>> are overloaded:
> >>>> A volume…
> >>>> In CloudStack terms is a disk attached to a VM.
> >>>> In NetApp terms is an NFS volume, analogous to CloudStack primary
> >> storage,
> >>>> where all the CloudStack volumes are stored.
> >>>>
> >>>> A snapshot…
> >>>> In CloudStack terms is a backup of a VM.
> >>>> In NetApp terms is a copy of all the contents of a NetApp volume, taken
> >> at
> >>>> a point in time to create an analogous CloudStack snapshot for (up to)
> >>>> every CloudStack volume on that primary storage.
> >>>>
> >>>> There are several reasons that an API for snapshotting multiple volumes
> >> is
> >>>> more attractive to us than calling a single volume API over and over. A
> >> lot
> >>>> of it has to do with how we actually create the snapshots. Unlike a
> >>>> hypervisor snapshot, when we create a vm snapshot, the entire primary
> >>>> storage is backed up (but only the requested volume has an entry added
> >> to
> >>>> the db). To add on to this, our hardware has a hard limit of 255 storage
> >>>> volume level snapshots. So, if there were 255 vms on a single primary
> >>>> storage and each one of them performed a backup, no more backups could
> >> be
> >>>> taken before we start removing the oldest backup (without some trickery
> >>>> that we are currently working on). Some might say a solution to this
> >> would
> >>>> be queueing the requests and waiting till they're all finished, but that
> >>>> seems much more error prone and like hackish design compared to simply
> >>>> allowing multiple VM volumes to be specified.
> >>>>
> >>>> This is both a request for optimizing the backend and optimizing the
> >>>> experience for users. What happens when a user says they want to backup
> >> 30
> >>>> vm volumes at the same time? Is it not a cleaner experience to simply
> >>>> select all the volumes they want to back up, then click backup once?
> >> This
> >>>> way, the storage provider is given all the volumes at once and if they
> >> have
> >>>> some way of optimizing the request based on their hardware or software,
> >>>> they can take advantage of that. It can even be designed in such a way
> >> that
> >>>> if storage providers don't want to be given all the volumes at once,
> >> they
> >>>> can be called with each one individually, as to remain backwards
> >> compatible.
> >>>>
> >>>> Now, I'm also not saying that these two solutions can't co-exist. Even
> >> if
> >>>> we have the ability to backup multiple volumes at once, nothing is
> >> stopping
> >>>> users from backing them up one by one, so queueing is still something we
> >>>> may have to implement. However, I think extending the subsystem API to
> >>>> grant storage providers the ability to leverage any optimization they
> >> can
> >>>> without having to queue is a cleaner solution. If the concern is how
> >> users
> >>>> interpret what is going on in the backend, I think we can find some way
> >> to
> >>>> make that clear to them.
> >>>>
> >>>> -Chris
> >>>> --
> >>>> Chris Suich
> >>>> chris.suich@netapp.com
> >>>> NetApp Software Engineer
> >>>> Data Center Platforms – Cloud Solutions
> >>>> Citrix, Cisco & Red Hat
> >>>>
> >>>> On Sep 18, 2013, at 12:26 PM, Alex Huang <Al...@citrix.com> wrote:
> >>>>
> >>>>> That's my read on the proposal also but, Chris, please clarify.  I
> >> don't
> >>>> think the end user will see the change.  It's an optimization for
> >>>> interfacing with the storage backend.
> >>>>>
> >>>>> --Alex
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
> >>>>>> Sent: Wednesday, September 18, 2013 9:22 AM
> >>>>>> To: dev@cloudstack.apache.org
> >>>>>> Subject: Re: [PROPOSAL] Storage Subsystem API Interface Additions
> >>>>>>
> >>>>>> Perhaps he needs to elaborate on the use case and what he means by
> >> more
> >>>>>> efficient.  He may be referring to multiple volumes in the sense of
> >>>>>> snapshotting the ROOT disks for 10 different VMs.
> >>>>>>
> >>>>>> On Wed, Sep 18, 2013 at 10:10 AM, Darren Shepherd
> >>>>>> <da...@gmail.com> wrote:
> >>>>>>> Here's my general concern about multiple volume snapshots at once.
> >>>>>>> Giving such a feature leads the user to believe that snapshotting
> >>>>>>> multiple volumes at once will give them consistency across the
> >> volumes
> >>>> in
> >>>>>> the snapshot.
> >>>>>>> This is not true, and difficult to do with many hypervisors, and
> >>>>>>> typically requires an agent in the VM.  A single snapshot, as exists
> >>>>>>> today, is really crash consistent, meaning that there is may exist
> >>>>>>> unsync'd data.  To do a true multi volume snapshot requires a
> >> "quiesce"
> >>>>>> functionality in the VM.
> >>>>>>> So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause
> >>>> I/O.
> >>>>>>>
> >>>>>>> I'm might be fine with the option of allowing multiple volumeId's to
> >>>>>>> be specified in the snapshot API, but it needs to be clear that those
> >>>>>>> snapshots may be taken sequentially and they are all independently
> >>>>>>> crash consistent.  But, if you make that clear, then why even have
> >> the
> >>>> API.
> >>>>>>> Essentially it is the same as doing multiple snapshot API commands.
> >>>>>>>
> >>>>>>> So really I would lean towards having the multiple snapshotting
> >>>>>>> supported in the driver or storage subsystem, but not exposed to the
> >>>>>>> user.  You can easy accomplish it by having a timed window on
> >>>>>>> snapshotting.  So every 10 seconds you do snapshots, if 5 requests
> >>>>>>> have queued in the last 10 seconds, you do them all at once.  This
> >>>> could be
> >>>>>> implemented as a framework thing.
> >>>>>>> If your provider implements "SnapshotBatching" interface and that has
> >>>>>>> a getBatchWindowTime(), then the framework can detect that it should
> >>>>>>> try to queue up some snapshot requests and send them to the driver in
> >>>>>>> a batch.  Or that could be implemented in the driver itself.  I would
> >>>>>>> lean toward doing it in the driver and if that goes well, we look at
> >>>>>>> pulling the functionality into core ACS.
> >>>>>>>
> >>>>>>> Darren
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
> >>>>>>> Chris.Suich@netapp.com> wrote:
> >>>>>>>
> >>>>>>>> I would like to raise for discussion the idea of adding a couple
> >>>>>>>> methods to the Storage Subsystem API interface. Currently,
> >>>>>>>> takeSnapshot() and
> >>>>>>>> revertSnapshot() only support single VM volumes. We have a use case
> >>>>>>>> for snapshotting multiple VM volumes at the same time. For us, it is
> >>>>>>>> more efficient to snapshot them all at once rather than snapshot VM
> >>>>>>>> Volumes individually and this seems like a more elegant solution
> >> than
> >>>>>>>> queueing the requests within our plugin.
> >>>>>>>>
> >>>>>>>> Base on my investigation, this should require:
> >>>>>>>> -Two additional API to be invoked from the UI -Two additional
> >> methods
> >>>>>>>> added to the Storage Subsystem API interface -Changes in between the
> >>>>>>>> API level and invoking the Storage Subsystem API implementations (I
> >>>>>>>> know this is broad and vague), mainly around the SnapshotManger/Impl
> >>>>>>>>
> >>>>>>>> There are a couple topics we would like discussion on:
> >>>>>>>> -Would this be beneficial/detrimental/neutral to other storage
> >>>> providers?
> >>>>>>>> -How should we handle the addition of new methods to the Storage
> >>>>>>>> Subsystem API interface? Default them to throw an
> >>>>>> UnsupportedOperationException?
> >>>>>>>> Default to calling the single VM volume version multiple times?
> >>>>>>>> -Does anyone see any issues with allowing multiple snapshots to be
> >>>>>>>> taken at the same time or letting storage providers have a list of
> >>>>>>>> all the requested volumes to backup?
> >>>>>>>>
> >>>>>>>> Please let me know if I've missed any major topics for discussion or
> >>>>>>>> if anything needs clarification.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Chris
> >>>>>>>> --
> >>>>>>>> Chris Suich
> >>>>>>>> chris.suich@netapp.com
> >>>>>>>> NetApp Software Engineer
> >>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>>>>>>
> >>>>>>>>
> >>>>
> >>>>
> >>
> >>
> 
> 
> 
> 
> -- 
> Mike Tutkowski
> Senior CloudStack Developer, SolidFire Inc.
> e: mike.tutkowski@solidfire.com
> o: 303.746.7302
> Advancing the way the world uses the cloud™

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by Mike Tutkowski <mi...@solidfire.com>.

We might want to bring John Burwell into this discussion as he has
documented ideas on how storage drivers might be able to advertise to the
outside CloudStack world their capabilities.

For example, my SolidFire driver could advertise that the volumes it
creates support min, max, and burst IOPS. This info could be leveraged by
input dialogs in the GUI to display certain applicable fields.

Right now my feeling is that storage tagging is a bit of a mess.

When you encounter dialogs like Add Disk Offering, the user can enter one
or more storage tags.

That's great, but it doesn't let the dialog know (easily) what features
primary storage that is tagged with those storage tags support.

That being the case, the dialog ends up displaying all sorts of options
that users can fill in that may or may not be supported by the storage that
CloudStack ends up placing the newly created volume on. It's really up to
the admin who is creating the Disk Offering (for example) to make sure if
he fills in options that they can be supported by the storage that is
tagged.

OpenStack has a completely different paradigm here.

It has the concept of a Volume Type. A Volume Type is backed by one and
only one storage driver. You can fill in what they call Extra Specs to pass
in driver-specific info (key/value pairs, ex: min_iops=300).

If we had something like this, these dialogs would know what data is
applicable to collect.


On Wed, Sep 18, 2013 at 6:29 PM, SuichII, Christopher <
Chris.Suich@netapp.com> wrote:

> That's a good point and I'm not sure. Maybe we can have the drivers
> indicate whether they support batched/multiple volume snapshotting. If not,
> then we can spin up a thread per volume to snapshot. I completely agree
> that simply calling them in sequence could end up badly.
>
> You're right though, the examples I provided are only partial examples as
> they really do just edit something in the db.
>
> Edison, Alex or anyone else knowledgable of the storage system - Do you
> have any input. Was my clarification helpful?
>
> For what it's worth, after some further investigation, it sounds like we
> won't actually be performing NetApp volume level snapshots to backup vm
> volumes - either when requested or as a batch. We believe we have come up
> with a solution that lets us create individual backups while avoiding our
> 255 snapshot limit. However, it would still be beneficial to let users
> request backups of multiple volumes at once for the other reasons I
> explained earlier.
>
> -Chris
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Sep 18, 2013, at 6:56 PM, Darren Shepherd <da...@gmail.com>
> wrote:
>
> > Nah, I guess its not so bad to have a volumeIds param.  Just as long as
> its
> > really clear that means nothing about consistency.
> >
> > I would be a little concerned about how this will be implemented by other
> > storage providers though.  Currently if you do 5 snapshot API calls that
> > will launch 5 threads and they will happen in parallel.  If they get
> > batched and sent in one thread, how is the framework/driver going to
> handle
> > the snapshots for drivers that don't supporting batching?  Sequential
> would
> > be bad as sometimes it takes awhile to snapshot.
> >
> > The APIs you mentioned that takes lists really just manipulate data in
> the
> > DB, so they can easily batch and transactionally do a bunch at once.
> >
> > Maybe someone who's more familiar with the storage implementation can
> > comment?
> >
> > Darren
> >
> >
> > On Wed, Sep 18, 2013 at 12:46 PM, SuichII, Christopher <
> > Chris.Suich@netapp.com> wrote:
> >
> >> That certainly would work, but I don't see how it is a better design.
> Can
> >> you elaborate on how sending multiple volumeIds is hackish? Look at the
> >> existing API framework. We have several APIs that accept lists as
> >> parameters. Normally, they're used for things like querying or deleting.
> >> Take a look at some of these commands:
> >> -ArchiveEventsCmd
> >> -DeleteEventsCmd
> >> -DeleteSnapshotPoliciesCmd
> >>
> >> This kind of API is simply a shorthand for invoking another API many
> times.
> >>
> >> I think it is only an NetApp optimization in the sense that we're the
> only
> >> ones who need it right now. What we're asking for has nothing specific
> to
> >> do to NetApp. We would just like the shorthand ability to do things all
> at
> >> once rather than one at a time. I think other vendors could utilize this
> >> just as easily.
> >>
> >> -Chris
> >> --
> >> Chris Suich
> >> chris.suich@netapp.com
> >> NetApp Software Engineer
> >> Data Center Platforms – Cloud Solutions
> >> Citrix, Cisco & Red Hat
> >>
> >> On Sep 18, 2013, at 2:32 PM, Darren Shepherd <
> darren.s.shepherd@gmail.com>
> >> wrote:
> >>
> >>> Given this explanation.  Would the following not work?
> >>>
> >>> 1) Enhance UI to allow multi select.  There is no API change, UI will
> >> just
> >>> call snapshot a bunch of time
> >>> 2) Enhance storage framework or driver to detect that 20 requests just
> >> came
> >>> in within a window of X seconds and send them to the driver all at
> once.
> >>>
> >>> I know you said queuing on the backend is hackish, but having the user
> >> send
> >>> multiple volumeIds in the API is just as hackish to me.  We can only
> >>> guarentee to the user that the multiple snapshots taken are as
> consistent
> >>> as if they called snapshot API individually.  The user won't really
> know
> >>> exactly what NetApp volume they exist on and really neither will the
> >>> storage framework, as export != volume.  Only the driver knows if
> >> batching
> >>> is really possible.  So I'm not exactly saying queue, its short
> batches.
> >>>
> >>> In short, I'm seeing this as a bit more of a NetApp optimization than a
> >>> general thing.  I'm all for using storage device level snapshotting but
> >> it
> >>> seems like its going to be implementation specific.  Its interesting,
> if
> >>> you look at digital ocean they have snapshots and backups as two
> >> different
> >>> concepts.  You can see that they ran into this specific issue.  Full
> >>> storage volume snapshots are really difficult to expose to the user.
>  So
> >>> digital ocean does "backups" which are live but on a schedule and seem
> to
> >>> be a full volume backup.  And then there are snapshots which are on
> >> demand,
> >>> but require you to stop your VM (so they can essentially copy the qcow
> or
> >>> lv somewhere).
> >>>
> >>> Darren
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Sep 18, 2013 at 11:12 AM, SuichII, Christopher <
> >>> Chris.Suich@netapp.com> wrote:
> >>>
> >>>> First, let me apologize for the confusing terms, because some words
> here
> >>>> are overloaded:
> >>>> A volume…
> >>>> In CloudStack terms is a disk attached to a VM.
> >>>> In NetApp terms is an NFS volume, analogous to CloudStack primary
> >> storage,
> >>>> where all the CloudStack volumes are stored.
> >>>>
> >>>> A snapshot…
> >>>> In CloudStack terms is a backup of a VM.
> >>>> In NetApp terms is a copy of all the contents of a NetApp volume,
> taken
> >> at
> >>>> a point in time to create an analogous CloudStack snapshot for (up to)
> >>>> every CloudStack volume on that primary storage.
> >>>>
> >>>> There are several reasons that an API for snapshotting multiple
> volumes
> >> is
> >>>> more attractive to us than calling a single volume API over and over.
> A
> >> lot
> >>>> of it has to do with how we actually create the snapshots. Unlike a
> >>>> hypervisor snapshot, when we create a vm snapshot, the entire primary
> >>>> storage is backed up (but only the requested volume has an entry added
> >> to
> >>>> the db). To add on to this, our hardware has a hard limit of 255
> storage
> >>>> volume level snapshots. So, if there were 255 vms on a single primary
> >>>> storage and each one of them performed a backup, no more backups could
> >> be
> >>>> taken before we start removing the oldest backup (without some
> trickery
> >>>> that we are currently working on). Some might say a solution to this
> >> would
> >>>> be queueing the requests and waiting till they're all finished, but
> that
> >>>> seems much more error prone and like hackish design compared to simply
> >>>> allowing multiple VM volumes to be specified.
> >>>>
> >>>> This is both a request for optimizing the backend and optimizing the
> >>>> experience for users. What happens when a user says they want to
> backup
> >> 30
> >>>> vm volumes at the same time? Is it not a cleaner experience to simply
> >>>> select all the volumes they want to back up, then click backup once?
> >> This
> >>>> way, the storage provider is given all the volumes at once and if they
> >> have
> >>>> some way of optimizing the request based on their hardware or
> software,
> >>>> they can take advantage of that. It can even be designed in such a way
> >> that
> >>>> if storage providers don't want to be given all the volumes at once,
> >> they
> >>>> can be called with each one individually, as to remain backwards
> >> compatible.
> >>>>
> >>>> Now, I'm also not saying that these two solutions can't co-exist. Even
> >> if
> >>>> we have the ability to backup multiple volumes at once, nothing is
> >> stopping
> >>>> users from backing them up one by one, so queueing is still something
> we
> >>>> may have to implement. However, I think extending the subsystem API to
> >>>> grant storage providers the ability to leverage any optimization they
> >> can
> >>>> without having to queue is a cleaner solution. If the concern is how
> >> users
> >>>> interpret what is going on in the backend, I think we can find some
> way
> >> to
> >>>> make that clear to them.
> >>>>
> >>>> -Chris
> >>>> --
> >>>> Chris Suich
> >>>> chris.suich@netapp.com
> >>>> NetApp Software Engineer
> >>>> Data Center Platforms – Cloud Solutions
> >>>> Citrix, Cisco & Red Hat
> >>>>
> >>>> On Sep 18, 2013, at 12:26 PM, Alex Huang <Al...@citrix.com>
> wrote:
> >>>>
> >>>>> That's my read on the proposal also but, Chris, please clarify.  I
> >> don't
> >>>> think the end user will see the change.  It's an optimization for
> >>>> interfacing with the storage backend.
> >>>>>
> >>>>> --Alex
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
> >>>>>> Sent: Wednesday, September 18, 2013 9:22 AM
> >>>>>> To: dev@cloudstack.apache.org
> >>>>>> Subject: Re: [PROPOSAL] Storage Subsystem API Interface Additions
> >>>>>>
> >>>>>> Perhaps he needs to elaborate on the use case and what he means by
> >> more
> >>>>>> efficient.  He may be referring to multiple volumes in the sense of
> >>>>>> snapshotting the ROOT disks for 10 different VMs.
> >>>>>>
> >>>>>> On Wed, Sep 18, 2013 at 10:10 AM, Darren Shepherd
> >>>>>> <da...@gmail.com> wrote:
> >>>>>>> Here's my general concern about multiple volume snapshots at once.
> >>>>>>> Giving such a feature leads the user to believe that snapshotting
> >>>>>>> multiple volumes at once will give them consistency across the
> >> volumes
> >>>> in
> >>>>>> the snapshot.
> >>>>>>> This is not true, and difficult to do with many hypervisors, and
> >>>>>>> typically requires an agent in the VM.  A single snapshot, as
> exists
> >>>>>>> today, is really crash consistent, meaning that there is may exist
> >>>>>>> unsync'd data.  To do a true multi volume snapshot requires a
> >> "quiesce"
> >>>>>> functionality in the VM.
> >>>>>>> So you do pause I/O queues, fsync, fsync, snapshot, snapshot,
> unpause
> >>>> I/O.
> >>>>>>>
> >>>>>>> I'm might be fine with the option of allowing multiple volumeId's
> to
> >>>>>>> be specified in the snapshot API, but it needs to be clear that
> those
> >>>>>>> snapshots may be taken sequentially and they are all independently
> >>>>>>> crash consistent.  But, if you make that clear, then why even have
> >> the
> >>>> API.
> >>>>>>> Essentially it is the same as doing multiple snapshot API commands.
> >>>>>>>
> >>>>>>> So really I would lean towards having the multiple snapshotting
> >>>>>>> supported in the driver or storage subsystem, but not exposed to
> the
> >>>>>>> user.  You can easy accomplish it by having a timed window on
> >>>>>>> snapshotting.  So every 10 seconds you do snapshots, if 5 requests
> >>>>>>> have queued in the last 10 seconds, you do them all at once.  This
> >>>> could be
> >>>>>> implemented as a framework thing.
> >>>>>>> If your provider implements "SnapshotBatching" interface and that
> has
> >>>>>>> a getBatchWindowTime(), then the framework can detect that it
> should
> >>>>>>> try to queue up some snapshot requests and send them to the driver
> in
> >>>>>>> a batch.  Or that could be implemented in the driver itself.  I
> would
> >>>>>>> lean toward doing it in the driver and if that goes well, we look
> at
> >>>>>>> pulling the functionality into core ACS.
> >>>>>>>
> >>>>>>> Darren
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
> >>>>>>> Chris.Suich@netapp.com> wrote:
> >>>>>>>
> >>>>>>>> I would like to raise for discussion the idea of adding a couple
> >>>>>>>> methods to the Storage Subsystem API interface. Currently,
> >>>>>>>> takeSnapshot() and
> >>>>>>>> revertSnapshot() only support single VM volumes. We have a use
> case
> >>>>>>>> for snapshotting multiple VM volumes at the same time. For us, it
> is
> >>>>>>>> more efficient to snapshot them all at once rather than snapshot
> VM
> >>>>>>>> Volumes individually and this seems like a more elegant solution
> >> than
> >>>>>>>> queueing the requests within our plugin.
> >>>>>>>>
> >>>>>>>> Base on my investigation, this should require:
> >>>>>>>> -Two additional API to be invoked from the UI -Two additional
> >> methods
> >>>>>>>> added to the Storage Subsystem API interface -Changes in between
> the
> >>>>>>>> API level and invoking the Storage Subsystem API implementations
> (I
> >>>>>>>> know this is broad and vague), mainly around the
> SnapshotManger/Impl
> >>>>>>>>
> >>>>>>>> There are a couple topics we would like discussion on:
> >>>>>>>> -Would this be beneficial/detrimental/neutral to other storage
> >>>> providers?
> >>>>>>>> -How should we handle the addition of new methods to the Storage
> >>>>>>>> Subsystem API interface? Default them to throw an
> >>>>>> UnsupportedOperationException?
> >>>>>>>> Default to calling the single VM volume version multiple times?
> >>>>>>>> -Does anyone see any issues with allowing multiple snapshots to be
> >>>>>>>> taken at the same time or letting storage providers have a list of
> >>>>>>>> all the requested volumes to backup?
> >>>>>>>>
> >>>>>>>> Please let me know if I've missed any major topics for discussion
> or
> >>>>>>>> if anything needs clarification.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Chris
> >>>>>>>> --
> >>>>>>>> Chris Suich
> >>>>>>>> chris.suich@netapp.com
> >>>>>>>> NetApp Software Engineer
> >>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>>>>>>
> >>>>>>>>
> >>>>
> >>>>
> >>
> >>
>
>


-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by "SuichII, Christopher" <Ch...@netapp.com>.

That's a good point and I'm not sure. Maybe we can have the drivers indicate whether they support batched/multiple volume snapshotting. If not, then we can spin up a thread per volume to snapshot. I completely agree that simply calling them in sequence could end up badly.

You're right though, the examples I provided are only partial examples as they really do just edit something in the db.

Edison, Alex or anyone else knowledgable of the storage system - Do you have any input. Was my clarification helpful?

For what it's worth, after some further investigation, it sounds like we won't actually be performing NetApp volume level snapshots to backup vm volumes - either when requested or as a batch. We believe we have come up with a solution that lets us create individual backups while avoiding our 255 snapshot limit. However, it would still be beneficial to let users request backups of multiple volumes at once for the other reasons I explained earlier.

-Chris
-- 
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Sep 18, 2013, at 6:56 PM, Darren Shepherd <da...@gmail.com> wrote:

> Nah, I guess its not so bad to have a volumeIds param.  Just as long as its
> really clear that means nothing about consistency.
> 
> I would be a little concerned about how this will be implemented by other
> storage providers though.  Currently if you do 5 snapshot API calls that
> will launch 5 threads and they will happen in parallel.  If they get
> batched and sent in one thread, how is the framework/driver going to handle
> the snapshots for drivers that don't supporting batching?  Sequential would
> be bad as sometimes it takes awhile to snapshot.
> 
> The APIs you mentioned that takes lists really just manipulate data in the
> DB, so they can easily batch and transactionally do a bunch at once.
> 
> Maybe someone who's more familiar with the storage implementation can
> comment?
> 
> Darren
> 
> 
> On Wed, Sep 18, 2013 at 12:46 PM, SuichII, Christopher <
> Chris.Suich@netapp.com> wrote:
> 
>> That certainly would work, but I don't see how it is a better design. Can
>> you elaborate on how sending multiple volumeIds is hackish? Look at the
>> existing API framework. We have several APIs that accept lists as
>> parameters. Normally, they're used for things like querying or deleting.
>> Take a look at some of these commands:
>> -ArchiveEventsCmd
>> -DeleteEventsCmd
>> -DeleteSnapshotPoliciesCmd
>> 
>> This kind of API is simply a shorthand for invoking another API many times.
>> 
>> I think it is only an NetApp optimization in the sense that we're the only
>> ones who need it right now. What we're asking for has nothing specific to
>> do to NetApp. We would just like the shorthand ability to do things all at
>> once rather than one at a time. I think other vendors could utilize this
>> just as easily.
>> 
>> -Chris
>> --
>> Chris Suich
>> chris.suich@netapp.com
>> NetApp Software Engineer
>> Data Center Platforms – Cloud Solutions
>> Citrix, Cisco & Red Hat
>> 
>> On Sep 18, 2013, at 2:32 PM, Darren Shepherd <da...@gmail.com>
>> wrote:
>> 
>>> Given this explanation.  Would the following not work?
>>> 
>>> 1) Enhance UI to allow multi select.  There is no API change, UI will
>> just
>>> call snapshot a bunch of time
>>> 2) Enhance storage framework or driver to detect that 20 requests just
>> came
>>> in within a window of X seconds and send them to the driver all at once.
>>> 
>>> I know you said queuing on the backend is hackish, but having the user
>> send
>>> multiple volumeIds in the API is just as hackish to me.  We can only
>>> guarentee to the user that the multiple snapshots taken are as consistent
>>> as if they called snapshot API individually.  The user won't really know
>>> exactly what NetApp volume they exist on and really neither will the
>>> storage framework, as export != volume.  Only the driver knows if
>> batching
>>> is really possible.  So I'm not exactly saying queue, its short batches.
>>> 
>>> In short, I'm seeing this as a bit more of a NetApp optimization than a
>>> general thing.  I'm all for using storage device level snapshotting but
>> it
>>> seems like its going to be implementation specific.  Its interesting, if
>>> you look at digital ocean they have snapshots and backups as two
>> different
>>> concepts.  You can see that they ran into this specific issue.  Full
>>> storage volume snapshots are really difficult to expose to the user.  So
>>> digital ocean does "backups" which are live but on a schedule and seem to
>>> be a full volume backup.  And then there are snapshots which are on
>> demand,
>>> but require you to stop your VM (so they can essentially copy the qcow or
>>> lv somewhere).
>>> 
>>> Darren
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Sep 18, 2013 at 11:12 AM, SuichII, Christopher <
>>> Chris.Suich@netapp.com> wrote:
>>> 
>>>> First, let me apologize for the confusing terms, because some words here
>>>> are overloaded:
>>>> A volume…
>>>> In CloudStack terms is a disk attached to a VM.
>>>> In NetApp terms is an NFS volume, analogous to CloudStack primary
>> storage,
>>>> where all the CloudStack volumes are stored.
>>>> 
>>>> A snapshot…
>>>> In CloudStack terms is a backup of a VM.
>>>> In NetApp terms is a copy of all the contents of a NetApp volume, taken
>> at
>>>> a point in time to create an analogous CloudStack snapshot for (up to)
>>>> every CloudStack volume on that primary storage.
>>>> 
>>>> There are several reasons that an API for snapshotting multiple volumes
>> is
>>>> more attractive to us than calling a single volume API over and over. A
>> lot
>>>> of it has to do with how we actually create the snapshots. Unlike a
>>>> hypervisor snapshot, when we create a vm snapshot, the entire primary
>>>> storage is backed up (but only the requested volume has an entry added
>> to
>>>> the db). To add on to this, our hardware has a hard limit of 255 storage
>>>> volume level snapshots. So, if there were 255 vms on a single primary
>>>> storage and each one of them performed a backup, no more backups could
>> be
>>>> taken before we start removing the oldest backup (without some trickery
>>>> that we are currently working on). Some might say a solution to this
>> would
>>>> be queueing the requests and waiting till they're all finished, but that
>>>> seems much more error prone and like hackish design compared to simply
>>>> allowing multiple VM volumes to be specified.
>>>> 
>>>> This is both a request for optimizing the backend and optimizing the
>>>> experience for users. What happens when a user says they want to backup
>> 30
>>>> vm volumes at the same time? Is it not a cleaner experience to simply
>>>> select all the volumes they want to back up, then click backup once?
>> This
>>>> way, the storage provider is given all the volumes at once and if they
>> have
>>>> some way of optimizing the request based on their hardware or software,
>>>> they can take advantage of that. It can even be designed in such a way
>> that
>>>> if storage providers don't want to be given all the volumes at once,
>> they
>>>> can be called with each one individually, as to remain backwards
>> compatible.
>>>> 
>>>> Now, I'm also not saying that these two solutions can't co-exist. Even
>> if
>>>> we have the ability to backup multiple volumes at once, nothing is
>> stopping
>>>> users from backing them up one by one, so queueing is still something we
>>>> may have to implement. However, I think extending the subsystem API to
>>>> grant storage providers the ability to leverage any optimization they
>> can
>>>> without having to queue is a cleaner solution. If the concern is how
>> users
>>>> interpret what is going on in the backend, I think we can find some way
>> to
>>>> make that clear to them.
>>>> 
>>>> -Chris
>>>> --
>>>> Chris Suich
>>>> chris.suich@netapp.com
>>>> NetApp Software Engineer
>>>> Data Center Platforms – Cloud Solutions
>>>> Citrix, Cisco & Red Hat
>>>> 
>>>> On Sep 18, 2013, at 12:26 PM, Alex Huang <Al...@citrix.com> wrote:
>>>> 
>>>>> That's my read on the proposal also but, Chris, please clarify.  I
>> don't
>>>> think the end user will see the change.  It's an optimization for
>>>> interfacing with the storage backend.
>>>>> 
>>>>> --Alex
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>>>>>> Sent: Wednesday, September 18, 2013 9:22 AM
>>>>>> To: dev@cloudstack.apache.org
>>>>>> Subject: Re: [PROPOSAL] Storage Subsystem API Interface Additions
>>>>>> 
>>>>>> Perhaps he needs to elaborate on the use case and what he means by
>> more
>>>>>> efficient.  He may be referring to multiple volumes in the sense of
>>>>>> snapshotting the ROOT disks for 10 different VMs.
>>>>>> 
>>>>>> On Wed, Sep 18, 2013 at 10:10 AM, Darren Shepherd
>>>>>> <da...@gmail.com> wrote:
>>>>>>> Here's my general concern about multiple volume snapshots at once.
>>>>>>> Giving such a feature leads the user to believe that snapshotting
>>>>>>> multiple volumes at once will give them consistency across the
>> volumes
>>>> in
>>>>>> the snapshot.
>>>>>>> This is not true, and difficult to do with many hypervisors, and
>>>>>>> typically requires an agent in the VM.  A single snapshot, as exists
>>>>>>> today, is really crash consistent, meaning that there is may exist
>>>>>>> unsync'd data.  To do a true multi volume snapshot requires a
>> "quiesce"
>>>>>> functionality in the VM.
>>>>>>> So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause
>>>> I/O.
>>>>>>> 
>>>>>>> I'm might be fine with the option of allowing multiple volumeId's to
>>>>>>> be specified in the snapshot API, but it needs to be clear that those
>>>>>>> snapshots may be taken sequentially and they are all independently
>>>>>>> crash consistent.  But, if you make that clear, then why even have
>> the
>>>> API.
>>>>>>> Essentially it is the same as doing multiple snapshot API commands.
>>>>>>> 
>>>>>>> So really I would lean towards having the multiple snapshotting
>>>>>>> supported in the driver or storage subsystem, but not exposed to the
>>>>>>> user.  You can easy accomplish it by having a timed window on
>>>>>>> snapshotting.  So every 10 seconds you do snapshots, if 5 requests
>>>>>>> have queued in the last 10 seconds, you do them all at once.  This
>>>> could be
>>>>>> implemented as a framework thing.
>>>>>>> If your provider implements "SnapshotBatching" interface and that has
>>>>>>> a getBatchWindowTime(), then the framework can detect that it should
>>>>>>> try to queue up some snapshot requests and send them to the driver in
>>>>>>> a batch.  Or that could be implemented in the driver itself.  I would
>>>>>>> lean toward doing it in the driver and if that goes well, we look at
>>>>>>> pulling the functionality into core ACS.
>>>>>>> 
>>>>>>> Darren
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
>>>>>>> Chris.Suich@netapp.com> wrote:
>>>>>>> 
>>>>>>>> I would like to raise for discussion the idea of adding a couple
>>>>>>>> methods to the Storage Subsystem API interface. Currently,
>>>>>>>> takeSnapshot() and
>>>>>>>> revertSnapshot() only support single VM volumes. We have a use case
>>>>>>>> for snapshotting multiple VM volumes at the same time. For us, it is
>>>>>>>> more efficient to snapshot them all at once rather than snapshot VM
>>>>>>>> Volumes individually and this seems like a more elegant solution
>> than
>>>>>>>> queueing the requests within our plugin.
>>>>>>>> 
>>>>>>>> Base on my investigation, this should require:
>>>>>>>> -Two additional API to be invoked from the UI -Two additional
>> methods
>>>>>>>> added to the Storage Subsystem API interface -Changes in between the
>>>>>>>> API level and invoking the Storage Subsystem API implementations (I
>>>>>>>> know this is broad and vague), mainly around the SnapshotManger/Impl
>>>>>>>> 
>>>>>>>> There are a couple topics we would like discussion on:
>>>>>>>> -Would this be beneficial/detrimental/neutral to other storage
>>>> providers?
>>>>>>>> -How should we handle the addition of new methods to the Storage
>>>>>>>> Subsystem API interface? Default them to throw an
>>>>>> UnsupportedOperationException?
>>>>>>>> Default to calling the single VM volume version multiple times?
>>>>>>>> -Does anyone see any issues with allowing multiple snapshots to be
>>>>>>>> taken at the same time or letting storage providers have a list of
>>>>>>>> all the requested volumes to backup?
>>>>>>>> 
>>>>>>>> Please let me know if I've missed any major topics for discussion or
>>>>>>>> if anything needs clarification.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Chris
>>>>>>>> --
>>>>>>>> Chris Suich
>>>>>>>> chris.suich@netapp.com
>>>>>>>> NetApp Software Engineer
>>>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>>>>>>>> 
>>>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by Darren Shepherd <da...@gmail.com>.

Nah, I guess its not so bad to have a volumeIds param.  Just as long as its
really clear that means nothing about consistency.

I would be a little concerned about how this will be implemented by other
storage providers though.  Currently if you do 5 snapshot API calls that
will launch 5 threads and they will happen in parallel.  If they get
batched and sent in one thread, how is the framework/driver going to handle
the snapshots for drivers that don't supporting batching?  Sequential would
be bad as sometimes it takes awhile to snapshot.

The APIs you mentioned that takes lists really just manipulate data in the
DB, so they can easily batch and transactionally do a bunch at once.

Maybe someone who's more familiar with the storage implementation can
comment?

Darren


On Wed, Sep 18, 2013 at 12:46 PM, SuichII, Christopher <
Chris.Suich@netapp.com> wrote:

> That certainly would work, but I don't see how it is a better design. Can
> you elaborate on how sending multiple volumeIds is hackish? Look at the
> existing API framework. We have several APIs that accept lists as
> parameters. Normally, they're used for things like querying or deleting.
> Take a look at some of these commands:
> -ArchiveEventsCmd
> -DeleteEventsCmd
> -DeleteSnapshotPoliciesCmd
>
> This kind of API is simply a shorthand for invoking another API many times.
>
> I think it is only an NetApp optimization in the sense that we're the only
> ones who need it right now. What we're asking for has nothing specific to
> do to NetApp. We would just like the shorthand ability to do things all at
> once rather than one at a time. I think other vendors could utilize this
> just as easily.
>
> -Chris
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Sep 18, 2013, at 2:32 PM, Darren Shepherd <da...@gmail.com>
> wrote:
>
> > Given this explanation.  Would the following not work?
> >
> > 1) Enhance UI to allow multi select.  There is no API change, UI will
> just
> > call snapshot a bunch of time
> > 2) Enhance storage framework or driver to detect that 20 requests just
> came
> > in within a window of X seconds and send them to the driver all at once.
> >
> > I know you said queuing on the backend is hackish, but having the user
> send
> > multiple volumeIds in the API is just as hackish to me.  We can only
> > guarentee to the user that the multiple snapshots taken are as consistent
> > as if they called snapshot API individually.  The user won't really know
> > exactly what NetApp volume they exist on and really neither will the
> > storage framework, as export != volume.  Only the driver knows if
> batching
> > is really possible.  So I'm not exactly saying queue, its short batches.
> >
> > In short, I'm seeing this as a bit more of a NetApp optimization than a
> > general thing.  I'm all for using storage device level snapshotting but
> it
> > seems like its going to be implementation specific.  Its interesting, if
> > you look at digital ocean they have snapshots and backups as two
> different
> > concepts.  You can see that they ran into this specific issue.  Full
> > storage volume snapshots are really difficult to expose to the user.  So
> > digital ocean does "backups" which are live but on a schedule and seem to
> > be a full volume backup.  And then there are snapshots which are on
> demand,
> > but require you to stop your VM (so they can essentially copy the qcow or
> > lv somewhere).
> >
> > Darren
> >
> >
> >
> >
> > On Wed, Sep 18, 2013 at 11:12 AM, SuichII, Christopher <
> > Chris.Suich@netapp.com> wrote:
> >
> >> First, let me apologize for the confusing terms, because some words here
> >> are overloaded:
> >> A volume…
> >> In CloudStack terms is a disk attached to a VM.
> >> In NetApp terms is an NFS volume, analogous to CloudStack primary
> storage,
> >> where all the CloudStack volumes are stored.
> >>
> >> A snapshot…
> >> In CloudStack terms is a backup of a VM.
> >> In NetApp terms is a copy of all the contents of a NetApp volume, taken
> at
> >> a point in time to create an analogous CloudStack snapshot for (up to)
> >> every CloudStack volume on that primary storage.
> >>
> >> There are several reasons that an API for snapshotting multiple volumes
> is
> >> more attractive to us than calling a single volume API over and over. A
> lot
> >> of it has to do with how we actually create the snapshots. Unlike a
> >> hypervisor snapshot, when we create a vm snapshot, the entire primary
> >> storage is backed up (but only the requested volume has an entry added
> to
> >> the db). To add on to this, our hardware has a hard limit of 255 storage
> >> volume level snapshots. So, if there were 255 vms on a single primary
> >> storage and each one of them performed a backup, no more backups could
> be
> >> taken before we start removing the oldest backup (without some trickery
> >> that we are currently working on). Some might say a solution to this
> would
> >> be queueing the requests and waiting till they're all finished, but that
> >> seems much more error prone and like hackish design compared to simply
> >> allowing multiple VM volumes to be specified.
> >>
> >> This is both a request for optimizing the backend and optimizing the
> >> experience for users. What happens when a user says they want to backup
> 30
> >> vm volumes at the same time? Is it not a cleaner experience to simply
> >> select all the volumes they want to back up, then click backup once?
> This
> >> way, the storage provider is given all the volumes at once and if they
> have
> >> some way of optimizing the request based on their hardware or software,
> >> they can take advantage of that. It can even be designed in such a way
> that
> >> if storage providers don't want to be given all the volumes at once,
> they
> >> can be called with each one individually, as to remain backwards
> compatible.
> >>
> >> Now, I'm also not saying that these two solutions can't co-exist. Even
> if
> >> we have the ability to backup multiple volumes at once, nothing is
> stopping
> >> users from backing them up one by one, so queueing is still something we
> >> may have to implement. However, I think extending the subsystem API to
> >> grant storage providers the ability to leverage any optimization they
> can
> >> without having to queue is a cleaner solution. If the concern is how
> users
> >> interpret what is going on in the backend, I think we can find some way
> to
> >> make that clear to them.
> >>
> >> -Chris
> >> --
> >> Chris Suich
> >> chris.suich@netapp.com
> >> NetApp Software Engineer
> >> Data Center Platforms – Cloud Solutions
> >> Citrix, Cisco & Red Hat
> >>
> >> On Sep 18, 2013, at 12:26 PM, Alex Huang <Al...@citrix.com> wrote:
> >>
> >>> That's my read on the proposal also but, Chris, please clarify.  I
> don't
> >> think the end user will see the change.  It's an optimization for
> >> interfacing with the storage backend.
> >>>
> >>> --Alex
> >>>
> >>>> -----Original Message-----
> >>>> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
> >>>> Sent: Wednesday, September 18, 2013 9:22 AM
> >>>> To: dev@cloudstack.apache.org
> >>>> Subject: Re: [PROPOSAL] Storage Subsystem API Interface Additions
> >>>>
> >>>> Perhaps he needs to elaborate on the use case and what he means by
> more
> >>>> efficient.  He may be referring to multiple volumes in the sense of
> >>>> snapshotting the ROOT disks for 10 different VMs.
> >>>>
> >>>> On Wed, Sep 18, 2013 at 10:10 AM, Darren Shepherd
> >>>> <da...@gmail.com> wrote:
> >>>>> Here's my general concern about multiple volume snapshots at once.
> >>>>> Giving such a feature leads the user to believe that snapshotting
> >>>>> multiple volumes at once will give them consistency across the
> volumes
> >> in
> >>>> the snapshot.
> >>>>> This is not true, and difficult to do with many hypervisors, and
> >>>>> typically requires an agent in the VM.  A single snapshot, as exists
> >>>>> today, is really crash consistent, meaning that there is may exist
> >>>>> unsync'd data.  To do a true multi volume snapshot requires a
> "quiesce"
> >>>> functionality in the VM.
> >>>>> So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause
> >> I/O.
> >>>>>
> >>>>> I'm might be fine with the option of allowing multiple volumeId's to
> >>>>> be specified in the snapshot API, but it needs to be clear that those
> >>>>> snapshots may be taken sequentially and they are all independently
> >>>>> crash consistent.  But, if you make that clear, then why even have
> the
> >> API.
> >>>>> Essentially it is the same as doing multiple snapshot API commands.
> >>>>>
> >>>>> So really I would lean towards having the multiple snapshotting
> >>>>> supported in the driver or storage subsystem, but not exposed to the
> >>>>> user.  You can easy accomplish it by having a timed window on
> >>>>> snapshotting.  So every 10 seconds you do snapshots, if 5 requests
> >>>>> have queued in the last 10 seconds, you do them all at once.  This
> >> could be
> >>>> implemented as a framework thing.
> >>>>> If your provider implements "SnapshotBatching" interface and that has
> >>>>> a getBatchWindowTime(), then the framework can detect that it should
> >>>>> try to queue up some snapshot requests and send them to the driver in
> >>>>> a batch.  Or that could be implemented in the driver itself.  I would
> >>>>> lean toward doing it in the driver and if that goes well, we look at
> >>>>> pulling the functionality into core ACS.
> >>>>>
> >>>>> Darren
> >>>>>
> >>>>>
> >>>>> On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
> >>>>> Chris.Suich@netapp.com> wrote:
> >>>>>
> >>>>>> I would like to raise for discussion the idea of adding a couple
> >>>>>> methods to the Storage Subsystem API interface. Currently,
> >>>>>> takeSnapshot() and
> >>>>>> revertSnapshot() only support single VM volumes. We have a use case
> >>>>>> for snapshotting multiple VM volumes at the same time. For us, it is
> >>>>>> more efficient to snapshot them all at once rather than snapshot VM
> >>>>>> Volumes individually and this seems like a more elegant solution
> than
> >>>>>> queueing the requests within our plugin.
> >>>>>>
> >>>>>> Base on my investigation, this should require:
> >>>>>> -Two additional API to be invoked from the UI -Two additional
> methods
> >>>>>> added to the Storage Subsystem API interface -Changes in between the
> >>>>>> API level and invoking the Storage Subsystem API implementations (I
> >>>>>> know this is broad and vague), mainly around the SnapshotManger/Impl
> >>>>>>
> >>>>>> There are a couple topics we would like discussion on:
> >>>>>> -Would this be beneficial/detrimental/neutral to other storage
> >> providers?
> >>>>>> -How should we handle the addition of new methods to the Storage
> >>>>>> Subsystem API interface? Default them to throw an
> >>>> UnsupportedOperationException?
> >>>>>> Default to calling the single VM volume version multiple times?
> >>>>>> -Does anyone see any issues with allowing multiple snapshots to be
> >>>>>> taken at the same time or letting storage providers have a list of
> >>>>>> all the requested volumes to backup?
> >>>>>>
> >>>>>> Please let me know if I've missed any major topics for discussion or
> >>>>>> if anything needs clarification.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Chris
> >>>>>> --
> >>>>>> Chris Suich
> >>>>>> chris.suich@netapp.com
> >>>>>> NetApp Software Engineer
> >>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>>>>
> >>>>>>
> >>
> >>
>
>

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by "SuichII, Christopher" <Ch...@netapp.com>.

That certainly would work, but I don't see how it is a better design. Can you elaborate on how sending multiple volumeIds is hackish? Look at the existing API framework. We have several APIs that accept lists as parameters. Normally, they're used for things like querying or deleting. Take a look at some of these commands:
-ArchiveEventsCmd
-DeleteEventsCmd
-DeleteSnapshotPoliciesCmd

This kind of API is simply a shorthand for invoking another API many times.

I think it is only an NetApp optimization in the sense that we're the only ones who need it right now. What we're asking for has nothing specific to do to NetApp. We would just like the shorthand ability to do things all at once rather than one at a time. I think other vendors could utilize this just as easily.

-Chris
-- 
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Sep 18, 2013, at 2:32 PM, Darren Shepherd <da...@gmail.com> wrote:

> Given this explanation.  Would the following not work?
> 
> 1) Enhance UI to allow multi select.  There is no API change, UI will just
> call snapshot a bunch of time
> 2) Enhance storage framework or driver to detect that 20 requests just came
> in within a window of X seconds and send them to the driver all at once.
> 
> I know you said queuing on the backend is hackish, but having the user send
> multiple volumeIds in the API is just as hackish to me.  We can only
> guarentee to the user that the multiple snapshots taken are as consistent
> as if they called snapshot API individually.  The user won't really know
> exactly what NetApp volume they exist on and really neither will the
> storage framework, as export != volume.  Only the driver knows if batching
> is really possible.  So I'm not exactly saying queue, its short batches.
> 
> In short, I'm seeing this as a bit more of a NetApp optimization than a
> general thing.  I'm all for using storage device level snapshotting but it
> seems like its going to be implementation specific.  Its interesting, if
> you look at digital ocean they have snapshots and backups as two different
> concepts.  You can see that they ran into this specific issue.  Full
> storage volume snapshots are really difficult to expose to the user.  So
> digital ocean does "backups" which are live but on a schedule and seem to
> be a full volume backup.  And then there are snapshots which are on demand,
> but require you to stop your VM (so they can essentially copy the qcow or
> lv somewhere).
> 
> Darren
> 
> 
> 
> 
> On Wed, Sep 18, 2013 at 11:12 AM, SuichII, Christopher <
> Chris.Suich@netapp.com> wrote:
> 
>> First, let me apologize for the confusing terms, because some words here
>> are overloaded:
>> A volume…
>> In CloudStack terms is a disk attached to a VM.
>> In NetApp terms is an NFS volume, analogous to CloudStack primary storage,
>> where all the CloudStack volumes are stored.
>> 
>> A snapshot…
>> In CloudStack terms is a backup of a VM.
>> In NetApp terms is a copy of all the contents of a NetApp volume, taken at
>> a point in time to create an analogous CloudStack snapshot for (up to)
>> every CloudStack volume on that primary storage.
>> 
>> There are several reasons that an API for snapshotting multiple volumes is
>> more attractive to us than calling a single volume API over and over. A lot
>> of it has to do with how we actually create the snapshots. Unlike a
>> hypervisor snapshot, when we create a vm snapshot, the entire primary
>> storage is backed up (but only the requested volume has an entry added to
>> the db). To add on to this, our hardware has a hard limit of 255 storage
>> volume level snapshots. So, if there were 255 vms on a single primary
>> storage and each one of them performed a backup, no more backups could be
>> taken before we start removing the oldest backup (without some trickery
>> that we are currently working on). Some might say a solution to this would
>> be queueing the requests and waiting till they're all finished, but that
>> seems much more error prone and like hackish design compared to simply
>> allowing multiple VM volumes to be specified.
>> 
>> This is both a request for optimizing the backend and optimizing the
>> experience for users. What happens when a user says they want to backup 30
>> vm volumes at the same time? Is it not a cleaner experience to simply
>> select all the volumes they want to back up, then click backup once? This
>> way, the storage provider is given all the volumes at once and if they have
>> some way of optimizing the request based on their hardware or software,
>> they can take advantage of that. It can even be designed in such a way that
>> if storage providers don't want to be given all the volumes at once, they
>> can be called with each one individually, as to remain backwards compatible.
>> 
>> Now, I'm also not saying that these two solutions can't co-exist. Even if
>> we have the ability to backup multiple volumes at once, nothing is stopping
>> users from backing them up one by one, so queueing is still something we
>> may have to implement. However, I think extending the subsystem API to
>> grant storage providers the ability to leverage any optimization they can
>> without having to queue is a cleaner solution. If the concern is how users
>> interpret what is going on in the backend, I think we can find some way to
>> make that clear to them.
>> 
>> -Chris
>> --
>> Chris Suich
>> chris.suich@netapp.com
>> NetApp Software Engineer
>> Data Center Platforms – Cloud Solutions
>> Citrix, Cisco & Red Hat
>> 
>> On Sep 18, 2013, at 12:26 PM, Alex Huang <Al...@citrix.com> wrote:
>> 
>>> That's my read on the proposal also but, Chris, please clarify.  I don't
>> think the end user will see the change.  It's an optimization for
>> interfacing with the storage backend.
>>> 
>>> --Alex
>>> 
>>>> -----Original Message-----
>>>> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>>>> Sent: Wednesday, September 18, 2013 9:22 AM
>>>> To: dev@cloudstack.apache.org
>>>> Subject: Re: [PROPOSAL] Storage Subsystem API Interface Additions
>>>> 
>>>> Perhaps he needs to elaborate on the use case and what he means by more
>>>> efficient.  He may be referring to multiple volumes in the sense of
>>>> snapshotting the ROOT disks for 10 different VMs.
>>>> 
>>>> On Wed, Sep 18, 2013 at 10:10 AM, Darren Shepherd
>>>> <da...@gmail.com> wrote:
>>>>> Here's my general concern about multiple volume snapshots at once.
>>>>> Giving such a feature leads the user to believe that snapshotting
>>>>> multiple volumes at once will give them consistency across the volumes
>> in
>>>> the snapshot.
>>>>> This is not true, and difficult to do with many hypervisors, and
>>>>> typically requires an agent in the VM.  A single snapshot, as exists
>>>>> today, is really crash consistent, meaning that there is may exist
>>>>> unsync'd data.  To do a true multi volume snapshot requires a "quiesce"
>>>> functionality in the VM.
>>>>> So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause
>> I/O.
>>>>> 
>>>>> I'm might be fine with the option of allowing multiple volumeId's to
>>>>> be specified in the snapshot API, but it needs to be clear that those
>>>>> snapshots may be taken sequentially and they are all independently
>>>>> crash consistent.  But, if you make that clear, then why even have the
>> API.
>>>>> Essentially it is the same as doing multiple snapshot API commands.
>>>>> 
>>>>> So really I would lean towards having the multiple snapshotting
>>>>> supported in the driver or storage subsystem, but not exposed to the
>>>>> user.  You can easy accomplish it by having a timed window on
>>>>> snapshotting.  So every 10 seconds you do snapshots, if 5 requests
>>>>> have queued in the last 10 seconds, you do them all at once.  This
>> could be
>>>> implemented as a framework thing.
>>>>> If your provider implements "SnapshotBatching" interface and that has
>>>>> a getBatchWindowTime(), then the framework can detect that it should
>>>>> try to queue up some snapshot requests and send them to the driver in
>>>>> a batch.  Or that could be implemented in the driver itself.  I would
>>>>> lean toward doing it in the driver and if that goes well, we look at
>>>>> pulling the functionality into core ACS.
>>>>> 
>>>>> Darren
>>>>> 
>>>>> 
>>>>> On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
>>>>> Chris.Suich@netapp.com> wrote:
>>>>> 
>>>>>> I would like to raise for discussion the idea of adding a couple
>>>>>> methods to the Storage Subsystem API interface. Currently,
>>>>>> takeSnapshot() and
>>>>>> revertSnapshot() only support single VM volumes. We have a use case
>>>>>> for snapshotting multiple VM volumes at the same time. For us, it is
>>>>>> more efficient to snapshot them all at once rather than snapshot VM
>>>>>> Volumes individually and this seems like a more elegant solution than
>>>>>> queueing the requests within our plugin.
>>>>>> 
>>>>>> Base on my investigation, this should require:
>>>>>> -Two additional API to be invoked from the UI -Two additional methods
>>>>>> added to the Storage Subsystem API interface -Changes in between the
>>>>>> API level and invoking the Storage Subsystem API implementations (I
>>>>>> know this is broad and vague), mainly around the SnapshotManger/Impl
>>>>>> 
>>>>>> There are a couple topics we would like discussion on:
>>>>>> -Would this be beneficial/detrimental/neutral to other storage
>> providers?
>>>>>> -How should we handle the addition of new methods to the Storage
>>>>>> Subsystem API interface? Default them to throw an
>>>> UnsupportedOperationException?
>>>>>> Default to calling the single VM volume version multiple times?
>>>>>> -Does anyone see any issues with allowing multiple snapshots to be
>>>>>> taken at the same time or letting storage providers have a list of
>>>>>> all the requested volumes to backup?
>>>>>> 
>>>>>> Please let me know if I've missed any major topics for discussion or
>>>>>> if anything needs clarification.
>>>>>> 
>>>>>> Thanks,
>>>>>> Chris
>>>>>> --
>>>>>> Chris Suich
>>>>>> chris.suich@netapp.com
>>>>>> NetApp Software Engineer
>>>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>>>>>> 
>>>>>> 
>> 
>>

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by Darren Shepherd <da...@gmail.com>.

Given this explanation.  Would the following not work?

1) Enhance UI to allow multi select.  There is no API change, UI will just
call snapshot a bunch of time
2) Enhance storage framework or driver to detect that 20 requests just came
in within a window of X seconds and send them to the driver all at once.

I know you said queuing on the backend is hackish, but having the user send
multiple volumeIds in the API is just as hackish to me.  We can only
guarentee to the user that the multiple snapshots taken are as consistent
as if they called snapshot API individually.  The user won't really know
exactly what NetApp volume they exist on and really neither will the
storage framework, as export != volume.  Only the driver knows if batching
is really possible.  So I'm not exactly saying queue, its short batches.

In short, I'm seeing this as a bit more of a NetApp optimization than a
general thing.  I'm all for using storage device level snapshotting but it
seems like its going to be implementation specific.  Its interesting, if
you look at digital ocean they have snapshots and backups as two different
concepts.  You can see that they ran into this specific issue.  Full
storage volume snapshots are really difficult to expose to the user.  So
digital ocean does "backups" which are live but on a schedule and seem to
be a full volume backup.  And then there are snapshots which are on demand,
but require you to stop your VM (so they can essentially copy the qcow or
lv somewhere).

Darren




On Wed, Sep 18, 2013 at 11:12 AM, SuichII, Christopher <
Chris.Suich@netapp.com> wrote:

> First, let me apologize for the confusing terms, because some words here
> are overloaded:
> A volume…
> In CloudStack terms is a disk attached to a VM.
> In NetApp terms is an NFS volume, analogous to CloudStack primary storage,
> where all the CloudStack volumes are stored.
>
> A snapshot…
> In CloudStack terms is a backup of a VM.
> In NetApp terms is a copy of all the contents of a NetApp volume, taken at
> a point in time to create an analogous CloudStack snapshot for (up to)
> every CloudStack volume on that primary storage.
>
> There are several reasons that an API for snapshotting multiple volumes is
> more attractive to us than calling a single volume API over and over. A lot
> of it has to do with how we actually create the snapshots. Unlike a
> hypervisor snapshot, when we create a vm snapshot, the entire primary
> storage is backed up (but only the requested volume has an entry added to
> the db). To add on to this, our hardware has a hard limit of 255 storage
> volume level snapshots. So, if there were 255 vms on a single primary
> storage and each one of them performed a backup, no more backups could be
> taken before we start removing the oldest backup (without some trickery
> that we are currently working on). Some might say a solution to this would
> be queueing the requests and waiting till they're all finished, but that
> seems much more error prone and like hackish design compared to simply
> allowing multiple VM volumes to be specified.
>
> This is both a request for optimizing the backend and optimizing the
> experience for users. What happens when a user says they want to backup 30
> vm volumes at the same time? Is it not a cleaner experience to simply
> select all the volumes they want to back up, then click backup once? This
> way, the storage provider is given all the volumes at once and if they have
> some way of optimizing the request based on their hardware or software,
> they can take advantage of that. It can even be designed in such a way that
> if storage providers don't want to be given all the volumes at once, they
> can be called with each one individually, as to remain backwards compatible.
>
> Now, I'm also not saying that these two solutions can't co-exist. Even if
> we have the ability to backup multiple volumes at once, nothing is stopping
> users from backing them up one by one, so queueing is still something we
> may have to implement. However, I think extending the subsystem API to
> grant storage providers the ability to leverage any optimization they can
> without having to queue is a cleaner solution. If the concern is how users
> interpret what is going on in the backend, I think we can find some way to
> make that clear to them.
>
> -Chris
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Sep 18, 2013, at 12:26 PM, Alex Huang <Al...@citrix.com> wrote:
>
> > That's my read on the proposal also but, Chris, please clarify.  I don't
> think the end user will see the change.  It's an optimization for
> interfacing with the storage backend.
> >
> > --Alex
> >
> >> -----Original Message-----
> >> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
> >> Sent: Wednesday, September 18, 2013 9:22 AM
> >> To: dev@cloudstack.apache.org
> >> Subject: Re: [PROPOSAL] Storage Subsystem API Interface Additions
> >>
> >> Perhaps he needs to elaborate on the use case and what he means by more
> >> efficient.  He may be referring to multiple volumes in the sense of
> >> snapshotting the ROOT disks for 10 different VMs.
> >>
> >> On Wed, Sep 18, 2013 at 10:10 AM, Darren Shepherd
> >> <da...@gmail.com> wrote:
> >>> Here's my general concern about multiple volume snapshots at once.
> >>> Giving such a feature leads the user to believe that snapshotting
> >>> multiple volumes at once will give them consistency across the volumes
> in
> >> the snapshot.
> >>> This is not true, and difficult to do with many hypervisors, and
> >>> typically requires an agent in the VM.  A single snapshot, as exists
> >>> today, is really crash consistent, meaning that there is may exist
> >>> unsync'd data.  To do a true multi volume snapshot requires a "quiesce"
> >> functionality in the VM.
> >>> So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause
> I/O.
> >>>
> >>> I'm might be fine with the option of allowing multiple volumeId's to
> >>> be specified in the snapshot API, but it needs to be clear that those
> >>> snapshots may be taken sequentially and they are all independently
> >>> crash consistent.  But, if you make that clear, then why even have the
> API.
> >>> Essentially it is the same as doing multiple snapshot API commands.
> >>>
> >>> So really I would lean towards having the multiple snapshotting
> >>> supported in the driver or storage subsystem, but not exposed to the
> >>> user.  You can easy accomplish it by having a timed window on
> >>> snapshotting.  So every 10 seconds you do snapshots, if 5 requests
> >>> have queued in the last 10 seconds, you do them all at once.  This
> could be
> >> implemented as a framework thing.
> >>> If your provider implements "SnapshotBatching" interface and that has
> >>> a getBatchWindowTime(), then the framework can detect that it should
> >>> try to queue up some snapshot requests and send them to the driver in
> >>> a batch.  Or that could be implemented in the driver itself.  I would
> >>> lean toward doing it in the driver and if that goes well, we look at
> >>> pulling the functionality into core ACS.
> >>>
> >>> Darren
> >>>
> >>>
> >>> On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
> >>> Chris.Suich@netapp.com> wrote:
> >>>
> >>>> I would like to raise for discussion the idea of adding a couple
> >>>> methods to the Storage Subsystem API interface. Currently,
> >>>> takeSnapshot() and
> >>>> revertSnapshot() only support single VM volumes. We have a use case
> >>>> for snapshotting multiple VM volumes at the same time. For us, it is
> >>>> more efficient to snapshot them all at once rather than snapshot VM
> >>>> Volumes individually and this seems like a more elegant solution than
> >>>> queueing the requests within our plugin.
> >>>>
> >>>> Base on my investigation, this should require:
> >>>> -Two additional API to be invoked from the UI -Two additional methods
> >>>> added to the Storage Subsystem API interface -Changes in between the
> >>>> API level and invoking the Storage Subsystem API implementations (I
> >>>> know this is broad and vague), mainly around the SnapshotManger/Impl
> >>>>
> >>>> There are a couple topics we would like discussion on:
> >>>> -Would this be beneficial/detrimental/neutral to other storage
> providers?
> >>>> -How should we handle the addition of new methods to the Storage
> >>>> Subsystem API interface? Default them to throw an
> >> UnsupportedOperationException?
> >>>> Default to calling the single VM volume version multiple times?
> >>>> -Does anyone see any issues with allowing multiple snapshots to be
> >>>> taken at the same time or letting storage providers have a list of
> >>>> all the requested volumes to backup?
> >>>>
> >>>> Please let me know if I've missed any major topics for discussion or
> >>>> if anything needs clarification.
> >>>>
> >>>> Thanks,
> >>>> Chris
> >>>> --
> >>>> Chris Suich
> >>>> chris.suich@netapp.com
> >>>> NetApp Software Engineer
> >>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>>>
> >>>>
>
>

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by "SuichII, Christopher" <Ch...@netapp.com>.

First, let me apologize for the confusing terms, because some words here are overloaded:
A volume…
In CloudStack terms is a disk attached to a VM.
In NetApp terms is an NFS volume, analogous to CloudStack primary storage, where all the CloudStack volumes are stored.

A snapshot…
In CloudStack terms is a backup of a VM.
In NetApp terms is a copy of all the contents of a NetApp volume, taken at a point in time to create an analogous CloudStack snapshot for (up to) every CloudStack volume on that primary storage.

There are several reasons that an API for snapshotting multiple volumes is more attractive to us than calling a single volume API over and over. A lot of it has to do with how we actually create the snapshots. Unlike a hypervisor snapshot, when we create a vm snapshot, the entire primary storage is backed up (but only the requested volume has an entry added to the db). To add on to this, our hardware has a hard limit of 255 storage volume level snapshots. So, if there were 255 vms on a single primary storage and each one of them performed a backup, no more backups could be taken before we start removing the oldest backup (without some trickery that we are currently working on). Some might say a solution to this would be queueing the requests and waiting till they're all finished, but that seems much more error prone and like hackish design compared to simply allowing multiple VM volumes to be specified.

This is both a request for optimizing the backend and optimizing the experience for users. What happens when a user says they want to backup 30 vm volumes at the same time? Is it not a cleaner experience to simply select all the volumes they want to back up, then click backup once? This way, the storage provider is given all the volumes at once and if they have some way of optimizing the request based on their hardware or software, they can take advantage of that. It can even be designed in such a way that if storage providers don't want to be given all the volumes at once, they can be called with each one individually, as to remain backwards compatible.

Now, I'm also not saying that these two solutions can't co-exist. Even if we have the ability to backup multiple volumes at once, nothing is stopping users from backing them up one by one, so queueing is still something we may have to implement. However, I think extending the subsystem API to grant storage providers the ability to leverage any optimization they can without having to queue is a cleaner solution. If the concern is how users interpret what is going on in the backend, I think we can find some way to make that clear to them.

-Chris
-- 
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Sep 18, 2013, at 12:26 PM, Alex Huang <Al...@citrix.com> wrote:

> That's my read on the proposal also but, Chris, please clarify.  I don't think the end user will see the change.  It's an optimization for interfacing with the storage backend.
> 
> --Alex
> 
>> -----Original Message-----
>> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>> Sent: Wednesday, September 18, 2013 9:22 AM
>> To: dev@cloudstack.apache.org
>> Subject: Re: [PROPOSAL] Storage Subsystem API Interface Additions
>> 
>> Perhaps he needs to elaborate on the use case and what he means by more
>> efficient.  He may be referring to multiple volumes in the sense of
>> snapshotting the ROOT disks for 10 different VMs.
>> 
>> On Wed, Sep 18, 2013 at 10:10 AM, Darren Shepherd
>> <da...@gmail.com> wrote:
>>> Here's my general concern about multiple volume snapshots at once.
>>> Giving such a feature leads the user to believe that snapshotting
>>> multiple volumes at once will give them consistency across the volumes in
>> the snapshot.
>>> This is not true, and difficult to do with many hypervisors, and
>>> typically requires an agent in the VM.  A single snapshot, as exists
>>> today, is really crash consistent, meaning that there is may exist
>>> unsync'd data.  To do a true multi volume snapshot requires a "quiesce"
>> functionality in the VM.
>>> So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause I/O.
>>> 
>>> I'm might be fine with the option of allowing multiple volumeId's to
>>> be specified in the snapshot API, but it needs to be clear that those
>>> snapshots may be taken sequentially and they are all independently
>>> crash consistent.  But, if you make that clear, then why even have the API.
>>> Essentially it is the same as doing multiple snapshot API commands.
>>> 
>>> So really I would lean towards having the multiple snapshotting
>>> supported in the driver or storage subsystem, but not exposed to the
>>> user.  You can easy accomplish it by having a timed window on
>>> snapshotting.  So every 10 seconds you do snapshots, if 5 requests
>>> have queued in the last 10 seconds, you do them all at once.  This could be
>> implemented as a framework thing.
>>> If your provider implements "SnapshotBatching" interface and that has
>>> a getBatchWindowTime(), then the framework can detect that it should
>>> try to queue up some snapshot requests and send them to the driver in
>>> a batch.  Or that could be implemented in the driver itself.  I would
>>> lean toward doing it in the driver and if that goes well, we look at
>>> pulling the functionality into core ACS.
>>> 
>>> Darren
>>> 
>>> 
>>> On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
>>> Chris.Suich@netapp.com> wrote:
>>> 
>>>> I would like to raise for discussion the idea of adding a couple
>>>> methods to the Storage Subsystem API interface. Currently,
>>>> takeSnapshot() and
>>>> revertSnapshot() only support single VM volumes. We have a use case
>>>> for snapshotting multiple VM volumes at the same time. For us, it is
>>>> more efficient to snapshot them all at once rather than snapshot VM
>>>> Volumes individually and this seems like a more elegant solution than
>>>> queueing the requests within our plugin.
>>>> 
>>>> Base on my investigation, this should require:
>>>> -Two additional API to be invoked from the UI -Two additional methods
>>>> added to the Storage Subsystem API interface -Changes in between the
>>>> API level and invoking the Storage Subsystem API implementations (I
>>>> know this is broad and vague), mainly around the SnapshotManger/Impl
>>>> 
>>>> There are a couple topics we would like discussion on:
>>>> -Would this be beneficial/detrimental/neutral to other storage providers?
>>>> -How should we handle the addition of new methods to the Storage
>>>> Subsystem API interface? Default them to throw an
>> UnsupportedOperationException?
>>>> Default to calling the single VM volume version multiple times?
>>>> -Does anyone see any issues with allowing multiple snapshots to be
>>>> taken at the same time or letting storage providers have a list of
>>>> all the requested volumes to backup?
>>>> 
>>>> Please let me know if I've missed any major topics for discussion or
>>>> if anything needs clarification.
>>>> 
>>>> Thanks,
>>>> Chris
>>>> --
>>>> Chris Suich
>>>> chris.suich@netapp.com
>>>> NetApp Software Engineer
>>>> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
>>>> 
>>>>

RE: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by Alex Huang <Al...@citrix.com>.

That's my read on the proposal also but, Chris, please clarify.  I don't think the end user will see the change.  It's an optimization for interfacing with the storage backend.

--Alex

> -----Original Message-----
> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
> Sent: Wednesday, September 18, 2013 9:22 AM
> To: dev@cloudstack.apache.org
> Subject: Re: [PROPOSAL] Storage Subsystem API Interface Additions
> 
> Perhaps he needs to elaborate on the use case and what he means by more
> efficient.  He may be referring to multiple volumes in the sense of
> snapshotting the ROOT disks for 10 different VMs.
> 
> On Wed, Sep 18, 2013 at 10:10 AM, Darren Shepherd
> <da...@gmail.com> wrote:
> > Here's my general concern about multiple volume snapshots at once.
> > Giving such a feature leads the user to believe that snapshotting
> > multiple volumes at once will give them consistency across the volumes in
> the snapshot.
> > This is not true, and difficult to do with many hypervisors, and
> > typically requires an agent in the VM.  A single snapshot, as exists
> > today, is really crash consistent, meaning that there is may exist
> > unsync'd data.  To do a true multi volume snapshot requires a "quiesce"
> functionality in the VM.
> > So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause I/O.
> >
> > I'm might be fine with the option of allowing multiple volumeId's to
> > be specified in the snapshot API, but it needs to be clear that those
> > snapshots may be taken sequentially and they are all independently
> > crash consistent.  But, if you make that clear, then why even have the API.
> > Essentially it is the same as doing multiple snapshot API commands.
> >
> > So really I would lean towards having the multiple snapshotting
> > supported in the driver or storage subsystem, but not exposed to the
> > user.  You can easy accomplish it by having a timed window on
> > snapshotting.  So every 10 seconds you do snapshots, if 5 requests
> > have queued in the last 10 seconds, you do them all at once.  This could be
> implemented as a framework thing.
> > If your provider implements "SnapshotBatching" interface and that has
> > a getBatchWindowTime(), then the framework can detect that it should
> > try to queue up some snapshot requests and send them to the driver in
> > a batch.  Or that could be implemented in the driver itself.  I would
> > lean toward doing it in the driver and if that goes well, we look at
> > pulling the functionality into core ACS.
> >
> > Darren
> >
> >
> > On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
> > Chris.Suich@netapp.com> wrote:
> >
> >> I would like to raise for discussion the idea of adding a couple
> >> methods to the Storage Subsystem API interface. Currently,
> >> takeSnapshot() and
> >> revertSnapshot() only support single VM volumes. We have a use case
> >> for snapshotting multiple VM volumes at the same time. For us, it is
> >> more efficient to snapshot them all at once rather than snapshot VM
> >> Volumes individually and this seems like a more elegant solution than
> >> queueing the requests within our plugin.
> >>
> >> Base on my investigation, this should require:
> >> -Two additional API to be invoked from the UI -Two additional methods
> >> added to the Storage Subsystem API interface -Changes in between the
> >> API level and invoking the Storage Subsystem API implementations (I
> >> know this is broad and vague), mainly around the SnapshotManger/Impl
> >>
> >> There are a couple topics we would like discussion on:
> >> -Would this be beneficial/detrimental/neutral to other storage providers?
> >> -How should we handle the addition of new methods to the Storage
> >> Subsystem API interface? Default them to throw an
> UnsupportedOperationException?
> >> Default to calling the single VM volume version multiple times?
> >> -Does anyone see any issues with allowing multiple snapshots to be
> >> taken at the same time or letting storage providers have a list of
> >> all the requested volumes to backup?
> >>
> >> Please let me know if I've missed any major topics for discussion or
> >> if anything needs clarification.
> >>
> >> Thanks,
> >> Chris
> >> --
> >> Chris Suich
> >> chris.suich@netapp.com
> >> NetApp Software Engineer
> >> Data Center Platforms - Cloud Solutions Citrix, Cisco & Red Hat
> >>
> >>

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by Marcus Sorensen <sh...@gmail.com>.

Perhaps he needs to elaborate on the use case and what he means by
more efficient.  He may be referring to multiple volumes in the sense
of snapshotting the ROOT disks for 10 different VMs.

On Wed, Sep 18, 2013 at 10:10 AM, Darren Shepherd
<da...@gmail.com> wrote:
> Here's my general concern about multiple volume snapshots at once.  Giving
> such a feature leads the user to believe that snapshotting multiple volumes
> at once will give them consistency across the volumes in the snapshot.
> This is not true, and difficult to do with many hypervisors, and typically
> requires an agent in the VM.  A single snapshot, as exists today, is really
> crash consistent, meaning that there is may exist unsync'd data.  To do a
> true multi volume snapshot requires a "quiesce" functionality in the VM.
> So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause I/O.
>
> I'm might be fine with the option of allowing multiple volumeId's to be
> specified in the snapshot API, but it needs to be clear that those
> snapshots may be taken sequentially and they are all independently crash
> consistent.  But, if you make that clear, then why even have the API.
> Essentially it is the same as doing multiple snapshot API commands.
>
> So really I would lean towards having the multiple snapshotting supported
> in the driver or storage subsystem, but not exposed to the user.  You can
> easy accomplish it by having a timed window on snapshotting.  So every 10
> seconds you do snapshots, if 5 requests have queued in the last 10 seconds,
> you do them all at once.  This could be implemented as a framework thing.
> If your provider implements "SnapshotBatching" interface and that has a
> getBatchWindowTime(), then the framework can detect that it should try to
> queue up some snapshot requests and send them to the driver in a batch.  Or
> that could be implemented in the driver itself.  I would lean toward doing
> it in the driver and if that goes well, we look at pulling the
> functionality into core ACS.
>
> Darren
>
>
> On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
> Chris.Suich@netapp.com> wrote:
>
>> I would like to raise for discussion the idea of adding a couple methods
>> to the Storage Subsystem API interface. Currently, takeSnapshot() and
>> revertSnapshot() only support single VM volumes. We have a use case for
>> snapshotting multiple VM volumes at the same time. For us, it is more
>> efficient to snapshot them all at once rather than snapshot VM Volumes
>> individually and this seems like a more elegant solution than queueing the
>> requests within our plugin.
>>
>> Base on my investigation, this should require:
>> -Two additional API to be invoked from the UI
>> -Two additional methods added to the Storage Subsystem API interface
>> -Changes in between the API level and invoking the Storage Subsystem API
>> implementations (I know this is broad and vague), mainly around the
>> SnapshotManger/Impl
>>
>> There are a couple topics we would like discussion on:
>> -Would this be beneficial/detrimental/neutral to other storage providers?
>> -How should we handle the addition of new methods to the Storage Subsystem
>> API interface? Default them to throw an UnsupportedOperationException?
>> Default to calling the single VM volume version multiple times?
>> -Does anyone see any issues with allowing multiple snapshots to be taken
>> at the same time or letting storage providers have a list of all the
>> requested volumes to backup?
>>
>> Please let me know if I've missed any major topics for discussion or if
>> anything needs clarification.
>>
>> Thanks,
>> Chris
>> --
>> Chris Suich
>> chris.suich@netapp.com
>> NetApp Software Engineer
>> Data Center Platforms – Cloud Solutions
>> Citrix, Cisco & Red Hat
>>
>>

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by Mike Tutkowski <mi...@solidfire.com>.

I'm not sure on a release for when I'll implement snapshot functionality.
Maybe 4.4.


On Thu, Sep 19, 2013 at 6:16 AM, SuichII, Christopher <
Chris.Suich@netapp.com> wrote:

> John - any chance we can get your input on the original topic. Mikes
> comment was a kind of unrelated (but a completely valid topic that I'd like
> to be a part of discussing anyway!).
>
> Mike - are you planning on implementing the snapshotting methods of the
> storage subsystem API anytime in the near future?
>
> Kelvin or Darren, do you have thoughts on how thoughts on how this will
> work regarding the user experience? As a user, I would be quite annoyed if
> I had to back up all of my vms one at a a time. I guess I'm still not sure
> I understand the opposition to allowing users to perform an action on more
> than one object at a time. I know that is kind of a foreign concept in
> CloudStack right now, as the UI doesn't support it, but that is something I
> am currently working on any (extending the UI to support this). If it is
> strictly a matter of making sure the user's expectations regarding
> consistency are in line with what happens, then I think that is a separate
> discussion.
>
> Maybe others would find it useful to have an API like this exposed if they
> are interacting with CloudStack directly through the APIs rather than the
> UI. Imagine being a developer consuming our APIs. It certainly seems
> cleaner (at least to me), to let the user call one API with a list of vms
> to backup them all up instead of them requiring to loop around as many vms
> as they want and having to call the same API that many times.
>
> To be perfectly clear, Kelvin, the benefit to propagating the ability to
> backup multiple vm volumes at once to the UI is about ease of use for users
> and ease of development for us - us being CS developers, CS plugin
> developers and CS API consumers.
>
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
> On Sep 19, 2013, at 1:00 AM, Kelven Yang <ke...@citrix.com> wrote:
>
> >
> >
> > On 9/18/13 9:10 AM, "Darren Shepherd" <da...@gmail.com>
> wrote:
> >
> >> Here's my general concern about multiple volume snapshots at once.
>  Giving
> >> such a feature leads the user to believe that snapshotting multiple
> >> volumes
> >> at once will give them consistency across the volumes in the snapshot.
> >> This is not true, and difficult to do with many hypervisors, and
> typically
> >> requires an agent in the VM.  A single snapshot, as exists today, is
> >> really
> >> crash consistent, meaning that there is may exist unsync'd data.  To do
> a
> >> true multi volume snapshot requires a "quiesce" functionality in the VM.
> >> So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause
> I/O.
> >>
> >> I'm might be fine with the option of allowing multiple volumeId's to be
> >> specified in the snapshot API, but it needs to be clear that those
> >> snapshots may be taken sequentially and they are all independently crash
> >> consistent.  But, if you make that clear, then why even have the API.
> >> Essentially it is the same as doing multiple snapshot API commands.
> >>
> >> So really I would lean towards having the multiple snapshotting
> supported
> >> in the driver or storage subsystem, but not exposed to the user.  You
> can
> >> easy accomplish it by having a timed window on snapshotting.  So every
> 10
> >> seconds you do snapshots, if 5 requests have queued in the last 10
> >> seconds,
> >> you do them all at once.  This could be implemented as a framework
> thing.
> >> If your provider implements "SnapshotBatching" interface and that has a
> >> getBatchWindowTime(), then the framework can detect that it should try
> to
> >> queue up some snapshot requests and send them to the driver in a batch.
> >> Or
> >> that could be implemented in the driver itself.
> >
> > It makes more sense to me that "SnapshotBatching" is made available at
> > storage framework layer for similar drivers that have such batch
> > capability to share. There also exists another potential intelligent
> > processing -  when storage subsystem layer processes independent
> > volume-snapshot requests falling into the window, it can aggregate the
> > requests targeting for the same VM instance into groups, this can allow
> > hypervisor level drivers to take advantage of hypervisor provided VM
> > snapshot wisely.
> >
> > So +1 for storage layer - driver interface enhancements like this, but I
> > don't see much immediate benefit to propagate it into end-user API layer.
> >
> > -Kelven
> >
> >
> >> I would lean toward doing
> >> it in the driver and if that goes well, we look at pulling the
> >> functionality into core ACS.
> >>
> >> Darren
> >>
> >>
> >> On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
> >> Chris.Suich@netapp.com> wrote:
> >>
> >>> I would like to raise for discussion the idea of adding a couple
> methods
> >>> to the Storage Subsystem API interface. Currently, takeSnapshot() and
> >>> revertSnapshot() only support single VM volumes. We have a use case for
> >>> snapshotting multiple VM volumes at the same time. For us, it is more
> >>> efficient to snapshot them all at once rather than snapshot VM Volumes
> >>> individually and this seems like a more elegant solution than queueing
> >>> the
> >>> requests within our plugin.
> >>>
> >>> Base on my investigation, this should require:
> >>> -Two additional API to be invoked from the UI
> >>> -Two additional methods added to the Storage Subsystem API interface
> >>> -Changes in between the API level and invoking the Storage Subsystem
> API
> >>> implementations (I know this is broad and vague), mainly around the
> >>> SnapshotManger/Impl
> >>>
> >>> There are a couple topics we would like discussion on:
> >>> -Would this be beneficial/detrimental/neutral to other storage
> >>> providers?
> >>> -How should we handle the addition of new methods to the Storage
> >>> Subsystem
> >>> API interface? Default them to throw an UnsupportedOperationException?
> >>> Default to calling the single VM volume version multiple times?
> >>> -Does anyone see any issues with allowing multiple snapshots to be
> taken
> >>> at the same time or letting storage providers have a list of all the
> >>> requested volumes to backup?
> >>>
> >>> Please let me know if I've missed any major topics for discussion or if
> >>> anything needs clarification.
> >>>
> >>> Thanks,
> >>> Chris
> >>> --
> >>> Chris Suich
> >>> chris.suich@netapp.com
> >>> NetApp Software Engineer
> >>> Data Center Platforms  Cloud Solutions
> >>> Citrix, Cisco & Red Hat
> >>>
> >>>
> >
>
>


-- 
*Mike Tutkowski*
*Senior CloudStack Developer, SolidFire Inc.*
e: mike.tutkowski@solidfire.com
o: 303.746.7302
Advancing the way the world uses the
cloud<http://solidfire.com/solution/overview/?video=play>
*™*

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by Kelven Yang <ke...@citrix.com>.


On 9/19/13 5:16 AM, "SuichII, Christopher" <Ch...@netapp.com> wrote:

>John - any chance we can get your input on the original topic. Mikes
>comment was a kind of unrelated (but a completely valid topic that I'd
>like to be a part of discussing anyway!).
>
>Mike - are you planning on implementing the snapshotting methods of the
>storage subsystem API anytime in the near future?
>
>Kelvin or Darren, do you have thoughts on how thoughts on how this will
>work regarding the user experience? As a user, I would be quite annoyed
>if I had to back up all of my vms one at a a time. I guess I'm still not
>sure I understand the opposition to allowing users to perform an action
>on more than one object at a time. I know that is kind of a foreign
>concept in CloudStack right now, as the UI doesn't support it, but that
>is something I am currently working on any (extending the UI to support
>this). If it is strictly a matter of making sure the user's expectations
>regarding consistency are in line with what happens, then I think that is
>a separate discussion.
>
>Maybe others would find it useful to have an API like this exposed if
>they are interacting with CloudStack directly through the APIs rather
>than the UI. Imagine being a developer consuming our APIs. It certainly
>seems cleaner (at least to me), to let the user call one API with a list
>of vms to backup them all up instead of them requiring to loop around as
>many vms as they want and having to call the same API that many times.
>
>To be perfectly clear, Kelvin, the benefit to propagating the ability to
>backup multiple vm volumes at once to the UI is about ease of use for
>users and ease of development for us - us being CS developers, CS plugin
>developers and CS API consumers.

As you still have to compose the list of volumes you want to submit to the
API, the real difference is that you submit in a loop or in one call, from
user experience perspective,  you can always have the same experience (it
is the matter of UI implementation). On the flipping side, since multiple
volume snapshots may have independent life cycles, if we want to track
these spawned tasks and perform graceful error handling, it will
complicate the API design. Unless the round-trip cost of having multiple
calls becomes significant, the benefit of doing so may not sound as
valuable as it is.

-Kelven 


>
>-- 
>Chris Suich
>chris.suich@netapp.com
>NetApp Software Engineer
>Data Center Platforms  Cloud Solutions
>Citrix, Cisco & Red Hat
>
>On Sep 19, 2013, at 1:00 AM, Kelven Yang <ke...@citrix.com> wrote:
>
>> 
>> 
>> On 9/18/13 9:10 AM, "Darren Shepherd" <da...@gmail.com>
>>wrote:
>> 
>>> Here's my general concern about multiple volume snapshots at once.
>>>Giving
>>> such a feature leads the user to believe that snapshotting multiple
>>> volumes
>>> at once will give them consistency across the volumes in the snapshot.
>>> This is not true, and difficult to do with many hypervisors, and
>>>typically
>>> requires an agent in the VM.  A single snapshot, as exists today, is
>>> really
>>> crash consistent, meaning that there is may exist unsync'd data.  To
>>>do a
>>> true multi volume snapshot requires a "quiesce" functionality in the
>>>VM.
>>> So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause
>>>I/O.
>>> 
>>> I'm might be fine with the option of allowing multiple volumeId's to be
>>> specified in the snapshot API, but it needs to be clear that those
>>> snapshots may be taken sequentially and they are all independently
>>>crash
>>> consistent.  But, if you make that clear, then why even have the API.
>>> Essentially it is the same as doing multiple snapshot API commands.
>>> 
>>> So really I would lean towards having the multiple snapshotting
>>>supported
>>> in the driver or storage subsystem, but not exposed to the user.  You
>>>can
>>> easy accomplish it by having a timed window on snapshotting.  So every
>>>10
>>> seconds you do snapshots, if 5 requests have queued in the last 10
>>> seconds,
>>> you do them all at once.  This could be implemented as a framework
>>>thing.
>>> If your provider implements "SnapshotBatching" interface and that has a
>>> getBatchWindowTime(), then the framework can detect that it should try
>>>to
>>> queue up some snapshot requests and send them to the driver in a batch.
>>> Or
>>> that could be implemented in the driver itself.
>> 
>> It makes more sense to me that "SnapshotBatching" is made available at
>> storage framework layer for similar drivers that have such batch
>> capability to share. There also exists another potential intelligent
>> processing -  when storage subsystem layer processes independent
>> volume-snapshot requests falling into the window, it can aggregate the
>> requests targeting for the same VM instance into groups, this can allow
>> hypervisor level drivers to take advantage of hypervisor provided VM
>> snapshot wisely.
>> 
>> So +1 for storage layer - driver interface enhancements like this, but I
>> don't see much immediate benefit to propagate it into end-user API
>>layer.
>> 
>> -Kelven
>> 
>> 
>>> I would lean toward doing
>>> it in the driver and if that goes well, we look at pulling the
>>> functionality into core ACS.
>>> 
>>> Darren
>>> 
>>> 
>>> On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
>>> Chris.Suich@netapp.com> wrote:
>>> 
>>>> I would like to raise for discussion the idea of adding a couple
>>>>methods
>>>> to the Storage Subsystem API interface. Currently, takeSnapshot() and
>>>> revertSnapshot() only support single VM volumes. We have a use case
>>>>for
>>>> snapshotting multiple VM volumes at the same time. For us, it is more
>>>> efficient to snapshot them all at once rather than snapshot VM Volumes
>>>> individually and this seems like a more elegant solution than queueing
>>>> the
>>>> requests within our plugin.
>>>> 
>>>> Base on my investigation, this should require:
>>>> -Two additional API to be invoked from the UI
>>>> -Two additional methods added to the Storage Subsystem API interface
>>>> -Changes in between the API level and invoking the Storage Subsystem
>>>>API
>>>> implementations (I know this is broad and vague), mainly around the
>>>> SnapshotManger/Impl
>>>> 
>>>> There are a couple topics we would like discussion on:
>>>> -Would this be beneficial/detrimental/neutral to other storage
>>>> providers?
>>>> -How should we handle the addition of new methods to the Storage
>>>> Subsystem
>>>> API interface? Default them to throw an UnsupportedOperationException?
>>>> Default to calling the single VM volume version multiple times?
>>>> -Does anyone see any issues with allowing multiple snapshots to be
>>>>taken
>>>> at the same time or letting storage providers have a list of all the
>>>> requested volumes to backup?
>>>> 
>>>> Please let me know if I've missed any major topics for discussion or
>>>>if
>>>> anything needs clarification.
>>>> 
>>>> Thanks,
>>>> Chris
>>>> --
>>>> Chris Suich
>>>> chris.suich@netapp.com
>>>> NetApp Software Engineer
>>>> Data Center Platforms  Cloud Solutions
>>>> Citrix, Cisco & Red Hat
>>>> 
>>>> 
>> 
>

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by "SuichII, Christopher" <Ch...@netapp.com>.

John - any chance we can get your input on the original topic. Mikes comment was a kind of unrelated (but a completely valid topic that I'd like to be a part of discussing anyway!).

Mike - are you planning on implementing the snapshotting methods of the storage subsystem API anytime in the near future?

Kelvin or Darren, do you have thoughts on how thoughts on how this will work regarding the user experience? As a user, I would be quite annoyed if I had to back up all of my vms one at a a time. I guess I'm still not sure I understand the opposition to allowing users to perform an action on more than one object at a time. I know that is kind of a foreign concept in CloudStack right now, as the UI doesn't support it, but that is something I am currently working on any (extending the UI to support this). If it is strictly a matter of making sure the user's expectations regarding consistency are in line with what happens, then I think that is a separate discussion.

Maybe others would find it useful to have an API like this exposed if they are interacting with CloudStack directly through the APIs rather than the UI. Imagine being a developer consuming our APIs. It certainly seems cleaner (at least to me), to let the user call one API with a list of vms to backup them all up instead of them requiring to loop around as many vms as they want and having to call the same API that many times. 

To be perfectly clear, Kelvin, the benefit to propagating the ability to backup multiple vm volumes at once to the UI is about ease of use for users and ease of development for us - us being CS developers, CS plugin developers and CS API consumers.

-- 
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Sep 19, 2013, at 1:00 AM, Kelven Yang <ke...@citrix.com> wrote:

> 
> 
> On 9/18/13 9:10 AM, "Darren Shepherd" <da...@gmail.com> wrote:
> 
>> Here's my general concern about multiple volume snapshots at once.  Giving
>> such a feature leads the user to believe that snapshotting multiple
>> volumes
>> at once will give them consistency across the volumes in the snapshot.
>> This is not true, and difficult to do with many hypervisors, and typically
>> requires an agent in the VM.  A single snapshot, as exists today, is
>> really
>> crash consistent, meaning that there is may exist unsync'd data.  To do a
>> true multi volume snapshot requires a "quiesce" functionality in the VM.
>> So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause I/O.
>> 
>> I'm might be fine with the option of allowing multiple volumeId's to be
>> specified in the snapshot API, but it needs to be clear that those
>> snapshots may be taken sequentially and they are all independently crash
>> consistent.  But, if you make that clear, then why even have the API.
>> Essentially it is the same as doing multiple snapshot API commands.
>> 
>> So really I would lean towards having the multiple snapshotting supported
>> in the driver or storage subsystem, but not exposed to the user.  You can
>> easy accomplish it by having a timed window on snapshotting.  So every 10
>> seconds you do snapshots, if 5 requests have queued in the last 10
>> seconds,
>> you do them all at once.  This could be implemented as a framework thing.
>> If your provider implements "SnapshotBatching" interface and that has a
>> getBatchWindowTime(), then the framework can detect that it should try to
>> queue up some snapshot requests and send them to the driver in a batch.
>> Or
>> that could be implemented in the driver itself.
> 
> It makes more sense to me that "SnapshotBatching" is made available at
> storage framework layer for similar drivers that have such batch
> capability to share. There also exists another potential intelligent
> processing -  when storage subsystem layer processes independent
> volume-snapshot requests falling into the window, it can aggregate the
> requests targeting for the same VM instance into groups, this can allow
> hypervisor level drivers to take advantage of hypervisor provided VM
> snapshot wisely. 
> 
> So +1 for storage layer - driver interface enhancements like this, but I
> don't see much immediate benefit to propagate it into end-user API layer.
> 
> -Kelven
> 
> 
>> I would lean toward doing
>> it in the driver and if that goes well, we look at pulling the
>> functionality into core ACS.
>> 
>> Darren
>> 
>> 
>> On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
>> Chris.Suich@netapp.com> wrote:
>> 
>>> I would like to raise for discussion the idea of adding a couple methods
>>> to the Storage Subsystem API interface. Currently, takeSnapshot() and
>>> revertSnapshot() only support single VM volumes. We have a use case for
>>> snapshotting multiple VM volumes at the same time. For us, it is more
>>> efficient to snapshot them all at once rather than snapshot VM Volumes
>>> individually and this seems like a more elegant solution than queueing
>>> the
>>> requests within our plugin.
>>> 
>>> Base on my investigation, this should require:
>>> -Two additional API to be invoked from the UI
>>> -Two additional methods added to the Storage Subsystem API interface
>>> -Changes in between the API level and invoking the Storage Subsystem API
>>> implementations (I know this is broad and vague), mainly around the
>>> SnapshotManger/Impl
>>> 
>>> There are a couple topics we would like discussion on:
>>> -Would this be beneficial/detrimental/neutral to other storage
>>> providers?
>>> -How should we handle the addition of new methods to the Storage
>>> Subsystem
>>> API interface? Default them to throw an UnsupportedOperationException?
>>> Default to calling the single VM volume version multiple times?
>>> -Does anyone see any issues with allowing multiple snapshots to be taken
>>> at the same time or letting storage providers have a list of all the
>>> requested volumes to backup?
>>> 
>>> Please let me know if I've missed any major topics for discussion or if
>>> anything needs clarification.
>>> 
>>> Thanks,
>>> Chris
>>> --
>>> Chris Suich
>>> chris.suich@netapp.com
>>> NetApp Software Engineer
>>> Data Center Platforms  Cloud Solutions
>>> Citrix, Cisco & Red Hat
>>> 
>>> 
>

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by Kelven Yang <ke...@citrix.com>.


On 9/18/13 9:10 AM, "Darren Shepherd" <da...@gmail.com> wrote:

>Here's my general concern about multiple volume snapshots at once.  Giving
>such a feature leads the user to believe that snapshotting multiple
>volumes
>at once will give them consistency across the volumes in the snapshot.
>This is not true, and difficult to do with many hypervisors, and typically
>requires an agent in the VM.  A single snapshot, as exists today, is
>really
>crash consistent, meaning that there is may exist unsync'd data.  To do a
>true multi volume snapshot requires a "quiesce" functionality in the VM.
>So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause I/O.
>
>I'm might be fine with the option of allowing multiple volumeId's to be
>specified in the snapshot API, but it needs to be clear that those
>snapshots may be taken sequentially and they are all independently crash
>consistent.  But, if you make that clear, then why even have the API.
>Essentially it is the same as doing multiple snapshot API commands.
>
>So really I would lean towards having the multiple snapshotting supported
>in the driver or storage subsystem, but not exposed to the user.  You can
>easy accomplish it by having a timed window on snapshotting.  So every 10
>seconds you do snapshots, if 5 requests have queued in the last 10
>seconds,
>you do them all at once.  This could be implemented as a framework thing.
>If your provider implements "SnapshotBatching" interface and that has a
>getBatchWindowTime(), then the framework can detect that it should try to
>queue up some snapshot requests and send them to the driver in a batch.
>Or
>that could be implemented in the driver itself.

It makes more sense to me that "SnapshotBatching" is made available at
storage framework layer for similar drivers that have such batch
capability to share. There also exists another potential intelligent
processing -  when storage subsystem layer processes independent
volume-snapshot requests falling into the window, it can aggregate the
requests targeting for the same VM instance into groups, this can allow
hypervisor level drivers to take advantage of hypervisor provided VM
snapshot wisely. 

So +1 for storage layer - driver interface enhancements like this, but I
don't see much immediate benefit to propagate it into end-user API layer.

-Kelven


>I would lean toward doing
>it in the driver and if that goes well, we look at pulling the
>functionality into core ACS.
>
>Darren
>
>
>On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
>Chris.Suich@netapp.com> wrote:
>
>> I would like to raise for discussion the idea of adding a couple methods
>> to the Storage Subsystem API interface. Currently, takeSnapshot() and
>> revertSnapshot() only support single VM volumes. We have a use case for
>> snapshotting multiple VM volumes at the same time. For us, it is more
>> efficient to snapshot them all at once rather than snapshot VM Volumes
>> individually and this seems like a more elegant solution than queueing
>>the
>> requests within our plugin.
>>
>> Base on my investigation, this should require:
>> -Two additional API to be invoked from the UI
>> -Two additional methods added to the Storage Subsystem API interface
>> -Changes in between the API level and invoking the Storage Subsystem API
>> implementations (I know this is broad and vague), mainly around the
>> SnapshotManger/Impl
>>
>> There are a couple topics we would like discussion on:
>> -Would this be beneficial/detrimental/neutral to other storage
>>providers?
>> -How should we handle the addition of new methods to the Storage
>>Subsystem
>> API interface? Default them to throw an UnsupportedOperationException?
>> Default to calling the single VM volume version multiple times?
>> -Does anyone see any issues with allowing multiple snapshots to be taken
>> at the same time or letting storage providers have a list of all the
>> requested volumes to backup?
>>
>> Please let me know if I've missed any major topics for discussion or if
>> anything needs clarification.
>>
>> Thanks,
>> Chris
>> --
>> Chris Suich
>> chris.suich@netapp.com
>> NetApp Software Engineer
>> Data Center Platforms  Cloud Solutions
>> Citrix, Cisco & Red Hat
>>
>>

Re: [PROPOSAL] Storage Subsystem API Interface Additions

Posted by Darren Shepherd <da...@gmail.com>.

Here's my general concern about multiple volume snapshots at once.  Giving
such a feature leads the user to believe that snapshotting multiple volumes
at once will give them consistency across the volumes in the snapshot.
This is not true, and difficult to do with many hypervisors, and typically
requires an agent in the VM.  A single snapshot, as exists today, is really
crash consistent, meaning that there is may exist unsync'd data.  To do a
true multi volume snapshot requires a "quiesce" functionality in the VM.
So you do pause I/O queues, fsync, fsync, snapshot, snapshot, unpause I/O.

I'm might be fine with the option of allowing multiple volumeId's to be
specified in the snapshot API, but it needs to be clear that those
snapshots may be taken sequentially and they are all independently crash
consistent.  But, if you make that clear, then why even have the API.
Essentially it is the same as doing multiple snapshot API commands.

So really I would lean towards having the multiple snapshotting supported
in the driver or storage subsystem, but not exposed to the user.  You can
easy accomplish it by having a timed window on snapshotting.  So every 10
seconds you do snapshots, if 5 requests have queued in the last 10 seconds,
you do them all at once.  This could be implemented as a framework thing.
If your provider implements "SnapshotBatching" interface and that has a
getBatchWindowTime(), then the framework can detect that it should try to
queue up some snapshot requests and send them to the driver in a batch.  Or
that could be implemented in the driver itself.  I would lean toward doing
it in the driver and if that goes well, we look at pulling the
functionality into core ACS.

Darren

On Wed, Sep 18, 2013 at 5:22 AM, SuichII, Christopher <
Chris.Suich@netapp.com> wrote:

> I would like to raise for discussion the idea of adding a couple methods
> to the Storage Subsystem API interface. Currently, takeSnapshot() and
> revertSnapshot() only support single VM volumes. We have a use case for
> snapshotting multiple VM volumes at the same time. For us, it is more
> efficient to snapshot them all at once rather than snapshot VM Volumes
> individually and this seems like a more elegant solution than queueing the
> requests within our plugin.
>
> Base on my investigation, this should require:
> -Two additional API to be invoked from the UI
> -Two additional methods added to the Storage Subsystem API interface
> -Changes in between the API level and invoking the Storage Subsystem API
> implementations (I know this is broad and vague), mainly around the
> SnapshotManger/Impl
>
> There are a couple topics we would like discussion on:
> -Would this be beneficial/detrimental/neutral to other storage providers?
> -How should we handle the addition of new methods to the Storage Subsystem
> API interface? Default them to throw an UnsupportedOperationException?
> Default to calling the single VM volume version multiple times?
> -Does anyone see any issues with allowing multiple snapshots to be taken
> at the same time or letting storage providers have a list of all the
> requested volumes to backup?
>
> Please let me know if I've missed any major topics for discussion or if
> anything needs clarification.
>
> Thanks,
> Chris
> --
> Chris Suich
> chris.suich@netapp.com
> NetApp Software Engineer
> Data Center Platforms – Cloud Solutions
> Citrix, Cisco & Red Hat
>
>