You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by "SuichII, Christopher" <Ch...@netapp.com> on 2013/10/25 20:37:02 UTC
[DISCUSS] Scalable Backup and Recovery

I’d like to revisit this topic focusing on a UI-only solution. I believe a reasonable approach is to allow users to select multiple volumes or vms (with the new multi-select UI widget) to perform snapshot operations. With this, we would kick off a snapshot API request per volume/vm that was selected so the UI would display the status and result of each request individually rather than lumping all the requests together and only being able to display a success or failure for the entire selection. Consequently, we would only need to make UI changes.

To minimize the effort, we could tie directly into the existing filtering features of the volumes & vms pages. This would allow users to select volumes & vms for backup by the following criteria:
-Name
-Zone
-Domain (if admin)
-Account (if admin)
-Tag key
-Tag value

I understand one lingering concern some people had was the user’s belief that the snapshots would all be consistent amongst each other. I’d like to hear if this is still a concern with this approach and if so, if there are any ideas of how to express to the user how the operation would actually be performed.

-Chris
--
Chris Suich
chris.suich@netapp.com<ma...@netapp.com>
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Sep 30, 2013, at 8:37 AM, SuichII, Christopher <Ch...@netapp.com>> wrote:

See responses inline.
--
Chris Suich
chris.suich@netapp.com<ma...@netapp.com>
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Sep 29, 2013, at 10:22 AM, kelcey@backbonetechnology.com<ma...@backbonetechnology.com> wrote:

Even as a simple start it would be nice to have the option to back up/snapshot a VM (which means all volumes as a collection) and not just per volume. If I have 200 VMs with 3 volumes each, that's 600 snapshots and 400 attach disk actions. Now scale that out to your average production environment running 1000s of VMs.

This becomes a logistical recovery nightmare.

We should have some kind of routine that allows snapping multiple volumes as a collection so those same volumes can be restored as a collection.

This is a really cool use case as well, I can definitely get behind it as well.


Just a thought.

Thanks,

-Kelcey

Sent from my HTC

----- Reply message -----
From: "Darren Shepherd" <da...@gmail.com>>
To: "dev@cloudstack.apache.org<ma...@cloudstack.apache.org>" <de...@cloudstack.apache.org>>
Subject: Scalable Backup and Recovery
Date: Sat, Sep 28, 2013 9:22 PM

Based on your use cases it sounds like what your asking for is the
ability to create a selection criteria for scheduled snapshots.  So as
long as your VM/Volume matches that criteria it will be backed up at
some given time.  I think that would be useful because a user could
say something like "all volumes in network 'production' should be
backed up every night."

Selection criteria is an interesting option. It is not necessarily what I was proposing, but that would be a cool solution depending on the flexibility of the criteria.


So if such a thing was implemented it seems like if the storage
provider implemented some capability to handle multiple snapshots at
once, then it could be passed all volumes at once and it would do
something intelligent.

This is along the lines of what I was talking about in my other email - sometimes it is just easer to do things together, not one by one.


Now reading between the lines, it seems like you're looking for some
use case to exploit some functionality in netapp.  I'll tell you what
I'd like to see implemented.  Having ran netapp in the past with CS we
ran into this conundrum.  We would take snapshots on the filter, but
in reality they were pretty useless.  If you ever needed to rollback
to the snapshot, your screwed.  Things have changed since the
snapshot, VMs were created, deleted, VM snapshots had occurred, etc.
So if you rollback the metadata in CS is completely out of sync now.
What I think would be a great feature is to be able to do StoragePool
snapshots.  Now doing the snapshot is simple, the really complex thing
is how to implement the "StoragePool revert to snapshot"
functionality.  If somebody could do that, that would be awesome.

Without getting into much detail, we're actually not trying to exploit anything too NetApp specific. In fact, we're not really leveraging NetApp volume snapshots for B&R for the reason you listed. If you try to recover the whole volume, you're potentially blowing away data that was created between clicking the button and when the snapshot was created.

Again, that seems like a vendor specific thing - something we should add to the storage subsystem interface and let storage providers implement it if they can.


Darren

On Sat, Sep 28, 2013 at 5:41 PM, SuichII, Christopher
<Ch...@netapp.com>> wrote:
Well, yes, in part. By scalable I mean that if CloudStack is expected to be able to manage such a large number of vms, it should be able to backup and recover those vms with minimal effort. Doing things one at a time does not necessarily scale well when you're talking about a cloud infrastructure.

Also certain hypervisors have various quirks which stand in the way of an
efficient solution.

I absolutely agree. This is where the storage subsystem API comes in. Creating backups for some storage providers can be much faster, easier and more efficient than hypervisor. As the storage subsystem API gains more traction and true backup and recovery becomes available, I think we'll begin to see people asking why things must be done one at a time. The use cases I listed below would help us get ahead of the curve and have these features I predict people will be asking for (and it sounds like Kelcey is asking for it now!).

--
Chris Suich
chris.suich@netapp.com<ma...@netapp.com>
NetApp Software Engineer
Data Center Platforms – Cloud Solutions
Citrix, Cisco & Red Hat

On Sep 27, 2013, at 6:38 PM, Chiradeep Vittal <Ch...@citrix.com> wrote:

Ah I see. You mean a "scalable user experience".

The actual scalability of the snapshot process itself is limited by
available disk and network bandwidth.
Also certain hypervisors have various quirks which stand in the way of an
efficient solution.

On 9/27/13 10:27 AM, "SuichII, Christopher" <Ch...@netapp.com> wrote:

I'd like to start a discussion around the direction of scalable backup
and recovery in CloudStack. Currently, the only want to backup and
recover vms is by setting up a schedule or manually snapshotting up
individual vm disks or manually snapshotting vms. Unfortunately, I don't
believe this is a very scalable solution. What if a user wants all of
their vm disks to be backed up on the same schedule? What if a domain
administrator wants all of the vms in their domain to be backed up on the
same schedule or to manually backup every vm in their domain?

Here are some use cases I see for helping to scale things up:
-Scheduled and manual backup of 1 to all of a user's vms and vm disks
-Scheduled and manual backup of 1 to all of a domain's vms and vm disks
(by a domain admin)
-Scheduled and manual backup of 1 to all vms and vm disks on primary
storage (by a cloud admin) - this one is tougher to find a valid use case
for
-Backup schedules attached to service offerings

I know I previously started a discussion about backing up multiple vm
disks at once, but I think these use cases, broken down by user type
(user, domain admin and admin), should help clear things up and show the
utility of being able to backup multiple objects at once.

Thanks!
Chris
--
Chris Suich
chris.suich@netapp.com
NetApp Software Engineer
Data Center Platforms  Cloud Solutions
Citrix, Cisco & Red Hat