You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Raul D'Opazo <Ra...@software.dell.com> on 2015/10/20 13:22:53 UTC

Hiper-V snapshot and Cassandra

Hi,
I am really new with Cassandra and i have some questions regarding the backup of Cassandra with TB of info. So please, forgive me if I ask a noob question.
I only have one node, in one server (Windows 2012), and Cassandra will grow up to 4TB approx. It is a hiper-v virtual machine, with enough resources.
I have done snapshots and it is ok, because we don't double the size in each snapshot, but I need to have other solution in case of disks problems.
Copying these snapshots using other backup systems is crazy, approx.. 500MB/s it will last days.
I am thinking if hiper-v virtual machine snapshots can be used to recover Cassandra in a consistence way. Is it possible?
This will avoid me to copy snapshots to other network location or backup system.
Thanks,
Raul


RE: Hiper-V snapshot and Cassandra

Posted by Raul D'Opazo <Ra...@software.dell.com>.
Hi, so what is the usual thing to take care of backups:

-        Take Cassandra snapshots

-        Take this snapshots to a backup system
?


From: Robert Coli [mailto:rcoli@eventbrite.com]
Sent: Tuesday, October 20, 2015 5:07 PM
To: user@cassandra.apache.org
Subject: Re: Hiper-V snapshot and Cassandra

On Tue, Oct 20, 2015 at 4:22 AM, Raul D'Opazo <Ra...@software.dell.com>> wrote:
I only have one node, in one server (Windows 2012), and Cassandra will grow up to 4TB approx. It is a hiper-v virtual machine, with enough resources.

This is an extremely unusual and probably degenerate use of Cassandra.

I have done snapshots and it is ok, because we don’t double the size in each snapshot, but I need to have other solution in case of disks problems.

I have no idea how snapshots work in Windows; if like linux, each snapshot is hard links to the actual data files.

I am thinking if hiper-v virtual machine snapshots can be used to recover Cassandra in a consistence way. Is it possible?

Sure? If you quiesce writes to the system or if you don't care about the delta in the commit log between snapshot+hiper-v snapshot, your snapshot will contain all the immutable data files you need to restore.

Finally, I re-iterate my confusion at why you wish to do this unusual thing?

=Rob


Re: Hiper-V snapshot and Cassandra

Posted by Robert Coli <rc...@eventbrite.com>.
On Tue, Oct 20, 2015 at 4:22 AM, Raul D'Opazo <Raul.DOpazo@software.dell.com
> wrote:

> I only have one node, in one server (Windows 2012), and Cassandra will
> grow up to 4TB approx. It is a hiper-v virtual machine, with enough
> resources.
>
>
This is an extremely unusual and probably degenerate use of Cassandra.


> I have done snapshots and it is ok, because we don’t double the size in
> each snapshot, but I need to have other solution in case of disks problems.
>

I have no idea how snapshots work in Windows; if like linux, each snapshot
is hard links to the actual data files.


> I am thinking if hiper-v virtual machine snapshots can be used to recover
> Cassandra in a consistence way. Is it possible?
>
>
Sure? If you quiesce writes to the system or if you don't care about the
delta in the commit log between snapshot+hiper-v snapshot, your snapshot
will contain all the immutable data files you need to restore.

Finally, I re-iterate my confusion at why you wish to do this unusual thing?

=Rob

Re: Hiper-V snapshot and Cassandra

Posted by Jeff Jirsa <je...@crowdstrike.com>.
As long as your hyper-v/vss snapshots include both the data directory and the commit log directory, then they’re exactly as good as tolerating a single power outage – you should be able to load the sstables and replay  commit log and be fine. 

Assuming you’re moving the hyper-v/vss snapshot to another host (using DPM or similar), it’s probably going to work the way you expect.

You’ll note, however, the cassandra was designed to do the opposite of what you’re doing – rather than having one monolithic database that’s scaled up, the canonical use case for cassandra is to have a number of smaller databases, so you still get the same capacity and throughput, but you also get high availability and fault tolerance. It may be worth noting (as Mr. Coli suggested) that you’re using cassandra in an atypical fashion, and if you add more smaller nodes, then you’ll gain performance, gain HA, gain capacity, and that moving snapshots will be faster because there’s less data per system.


From:  Raul D'Opazo
Reply-To:  "user@cassandra.apache.org"
Date:  Tuesday, October 20, 2015 at 4:22 AM
To:  "user@cassandra.apache.org"
Subject:  Hiper-V snapshot and Cassandra

Hi,

I am really new with Cassandra and i have some questions regarding the backup of Cassandra with TB of info. So please, forgive me if I ask a noob question. 

I only have one node, in one server (Windows 2012), and Cassandra will grow up to 4TB approx. It is a hiper-v virtual machine, with enough resources.

I have done snapshots and it is ok, because we don’t double the size in each snapshot, but I need to have other solution in case of disks problems. 

Copying these snapshots using other backup systems is crazy, approx.. 500MB/s it will last days.

I am thinking if hiper-v virtual machine snapshots can be used to recover Cassandra in a consistence way. Is it possible?

This will avoid me to copy snapshots to other network location or backup system.

Thanks,

Raul