You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ngoc Minh VO <ng...@bnpparibas.com> on 2014/11/18 15:50:59 UTC

Cassandra backup via snapshots in production

Hello all,

We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …).

We are thinking of:

-          Backup: add a 2TB HDD on each node for C* daily/weekly snapshots.

-          Restore: load the most recent snapshots or latest “non-corrupted” ones and replay missing data imports from other data source.

We would like to know if somebody are using Cassandra’s backup feature in production and could share your experience with us.

Your help would be greatly appreciated.
Best regards,
Minh


This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the internet 
cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed or falsified. 
Do not print this message unless it is necessary,consider the environment.

----------------------------------------------------------------------------------------------------------------------------------

Ce message et toutes les pieces jointes (ci-apres le "message") 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.

Re: Cassandra backup via snapshots in production

Posted by Robert Coli <rc...@eventbrite.com>.
On Tue, Nov 18, 2014 at 6:50 AM, Ngoc Minh VO <ng...@bnpparibas.com>
wrote:

>   We are looking for a solution to backup data in our C* cluster (v2.0.x,
> 16 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
>
> The main purpose is to protect us from human errors (eg. unexpected
> manipulations: delete, drop tables, …).
>

https://github.com/JeremyGrosser/tablesnap

=Rob

RE: Cassandra backup via snapshots in production

Posted by Ngoc Minh VO <ng...@bnpparibas.com>.
Thanks a lot for your answers!

What we plan to do is:

-          auto_snapshot = true

-          if the human errors happened on D-5:

o   we will bring the cluster offline

o   purge all data

o   import snapshots prior D-5 (and delete snapshots after D-5)

o   upload all missing data between D-5 and D

o   bring the cluster online

Do you think it would work?

From: Jens Rantil [mailto:jens.rantil@tink.se]
Sent: mardi 25 novembre 2014 10:03
To: user@cassandra.apache.org
Subject: Re: Cassandra backup via snapshots in production

> Truncate does trigger snapshot creation though

Doesn’t it? With “auto_snapshot: true” it should.

——— Jens Rantil Backend engineer Tink AB Email: jens.rantil@tink.se<ma...@tink.se> Phone: +46 708 84 18 32 Web: www.tink.se<http://www.tink.se> Facebook Linkedin Twitter


On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan <do...@gmail.com>> wrote:

True

Delete in CQL just create tombstone so from the storage engine pov it's just adding some physical columns

Truncate does trigger snapshot creation though
Le 21 nov. 2014 19:29, "Robert Coli" <rc...@eventbrite.com>> a écrit :
On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil <je...@tink.se>> wrote:
> The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …).

If that is the main purpose, having "auto_snapshot: true” in cassandra.yaml will be enough to protect you.

OP includes "delete" in their list of "unexpected manipulations", and auto_snapshot: true will not protect you in any way from DELETE.

=Rob
http://twitter.com/rcolidba



This message and any attachments (the "message") is
intended solely for the intended addressees and is confidential. 
If you receive this message in error,or are not the intended recipient(s), 
please delete it and any copies from your systems and immediately notify
the sender. Any unauthorized view, use that does not comply with its purpose, 
dissemination or disclosure, either whole or partial, is prohibited. Since the internet 
cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS 
(and its subsidiaries) shall not be liable for the message if modified, changed or falsified. 
Do not print this message unless it is necessary,consider the environment.

----------------------------------------------------------------------------------------------------------------------------------

Ce message et toutes les pieces jointes (ci-apres le "message") 
sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer
l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
(et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese
ou il aurait ete modifie, deforme ou falsifie. 
N'imprimez ce message que si necessaire, pensez a l'environnement.

Re: Cassandra backup via snapshots in production

Posted by Jens Rantil <je...@tink.se>.
> Truncate does trigger snapshot creation though




Doesn’t it? With “auto_snapshot: true” it should.




———
Jens Rantil
Backend engineer
Tink AB

Email: jens.rantil@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Nov 25, 2014 at 9:21 AM, DuyHai Doan <do...@gmail.com> wrote:

> True
> Delete in CQL just create tombstone so from the storage engine pov it's
> just adding some physical columns
> Truncate does trigger snapshot creation though
> Le 21 nov. 2014 19:29, "Robert Coli" <rc...@eventbrite.com> a écrit :
>> On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil <je...@tink.se> wrote:
>>
>>> > The main purpose is to protect us from human errors (eg. unexpected
>>> manipulations: delete, drop tables, …).
>>>
>>> If that is the main purpose, having "auto_snapshot: true” in
>>> cassandra.yaml will be enough to protect you.
>>>
>>
>> OP includes "delete" in their list of "unexpected manipulations", and
>> auto_snapshot: true will not protect you in any way from DELETE.
>>
>> =Rob
>> http://twitter.com/rcolidba
>>

Re: Cassandra backup via snapshots in production

Posted by DuyHai Doan <do...@gmail.com>.
True

Delete in CQL just create tombstone so from the storage engine pov it's
just adding some physical columns

Truncate does trigger snapshot creation though
Le 21 nov. 2014 19:29, "Robert Coli" <rc...@eventbrite.com> a écrit :

> On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil <je...@tink.se> wrote:
>
>> > The main purpose is to protect us from human errors (eg. unexpected
>> manipulations: delete, drop tables, …).
>>
>> If that is the main purpose, having "auto_snapshot: true” in
>> cassandra.yaml will be enough to protect you.
>>
>
> OP includes "delete" in their list of "unexpected manipulations", and
> auto_snapshot: true will not protect you in any way from DELETE.
>
> =Rob
> http://twitter.com/rcolidba
>

Re: Cassandra backup via snapshots in production

Posted by Robert Coli <rc...@eventbrite.com>.
On Fri, Nov 21, 2014 at 8:40 AM, Jens Rantil <je...@tink.se> wrote:

> > The main purpose is to protect us from human errors (eg. unexpected
> manipulations: delete, drop tables, …).
>
> If that is the main purpose, having "auto_snapshot: true” in
> cassandra.yaml will be enough to protect you.
>

OP includes "delete" in their list of "unexpected manipulations", and
auto_snapshot: true will not protect you in any way from DELETE.

=Rob
http://twitter.com/rcolidba

Re: Cassandra backup via snapshots in production

Posted by Jens Rantil <je...@tink.se>.
> The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …).




If that is the main purpose, having "auto_snapshot: true” in cassandra.yaml will be enough to protect you.




Regarding backup, I have a small script that creates a named snapshot and for each sstable; encrypts, uploads to S3 and deletes the snapshotted sstable. It took me an hour to write and roll out to all our nodes. The whole process is currently logged, but eventually I will also send an e-mail if backup fails.


———
Jens Rantil
Backend engineer
Tink AB

Email: jens.rantil@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook Linkedin Twitter

On Tue, Nov 18, 2014 at 3:52 PM, Ngoc Minh VO <ng...@bnpparibas.com>
wrote:

> Hello all,
> We are looking for a solution to backup data in our C* cluster (v2.0.x, 16 nodes, 4 x 500GB SSD, RF = 6 over 2 datacenters).
> The main purpose is to protect us from human errors (eg. unexpected manipulations: delete, drop tables, …).
> We are thinking of:
> -          Backup: add a 2TB HDD on each node for C* daily/weekly snapshots.
> -          Restore: load the most recent snapshots or latest “non-corrupted” ones and replay missing data imports from other data source.
> We would like to know if somebody are using Cassandra’s backup feature in production and could share your experience with us.
> Your help would be greatly appreciated.
> Best regards,
> Minh
> This message and any attachments (the "message") is
> intended solely for the intended addressees and is confidential. 
> If you receive this message in error,or are not the intended recipient(s), 
> please delete it and any copies from your systems and immediately notify
> the sender. Any unauthorized view, use that does not comply with its purpose, 
> dissemination or disclosure, either whole or partial, is prohibited. Since the internet 
> cannot guarantee the integrity of this message which may not be reliable, BNP PARIBAS 
> (and its subsidiaries) shall not be liable for the message if modified, changed or falsified. 
> Do not print this message unless it is necessary,consider the environment.
> ----------------------------------------------------------------------------------------------------------------------------------
> Ce message et toutes les pieces jointes (ci-apres le "message") 
> sont etablis a l'intention exclusive de ses destinataires et sont confidentiels.
> Si vous recevez ce message par erreur ou s'il ne vous est pas destine,
> merci de le detruire ainsi que toute copie de votre systeme et d'en avertir
> immediatement l'expediteur. Toute lecture non autorisee, toute utilisation de 
> ce message qui n'est pas conforme a sa destination, toute diffusion ou toute 
> publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer
> l'integrite de ce message electronique susceptible d'alteration, BNP Paribas 
> (et ses filiales) decline(nt) toute responsabilite au titre de ce message dans l'hypothese
> ou il aurait ete modifie, deforme ou falsifie. 
> N'imprimez ce message que si necessaire, pensez a l'environnement.