You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Bhuvan Rawal <bh...@gmail.com> on 2016/06/16 22:51:20 UTC

Re: Backup strategy

Hi Vasu,

Planet Cassandra has a documentation page for basic info about migrating to
cassandra from MySQL. What to expect and what not to. It can be found here
<http://planetcassandra.org/mysql-to-cassandra-migration/>.

I had a look at this slide
<http://www.slideshare.net/planetcassandra/migration-best-practices-from-rdbms-to-cassandra-without-a-hitch>
a
while back. It provides a pretty reliable 4 Phase Sync strategy, starting
from Slide 31. Also the QA session of the talk is informative too -
http://www.doanduyhai.com/blog/?p=1757.

Best Regards,
Bhuvan

On Fri, Jun 17, 2016 at 4:03 AM, <va...@gmail.com> wrote:

> Hi ,
>
> I'm from relational world recently started working on Cassandra. I'm just
> wondering what is backup best practices for DB around 100 Tb with multi DC
> setup.
>
>
> Thanks,
> Vasu

Re: Backup strategy

Posted by Dennis Lovely <dl...@aegisco.com>.
Snapshot would flush your memtable to disk and you could stream your
sstables out.  Incremental backups would be the differences that have
occurred since your last snapshot as far as I'm aware.  Since it's
reasonably unfeasible to constantly stream out full snapshots (depending on
the density of your data on disk), incremental backups are a faster
approach to keeping a remote location synched with your sstable changes,
which would make it much more likely to succesfully restore to points in
time.

On Thu, Jun 16, 2016 at 4:35 PM, Rakesh Kumar <ra...@gmail.com>
wrote:

> On Thu, Jun 16, 2016 at 7:30 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
> > 2. Snapshotting : Hardlinks of sstables will get created. This is a very
> > fast process and latest data is captured into sstables after flushing
> > memtables, snapshots will be created in snapshots directory. But snapshot
> > does not provide you the feature to go back to a certain point in time
> but
> > incremental backups give you that feature.
>
> Does that mean that the only point-in-time recovery possible is using
> incremental backup. In other words C* does not have a concept of
> rolling forward commit logs to a point in time (like RDBMS do). Pls
> clarify.  thanks
>

Re: Backup strategy

Posted by Rakesh Kumar <ra...@gmail.com>.
On Thu, Jun 16, 2016 at 7:30 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
> 2. Snapshotting : Hardlinks of sstables will get created. This is a very
> fast process and latest data is captured into sstables after flushing
> memtables, snapshots will be created in snapshots directory. But snapshot
> does not provide you the feature to go back to a certain point in time but
> incremental backups give you that feature.

Does that mean that the only point-in-time recovery possible is using
incremental backup. In other words C* does not have a concept of
rolling forward commit logs to a point in time (like RDBMS do). Pls
clarify.  thanks

Re: Backup strategy

Posted by Dennis Lovely <dl...@aegisco.com>.
Periodic snapshots + incremental backups I think are pretty good in terms
of restoring to point in time.  But you must manage cleaning up your
snapshots + incremental backups on your own.  I believe that tablesnap (
https://github.com/JeremyGrosser/tablesnap) is a pretty decent approach in
terms of keeping your sstables, per node, synched to a location off of your
host (on S3 in fact).  Not sure how portable it is to other block storage
services however.  S3+Lifecycle policy to go to Glacier would likely be the
most cost effective for long term retention.

On Thu, Jun 16, 2016 at 4:30 PM, Bhuvan Rawal <bh...@gmail.com> wrote:

> Also if we talk about backup strategy for Cassandra Data then essentially
> there are couple of strategies that are adopted:
>
> 1. Incremental Backups. The old sstables will remain inside a backup
> directory and can be shipped to a storage location like AWS Glacier, etc.
> 2. Snapshotting : Hardlinks of sstables will get created. This is a very
> fast process and latest data is captured into sstables after flushing
> memtables, snapshots will be created in snapshots directory. But snapshot
> does not provide you the feature to go back to a certain point in time but
> incremental backups give you that feature.
>
> Depending on the use case, you can use 1 or 2 or both.
>
> On Fri, Jun 17, 2016 at 4:46 AM, Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> What kind of data are we talking here?
>> Is it time series data with infrequent updates and only inserts or
>> frequently updated data. How frequently is old data read. I ask this
>> because your Node size planning and Compaction Strategy will essentially
>> depend on these.
>>
>> I have known people go upto 3-5 TB per node if data is not updated
>> frequently.
>>
>> Regards,
>> Bhuvan
>>
>> On Fri, Jun 17, 2016 at 4:31 AM, <va...@gmail.com> wrote:
>>
>>> Bhuvan,
>>>
>>> Thanks for the info but actually I'm not looking for migration strategy.
>>> just want to backup strategy and retention policy best practices
>>>
>>> Thanks,
>>> Vasu
>>>
>>> On Jun 16, 2016, at 6:51 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
>>>
>>> Hi Vasu,
>>>
>>> Planet Cassandra has a documentation page for basic info about migrating
>>> to cassandra from MySQL. What to expect and what not to. It can be found
>>> here <http://planetcassandra.org/mysql-to-cassandra-migration/>.
>>>
>>> I had a look at this slide
>>> <http://www.slideshare.net/planetcassandra/migration-best-practices-from-rdbms-to-cassandra-without-a-hitch> a
>>> while back. It provides a pretty reliable 4 Phase Sync strategy, starting
>>> from Slide 31. Also the QA session of the talk is informative too -
>>> http://www.doanduyhai.com/blog/?p=1757.
>>>
>>> Best Regards,
>>> Bhuvan
>>>
>>> On Fri, Jun 17, 2016 at 4:03 AM, <va...@gmail.com> wrote:
>>>
>>>> Hi ,
>>>>
>>>> I'm from relational world recently started working on Cassandra. I'm
>>>> just wondering what is backup best practices for DB around 100 Tb with
>>>> multi DC setup.
>>>>
>>>>
>>>> Thanks,
>>>> Vasu
>>>
>>>
>>>
>>
>

Re: Backup strategy

Posted by Bhuvan Rawal <bh...@gmail.com>.
Also if we talk about backup strategy for Cassandra Data then essentially
there are couple of strategies that are adopted:

1. Incremental Backups. The old sstables will remain inside a backup
directory and can be shipped to a storage location like AWS Glacier, etc.
2. Snapshotting : Hardlinks of sstables will get created. This is a very
fast process and latest data is captured into sstables after flushing
memtables, snapshots will be created in snapshots directory. But snapshot
does not provide you the feature to go back to a certain point in time but
incremental backups give you that feature.

Depending on the use case, you can use 1 or 2 or both.

On Fri, Jun 17, 2016 at 4:46 AM, Bhuvan Rawal <bh...@gmail.com> wrote:

> What kind of data are we talking here?
> Is it time series data with infrequent updates and only inserts or
> frequently updated data. How frequently is old data read. I ask this
> because your Node size planning and Compaction Strategy will essentially
> depend on these.
>
> I have known people go upto 3-5 TB per node if data is not updated
> frequently.
>
> Regards,
> Bhuvan
>
> On Fri, Jun 17, 2016 at 4:31 AM, <va...@gmail.com> wrote:
>
>> Bhuvan,
>>
>> Thanks for the info but actually I'm not looking for migration strategy.
>> just want to backup strategy and retention policy best practices
>>
>> Thanks,
>> Vasu
>>
>> On Jun 16, 2016, at 6:51 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
>>
>> Hi Vasu,
>>
>> Planet Cassandra has a documentation page for basic info about migrating
>> to cassandra from MySQL. What to expect and what not to. It can be found
>> here <http://planetcassandra.org/mysql-to-cassandra-migration/>.
>>
>> I had a look at this slide
>> <http://www.slideshare.net/planetcassandra/migration-best-practices-from-rdbms-to-cassandra-without-a-hitch> a
>> while back. It provides a pretty reliable 4 Phase Sync strategy, starting
>> from Slide 31. Also the QA session of the talk is informative too -
>> http://www.doanduyhai.com/blog/?p=1757.
>>
>> Best Regards,
>> Bhuvan
>>
>> On Fri, Jun 17, 2016 at 4:03 AM, <va...@gmail.com> wrote:
>>
>>> Hi ,
>>>
>>> I'm from relational world recently started working on Cassandra. I'm
>>> just wondering what is backup best practices for DB around 100 Tb with
>>> multi DC setup.
>>>
>>>
>>> Thanks,
>>> Vasu
>>
>>
>>
>

Re: Backup strategy

Posted by Bhuvan Rawal <bh...@gmail.com>.
What kind of data are we talking here?
Is it time series data with infrequent updates and only inserts or
frequently updated data. How frequently is old data read. I ask this
because your Node size planning and Compaction Strategy will essentially
depend on these.

I have known people go upto 3-5 TB per node if data is not updated
frequently.

Regards,
Bhuvan

On Fri, Jun 17, 2016 at 4:31 AM, <va...@gmail.com> wrote:

> Bhuvan,
>
> Thanks for the info but actually I'm not looking for migration strategy.
> just want to backup strategy and retention policy best practices
>
> Thanks,
> Vasu
>
> On Jun 16, 2016, at 6:51 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
>
> Hi Vasu,
>
> Planet Cassandra has a documentation page for basic info about migrating
> to cassandra from MySQL. What to expect and what not to. It can be found
> here <http://planetcassandra.org/mysql-to-cassandra-migration/>.
>
> I had a look at this slide
> <http://www.slideshare.net/planetcassandra/migration-best-practices-from-rdbms-to-cassandra-without-a-hitch> a
> while back. It provides a pretty reliable 4 Phase Sync strategy, starting
> from Slide 31. Also the QA session of the talk is informative too -
> http://www.doanduyhai.com/blog/?p=1757.
>
> Best Regards,
> Bhuvan
>
> On Fri, Jun 17, 2016 at 4:03 AM, <va...@gmail.com> wrote:
>
>> Hi ,
>>
>> I'm from relational world recently started working on Cassandra. I'm just
>> wondering what is backup best practices for DB around 100 Tb with multi DC
>> setup.
>>
>>
>> Thanks,
>> Vasu
>
>
>

Re: Backup strategy

Posted by va...@gmail.com.
Bhuvan,

Thanks for the info but actually I'm not looking for migration strategy. just want to backup strategy and retention policy best practices 

Thanks,
Vasu

> On Jun 16, 2016, at 6:51 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
> 
> Hi Vasu,
> 
> Planet Cassandra has a documentation page for basic info about migrating to cassandra from MySQL. What to expect and what not to. It can be found here.
> 
> I had a look at this slide a while back. It provides a pretty reliable 4 Phase Sync strategy, starting from Slide 31. Also the QA session of the talk is informative too - http://www.doanduyhai.com/blog/?p=1757. 
> 
> Best Regards,
> Bhuvan
> 
>> On Fri, Jun 17, 2016 at 4:03 AM, <va...@gmail.com> wrote:
>> Hi ,
>> 
>> I'm from relational world recently started working on Cassandra. I'm just wondering what is backup best practices for DB around 100 Tb with multi DC setup.
>> 
>> 
>> Thanks,
>> Vasu
>