You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Satoshi Hikida <sa...@gmail.com> on 2016/07/14 06:54:33 UTC

What is the merit of incremental backup

Hi,

I want to know the actual advantage of using incremental backup.

I've read through the DataStax document and it says the merit of using
incremental backup is as follows:

- It allows storing backups offsite without transferring entire snapshots
- With incremental backups and snapshots, it can provide more recent RPO
(Recovery Point Objective)

Is my understanding correct? I would appreciate if someone gives me some
advice or correct me.

References:
- DataStax, "Enabling incremental backups",
http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsBackupIncremental.html

Regards,
Satoshi

Re: What is the merit of incremental backup

Posted by Satoshi Hikida <sa...@gmail.com>.
Hi Rajath,

Thank you for your reply.

But I'm not sure why the not compacted SSTables takes longer repair time.
Could you explain the reason more detail?

I guess that SSTables which backed up in backups directory will be never
repaired so that It must exchange merkle tree and update actual data to
repair data consistency during the repair process. And It takes longer time
than simply transferring the data that repairing node should be stored from
existing nodes to repairing node.

I also think it is one of the demerit of using incremental backup, how
would you think?

Regards,
Satoshi


On Sat, Jul 16, 2016 at 3:03 AM, Rajath Subramanyam <ra...@gmail.com>
wrote:

> Hi Satoshi,
>
> Incremental Backup if set to True, copies SSTables to the backup folder as
> soon as a SSTable is flushed to disk. Hence these backed up SSTables miss
> out on the opportunity to go through compaction. Does that explain the
> longer time ?
>
> - Rajath
>
> ------------------------
> Rajath Subramanyam
>
>
> On Fri, Jul 15, 2016 at 12:20 AM, Satoshi Hikida <sa...@gmail.com>
> wrote:
>
>> Hi Prasenjit
>>
>> Thank you for your reply.
>>
>> However, I doubt that incremental backup can reduce RTO. I think the
>> demerit of incremental backup is to take longer repair time rather than
>> without incremental backup.
>>
>> Because I've compared the repair time of two cases like below.
>>
>> (a) snapshot(10GB, full repaired) + incremental backup(1GB)
>> (b) snapshot(10GB, full repaired)
>>
>> Each case consists of 3 node cluster, replication factor is 3 and total
>> data size is 12GB/node. And we assume one node got failure then we restore
>> the node. The result showed that case (b) is faster than case (a). The
>> repair process of the token ranges included in incremental backup was very
>> slow. However, the just transferring replicated data from existing nodes to
>> repairing node is faster than repair.
>>
>> So far, I think Pros and Cons of incremental back is as following:
>>
>> - Pros (There are already agreed by you)
>> - It allows storing backups offsite without transferring entire snapshots
>> - With incremental backups and snapshots, it can provide more recent RPO
>> (Recovery Point Objective)
>> - Cons
>> - It takes much longer repair time rather than without incremental backup
>> (longer RTO)
>>
>>
>> Is it correct understand? I would appreciate you can give me any advice
>> or ideas if I was misunderstanding.
>>
>>
>> Regards,
>> Satoshi
>>
>>
>> On Fri, Jul 15, 2016 at 1:46 AM, Prasenjit Sarkar <
>> prasenjit.sarkar@datos.io> wrote:
>>
>>> Hi Satoshi
>>>
>>> You are correct that incremental backups offer you the opportunity to
>>> reduce the amount of data you need to transfer offsite. On the recovery
>>> path, you need to piece together the full backup and subsequent incremental
>>> backups.
>>>
>>> However, where incremental backups help is with respect to the RTO due
>>> to the data reduction effect you mentioned. The RPO can be reduced only if
>>> you take more frequent incremental backups than full backups.
>>>
>>> Hope this helps,
>>> Prasenjit
>>>
>>> On Wed, Jul 13, 2016 at 11:54 PM, Satoshi Hikida <sa...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to know the actual advantage of using incremental backup.
>>>>
>>>> I've read through the DataStax document and it says the merit of using
>>>> incremental backup is as follows:
>>>>
>>>> - It allows storing backups offsite without transferring entire
>>>> snapshots
>>>> - With incremental backups and snapshots, it can provide more recent
>>>> RPO (Recovery Point Objective)
>>>>
>>>> Is my understanding correct? I would appreciate if someone gives me
>>>> some advice or correct me.
>>>>
>>>> References:
>>>> - DataStax, "Enabling incremental backups",
>>>> http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsBackupIncremental.html
>>>>
>>>> Regards,
>>>> Satoshi
>>>>
>>>
>>>
>>
>

Re: What is the merit of incremental backup

Posted by Rajath Subramanyam <ra...@gmail.com>.
Hi Satoshi,

Incremental Backup if set to True, copies SSTables to the backup folder as
soon as a SSTable is flushed to disk. Hence these backed up SSTables miss
out on the opportunity to go through compaction. Does that explain the
longer time ?

- Rajath

------------------------
Rajath Subramanyam


On Fri, Jul 15, 2016 at 12:20 AM, Satoshi Hikida <sa...@gmail.com> wrote:

> Hi Prasenjit
>
> Thank you for your reply.
>
> However, I doubt that incremental backup can reduce RTO. I think the
> demerit of incremental backup is to take longer repair time rather than
> without incremental backup.
>
> Because I've compared the repair time of two cases like below.
>
> (a) snapshot(10GB, full repaired) + incremental backup(1GB)
> (b) snapshot(10GB, full repaired)
>
> Each case consists of 3 node cluster, replication factor is 3 and total
> data size is 12GB/node. And we assume one node got failure then we restore
> the node. The result showed that case (b) is faster than case (a). The
> repair process of the token ranges included in incremental backup was very
> slow. However, the just transferring replicated data from existing nodes to
> repairing node is faster than repair.
>
> So far, I think Pros and Cons of incremental back is as following:
>
> - Pros (There are already agreed by you)
> - It allows storing backups offsite without transferring entire snapshots
> - With incremental backups and snapshots, it can provide more recent RPO
> (Recovery Point Objective)
> - Cons
> - It takes much longer repair time rather than without incremental backup
> (longer RTO)
>
>
> Is it correct understand? I would appreciate you can give me any advice or
> ideas if I was misunderstanding.
>
>
> Regards,
> Satoshi
>
>
> On Fri, Jul 15, 2016 at 1:46 AM, Prasenjit Sarkar <
> prasenjit.sarkar@datos.io> wrote:
>
>> Hi Satoshi
>>
>> You are correct that incremental backups offer you the opportunity to
>> reduce the amount of data you need to transfer offsite. On the recovery
>> path, you need to piece together the full backup and subsequent incremental
>> backups.
>>
>> However, where incremental backups help is with respect to the RTO due to
>> the data reduction effect you mentioned. The RPO can be reduced only if you
>> take more frequent incremental backups than full backups.
>>
>> Hope this helps,
>> Prasenjit
>>
>> On Wed, Jul 13, 2016 at 11:54 PM, Satoshi Hikida <sa...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I want to know the actual advantage of using incremental backup.
>>>
>>> I've read through the DataStax document and it says the merit of using
>>> incremental backup is as follows:
>>>
>>> - It allows storing backups offsite without transferring entire snapshots
>>> - With incremental backups and snapshots, it can provide more recent RPO
>>> (Recovery Point Objective)
>>>
>>> Is my understanding correct? I would appreciate if someone gives me some
>>> advice or correct me.
>>>
>>> References:
>>> - DataStax, "Enabling incremental backups",
>>> http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsBackupIncremental.html
>>>
>>> Regards,
>>> Satoshi
>>>
>>
>>
>

Re: What is the merit of incremental backup

Posted by Satoshi Hikida <sa...@gmail.com>.
Hi Prasenjit

Thank you for your reply.

However, I doubt that incremental backup can reduce RTO. I think the
demerit of incremental backup is to take longer repair time rather than
without incremental backup.

Because I've compared the repair time of two cases like below.

(a) snapshot(10GB, full repaired) + incremental backup(1GB)
(b) snapshot(10GB, full repaired)

Each case consists of 3 node cluster, replication factor is 3 and total
data size is 12GB/node. And we assume one node got failure then we restore
the node. The result showed that case (b) is faster than case (a). The
repair process of the token ranges included in incremental backup was very
slow. However, the just transferring replicated data from existing nodes to
repairing node is faster than repair.

So far, I think Pros and Cons of incremental back is as following:

- Pros (There are already agreed by you)
- It allows storing backups offsite without transferring entire snapshots
- With incremental backups and snapshots, it can provide more recent RPO
(Recovery Point Objective)
- Cons
- It takes much longer repair time rather than without incremental backup
(longer RTO)


Is it correct understand? I would appreciate you can give me any advice or
ideas if I was misunderstanding.


Regards,
Satoshi


On Fri, Jul 15, 2016 at 1:46 AM, Prasenjit Sarkar <prasenjit.sarkar@datos.io
> wrote:

> Hi Satoshi
>
> You are correct that incremental backups offer you the opportunity to
> reduce the amount of data you need to transfer offsite. On the recovery
> path, you need to piece together the full backup and subsequent incremental
> backups.
>
> However, where incremental backups help is with respect to the RTO due to
> the data reduction effect you mentioned. The RPO can be reduced only if you
> take more frequent incremental backups than full backups.
>
> Hope this helps,
> Prasenjit
>
> On Wed, Jul 13, 2016 at 11:54 PM, Satoshi Hikida <sa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I want to know the actual advantage of using incremental backup.
>>
>> I've read through the DataStax document and it says the merit of using
>> incremental backup is as follows:
>>
>> - It allows storing backups offsite without transferring entire snapshots
>> - With incremental backups and snapshots, it can provide more recent RPO
>> (Recovery Point Objective)
>>
>> Is my understanding correct? I would appreciate if someone gives me some
>> advice or correct me.
>>
>> References:
>> - DataStax, "Enabling incremental backups",
>> http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsBackupIncremental.html
>>
>> Regards,
>> Satoshi
>>
>
>

Re: What is the merit of incremental backup

Posted by Prasenjit Sarkar <pr...@datos.io>.
Hi Satoshi

You are correct that incremental backups offer you the opportunity to
reduce the amount of data you need to transfer offsite. On the recovery
path, you need to piece together the full backup and subsequent incremental
backups.

However, where incremental backups help is with respect to the RTO due to
the data reduction effect you mentioned. The RPO can be reduced only if you
take more frequent incremental backups than full backups.

Hope this helps,
Prasenjit

On Wed, Jul 13, 2016 at 11:54 PM, Satoshi Hikida <sa...@gmail.com> wrote:

> Hi,
>
> I want to know the actual advantage of using incremental backup.
>
> I've read through the DataStax document and it says the merit of using
> incremental backup is as follows:
>
> - It allows storing backups offsite without transferring entire snapshots
> - With incremental backups and snapshots, it can provide more recent RPO
> (Recovery Point Objective)
>
> Is my understanding correct? I would appreciate if someone gives me some
> advice or correct me.
>
> References:
> - DataStax, "Enabling incremental backups",
> http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsBackupIncremental.html
>
> Regards,
> Satoshi
>