You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Simon Schwichtenberg <SS...@dspace.de> on 2021/05/07 09:27:15 UTC

Data backup in CouchDB cluster

Hi,

I wonder how you'd do backups of your data in a CouchDB cluster. The documentation does not mention backups of clusters explicitly (https://docs.couchdb.org/en/latest/maintenance/backups.html#database-backups).

When you have a cluster of three nodes and the nodes are set to n=3 and q=2 (see https://docs.couchdb.org/en/latest/cluster/sharding.html), I'd expect that every single node in the cluster has all the data and you can copy the .couch files from any of these three nodes. When you have 6 nodes with n=3 and q=2 this approach does not work anymore because every node has just a single shard. Please correct me if I am wrong.

What is best practice to backup a cluster?

This message is a follow-up from here: https://github.com/cloudant/couchbackup/issues/349

Thanks,
Simon

Re: Data backup in CouchDB cluster

Posted by Jan Lehnardt <ja...@apache.org>.
Hi Simon,

there are multiple aspects to backup, data safety and
time to recovery (TTR).

If your only goal is to have a separate copy of your
data, backing up only one node of your database does
the trick.

However, if you want a short TTR, so your cluster is
complete again as soon as possible, you want a backup of
each of your nodes, so you can replay your backup at
any time. Each node stores the data slightly differently
on a physical level, which means restoring data from
node A’s backup to node B is not trivial. While logically,
this means you have three backups when you might only
want one.

If TTR is not a concern, you can just back up a single
node to make sure you can restore that if needed, but rely
on CouchDB internal backfill, if a node is replaced after a
failure without data being added from a backup. This will
take longer than restoring a node from its own backup.

Best
Jan
— 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

24/7 Observation for your CouchDB Instances:
https://opservatory.app


> On 7. May 2021, at 11:27, Simon Schwichtenberg <SS...@dspace.de> wrote:
> 
> Hi,
> 
> I wonder how you'd do backups of your data in a CouchDB cluster. The documentation does not mention backups of clusters explicitly (https://docs.couchdb.org/en/latest/maintenance/backups.html#database-backups).
> 
> When you have a cluster of three nodes and the nodes are set to n=3 and q=2 (see https://docs.couchdb.org/en/latest/cluster/sharding.html), I'd expect that every single node in the cluster has all the data and you can copy the .couch files from any of these three nodes. When you have 6 nodes with n=3 and q=2 this approach does not work anymore because every node has just a single shard. Please correct me if I am wrong.
> 
> What is best practice to backup a cluster?
> 
> This message is a follow-up from here: https://github.com/cloudant/couchbackup/issues/349
> 
> Thanks,
> Simon


Re: Data backup in CouchDB cluster

Posted by Andrea Brancatelli <ab...@schema31.it.INVALID>.
I don't think so, but never tried it.

---

Andrea Brancatelli

On 2021-05-07 12:38, Willem van der Westhuizen wrote:

> This looks very interesting, is it possible to do incremental backups with this approach, from a given seqence number in the db only?
> 
> Willem
> 
> On 2021/05/07 11:52, Andrea Brancatelli wrote: We're using this:
> 
> https://github.com/danielebailo/couchdb-dump
> 
> since a few years.
> 
> It almost always worked flawlessly. It's fast, and, to me, it's better
> than backing up the .couch files for various reasons:
> 
> * you can restore datas on a cluster with a different N/Q layout
> * you can restore datas on a different machine with a different
> cluster name / different IP / different whatever ... .couch files
> include references to vm.args parameters.
> * when you restore the DB you get a clean db without the tombstones.
> * you can backup the db without having local access to the machine,
> passing trough the standard HTTP port.
> 
> Hope it helps you.
> 
> ---
> 
> Andrea Brancatelli
> 
> On 2021-05-07 11:27, Simon Schwichtenberg wrote:
> 
> Hi,
> 
> I wonder how you'd do backups of your data in a CouchDB cluster. The documentation does not mention backups of clusters explicitly (https://docs.couchdb.org/en/latest/maintenance/backups.html#database-backups).
> 
> When you have a cluster of three nodes and the nodes are set to n=3 and q=2 (see https://docs.couchdb.org/en/latest/cluster/sharding.html), I'd expect that every single node in the cluster has all the data and you can copy the .couch files from any of these three nodes. When you have 6 nodes with n=3 and q=2 this approach does not work anymore because every node has just a single shard. Please correct me if I am wrong.
> 
> What is best practice to backup a cluster?
> 
> This message is a follow-up from here: https://github.com/cloudant/couchbackup/issues/349
> 
> Thanks,
> Simon

Re: Data backup in CouchDB cluster

Posted by Willem van der Westhuizen <wi...@kwantu.net>.
This looks very interesting, is it possible to do incremental backups 
with this approach, from a given seqence number in the db only?

Willem

On 2021/05/07 11:52, Andrea Brancatelli wrote:
> We're using this:
>
> https://github.com/danielebailo/couchdb-dump
>
> since a few years.
>
> It almost always worked flawlessly. It's fast, and, to me, it's better
> than backing up the .couch files for various reasons:
>
>   	* you can restore datas on a cluster with a different N/Q layout
>   	* you can restore datas on a different machine with a different
> cluster name / different IP / different whatever ... .couch files
> include references to vm.args parameters.
>   	* when you restore the DB you get a clean db without the tombstones.
>   	* you can backup the db without having local access to the machine,
> passing trough the standard HTTP port.
>
> Hope it helps you.
>
> ---
>
> Andrea Brancatelli
>
> On 2021-05-07 11:27, Simon Schwichtenberg wrote:
>
>> Hi,
>>
>> I wonder how you'd do backups of your data in a CouchDB cluster. The documentation does not mention backups of clusters explicitly (https://docs.couchdb.org/en/latest/maintenance/backups.html#database-backups).
>>
>> When you have a cluster of three nodes and the nodes are set to n=3 and q=2 (see https://docs.couchdb.org/en/latest/cluster/sharding.html), I'd expect that every single node in the cluster has all the data and you can copy the .couch files from any of these three nodes. When you have 6 nodes with n=3 and q=2 this approach does not work anymore because every node has just a single shard. Please correct me if I am wrong.
>>
>> What is best practice to backup a cluster?
>>
>> This message is a follow-up from here: https://github.com/cloudant/couchbackup/issues/349
>>
>> Thanks,
>> Simon

Re: Data backup in CouchDB cluster

Posted by Andrea Brancatelli <ab...@schema31.it.INVALID>.
We're using this: 

https://github.com/danielebailo/couchdb-dump 

since a few years. 

It almost always worked flawlessly. It's fast, and, to me, it's better
than backing up the .couch files for various reasons: 

 	* you can restore datas on a cluster with a different N/Q layout
 	* you can restore datas on a different machine with a different
cluster name / different IP / different whatever ... .couch files
include references to vm.args parameters.
 	* when you restore the DB you get a clean db without the tombstones.
 	* you can backup the db without having local access to the machine,
passing trough the standard HTTP port.

Hope it helps you.

---

Andrea Brancatelli

On 2021-05-07 11:27, Simon Schwichtenberg wrote:

> Hi,
> 
> I wonder how you'd do backups of your data in a CouchDB cluster. The documentation does not mention backups of clusters explicitly (https://docs.couchdb.org/en/latest/maintenance/backups.html#database-backups).
> 
> When you have a cluster of three nodes and the nodes are set to n=3 and q=2 (see https://docs.couchdb.org/en/latest/cluster/sharding.html), I'd expect that every single node in the cluster has all the data and you can copy the .couch files from any of these three nodes. When you have 6 nodes with n=3 and q=2 this approach does not work anymore because every node has just a single shard. Please correct me if I am wrong.
> 
> What is best practice to backup a cluster?
> 
> This message is a follow-up from here: https://github.com/cloudant/couchbackup/issues/349
> 
> Thanks,
> Simon