You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by David Gubler <da...@vshn.ch> on 2016/10/10 14:57:51 UTC

Backup best practices

Hi list,

I'm currently implementing CouchDB backup for one of our customers. The
backup should be generic, i.e. not tied to the needs of that specific
customer.

However, I can't find any consensus on how to properly implement backup
for CouchDB.

System: Ubuntu 16.04 with CouchDB 1.6.0, backup software is Burp.

According to
http://mail-archives.apache.org/mod_mbox/incubator-couchdb-user/200808.mbox/%3C32800028-9286-47C8-82A5-1ECC25667FDA@apache.org%3E,
I can just copy the file from the running server. That should work with
burp.

However, other sources say that the backup will be much smaller if I
dump each database. There's a bunch of tools to do this, ubuntu has the
python-couchdb package (0.10-1.1) with the couchdb-dump and couchdb-load
tools. As far as I understand it, the dump will not include some
(meta)information about the documents, like old versions.

Thus my main questions are:
- Can the python-couchdb tools (couchdb-dump, couchdb-load) be relied
upon as backup tools?
- Are these tool fast enough for larger (several GB) data sets?
- Are there realistic use cases in which these dumps are insufficient
because they miss some (meta)data which was present in the original
database?
- Any experiences with backup software simply copying the database files?

Thanks!

Best regards,

David

-- 

David Gubler
System Engineer

VSHN AG | Neugasse 10 | CH-8005 Zürich
T: +41 44 545 53 00 | M: +41 76 461 23 11 | http://vshn.ch


Re: Backup best practices

Posted by Kiril Stankov <ki...@open-net.biz>.
Hi,

What I do is having a replication to a backup CouchDB instance.
I also have some scripts, which simply stop the backup CouchDB instance,
then copy the files to another volume and then restart CouchDB.
Worked as a charm for an year now and tested for restore, even thought
not on a different OS.

------------------------------------------------------------------------
*With best regards,*
Kiril Stankov,
CEO


            This Email disclaimer
            <http://open-net.biz/emailsignature.html> is integral part
            of this message.

On 11.10.2016 \u0433. 13:18, Jan Lehnardt wrote:
> One addition, if you have views, copy the corresponding .view files first, then the .couch file.
>
> If you do it the other way around, the restore will have to rebuild the view files from scratch.
>
> Best
> Jan
> --
>
>
>> On 11 Oct 2016, at 07:42, jtuchel@objektfabrik.de wrote:
>>
>> David,
>>
>> we use a simple file copy procedure to backup our couch db from our production servers.
>> Restoring by simply copying the .couch file to another server or the origin machine works without any troubles ... unless if we take the .couch file to a Windows machine (our production environment is Ubuntu Linux). The windows boxes cannot finde some of the documents in such restored backups.
>>
>> I got the hint on this list to always stop couch in the Windows services panel before copying the file onto the machine. Since this wasn't too long ago, I cannot tell you if it really solves that problem reliably or not. It seems couch prefers replication as a reliable backup strategy, but that doesn't fit for all situations. If you simply want to store a file on some backup medium, this can be complicated (firewalls, executables on backup target environment etc.).
>>
>> So I am eager to read all the answers to your question as well, because we need a reliable way to store plain files (not a second couch machine) - especially for long term archives (10+ years).
>>
>> Joachim
>>
>>
>> Am 10.10.16 um 16:57 schrieb David Gubler:
>>> Hi list,
>>>
>>> I'm currently implementing CouchDB backup for one of our customers. The
>>> backup should be generic, i.e. not tied to the needs of that specific
>>> customer.
>>>
>>> However, I can't find any consensus on how to properly implement backup
>>> for CouchDB.
>>>
>>> System: Ubuntu 16.04 with CouchDB 1.6.0, backup software is Burp.
>>>
>>> According to
>>> http://mail-archives.apache.org/mod_mbox/incubator-couchdb-user/200808.mbox/%3C32800028-9286-47C8-82A5-1ECC25667FDA@apache.org%3E,
>>> I can just copy the file from the running server. That should work with
>>> burp.
>>>
>>> However, other sources say that the backup will be much smaller if I
>>> dump each database. There's a bunch of tools to do this, ubuntu has the
>>> python-couchdb package (0.10-1.1) with the couchdb-dump and couchdb-load
>>> tools. As far as I understand it, the dump will not include some
>>> (meta)information about the documents, like old versions.
>>>
>>> Thus my main questions are:
>>> - Can the python-couchdb tools (couchdb-dump, couchdb-load) be relied
>>> upon as backup tools?
>>> - Are these tool fast enough for larger (several GB) data sets?
>>> - Are there realistic use cases in which these dumps are insufficient
>>> because they miss some (meta)data which was present in the original
>>> database?
>>> - Any experiences with backup software simply copying the database files?
>>>
>>> Thanks!
>>>
>>> Best regards,
>>>
>>> David
>>>
>>
>> -- 
>> -----------------------------------------------------------------------
>> Objektfabrik Joachim Tuchel          mailto:jtuchel@objektfabrik.de
>> Fliederweg 1                         http://www.objektfabrik.de
>> D-71640 Ludwigsburg                  http://joachimtuchel.wordpress.com
>> Telefon: +49 7141 56 10 86 0         Fax: +49 7141 56 10 86 1
>>


Re: Backup best practices

Posted by Jan Lehnardt <ja...@apache.org>.
One addition, if you have views, copy the corresponding .view files first, then the .couch file.

If you do it the other way around, the restore will have to rebuild the view files from scratch.

Best
Jan
--


> On 11 Oct 2016, at 07:42, jtuchel@objektfabrik.de wrote:
> 
> David,
> 
> we use a simple file copy procedure to backup our couch db from our production servers.
> Restoring by simply copying the .couch file to another server or the origin machine works without any troubles ... unless if we take the .couch file to a Windows machine (our production environment is Ubuntu Linux). The windows boxes cannot finde some of the documents in such restored backups.
> 
> I got the hint on this list to always stop couch in the Windows services panel before copying the file onto the machine. Since this wasn't too long ago, I cannot tell you if it really solves that problem reliably or not. It seems couch prefers replication as a reliable backup strategy, but that doesn't fit for all situations. If you simply want to store a file on some backup medium, this can be complicated (firewalls, executables on backup target environment etc.).
> 
> So I am eager to read all the answers to your question as well, because we need a reliable way to store plain files (not a second couch machine) - especially for long term archives (10+ years).
> 
> Joachim
> 
> 
> Am 10.10.16 um 16:57 schrieb David Gubler:
>> Hi list,
>> 
>> I'm currently implementing CouchDB backup for one of our customers. The
>> backup should be generic, i.e. not tied to the needs of that specific
>> customer.
>> 
>> However, I can't find any consensus on how to properly implement backup
>> for CouchDB.
>> 
>> System: Ubuntu 16.04 with CouchDB 1.6.0, backup software is Burp.
>> 
>> According to
>> http://mail-archives.apache.org/mod_mbox/incubator-couchdb-user/200808.mbox/%3C32800028-9286-47C8-82A5-1ECC25667FDA@apache.org%3E,
>> I can just copy the file from the running server. That should work with
>> burp.
>> 
>> However, other sources say that the backup will be much smaller if I
>> dump each database. There's a bunch of tools to do this, ubuntu has the
>> python-couchdb package (0.10-1.1) with the couchdb-dump and couchdb-load
>> tools. As far as I understand it, the dump will not include some
>> (meta)information about the documents, like old versions.
>> 
>> Thus my main questions are:
>> - Can the python-couchdb tools (couchdb-dump, couchdb-load) be relied
>> upon as backup tools?
>> - Are these tool fast enough for larger (several GB) data sets?
>> - Are there realistic use cases in which these dumps are insufficient
>> because they miss some (meta)data which was present in the original
>> database?
>> - Any experiences with backup software simply copying the database files?
>> 
>> Thanks!
>> 
>> Best regards,
>> 
>> David
>> 
> 
> 
> -- 
> -----------------------------------------------------------------------
> Objektfabrik Joachim Tuchel          mailto:jtuchel@objektfabrik.de
> Fliederweg 1                         http://www.objektfabrik.de
> D-71640 Ludwigsburg                  http://joachimtuchel.wordpress.com
> Telefon: +49 7141 56 10 86 0         Fax: +49 7141 56 10 86 1
> 

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/


Re: Backup best practices

Posted by "jtuchel@objektfabrik.de" <jt...@objektfabrik.de>.
David,

we use a simple file copy procedure to backup our couch db from our 
production servers.
Restoring by simply copying the .couch file to another server or the 
origin machine works without any troubles ... unless if we take the 
.couch file to a Windows machine (our production environment is Ubuntu 
Linux). The windows boxes cannot finde some of the documents in such 
restored backups.

I got the hint on this list to always stop couch in the Windows services 
panel before copying the file onto the machine. Since this wasn't too 
long ago, I cannot tell you if it really solves that problem reliably or 
not. It seems couch prefers replication as a reliable backup strategy, 
but that doesn't fit for all situations. If you simply want to store a 
file on some backup medium, this can be complicated (firewalls, 
executables on backup target environment etc.).

So I am eager to read all the answers to your question as well, because 
we need a reliable way to store plain files (not a second couch machine) 
- especially for long term archives (10+ years).

Joachim


Am 10.10.16 um 16:57 schrieb David Gubler:
> Hi list,
>
> I'm currently implementing CouchDB backup for one of our customers. The
> backup should be generic, i.e. not tied to the needs of that specific
> customer.
>
> However, I can't find any consensus on how to properly implement backup
> for CouchDB.
>
> System: Ubuntu 16.04 with CouchDB 1.6.0, backup software is Burp.
>
> According to
> http://mail-archives.apache.org/mod_mbox/incubator-couchdb-user/200808.mbox/%3C32800028-9286-47C8-82A5-1ECC25667FDA@apache.org%3E,
> I can just copy the file from the running server. That should work with
> burp.
>
> However, other sources say that the backup will be much smaller if I
> dump each database. There's a bunch of tools to do this, ubuntu has the
> python-couchdb package (0.10-1.1) with the couchdb-dump and couchdb-load
> tools. As far as I understand it, the dump will not include some
> (meta)information about the documents, like old versions.
>
> Thus my main questions are:
> - Can the python-couchdb tools (couchdb-dump, couchdb-load) be relied
> upon as backup tools?
> - Are these tool fast enough for larger (several GB) data sets?
> - Are there realistic use cases in which these dumps are insufficient
> because they miss some (meta)data which was present in the original
> database?
> - Any experiences with backup software simply copying the database files?
>
> Thanks!
>
> Best regards,
>
> David
>


-- 
-----------------------------------------------------------------------
Objektfabrik Joachim Tuchel          mailto:jtuchel@objektfabrik.de
Fliederweg 1                         http://www.objektfabrik.de
D-71640 Ludwigsburg                  http://joachimtuchel.wordpress.com
Telefon: +49 7141 56 10 86 0         Fax: +49 7141 56 10 86 1