You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Niket Patel <ne...@me.com> on 2008/08/22 01:17:30 UTC

Backup of couchdb

Hello,

We started using CouchDB in production system hosted on EC2
So, I'm bit worried about backup,

I figured it can be as simple as backing up database file and  
replication, still I like to know,
Is there anyone using couchdb in production and have strong  
suggestions on topic from their experiences.
I hardly found anything on wiki or web for CouchDB backup.

Thanks
Niket


Re: Backup of couchdb

Posted by Richard Heycock <rg...@roughage.com.au>.
Excerpts from Niket Patel's message of Fri Aug 22 09:17:30 +1000 2008:
> Hello,
> 
> We started using CouchDB in production system hosted on EC2
> So, I'm bit worried about backup,
> 
> I figured it can be as simple as backing up database file and  
> replication, still I like to know,
> Is there anyone using couchdb in production and have strong  
> suggestions on topic from their experiences.
> I hardly found anything on wiki or web for CouchDB backup.

I'm currently using couchdb on ec2. I simply dump it to s3 and then
restore it when I start another instance. It's not the cleanest way
of doing it but it works well for what I'm doing.

BUT ...

Amazon have just released "Elastic Block Store" which means you can
attach a persistent block storage device your instance. So you can
put your couchdb storage on this partition and it will be persisted
from instance to instance. This makes me *very* happy!

  http://aws.typepad.com/aws/2008/08/amazon-elastic.html

rgh

> Thanks
> Niket
-- 
+61 (0) 410 646 369
[e]:  rgh@neoss.com.au
[im]: rgh@jabber.org

You're worried criminals will continue to penetrate into cyberspace, and
I'm worried complexity, poor design and mismanagement will be there to meet
them - Marcus Ranum

Re: Backup of couchdb

Posted by Jan Lehnardt <ja...@apache.org>.
On Aug 22, 2008, at 09:47, Jan Lehnardt wrote:

>
> On Aug 22, 2008, at 08:12, Niket Patel wrote:
>
>>
>> On Aug 22, 2008, at 11:09 AM, Jason Huggins wrote:
>>
>>> Just to be clear... is it okay to make a "hot backup" of that file
>>> while the server is still running? (I would think so, give it's
>>> append-only storage design.)
>>
>> More information on this will be helpful from couchdb developer.
>> If backup doesn't have some recently added records between backup  
>> start and finish..
>> thats fine
>>
>> But if this can corrupt db file, we have to think other backup  
>> options
>> replication should not considered as backup.
>
>
> After each write, a database file is guaranteed to be consistent on
> disk (unless you are on an OS that doesn't handle fsync() properly).
>
> If you disable writes to a database during the time of a backup, you
> can make a "hot" copy. You might not consider that "hot" anymore
> though. Reads can still go to the file.
>
> Or have two nodes, one of which is your live-node and the other one
> is your backup-node. Have the backup-node replicate from the live
> node up to a certain point. Then shut it down (or leave it idle) and
> make a filesystem copy of the replicated database.

In addition to that, note that two nodes can run on the same machine,
so there's no need to multiple EC2 instances or more hardware.

Cheers
Jan
--

Re: Backup of couchdb

Posted by Jason Huggins <ja...@jrandolph.com>.
On Fri, Aug 22, 2008 at 9:17 AM, Damien Katz <da...@apache.org> wrote:
> Actually, you can copy a live database file from the OS at anytime without
> problem. Doesn't matter if its being updated, or even if its being
> compacted, the CouchDB never-overwrite storage format ensures it should just
> work without issue.

Thank you for the unambiguous statement, Damien. On Noah's suggestion,
I'll close this thread with a wiki page. Lastly, the tester in me
would love to see some tests *proving* that  hot copies are safe...
Until then -- considering the source -- I'll take your word for it.
:-)

- Jason

Re: Backup of couchdb

Posted by Damien Katz <da...@apache.org>.
On Aug 22, 2008, at 3:47 AM, Jan Lehnardt wrote:

>
> On Aug 22, 2008, at 08:12, Niket Patel wrote:
>
>>
>> On Aug 22, 2008, at 11:09 AM, Jason Huggins wrote:
>>
>>> Just to be clear... is it okay to make a "hot backup" of that file
>>> while the server is still running? (I would think so, give it's
>>> append-only storage design.)
>>
>> More information on this will be helpful from couchdb developer.
>> If backup doesn't have some recently added records between backup  
>> start and finish..
>> thats fine
>>
>> But if this can corrupt db file, we have to think other backup  
>> options
>> replication should not considered as backup.
>
>
> After each write, a database file is guaranteed to be consistent on
> disk (unless you are on an OS that doesn't handle fsync() properly).
>
> If you disable writes to a database during the time of a backup, you
> can make a "hot" copy. You might not consider that "hot" anymore
> though. Reads can still go to the file.

Actually, you can copy a live database file from the OS at anytime  
without problem. Doesn't matter if its being updated, or even if its  
being compacted, the CouchDB never-overwrite storage format ensures it  
should just work without issue.

-Damien


>
>
> Or have two nodes, one of which is your live-node and the other one
> is your backup-node. Have the backup-node replicate from the live
> node up to a certain point. Then shut it down (or leave it idle) and
> make a filesystem copy of the replicated database.
>
> Use filesystem snapshots. In theory you should be able to grab
> a complete and live snapshot from under CouchDB's feet, I have
> never done this though and you should verify that it works :)
>
> Cheers
> Jan
> --


Re: Backup of couchdb

Posted by Jan Lehnardt <ja...@apache.org>.
On Aug 22, 2008, at 08:12, Niket Patel wrote:

>
> On Aug 22, 2008, at 11:09 AM, Jason Huggins wrote:
>
>> Just to be clear... is it okay to make a "hot backup" of that file
>> while the server is still running? (I would think so, give it's
>> append-only storage design.)
>
> More information on this will be helpful from couchdb developer.
> If backup doesn't have some recently added records between backup  
> start and finish..
> thats fine
>
> But if this can corrupt db file, we have to think other backup options
> replication should not considered as backup.


After each write, a database file is guaranteed to be consistent on
disk (unless you are on an OS that doesn't handle fsync() properly).

If you disable writes to a database during the time of a backup, you
can make a "hot" copy. You might not consider that "hot" anymore
though. Reads can still go to the file.

Or have two nodes, one of which is your live-node and the other one
is your backup-node. Have the backup-node replicate from the live
node up to a certain point. Then shut it down (or leave it idle) and
make a filesystem copy of the replicated database.

Use filesystem snapshots. In theory you should be able to grab
a complete and live snapshot from under CouchDB's feet, I have
never done this though and you should verify that it works :)

Cheers
Jan
--

Re: Backup of couchdb

Posted by Noah Slater <ns...@apache.org>.
On Fri, Aug 22, 2008 at 03:45:19PM -0500, Jason Huggins wrote:
> On Fri, Aug 22, 2008 at 6:43 AM, Noah Slater <ns...@apache.org> wrote:
> > I would like to request that when we get closure to this thread someone takes
> > the time to write it up on the wiki.
>
> Done!

Thanks!

-- 
Noah Slater, http://people.apache.org/~nslater/

Re: Backup of couchdb

Posted by Niket Patel <ne...@me.com>.
On Aug 23, 2008, at 2:15 AM, Jason Huggins wrote:

> http://wiki.apache.org/couchdb/FilesystemBackups
>
> It is sparse, but it at least plainly states the larger point that
> nothing more than filesystem copies are required. If I'm wrong in any
> way, please let me know (and fix the wiki page, too)!

Jason, At least wiki information is enough, as it including Damien's  
statment.

Also, I will also try to do some tests. Currently db files are small  
enough so not sure,
I will create good tests.
Please, put you findings and procedure of your tests on wiki.

Re: Backup of couchdb

Posted by Jason Huggins <ja...@jrandolph.com>.
On Fri, Aug 22, 2008 at 6:43 AM, Noah Slater <ns...@apache.org> wrote:
> I would like to request that when we get closure to this thread someone takes
> the time to write it up on the wiki.

Done!

http://wiki.apache.org/couchdb/FilesystemBackups

It is sparse, but it at least plainly states the larger point that
nothing more than filesystem copies are required. If I'm wrong in any
way, please let me know (and fix the wiki page, too)!

Cheers,
Jason

Re: Backup of couchdb

Posted by Noah Slater <ns...@apache.org>.
On Fri, Aug 22, 2008 at 11:42:11AM +0530, Niket Patel wrote:
> More information on this will be helpful from couchdb developer.

I would love to chip in but I don't know enough about this area.

I would like to request that when we get closure to this thread someone takes
the time to write it up on the wiki.

Best,

-- 
Noah Slater, http://people.apache.org/~nslater/

Re: Backup of couchdb

Posted by Niket Patel <ne...@me.com>.
On Aug 22, 2008, at 11:09 AM, Jason Huggins wrote:

> Just to be clear... is it okay to make a "hot backup" of that file
> while the server is still running? (I would think so, give it's
> append-only storage design.)

More information on this will be helpful from couchdb developer.
If backup doesn't have some recently added records between backup  
start and finish..
thats fine

But if this can corrupt db file, we have to think other backup options
replication should not considered as backup.

Re: Backup of couchdb

Posted by Jason Huggins <ja...@jrandolph.com>.
On Thu, Aug 21, 2008 at 6:35 PM, Chris Anderson <jc...@grabb.it> wrote:
> On the flip-side, if you are restoring from backup, or otherwise want
> to clone an entire couchdb database, the fastest method is through
> duplicating the my-database.couch file.

Just to be clear... is it okay to make a "hot backup" of that file
while the server is still running? (I would think so, give it's
append-only storage design.)

Other databases (Oracle, Postgres) make a big distinction between hot
backup (while the server is running) and cold backups (when it's not).
Usually, the safest backup method with them is a cold backup -- but is
impractical, given the downtime. So, usually, those other dbs ship
tools to "dump" the database to another file as their hot backup
solution. If "hot" copying of couchdb db files is safe and a "for
free" feature of the design... that's really, really cool. :-)

- Jason

Re: Backup of couchdb

Posted by Niket Patel <ne...@me.com>.
Perfect. I'm planning for both. Thanks

On Aug 22, 2008, at 5:05 AM, Chris Anderson wrote:

> In your case I'd probably set up replication on a cron job to put the
> data on another ec2 instance. And then periodically backup the .couch
> files to s3.


Re: Backup of couchdb

Posted by Chris Anderson <jc...@grabb.it>.
Niket,

If you maintain a backup using CouchDB replication, you'll have the
advantage of incremental updates, so while the first backup might take
longer than it would doing a purely file-based approach, you'll save
time overall.

On the flip-side, if you are restoring from backup, or otherwise want
to clone an entire couchdb database, the fastest method is through
duplicating the my-database.couch file.

In your case I'd probably set up replication on a cron job to put the
data on another ec2 instance. And then periodically backup the .couch
files to s3.



On Thu, Aug 21, 2008 at 4:17 PM, Niket Patel <ne...@me.com> wrote:
> Hello,
>
> We started using CouchDB in production system hosted on EC2
> So, I'm bit worried about backup,
>
> I figured it can be as simple as backing up database file and replication,
> still I like to know,
> Is there anyone using couchdb in production and have strong suggestions on
> topic from their experiences.
> I hardly found anything on wiki or web for CouchDB backup.
>
> Thanks
> Niket
>
>



-- 
Chris Anderson
http://jchris.mfdz.com