You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Sho Fukamachi <sh...@gmail.com> on 2008/07/15 17:22:37 UTC

replication usage? creating dupes?

O Wise ones,

While attempting to use Futon's built-in replicator function to sync a  
local DB with a (brand new) remote one, the replication kept timing  
out. I restarted it several times, and after it finally completed I  
was delighted to find it actually had created more records on the  
remote than exist locally. Hooray, free records!

Unfortunately they seem to be dupes. It was only about 3000 records,  
1000 or so records are dupes. This leaves me with a couple of questions:

- is there a "safe" way to do replication that doesn't create dupes?
- is Couch really sensitive to network fluctuations? I admit, I'm on  
the other side of the planet as the test server, but no packet loss or  
anything I can detect
- what is the current best practise to keep two databases in sync? Ie,  
2-way (multi master) replication. No dupes. Assume imperfect network  
(ie, over public internet). This is kind of one of the reasons I am  
using Couch for this project so .. I would like to do it right!

I also wonder if anyone has started work on a 3rd party  
synchronisation tool yet? I'm thinking something that just  
periodically queries both DBs, makes a list of unsynced _ids and/or  
_revs and then PUTS one to the other as necessary. Maybe something  
nice in Ruby? Not that I'm knocking Futon of course, it's just that in- 
browser JS seems a little .. fragile, especially after today's  
experience.

Thanks in advance for any suggestions/wisdom.

Sho


Re: replication usage? creating dupes?

Posted by Sho Fukamachi <sh...@gmail.com>.
thanks for the reply damien,


On 16/07/2008, at 1:44 AM, Damien Katz wrote:

> Replication causing duplicate records? We've never seen that, and it  
> shouldn't be possible. Maybe you replicated to 2 different source  
> databases to the same target?

Hm. Of course it is possible that this is, at heart,  a PEBKAC issue.  
I will try to replicate my experience, perhaps I was too hasty in  
jumping on the ML ...

> If you can zip up the two databases and mail them to me, or post  
> them some where publicly accessible, I can take a look and see if I  
> can figure out what happened.

I unfortunately was testing with copies of production user data and if  
I did that my boss would literally eviscerate me; regardless I  
unthinkingly deleted it upon seeing the error without considering its  
significance. However, I will be testing it thoroughly with non- 
production data and will send that if it happens again. I've tried the  
replication another two times without problems so I really don't know  
what's up.

Anyway, thanks for the clarifications - I had thought it shouldn't be  
possible for that to happen, and everything you described was just as  
I had thought/assumed! Perhaps I just got "lucky" : )


Sho

Re: replication usage? creating dupes?

Posted by Damien Katz <da...@apache.org>.
On Jul 15, 2008, at 11:22 AM, Sho Fukamachi wrote:

> O Wise ones,
>
> While attempting to use Futon's built-in replicator function to sync  
> a local DB with a (brand new) remote one, the replication kept  
> timing out. I restarted it several times, and after it finally  
> completed I was delighted to find it actually had created more  
> records on the remote than exist locally. Hooray, free records!
>

Replication causing duplicate records? We've never seen that, and it  
shouldn't be possible. Maybe you replicated to 2 different source  
databases to the same target?

> Unfortunately they seem to be dupes. It was only about 3000 records,  
> 1000 or so records are dupes. This leaves me with a couple of  
> questions:
>
> - is there a "safe" way to do replication that doesn't create dupes?

Using the HTTP replicator is the correct way, you issue a HTTP  
replication request and it performs the replication. Futon uses that.

>
> - is Couch really sensitive to network fluctuations? I admit, I'm on  
> the other side of the planet as the test server, but no packet loss  
> or anything I can detect

No, replication can fail at any point in the process and it will  
recover without problem at the next replication.

>
> - what is the current best practise to keep two databases in sync?  
> Ie, 2-way (multi master) replication. No dupes. Assume imperfect  
> network (ie, over public internet). This is kind of one of the  
> reasons I am using Couch for this project so .. I would like to do  
> it right!
>

Replicate the databases on a schedule. CouchDB will figure out what  
has changed incrementally.

> I also wonder if anyone has started work on a 3rd party  
> synchronisation tool yet? I'm thinking something that just  
> periodically queries both DBs, makes a list of unsynced _ids and/or  
> _revs and then PUTS one to the other as necessary. Maybe something  
> nice in Ruby? Not that I'm knocking Futon of course, it's just that  
> in-browser JS seems a little .. fragile, especially after today's  
> experience.

The replicator is actually written in Erlang. There is currently an  
issue with http-timeouts during long replicatons, and we need to do  
the replication async with the browser request to fix it. However  
there should never be new records (with new IDs) created during  
replication, it doesn't work that way.

>
>
> Thanks in advance for any suggestions/wisdom.
>
> Sho
>

If you can zip up the two databases and mail them to me, or post them  
some where publicly accessible, I can take a look and see if I can  
figure out what happened.