You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Jeldrik <je...@kopfsalat.org> on 2014/11/19 16:38:53 UTC

Keep a Replication when moving the CouchDB

Hi there,

I already asked this question on #couchdb but I'm not really satisfied
with the answers I got. Just because there are some open questions left
with no answer in IRC. I thought it could be a good idea to open the
question for a wider group. I will paste both my original question and
the answers I got in #couchdb.

Many thanks for your help,
Jeldrik

==

This was the question (I just added some information):

We are moving a couchdb to new hardware but we have a pull replication
(couch_backup.example.com) which we want to keep. Our planned steps are
like these:
1. rsync db files from couch_live.example.com to couch_new.example.com
2. compact dbs on couch_new (this is neccessary because on couch_live
compression was turned off and is wished to be turned on now)
# Meanwhile the couch_live is still live and data is pushed to it from
clients and pulled by the couch_backup replication
3. start pull replication on couch_new with source couch_live and target
couch_new for all dbs
4. if all dbs are nearly in sync have a short downtime until the data is
fully in sync then turn over to couch_new
5. shutdown couch_live and the replication to couch_backup
6. new data is comming in to couch_new
7. start pull replication on couch_backup with source couch_new
 
Now the question is how to keep the couch_backup replication? If I got
it right the replication depends on two values. The first one is the uri
to the source. So could a switch from couch_live.example.com/db1 to
couch_new.example.com/db1 break the replication? The second one is or
more precisely are the seq no. At the moment when we turn off the
couch_live all three couch_live, couch_backup and couch_new will have
the same data. So from the point of view of the data we have
consistency. But maybe the seq no. differ. Of course the couch_new will
immediately receive new data. So how can I convice the couch_backup to
start replication from that one point of data consistency?

==

And these were responses and my following questions on IRC to it:

15:09 <mar-ia> jeldrik: couch_backup will continue from the last data it
has. You should not need to wory about it. If I have understood
everything correctly :)
15:37 <jeldrik> mar-ia: thx. but how sure are you about that? the
problem is that couch_backup is on a remote site. and it happened to
them when we had a similar system move.
15:44 <mar-ia> jeldrik: Every node knows the last change it has. So when
it starts a replication it askes for all the changes made after that
point. It does not get the complete history, only the latest version (as
always).
15:49 <jeldrik> but if i got it right it does that with the checkpoints
aka seq no., doesn't it? and we had situations where the seq. no of a
replication differed from the source. so couldn't it happen that the new
system has a lower seq no. but new data and because of that after the
change the backup couch asks like for "everything after 'higher seq no'"
and then gets nothing
15:50 <jeldrik> what would break the consistency of the backup

Re: Keep a Replication when moving the CouchDB

Posted by Jeldrik <je...@kopfsalat.org>.
On 11/19/2014 10:00 PM, Matthieu Rakotojaona wrote:
> Excerpts from Jeldrik's message of 2014-11-19 16:38:53 +0100:
>> Hi there,
>>
>> I already asked this question on #couchdb but I'm not really satisfied
>> with the answers I got. Just because there are some open questions left
>> with no answer in IRC. I thought it could be a good idea to open the
>> question for a wider group. I will paste both my original question and
>> the answers I got in #couchdb.
>>
>> Many thanks for your help,
>> Jeldrik
>>
>> ==
>>
>> This was the question (I just added some information):
>>
>> We are moving a couchdb to new hardware but we have a pull replication
>> (couch_backup.example.com) which we want to keep. Our planned steps are
>> like these:
>> 1. rsync db files from couch_live.example.com to couch_new.example.com
>> 2. compact dbs on couch_new (this is neccessary because on couch_live
>> compression was turned off and is wished to be turned on now)
>> # Meanwhile the couch_live is still live and data is pushed to it from
>> clients and pulled by the couch_backup replication
>> 3. start pull replication on couch_new with source couch_live and target
>> couch_new for all dbs
>> 4. if all dbs are nearly in sync have a short downtime until the data is
>> fully in sync then turn over to couch_new
>> 5. shutdown couch_live and the replication to couch_backup
>> 6. new data is comming in to couch_new
>> 7. start pull replication on couch_backup with source couch_new
>>  
>> Now the question is how to keep the couch_backup replication? If I got
>> it right the replication depends on two values. The first one is the uri
>> to the source. So could a switch from couch_live.example.com/db1 to
>> couch_new.example.com/db1 break the replication? The second one is or
>> more precisely are the seq no. At the moment when we turn off the
>> couch_live all three couch_live, couch_backup and couch_new will have
>> the same data. So from the point of view of the data we have
>> consistency. But maybe the seq no. differ. Of course the couch_new will
>> immediately receive new data. So how can I convice the couch_backup to
>> start replication from that one point of data consistency?
>>
>> ==
>>
>> And these were responses and my following questions on IRC to it:
>>
>> 15:09 <mar-ia> jeldrik: couch_backup will continue from the last data it
>> has. You should not need to wory about it. If I have understood
>> everything correctly :)
>> 15:37 <jeldrik> mar-ia: thx. but how sure are you about that? the
>> problem is that couch_backup is on a remote site. and it happened to
>> them when we had a similar system move.
>> 15:44 <mar-ia> jeldrik: Every node knows the last change it has. So when
>> it starts a replication it askes for all the changes made after that
>> point. It does not get the complete history, only the latest version (as
>> always).
>> 15:49 <jeldrik> but if i got it right it does that with the checkpoints
>> aka seq no., doesn't it? and we had situations where the seq. no of a
>> replication differed from the source. so couldn't it happen that the new
>> system has a lower seq no. but new data and because of that after the
>> change the backup couch asks like for "everything after 'higher seq no'"
>> and then gets nothing
>> 15:50 <jeldrik> what would break the consistency of the backup
> Hi Jeldrik,
>
> Sequence number are per-db helpers for replication and incremental
> views, they are opaque data for users. The best way to see them is like
> ETags: they mean something only for the database that holds it, so you
> can query that database with "since=", but external components (that
> includes you as a user) don't have to worry about their meaning.
> Actually, if I refer to this thread [0], BigCouch uses strings and there
> are discussions about switching to strings. Don't take my word for it
> though, I'm not in the inner circles :)
>
> This is why there is no transferring of sequence numbers. Even if you
> remained on the same couchdb instance but had two databases with the
> same data, the sequence numbers could differ. Incidentally,
> replications, which are relying on sequence numbers, aren't supposed to
> be transferred from database to database. You have to setup a new
> replication every time you change a database name/server.
>
> But it's okay, here is what would happen in the worst case for you:
>
> - you set up a new replication from couh_new to couch_backup as soon as
>   couch_new is running, even though it's not receiving user data
>   directly
>
> - replicator runs, checks replication history between couch_new and
>   couch_backup, sees there is none, starts from scratch
>
> - replicator gets all changes in couch_new from the beginning, sees if
>   they exist in couch_backup. It checks _every_ doc so it might take
>   some time
>
> - since the data already exists on the destination, replicator won't
>   transfer any data
>
> - at the end, replicator will save a checkpoint for this replication
>   stating "there's been a replication between those 2 databases, up to
>   source id xxx and target id yyy" (note: this checkpoint is saved in
>   two parts, one on each end. But as a user you don't care). Now the
>   next time replicator runs, it will not start from scratch.
>
> In your situation, I'd set up a replication couch_new => couch_backup
> as soon as couch_new is up. You'd have a 3-way replication:
>
> - couch_live => couch_new
> - couch_live => couch_backup
> - couch_new => couch_backup
>
> which is totally fine and how CouchDB's replication protocol was
> intended to work. This way, the moment you turn couch_live down, the
> backup replication will already be up and running and all the data is
> where it should be. Don't forget to remove the couch_live =>
> couch_backup replication.
>
> I hope this answers your questions !
>
> [0] http://thread.gmane.org/gmane.comp.db.couchdb.devel/11724
>
Hi Matthieu,

thanks a lot. That makes things clearer to me!

Best wishes,
Jeldrik

Re: Keep a Replication when moving the CouchDB

Posted by Matthieu Rakotojaona <ma...@gmail.com>.
Excerpts from Jeldrik's message of 2014-11-19 16:38:53 +0100:
> Hi there,
> 
> I already asked this question on #couchdb but I'm not really satisfied
> with the answers I got. Just because there are some open questions left
> with no answer in IRC. I thought it could be a good idea to open the
> question for a wider group. I will paste both my original question and
> the answers I got in #couchdb.
> 
> Many thanks for your help,
> Jeldrik
> 
> ==
> 
> This was the question (I just added some information):
> 
> We are moving a couchdb to new hardware but we have a pull replication
> (couch_backup.example.com) which we want to keep. Our planned steps are
> like these:
> 1. rsync db files from couch_live.example.com to couch_new.example.com
> 2. compact dbs on couch_new (this is neccessary because on couch_live
> compression was turned off and is wished to be turned on now)
> # Meanwhile the couch_live is still live and data is pushed to it from
> clients and pulled by the couch_backup replication
> 3. start pull replication on couch_new with source couch_live and target
> couch_new for all dbs
> 4. if all dbs are nearly in sync have a short downtime until the data is
> fully in sync then turn over to couch_new
> 5. shutdown couch_live and the replication to couch_backup
> 6. new data is comming in to couch_new
> 7. start pull replication on couch_backup with source couch_new
>  
> Now the question is how to keep the couch_backup replication? If I got
> it right the replication depends on two values. The first one is the uri
> to the source. So could a switch from couch_live.example.com/db1 to
> couch_new.example.com/db1 break the replication? The second one is or
> more precisely are the seq no. At the moment when we turn off the
> couch_live all three couch_live, couch_backup and couch_new will have
> the same data. So from the point of view of the data we have
> consistency. But maybe the seq no. differ. Of course the couch_new will
> immediately receive new data. So how can I convice the couch_backup to
> start replication from that one point of data consistency?
> 
> ==
> 
> And these were responses and my following questions on IRC to it:
> 
> 15:09 <mar-ia> jeldrik: couch_backup will continue from the last data it
> has. You should not need to wory about it. If I have understood
> everything correctly :)
> 15:37 <jeldrik> mar-ia: thx. but how sure are you about that? the
> problem is that couch_backup is on a remote site. and it happened to
> them when we had a similar system move.
> 15:44 <mar-ia> jeldrik: Every node knows the last change it has. So when
> it starts a replication it askes for all the changes made after that
> point. It does not get the complete history, only the latest version (as
> always).
> 15:49 <jeldrik> but if i got it right it does that with the checkpoints
> aka seq no., doesn't it? and we had situations where the seq. no of a
> replication differed from the source. so couldn't it happen that the new
> system has a lower seq no. but new data and because of that after the
> change the backup couch asks like for "everything after 'higher seq no'"
> and then gets nothing
> 15:50 <jeldrik> what would break the consistency of the backup

Hi Jeldrik,

Sequence number are per-db helpers for replication and incremental
views, they are opaque data for users. The best way to see them is like
ETags: they mean something only for the database that holds it, so you
can query that database with "since=", but external components (that
includes you as a user) don't have to worry about their meaning.
Actually, if I refer to this thread [0], BigCouch uses strings and there
are discussions about switching to strings. Don't take my word for it
though, I'm not in the inner circles :)

This is why there is no transferring of sequence numbers. Even if you
remained on the same couchdb instance but had two databases with the
same data, the sequence numbers could differ. Incidentally,
replications, which are relying on sequence numbers, aren't supposed to
be transferred from database to database. You have to setup a new
replication every time you change a database name/server.

But it's okay, here is what would happen in the worst case for you:

- you set up a new replication from couh_new to couch_backup as soon as
  couch_new is running, even though it's not receiving user data
  directly

- replicator runs, checks replication history between couch_new and
  couch_backup, sees there is none, starts from scratch

- replicator gets all changes in couch_new from the beginning, sees if
  they exist in couch_backup. It checks _every_ doc so it might take
  some time

- since the data already exists on the destination, replicator won't
  transfer any data

- at the end, replicator will save a checkpoint for this replication
  stating "there's been a replication between those 2 databases, up to
  source id xxx and target id yyy" (note: this checkpoint is saved in
  two parts, one on each end. But as a user you don't care). Now the
  next time replicator runs, it will not start from scratch.

In your situation, I'd set up a replication couch_new => couch_backup
as soon as couch_new is up. You'd have a 3-way replication:

- couch_live => couch_new
- couch_live => couch_backup
- couch_new => couch_backup

which is totally fine and how CouchDB's replication protocol was
intended to work. This way, the moment you turn couch_live down, the
backup replication will already be up and running and all the data is
where it should be. Don't forget to remove the couch_live =>
couch_backup replication.

I hope this answers your questions !

[0] http://thread.gmane.org/gmane.comp.db.couchdb.devel/11724

-- 
Matthieu Rakotojaona