You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Chris Stockton <ch...@gmail.com> on 2011/05/25 21:23:29 UTC

Thoughts on server wide replication

I was thinking if there was a server wide replication we could support
many more users. Currently we are at a few thousand and we are
starting to feel just the expense of all of the TCP connections and
replication tasks, the calls to status to monitor that they are
running etc are getting very expensive and noticeable.

It would seem to me that a API for server wide replication would
greatly benefit our use patterns, and I'm sure anyone else who scales
through many databases (One database, is one customer).

Here is a few ideas for such a feature, throwing this out here just to
see if it sparks interest.

We will call this API _replicate_server for example purposes, name
could be subject to discussion.

To begin server wide replication:
  curl -vX POST http://localhost:5984/_replicate_server -d
'{"source":"example-database","target":"http://example.org/example-database"}'
    -> {"ok": true, <... other details>}

To begin server wide replication with a filtering function, here maybe
we can return either FALSE to not replicate, TRUE to replicate, then
an array of filters to use a filtering function? this could be simple
or very robust
  function(dbName, req) {
    return s.indexOf("my_interesting_dbs_prefix") == 1;
  }

  curl -vX POST http://localhost:5984/_replicate_server -d
'{"source":"example-database","target":"http://example.org/example-database",
"filter": "filters/server_filter"}'
    -> {"ok": true, <... other details>}

To begin server wide replication for a array of dbs:
  curl -vX POST http://localhost:5984/_replicate_server -d
'{"source":"example-database","target":"http://example.org/example-database",
"database_names": ["db_1", "db_2" ..., "db_3050"]}'
    -> {"ok": true, <... other details>}

Other params for request:
  "persistent": true|false - should this replication job persist
through couchdb restart, maybe this adds a entry to the config file or
something?
  "continuous": true|false - do a one time pass of all dbs or not,
defaulting to true makes sense, but is inconsistent with _replicate,
maybe just not support 1 time passes? my specific use cases don't
require it but I don't want to just speak for myself.

Just some thoughts from my last 1-2years or so experience with couchdb
and my use patterns. If we could trim down and improve replication
usability a bit I think couchdb could greatly benefit as a project.
Right now having to tell replication to start, having to make sure it
runs on restart (I know changes are coming/implemented for this of
some sort), and monitoring your databases to make sure they are up to
date is just a bit too much for the app tier to do and scares away
DBA's from embracing the technology as much I think.

Overall I love couchdb and find it to be a great product and has fit
our needs very well.

-Chris

Fwd: Thoughts on server wide replication

Posted by Chris Stockton <ch...@gmail.com>.

Forwarding from user list upon suggestion.

---------- Forwarded message ----------
From: Chris Stockton <ch...@gmail.com>
Date: Wed, May 25, 2011 at 12:23 PM
Subject: Thoughts on server wide replication
To: user@couchdb.apache.org

I was thinking if there was a server wide replication we could support
many more users. Currently we are at a few thousand and we are
starting to feel just the expense of all of the TCP connections and
replication tasks, the calls to status to monitor that they are
running etc are getting very expensive and noticeable.

It would seem to me that a API for server wide replication would
greatly benefit our use patterns, and I'm sure anyone else who scales
through many databases (One database, is one customer).

Here is a few ideas for such a feature, throwing this out here just to
see if it sparks interest.

We will call this API _replicate_server for example purposes, name
could be subject to discussion.

To begin server wide replication:
 curl -vX POST http://localhost:5984/_replicate_server -d
'{"source":"example-database","target":"http://example.org/example-database"}'
   -> {"ok": true, <... other details>}

To begin server wide replication with a filtering function, here maybe
we can return either FALSE to not replicate, TRUE to replicate, then
an array of filters to use a filtering function? this could be simple
or very robust
 function(dbName, req) {
   return s.indexOf("my_interesting_dbs_prefix") == 1;
 }

 curl -vX POST http://localhost:5984/_replicate_server -d
'{"source":"example-database","target":"http://example.org/example-database",
"filter": "filters/server_filter"}'
   -> {"ok": true, <... other details>}

To begin server wide replication for a array of dbs:
 curl -vX POST http://localhost:5984/_replicate_server -d
'{"source":"example-database","target":"http://example.org/example-database",
"database_names": ["db_1", "db_2" ..., "db_3050"]}'
   -> {"ok": true, <... other details>}

Other params for request:
 "persistent": true|false - should this replication job persist
through couchdb restart, maybe this adds a entry to the config file or
something?
 "continuous": true|false - do a one time pass of all dbs or not,
defaulting to true makes sense, but is inconsistent with _replicate,
maybe just not support 1 time passes? my specific use cases don't
require it but I don't want to just speak for myself.

Just some thoughts from my last 1-2years or so experience with couchdb
and my use patterns. If we could trim down and improve replication
usability a bit I think couchdb could greatly benefit as a project.
Right now having to tell replication to start, having to make sure it
runs on restart (I know changes are coming/implemented for this of
some sort), and monitoring your databases to make sure they are up to
date is just a bit too much for the app tier to do and scares away
DBA's from embracing the technology as much I think.

Overall I love couchdb and find it to be a great product and has fit
our needs very well.

-Chris

Re: Thoughts on server wide replication

Posted by Randall Leeds <ra...@gmail.com>.

On Wed, Jun 15, 2011 at 15:25, Chris Stockton <ch...@gmail.com> wrote:
> Hello,
>
> On Wed, May 25, 2011 at 12:23 PM, Chris Stockton
> <ch...@gmail.com> wrote:
>> I was thinking if there was a server wide replication we could support
>> many more users. Currently we are at a few thousand and we are
>> starting to feel just the expense of all of the TCP connections and
>> replication tasks, the calls to status to monitor that they are
>> running etc are getting very expensive and noticeable.
>
> No one has any comments or suggestions on replication scaling for many
> databases? Would be much appreciated.
>
> -Chris
>

Filipe should chime in. I believe the new replicator in 1.1+ would
share the same set of connections between any two hosts so you should
find that the number of connections used when you're replicating many
many dbs between the same set of hosts is more reasonable going
forward.

Re: Thoughts on server wide replication

Posted by Chris Stockton <ch...@gmail.com>.

Hello,

On Wed, May 25, 2011 at 12:23 PM, Chris Stockton
<ch...@gmail.com> wrote:
> I was thinking if there was a server wide replication we could support
> many more users. Currently we are at a few thousand and we are
> starting to feel just the expense of all of the TCP connections and
> replication tasks, the calls to status to monitor that they are
> running etc are getting very expensive and noticeable.

No one has any comments or suggestions on replication scaling for many
databases? Would be much appreciated.

-Chris