You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Nathan Vander Wilt <na...@calftrail.com> on 2013/02/05 19:38:45 UTC

Re: Round-robin replication [was Half-baked idea: incremental virtual databases]

+1 on round-robin continuous replication. Ideally it'd allow replications to specify a relative priority, e.g. replication of log alerts or chat messages might desire lower latency (higher priority) than a ddoc deployment or user backup. For now I'm going to just implement my own duct tape version of this, using cron jobs to trigger non-continuous replications.

FWIW, I'm sharing with my client's permission the script I've been using to load test continuous filtered replication to/from a central master:
https://gist.github.com/natevw/4711127

The test script sets up N+1 databases, writes documents to 1 as the master while replicating to the other N as well as "short-polling" the _changes to kinda simulate general load on top of the longpolling the application does. On OS X I can only get to around 250 users due to some FD_SETSIZE stuff with Erlang, but it remains stable if I keep it under that limit — however, it takes the user database replications a *long* time to all get caught up (some don't even start until a few minutes after the changes stop).

hth,
-natevw


On Feb 4, 2013, at 2:50 PM, Robert Newson wrote:

> I had a mind to teach the _replicator db this trick. Since we have a
> record of everything we need to resume a replication there's no reason
> for a one-to-one correspondence between a _replicator doc and a
> replicator process. We can simply run N of them for a bit (say, a
> batch of 1000 updates) and then switch to others. The internal
> db_updated mechanism is a good way to notice when we might have
> updates worth sending but it's only half the story. A round-robin over
> all _replicator docs (other than one-shot ones, of course) seems a
> really neat trick to me.
> 
> B.
> 
> On 4 February 2013 22:39, Jan Lehnardt <ja...@apache.org> wrote:
>> 
>> On Feb 4, 2013, at 23:14 , Nathan Vander Wilt <na...@gmail.com> wrote:
>> 
>>> On Jan 29, 2013, at 5:53 PM, Nathan Vander Wilt wrote:
>>>> So I've heard from both hosting providers that it is fine, but also managed to take both of their shared services "down" with only about ~100 users (200 continuous filtered replications). I'm only now at the point where I have tooling to build out arbitrary large tests on my local machine to see the stats for myself, but as I understand it the issue is that every replication needs at least one couchjs process to do its filtering for it.
>>>> 
>>>> So rather than inactive users mostly just taking up disk space, they're instead costing a full-fledged process worth of memory and system resources, each, all the time. As I understand it, this isn't much better on BigCouch either since the data is scattered ± evenly on each machine, so while the *computation* is spread, each node in the cluster still needs k*numberOfUsers couchjs processes running. So it's "scalable" in the sense that traditional databases are scalable: vertically, by buying machines with more and more memory.
>>>> 
>>>> Again, I am still working on getting a better feel for the costs involved, but the basic theory with a master-to-N hub is not a great start: every change needs to be processed by every N replications. So if a user writes a document that ends up in the master database, every other user's filter function needs to process that change coming out of master. Even when N users are generating 0 (instead of M) changes, it's not doing M*N work but there's still always 2*N open connections and supporting processes providing a nasty baseline for large values of N.
>>> 
>>> Looks like I was wrong about needing enough RAM for one couchjs process per replication.
>>> 
>>> CouchDB maintains a pool of (no more than query_server_config/os_process_limit) couchjs processes and work is divvied out amongst these as necessary. I found a little meta-discussion of this system at https://issues.apache.org/jira/browse/COUCHDB-1375 and the code uses it here https://github.com/apache/couchdb/blob/master/src/couchdb/couch_query_servers.erl#L299
>>> 
>>> On my laptop, I was able to spin up 250 users without issue. Beyond that, I start running into ± hardcoded system resource limits that Erlang has under Mac OS X but from what I've seen the only theoretical scalability issue with going beyond that on Linux/Windows would be response times, as the worker processes become more and more saturated.
>>> 
>>> It still seems wise to implement tiered replications for communicating between thousands of *active* user databases, but that seems reasonable to me.
>> 
>> CouchDB’s design is obviously lacking here.
>> 
>> For immediate relief, I’ll propose the usual jackhammer of unpopular responses: write your filters in Erlang. (sorry :)
>> 
>> For the future: we already see progress in improving the view server situation. Once we get to a more desirable setup (yaynode/v8), we can improve the view server communication, there is no reason you’d need a single JS OS process per active replication and we should absolutely fix that.
>> 
>> --
>> 
>> Another angle is the replicator. I know Jason Smith has a prototype of this in Node, it works. Instead of maintaining N active replications, we just keep a maximum number of active connections and cycle out ones that are currently inactive. The DbUpdateNotification mechanism should make this relatively straightforward. There is added overhead for setting up and tearing down replications, but we can make better use of resources and not clog things with inactive replications. Especially in a db-per-user scenario, most replications don’t see much of an update most of the time, they should be inactive until data is written to any of the source databases. The mechanics in CouchDB are all there for this, we just need to write it.
>> 
>> --
>> 
>> Nate, thanks for sharing our findings and for bearing with us, despite your very understandable frustrations. It is people like you that allow us to make CouchDB better!
>> 
>> Best
>> Jan
>> --
>> 
>> 
>>