You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Jan Lehnardt <ja...@apache.org> on 2014/01/02 21:17:27 UTC

Scaling db-per-user (Was: Re: Disabling doc include)

On 02 Jan 2014, at 21:08 , Jens Alfke <je...@couchbase.com> wrote:

> 
> On Jan 2, 2014, at 11:56 AM, Jan Lehnardt <ja...@apache.org> wrote:
> 
>> Out of curiosity, what scaling limit have you found? Is this documented somewhere?
> 
> By “we found” I should have said “we extrapolated”. We have customers that will need hundreds of thousands of user accounts, and attaching that many replications to a central master database wouldn’t be practical. Especially since they’d be filtered replications — every time a document was added to the master database, CouchDB would have to run n filter functions to decide whether to push it to n different user databases. Another scaling factor is that, if documents are accessible to many users, the storage space needed for those documents will be greatly multiplied since many replicas of them will exist.
> 
> If someone only needs a few hundred user or user-subset databases, though, this should be a feasible approach.

Thanks Jens. Sounds sensible to me.

We added /_db_updates in 1.4.0 that allows building the above with the difference that a replication only runs for active users, thus delaying most of the work until it is needed *and* avoiding having to run hundreds of thousands of replications at the same time. Managing all that however is still an exercise left to the user (I know of two implementations of this in node) and we should see if we can smart up the replicator accordingly.

Storage aside, do you thing that would worked for your scenario?

Re: Scaling db-per-user (Was: Re: Disabling doc include)

Posted by Jens Alfke <je...@couchbase.com>.
On Jan 2, 2014, at 12:17 PM, Jan Lehnardt <ja...@apache.org>> wrote:

We added /_db_updates in 1.4.0 that allows building the above with the difference that a replication only runs for active users, thus delaying most of the work until it is needed *and* avoiding having to run hundreds of thousands of replications at the same time.

I’ve looked up the API docs for _db_updates*, but I don’t see how it’s related to this topic. It appears to be a server-level changes feed that tells you when databases are created/deleted/changed.

I can see that you might use this to watch all the user databases and, when one changes, start a one-shot push replication to the central db. But the flip side is that when the central db changes you have to push the change to every single user database, and this doesn’t help with that.

—Jens

* http://docs.couchdb.org/en/latest/api/server/common.html#db-updates