You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by ro...@gmail.com on 2010/11/09 10:17:24 UTC

Design considerations

I've been following recent threads with interest as I'm pretty new to couch
but loving it's simplicity and power right now. One project I'm considering
it for would use couch as a central document store for circa 10,000 users
but each user may see a different "slice" of the full dataset (lets assume
the total number of documents is ~ 10m). These slices would overlap - user 1
might have anything from 0% to 99% of the documents in common with user 2.

I'm assuming from the couch security model that my approach should be to
have a different database for each user? But I have a couple of queries;

1) Is this approach valid?
2) Are there any downsides to the database per user approach (apart from the
obvious [and potentially vast] disk space duplication)?
3) Does the COPY command work across databases - ie can I use it to give a
user access to material from a central repository
4) Is there a better way?

FYI It's a system I've previously built with SQL and the permissions side of
things (who has access to what) was always a right pain in the neck - so
much so that we were considering a database per user even then.

Thanks in advance
Roger

Re: Design considerations

Posted by Ian Hobson <ia...@ntlworld.com>.

On 09/11/2010 19:40, roger.moffatt@gmail.com wrote:
> Thanks Zach, that's most helpful.
>
> Regarding the number of databases that can be open at once, how
> quickly are the connections closed down? If a user loads a web page
> (via a server side application layer) which accesses their database
> for example, is their connection closed pretty much immediately, or
> does it linger around for a while? If so ... how long is "a while"?
> Does this make sense? I'll give some thought to the sharding we could
> use to keep within sensible limits in the meantime.
>
> Shame about COPY not working between databases, but I guess not
> surprising. Replication filters sound interesting ... and looking at
> the docs I see I can do named document replication, which is
> effectively a COPY and hence perfect I think?
>
>> Finally, you'll really need to consider if/how/when folks will update
>> this data. If two users both want to update the same document on their
>> slices, and replicate their modifications back to the central
>> database, you're going to have to build conflict resolution into your
>> system.
> Experience shows that data tends to be edited shortly after document
> creation and rarely thereafter. Equally it's almost always edited by
> the person that created it - so the two users simultaneously editing a
> document scenario will be extraordinarily rare, but we will indeed
> need to handle conflict resolution if it happens. I've seen that in
> action with some of my tests and think that should all be fine.
>
Hi Roger,

An idea occurs to me. If you have one central database containing all 
the documents,
then you would not have the vast requirements of disk space.

Each user has his own, small database of index records. When he wants a 
document
the software would have to use the index document to get the _id in the main
database and load that.

You may have to work round cross site scripting issues either by using a 
proxy
or writing code run on the client. No sure.

Please treat these ideas as a start point - I'm no couchDB expert. 
Perhaps someone who
knows can comment?

Regards

Ian

Re: Design considerations

Posted by ro...@gmail.com.

Thanks Zach, that's most helpful.

Regarding the number of databases that can be open at once, how
quickly are the connections closed down? If a user loads a web page
(via a server side application layer) which accesses their database
for example, is their connection closed pretty much immediately, or
does it linger around for a while? If so ... how long is "a while"?
Does this make sense? I'll give some thought to the sharding we could
use to keep within sensible limits in the meantime.

Shame about COPY not working between databases, but I guess not
surprising. Replication filters sound interesting ... and looking at
the docs I see I can do named document replication, which is
effectively a COPY and hence perfect I think?

> Finally, you'll really need to consider if/how/when folks will update
> this data. If two users both want to update the same document on their
> slices, and replicate their modifications back to the central
> database, you're going to have to build conflict resolution into your
> system.

Experience shows that data tends to be edited shortly after document
creation and rarely thereafter. Equally it's almost always edited by
the person that created it - so the two users simultaneously editing a
document scenario will be extraordinarily rare, but we will indeed
need to handle conflict resolution if it happens. I've seen that in
action with some of my tests and think that should all be fine.

Sorry, this was a bit of a waffle post ... I'm just thinking out loud
while I ponder the system architecture.

Roger

Re: Design considerations

Posted by Zachary Zolton <za...@gmail.com>.

Roger,

CouchDB is known to handle many thousand databases, so I'd say that
having a database per user is a valid approach. CouchDB does however
have a configurable limit to the number of databases that can be open
at once, which you may want to read about here:

http://is.gd/gROMq

The COPY verb cannot copy a document between databases, however you'll
likely want to use CouchDB's replication to move documents between
databases. Obviously, the CouchDB guide provides a good background:

http://guide.couchdb.org/draft/replication.html

Moreover, replication filters might be a convenient way to determine
who get which documents, although I'd really have to hear more about
your scenario to give more explicit advice. Read about replication
filters here:

http://is.gd/gRP8Q

Finally, you'll really need to consider if/how/when folks will update
this data. If two users both want to update the same document on their
slices, and replicate their modifications back to the central
database, you're going to have to build conflict resolution into your
system.

Cheers,

Zach

On Tue, Nov 9, 2010 at 3:17 AM,  <ro...@gmail.com> wrote:
> I've been following recent threads with interest as I'm pretty new to couch
> but loving it's simplicity and power right now. One project I'm considering
> it for would use couch as a central document store for circa 10,000 users
> but each user may see a different "slice" of the full dataset (lets assume
> the total number of documents is ~ 10m). These slices would overlap - user 1
> might have anything from 0% to 99% of the documents in common with user 2.
>
> I'm assuming from the couch security model that my approach should be to
> have a different database for each user? But I have a couple of queries;
>
> 1) Is this approach valid?
> 2) Are there any downsides to the database per user approach (apart from the
> obvious [and potentially vast] disk space duplication)?
> 3) Does the COPY command work across databases - ie can I use it to give a
> user access to material from a central repository
> 4) Is there a better way?
>
> FYI It's a system I've previously built with SQL and the permissions side of
> things (who has access to what) was always a right pain in the neck - so
> much so that we were considering a database per user even then.
>
> Thanks in advance
> Roger
>