You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Robert Campbell <rr...@gmail.com> on 2009/11/05 11:02:01 UTC

How do you handle multiple document groups?

First, let me say that I posted this question to StackOverflow here:
http://stackoverflow.com/questions/1674662/nested-databases-in-couchdb

Here is what I mean by multiple document groups:
Assume you are building a blog engine which services hundreds of
domains. Within each domain, you would have similar groups of
documents: users, posts, comments, etc. How would you structure this
in CouchDB?

One way would be to create a User database which contains users from
all domains. The user documents would have a "domain" field which
denotes which domain this user is valid on. Likewise, you would have a
single Post database, with each post document having a domain field
and so on. I don't like this solution because 1) you will have lots of
data duplication, where every single document has to denote the domain
it's connected to. 2) It seems like it could be a security problem.
One vulnerability in your view functions could accidentally return one
domain's user set to another, etc. If we change the blog engine into
an enterprise document management/workflow engine, you could have
serious problems exposing one document to a competitor.

Another way you could do it is by bringing the domain group up into
the database level. This means you'd have "MyApp.com-Users",
"MyApp.com-Posts", "MyApp.com-Comments", "AnotherApp.net-Users", etc.
This helps reduce the data duplication and maybe the security issue a
bit, because now your documents don't need to specify a "domain" field
everywhere. Your app would follow the simple naming convention to
select the proper database. The disadvantage of this is that it feels
like a hack. What I really want is a MyApp database which contains
User, Post, and Comment sub-databases (for example) which then contain
the documents for that top-level group (domain) and lower-level group
(posts).

How do you guys address this problem?

Re: How do you handle multiple document groups?

Posted by Brian Candler <B....@pobox.com>.
On Thu, Nov 05, 2009 at 06:33:20PM +0100, Jan Lehnardt wrote:
> PS: No need to CC me on the couch lists :)

I have been using 'g' (group reply) in mutt, but you have prompted me to
investigate 'L' (list reply) and what's needed to configure it in .muttrc.

Hopefully it should be OK now :-)

Re: How do you handle multiple document groups?

Posted by Jan Lehnardt <ja...@apache.org>.
On 5 Nov 2009, at 16:45, Brian Candler wrote:

> On Thu, Nov 05, 2009 at 12:03:55PM +0100, Jan Lehnardt wrote:
>>> For something like "myapp.com" domain and "users" database. I'll  
>>> just
>>> have to replace all periods with underscores or something.
>>
>> I wouldn't create databases per type.
>>
>> say user a has the domains foo.com and bar.com
>>
>> for that user the databases /a/foo_com/ and /a/bar_com
>> are created. In these databases, all documents for the
>> respective domain live. If you need additional info for
>> the user that owns the domain that is not specific to
>> the domain, I'd go with putting all user-specific info
>> in each of the user's databases. This is duplication,
>> but it doesn't really hurt, except maybe for users
>> with hundreds and thousands of domains.
>
> I'd say it depends what you mean by a "user"
>
> If a user logs in as user@foo.com to get to the foo.com blog, and
> user@bar.com to get to the bar.com blog, then each database probably  
> should
> have its own users table.
>
> If you can login as user@example.com and you are 'attached' to both  
> the
> foo.com and bar.com domains as a user, then I'd say you'd need a  
> single
> users database, separate from the domain databases. The records in  
> the user
> database would list what domains the user is allowed to see. You can  
> use a
> map view to turn the data backwards, e.g. so that you can see which  
> users
> are allowed to access foo.com


Yeah, I had both methods in mind while writing, Thanks for clearing it  
up :)

Cheers
Jan
--
PS: No need to CC me on the couch lists :)



Re: How do you handle multiple document groups?

Posted by Brian Candler <B....@pobox.com>.
On Thu, Nov 05, 2009 at 12:03:55PM +0100, Jan Lehnardt wrote:
>> For something like "myapp.com" domain and "users" database. I'll just
>> have to replace all periods with underscores or something.
>
> I wouldn't create databases per type.
>
> say user a has the domains foo.com and bar.com
>
> for that user the databases /a/foo_com/ and /a/bar_com
> are created. In these databases, all documents for the
> respective domain live. If you need additional info for
> the user that owns the domain that is not specific to
> the domain, I'd go with putting all user-specific info
> in each of the user's databases. This is duplication,
> but it doesn't really hurt, except maybe for users
> with hundreds and thousands of domains.

I'd say it depends what you mean by a "user"

If a user logs in as user@foo.com to get to the foo.com blog, and
user@bar.com to get to the bar.com blog, then each database probably should
have its own users table.

If you can login as user@example.com and you are 'attached' to both the
foo.com and bar.com domains as a user, then I'd say you'd need a single
users database, separate from the domain databases. The records in the user
database would list what domains the user is allowed to see. You can use a
map view to turn the data backwards, e.g. so that you can see which users
are allowed to access foo.com

Re: How do you handle multiple document groups?

Posted by Robert Campbell <rr...@gmail.com>.
Jan, Brian, those are all excellent points. Per your recommendations,
I've decided to keep it a single database-per-domain, with no "type"
databases or grouping by naming conversion. You've both enumerated
numerous important benefits, the greatest of which (for me) is:

> But it does make your application harder to write - you can't have views which incorporate multiple types -

Not being able to write a view which incorporates documents of other
types would be a deal breaker.

Jan's comment that type databases feels too RDBMSy is correct - I come
from almost a decade of RDBMS-only modeling - so I'll definitely need
to "relax" a bit as suggested.

Thanks again everyone!



On Fri, Nov 6, 2009 at 9:56 AM, Jan Lehnardt <ja...@apache.org> wrote:
>
> On 6 Nov 2009, at 09:49, Brian Candler wrote:
>
>> On Thu, Nov 05, 2009 at 11:12:08AM -0800, Adam Wolff wrote:
>>>
>>> Can I ask what the advantage of this is? Is this for replication? I like
>>> having typed databases; it seems like that will be an easy way to solve
>>> scaling problems.
>>
>> A database per type is only going to be a limited number: e.g. if you have
>> records of type X, type Y and type Z that's three databases. You could
>> split
>> these across three servers but would likely end up with very unbalanced
>> load.
>>
>> If you have a database per domain, and you have 10,000 domains, then
>> having
>> a database per domain is going to give you a lot more options for scaling
>> -
>> e.g. split the domains across N databases, 1/Nth per database.
>>
>> Combining the two, so you have 30,000 databases, doesn't really give you
>> any
>> more options for scaling. But it does make your application harder to
>> write
>> - you can't have views which incorporate multiple types - and it will
>> probably make it perform less well, because couchdb will need to keep
>> three
>> times as many filehandles open (or rather, for a given load it will
>> discard
>> cached filehandles more early, so will have to open databases more often)
>>
>> You can identify types within the doc _id if you want, e.g.
>>  user_xxxxxxx
>>  post_yyyyyyy
>> It's easy enough to split the _id within a view to identify the type. Or
>> you
>> can just use an attribute within the doc.
>>
>> Also: another reason for a separate database per domain is for security
>> purposes, as it makes it easy to virtualise your app and prevents data
>> leakage from one account to another. You can make the database name an
>> attribute of the user's session. Having separate databases per type within
>> a
>> domain just makes access control more complex - you have to grant the user
>> access to three databases - without really giving any security benefit.
>
>
> Good points! In addition to the self-contained and security points. A db per
> user allows you let the user replicate all his/her data for offline use
> which
> you may or may not like to support. But if not, you should think about it :)
>
> Cheers
> Jan
> --
>
>
>
>
>

Re: How do you handle multiple document groups?

Posted by Jan Lehnardt <ja...@apache.org>.
On 6 Nov 2009, at 09:49, Brian Candler wrote:

> On Thu, Nov 05, 2009 at 11:12:08AM -0800, Adam Wolff wrote:
>> Can I ask what the advantage of this is? Is this for replication? I  
>> like
>> having typed databases; it seems like that will be an easy way to  
>> solve
>> scaling problems.
>
> A database per type is only going to be a limited number: e.g. if  
> you have
> records of type X, type Y and type Z that's three databases. You  
> could split
> these across three servers but would likely end up with very  
> unbalanced
> load.
>
> If you have a database per domain, and you have 10,000 domains, then  
> having
> a database per domain is going to give you a lot more options for  
> scaling -
> e.g. split the domains across N databases, 1/Nth per database.
>
> Combining the two, so you have 30,000 databases, doesn't really give  
> you any
> more options for scaling. But it does make your application harder  
> to write
> - you can't have views which incorporate multiple types - and it will
> probably make it perform less well, because couchdb will need to  
> keep three
> times as many filehandles open (or rather, for a given load it will  
> discard
> cached filehandles more early, so will have to open databases more  
> often)
>
> You can identify types within the doc _id if you want, e.g.
>   user_xxxxxxx
>   post_yyyyyyy
> It's easy enough to split the _id within a view to identify the  
> type. Or you
> can just use an attribute within the doc.
>
> Also: another reason for a separate database per domain is for  
> security
> purposes, as it makes it easy to virtualise your app and prevents data
> leakage from one account to another. You can make the database name an
> attribute of the user's session. Having separate databases per type  
> within a
> domain just makes access control more complex - you have to grant  
> the user
> access to three databases - without really giving any security  
> benefit.


Good points! In addition to the self-contained and security points. A  
db per
user allows you let the user replicate all his/her data for offline  
use which
you may or may not like to support. But if not, you should think about  
it :)

Cheers
Jan
--





Re: How do you handle multiple document groups?

Posted by Brian Candler <B....@pobox.com>.
On Thu, Nov 05, 2009 at 11:12:08AM -0800, Adam Wolff wrote:
> Can I ask what the advantage of this is? Is this for replication? I like
> having typed databases; it seems like that will be an easy way to solve
> scaling problems.

A database per type is only going to be a limited number: e.g. if you have
records of type X, type Y and type Z that's three databases. You could split
these across three servers but would likely end up with very unbalanced
load.

If you have a database per domain, and you have 10,000 domains, then having
a database per domain is going to give you a lot more options for scaling -
e.g. split the domains across N databases, 1/Nth per database.

Combining the two, so you have 30,000 databases, doesn't really give you any
more options for scaling. But it does make your application harder to write
- you can't have views which incorporate multiple types - and it will
probably make it perform less well, because couchdb will need to keep three
times as many filehandles open (or rather, for a given load it will discard
cached filehandles more early, so will have to open databases more often)

You can identify types within the doc _id if you want, e.g.
   user_xxxxxxx
   post_yyyyyyy
It's easy enough to split the _id within a view to identify the type. Or you
can just use an attribute within the doc.

Also: another reason for a separate database per domain is for security
purposes, as it makes it easy to virtualise your app and prevents data
leakage from one account to another. You can make the database name an
attribute of the user's session. Having separate databases per type within a
domain just makes access control more complex - you have to grant the user
access to three databases - without really giving any security benefit.

Just my 2c.

Brian.

Re: How do you handle multiple document groups?

Posted by Jan Lehnardt <ja...@apache.org>.
On 5 Nov 2009, at 20:12, Adam Wolff wrote:

> Can I ask what the advantage of this is? Is this for replication? I  
> like
> having typed databases; it seems like that will be an easy way to  
> solve
> scaling problems.

I'm not sure how "typed databases" have any impact on scaling. I like  
self-self contained databases because they are easier to reason about  
and easier to handle practically (delete a user / delete the db,  
done). Not that it is generally a bad thing but "typed databases"  
smell like RDBMS tables to me. There might be nothing wrong with it in  
certain applications, in others, it might lead to wrong design choices.

Cheers
Jan
--


>
> A
>
> On Thu, Nov 5, 2009 at 3:03 AM, Jan Lehnardt <ja...@apache.org> wrote:
>
>>
>> On 5 Nov 2009, at 11:47, Robert Campbell wrote:
>>
>> Okay, I _do_ like that CouchDB lets me use "/" in a database name, so
>>> I can hopefully do "http://xxx/myapp_com/users" which feels better
>>> thanks to the / delimiter.
>>>
>>> For something like "myapp.com" domain and "users" database. I'll  
>>> just
>>> have to replace all periods with underscores or something.
>>>
>>
>> I wouldn't create databases per type.
>>
>> say user a has the domains foo.com and bar.com
>>
>> for that user the databases /a/foo_com/ and /a/bar_com
>> are created. In these databases, all documents for the
>> respective domain live. If you need additional info for
>> the user that owns the domain that is not specific to
>> the domain, I'd go with putting all user-specific info
>> in each of the user's databases. This is duplication,
>> but it doesn't really hurt, except maybe for users
>> with hundreds and thousands of domains. In which
>> case he/she probably pays you enough money to
>> solve it :)
>>
>> Cheers
>> Jan
>> --
>>
>>
>>
>>
>>>
>>> On Thu, Nov 5, 2009 at 11:19 AM, Jan Lehnardt <ja...@apache.org>  
>>> wrote:
>>>
>>>>
>>>> On 5 Nov 2009, at 11:02, Robert Campbell wrote:
>>>>
>>>> First, let me say that I posted this question to StackOverflow  
>>>> here:
>>>>> http://stackoverflow.com/questions/1674662/nested-databases-in-couchdb
>>>>>
>>>>> Here is what I mean by multiple document groups:
>>>>> Assume you are building a blog engine which services hundreds of
>>>>> domains. Within each domain, you would have similar groups of
>>>>> documents: users, posts, comments, etc. How would you structure  
>>>>> this
>>>>> in CouchDB?
>>>>>
>>>>> One way would be to create a User database which contains users  
>>>>> from
>>>>> all domains. The user documents would have a "domain" field which
>>>>> denotes which domain this user is valid on. Likewise, you would  
>>>>> have a
>>>>> single Post database, with each post document having a domain  
>>>>> field
>>>>> and so on. I don't like this solution because 1) you will have  
>>>>> lots of
>>>>> data duplication, where every single document has to denote the  
>>>>> domain
>>>>> it's connected to. 2) It seems like it could be a security  
>>>>> problem.
>>>>> One vulnerability in your view functions could accidentally  
>>>>> return one
>>>>> domain's user set to another, etc. If we change the blog engine  
>>>>> into
>>>>> an enterprise document management/workflow engine, you could have
>>>>> serious problems exposing one document to a competitor.
>>>>>
>>>>> Another way you could do it is by bringing the domain group up  
>>>>> into
>>>>> the database level. This means you'd have "MyApp.com-Users",
>>>>> "MyApp.com-Posts", "MyApp.com-Comments", "AnotherApp.net-Users",  
>>>>> etc.
>>>>> This helps reduce the data duplication and maybe the security  
>>>>> issue a
>>>>> bit, because now your documents don't need to specify a "domain"  
>>>>> field
>>>>> everywhere. Your app would follow the simple naming convention to
>>>>> select the proper database. The disadvantage of this is that it  
>>>>> feels
>>>>> like a hack. What I really want is a MyApp database which contains
>>>>> User, Post, and Comment sub-databases (for example) which then  
>>>>> contain
>>>>> the documents for that top-level group (domain) and lower-level  
>>>>> group
>>>>> (posts).
>>>>>
>>>>> How do you guys address this problem?
>>>>>
>>>>
>>>> Give each user/domain combo a separate database. Lot's of  
>>>> databases are
>>>> no
>>>> problem.
>>>>
>>>> Cheers
>>>> Jan
>>>> --
>>>>
>>>>
>>>>
>>>
>>


Re: How do you handle multiple document groups?

Posted by Robert Campbell <rr...@gmail.com>.
> Can I ask what the advantage of this is? Is this for replication? I like
> having typed databases; it seems like that will be an easy way to solve
> scaling problems.

For me it's just about grouping. According to CouchDB: The Definitive
Guide, a CouchDB database is "...a bucket that holds 'related data'."

So in my example Blog application, I see two groups of related data:
1) all the data for a particular domain (posts, comments, etc)
2) all the data which is semantically similar (all posts grouped
together, all comments grouped together, all users grouped together,
etc)

My problem is that CouchDB really only allows 1 level of grouping.
This means I either have to simulate multiple groups by using some
database naming conversion or I have to just pick one group (as Jan
suggested, domain) and leave documents of all different semantic types
just piled in together (post docs, comment docs, user docs all over
the place). Of course this isn't as bad as it sounds, because I can
make a View to sort them all out.

As to Adam's question, I haven't really though about
replication/scaling yet; I'm just a couple chapters into the book I
quoted and I'm still playing with the 0.8.0 version sitting on my
Ubuntu (why isn't there a 0.9.0 deb?).

I'm still not sure which of the two options I'll select:
myapp_com/posts, myapp_com/comments databases or just myapp_com
database with all types lying within + Views to sort them out. Either
way it should be fun :-)



On Thu, Nov 5, 2009 at 8:12 PM, Adam Wolff <aw...@gmail.com> wrote:
> Can I ask what the advantage of this is? Is this for replication? I like
> having typed databases; it seems like that will be an easy way to solve
> scaling problems.
>
> A
>
> On Thu, Nov 5, 2009 at 3:03 AM, Jan Lehnardt <ja...@apache.org> wrote:
>
>>
>> On 5 Nov 2009, at 11:47, Robert Campbell wrote:
>>
>>  Okay, I _do_ like that CouchDB lets me use "/" in a database name, so
>>> I can hopefully do "http://xxx/myapp_com/users" which feels better
>>> thanks to the / delimiter.
>>>
>>> For something like "myapp.com" domain and "users" database. I'll just
>>> have to replace all periods with underscores or something.
>>>
>>
>> I wouldn't create databases per type.
>>
>> say user a has the domains foo.com and bar.com
>>
>> for that user the databases /a/foo_com/ and /a/bar_com
>> are created. In these databases, all documents for the
>> respective domain live. If you need additional info for
>> the user that owns the domain that is not specific to
>> the domain, I'd go with putting all user-specific info
>> in each of the user's databases. This is duplication,
>> but it doesn't really hurt, except maybe for users
>> with hundreds and thousands of domains. In which
>> case he/she probably pays you enough money to
>> solve it :)
>>
>> Cheers
>> Jan
>> --
>>
>>
>>
>>
>>>
>>> On Thu, Nov 5, 2009 at 11:19 AM, Jan Lehnardt <ja...@apache.org> wrote:
>>>
>>>>
>>>> On 5 Nov 2009, at 11:02, Robert Campbell wrote:
>>>>
>>>>  First, let me say that I posted this question to StackOverflow here:
>>>>> http://stackoverflow.com/questions/1674662/nested-databases-in-couchdb
>>>>>
>>>>> Here is what I mean by multiple document groups:
>>>>> Assume you are building a blog engine which services hundreds of
>>>>> domains. Within each domain, you would have similar groups of
>>>>> documents: users, posts, comments, etc. How would you structure this
>>>>> in CouchDB?
>>>>>
>>>>> One way would be to create a User database which contains users from
>>>>> all domains. The user documents would have a "domain" field which
>>>>> denotes which domain this user is valid on. Likewise, you would have a
>>>>> single Post database, with each post document having a domain field
>>>>> and so on. I don't like this solution because 1) you will have lots of
>>>>> data duplication, where every single document has to denote the domain
>>>>> it's connected to. 2) It seems like it could be a security problem.
>>>>> One vulnerability in your view functions could accidentally return one
>>>>> domain's user set to another, etc. If we change the blog engine into
>>>>> an enterprise document management/workflow engine, you could have
>>>>> serious problems exposing one document to a competitor.
>>>>>
>>>>> Another way you could do it is by bringing the domain group up into
>>>>> the database level. This means you'd have "MyApp.com-Users",
>>>>> "MyApp.com-Posts", "MyApp.com-Comments", "AnotherApp.net-Users", etc.
>>>>> This helps reduce the data duplication and maybe the security issue a
>>>>> bit, because now your documents don't need to specify a "domain" field
>>>>> everywhere. Your app would follow the simple naming convention to
>>>>> select the proper database. The disadvantage of this is that it feels
>>>>> like a hack. What I really want is a MyApp database which contains
>>>>> User, Post, and Comment sub-databases (for example) which then contain
>>>>> the documents for that top-level group (domain) and lower-level group
>>>>> (posts).
>>>>>
>>>>> How do you guys address this problem?
>>>>>
>>>>
>>>> Give each user/domain combo a separate database. Lot's of databases are
>>>> no
>>>> problem.
>>>>
>>>> Cheers
>>>> Jan
>>>> --
>>>>
>>>>
>>>>
>>>
>>
>

Re: How do you handle multiple document groups?

Posted by Adam Wolff <aw...@gmail.com>.
Can I ask what the advantage of this is? Is this for replication? I like
having typed databases; it seems like that will be an easy way to solve
scaling problems.

A

On Thu, Nov 5, 2009 at 3:03 AM, Jan Lehnardt <ja...@apache.org> wrote:

>
> On 5 Nov 2009, at 11:47, Robert Campbell wrote:
>
>  Okay, I _do_ like that CouchDB lets me use "/" in a database name, so
>> I can hopefully do "http://xxx/myapp_com/users" which feels better
>> thanks to the / delimiter.
>>
>> For something like "myapp.com" domain and "users" database. I'll just
>> have to replace all periods with underscores or something.
>>
>
> I wouldn't create databases per type.
>
> say user a has the domains foo.com and bar.com
>
> for that user the databases /a/foo_com/ and /a/bar_com
> are created. In these databases, all documents for the
> respective domain live. If you need additional info for
> the user that owns the domain that is not specific to
> the domain, I'd go with putting all user-specific info
> in each of the user's databases. This is duplication,
> but it doesn't really hurt, except maybe for users
> with hundreds and thousands of domains. In which
> case he/she probably pays you enough money to
> solve it :)
>
> Cheers
> Jan
> --
>
>
>
>
>>
>> On Thu, Nov 5, 2009 at 11:19 AM, Jan Lehnardt <ja...@apache.org> wrote:
>>
>>>
>>> On 5 Nov 2009, at 11:02, Robert Campbell wrote:
>>>
>>>  First, let me say that I posted this question to StackOverflow here:
>>>> http://stackoverflow.com/questions/1674662/nested-databases-in-couchdb
>>>>
>>>> Here is what I mean by multiple document groups:
>>>> Assume you are building a blog engine which services hundreds of
>>>> domains. Within each domain, you would have similar groups of
>>>> documents: users, posts, comments, etc. How would you structure this
>>>> in CouchDB?
>>>>
>>>> One way would be to create a User database which contains users from
>>>> all domains. The user documents would have a "domain" field which
>>>> denotes which domain this user is valid on. Likewise, you would have a
>>>> single Post database, with each post document having a domain field
>>>> and so on. I don't like this solution because 1) you will have lots of
>>>> data duplication, where every single document has to denote the domain
>>>> it's connected to. 2) It seems like it could be a security problem.
>>>> One vulnerability in your view functions could accidentally return one
>>>> domain's user set to another, etc. If we change the blog engine into
>>>> an enterprise document management/workflow engine, you could have
>>>> serious problems exposing one document to a competitor.
>>>>
>>>> Another way you could do it is by bringing the domain group up into
>>>> the database level. This means you'd have "MyApp.com-Users",
>>>> "MyApp.com-Posts", "MyApp.com-Comments", "AnotherApp.net-Users", etc.
>>>> This helps reduce the data duplication and maybe the security issue a
>>>> bit, because now your documents don't need to specify a "domain" field
>>>> everywhere. Your app would follow the simple naming convention to
>>>> select the proper database. The disadvantage of this is that it feels
>>>> like a hack. What I really want is a MyApp database which contains
>>>> User, Post, and Comment sub-databases (for example) which then contain
>>>> the documents for that top-level group (domain) and lower-level group
>>>> (posts).
>>>>
>>>> How do you guys address this problem?
>>>>
>>>
>>> Give each user/domain combo a separate database. Lot's of databases are
>>> no
>>> problem.
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>
>>>
>>
>

Re: How do you handle multiple document groups?

Posted by Jeremy Wall <jw...@google.com>.
I said much the same thing as Jan in my answer to the StackOverflow
Question.

On Thu, Nov 5, 2009 at 5:03 AM, Jan Lehnardt <ja...@apache.org> wrote:

>
> On 5 Nov 2009, at 11:47, Robert Campbell wrote:
>
>  Okay, I _do_ like that CouchDB lets me use "/" in a database name, so
>> I can hopefully do "http://xxx/myapp_com/users" which feels better
>> thanks to the / delimiter.
>>
>> For something like "myapp.com" domain and "users" database. I'll just
>> have to replace all periods with underscores or something.
>>
>
> I wouldn't create databases per type.
>
> say user a has the domains foo.com and bar.com
>
> for that user the databases /a/foo_com/ and /a/bar_com
> are created. In these databases, all documents for the
> respective domain live. If you need additional info for
> the user that owns the domain that is not specific to
> the domain, I'd go with putting all user-specific info
> in each of the user's databases. This is duplication,
> but it doesn't really hurt, except maybe for users
> with hundreds and thousands of domains. In which
> case he/she probably pays you enough money to
> solve it :)
>
> Cheers
> Jan
> --
>
>
>
>
>>
>> On Thu, Nov 5, 2009 at 11:19 AM, Jan Lehnardt <ja...@apache.org> wrote:
>>
>>>
>>> On 5 Nov 2009, at 11:02, Robert Campbell wrote:
>>>
>>>  First, let me say that I posted this question to StackOverflow here:
>>>> http://stackoverflow.com/questions/1674662/nested-databases-in-couchdb
>>>>
>>>> Here is what I mean by multiple document groups:
>>>> Assume you are building a blog engine which services hundreds of
>>>> domains. Within each domain, you would have similar groups of
>>>> documents: users, posts, comments, etc. How would you structure this
>>>> in CouchDB?
>>>>
>>>> One way would be to create a User database which contains users from
>>>> all domains. The user documents would have a "domain" field which
>>>> denotes which domain this user is valid on. Likewise, you would have a
>>>> single Post database, with each post document having a domain field
>>>> and so on. I don't like this solution because 1) you will have lots of
>>>> data duplication, where every single document has to denote the domain
>>>> it's connected to. 2) It seems like it could be a security problem.
>>>> One vulnerability in your view functions could accidentally return one
>>>> domain's user set to another, etc. If we change the blog engine into
>>>> an enterprise document management/workflow engine, you could have
>>>> serious problems exposing one document to a competitor.
>>>>
>>>> Another way you could do it is by bringing the domain group up into
>>>> the database level. This means you'd have "MyApp.com-Users",
>>>> "MyApp.com-Posts", "MyApp.com-Comments", "AnotherApp.net-Users", etc.
>>>> This helps reduce the data duplication and maybe the security issue a
>>>> bit, because now your documents don't need to specify a "domain" field
>>>> everywhere. Your app would follow the simple naming convention to
>>>> select the proper database. The disadvantage of this is that it feels
>>>> like a hack. What I really want is a MyApp database which contains
>>>> User, Post, and Comment sub-databases (for example) which then contain
>>>> the documents for that top-level group (domain) and lower-level group
>>>> (posts).
>>>>
>>>> How do you guys address this problem?
>>>>
>>>
>>> Give each user/domain combo a separate database. Lot's of databases are
>>> no
>>> problem.
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>
>>>
>>
>

Re: How do you handle multiple document groups?

Posted by Jan Lehnardt <ja...@apache.org>.
On 5 Nov 2009, at 11:47, Robert Campbell wrote:

> Okay, I _do_ like that CouchDB lets me use "/" in a database name, so
> I can hopefully do "http://xxx/myapp_com/users" which feels better
> thanks to the / delimiter.
>
> For something like "myapp.com" domain and "users" database. I'll just
> have to replace all periods with underscores or something.

I wouldn't create databases per type.

say user a has the domains foo.com and bar.com

for that user the databases /a/foo_com/ and /a/bar_com
are created. In these databases, all documents for the
respective domain live. If you need additional info for
the user that owns the domain that is not specific to
the domain, I'd go with putting all user-specific info
in each of the user's databases. This is duplication,
but it doesn't really hurt, except maybe for users
with hundreds and thousands of domains. In which
case he/she probably pays you enough money to
solve it :)

Cheers
Jan
--


>
>
> On Thu, Nov 5, 2009 at 11:19 AM, Jan Lehnardt <ja...@apache.org> wrote:
>>
>> On 5 Nov 2009, at 11:02, Robert Campbell wrote:
>>
>>> First, let me say that I posted this question to StackOverflow here:
>>> http://stackoverflow.com/questions/1674662/nested-databases-in-couchdb
>>>
>>> Here is what I mean by multiple document groups:
>>> Assume you are building a blog engine which services hundreds of
>>> domains. Within each domain, you would have similar groups of
>>> documents: users, posts, comments, etc. How would you structure this
>>> in CouchDB?
>>>
>>> One way would be to create a User database which contains users from
>>> all domains. The user documents would have a "domain" field which
>>> denotes which domain this user is valid on. Likewise, you would  
>>> have a
>>> single Post database, with each post document having a domain field
>>> and so on. I don't like this solution because 1) you will have  
>>> lots of
>>> data duplication, where every single document has to denote the  
>>> domain
>>> it's connected to. 2) It seems like it could be a security problem.
>>> One vulnerability in your view functions could accidentally return  
>>> one
>>> domain's user set to another, etc. If we change the blog engine into
>>> an enterprise document management/workflow engine, you could have
>>> serious problems exposing one document to a competitor.
>>>
>>> Another way you could do it is by bringing the domain group up into
>>> the database level. This means you'd have "MyApp.com-Users",
>>> "MyApp.com-Posts", "MyApp.com-Comments", "AnotherApp.net-Users",  
>>> etc.
>>> This helps reduce the data duplication and maybe the security  
>>> issue a
>>> bit, because now your documents don't need to specify a "domain"  
>>> field
>>> everywhere. Your app would follow the simple naming convention to
>>> select the proper database. The disadvantage of this is that it  
>>> feels
>>> like a hack. What I really want is a MyApp database which contains
>>> User, Post, and Comment sub-databases (for example) which then  
>>> contain
>>> the documents for that top-level group (domain) and lower-level  
>>> group
>>> (posts).
>>>
>>> How do you guys address this problem?
>>
>> Give each user/domain combo a separate database. Lot's of databases  
>> are no
>> problem.
>>
>> Cheers
>> Jan
>> --
>>
>>
>


Re: How do you handle multiple document groups?

Posted by Robert Campbell <rr...@gmail.com>.
Okay, I _do_ like that CouchDB lets me use "/" in a database name, so
I can hopefully do "http://xxx/myapp_com/users" which feels better
thanks to the / delimiter.

For something like "myapp.com" domain and "users" database. I'll just
have to replace all periods with underscores or something.


On Thu, Nov 5, 2009 at 11:19 AM, Jan Lehnardt <ja...@apache.org> wrote:
>
> On 5 Nov 2009, at 11:02, Robert Campbell wrote:
>
>> First, let me say that I posted this question to StackOverflow here:
>> http://stackoverflow.com/questions/1674662/nested-databases-in-couchdb
>>
>> Here is what I mean by multiple document groups:
>> Assume you are building a blog engine which services hundreds of
>> domains. Within each domain, you would have similar groups of
>> documents: users, posts, comments, etc. How would you structure this
>> in CouchDB?
>>
>> One way would be to create a User database which contains users from
>> all domains. The user documents would have a "domain" field which
>> denotes which domain this user is valid on. Likewise, you would have a
>> single Post database, with each post document having a domain field
>> and so on. I don't like this solution because 1) you will have lots of
>> data duplication, where every single document has to denote the domain
>> it's connected to. 2) It seems like it could be a security problem.
>> One vulnerability in your view functions could accidentally return one
>> domain's user set to another, etc. If we change the blog engine into
>> an enterprise document management/workflow engine, you could have
>> serious problems exposing one document to a competitor.
>>
>> Another way you could do it is by bringing the domain group up into
>> the database level. This means you'd have "MyApp.com-Users",
>> "MyApp.com-Posts", "MyApp.com-Comments", "AnotherApp.net-Users", etc.
>> This helps reduce the data duplication and maybe the security issue a
>> bit, because now your documents don't need to specify a "domain" field
>> everywhere. Your app would follow the simple naming convention to
>> select the proper database. The disadvantage of this is that it feels
>> like a hack. What I really want is a MyApp database which contains
>> User, Post, and Comment sub-databases (for example) which then contain
>> the documents for that top-level group (domain) and lower-level group
>> (posts).
>>
>> How do you guys address this problem?
>
> Give each user/domain combo a separate database. Lot's of databases are no
> problem.
>
> Cheers
> Jan
> --
>
>

Re: How do you handle multiple document groups?

Posted by Jan Lehnardt <ja...@apache.org>.
On 5 Nov 2009, at 11:02, Robert Campbell wrote:

> First, let me say that I posted this question to StackOverflow here:
> http://stackoverflow.com/questions/1674662/nested-databases-in-couchdb
>
> Here is what I mean by multiple document groups:
> Assume you are building a blog engine which services hundreds of
> domains. Within each domain, you would have similar groups of
> documents: users, posts, comments, etc. How would you structure this
> in CouchDB?
>
> One way would be to create a User database which contains users from
> all domains. The user documents would have a "domain" field which
> denotes which domain this user is valid on. Likewise, you would have a
> single Post database, with each post document having a domain field
> and so on. I don't like this solution because 1) you will have lots of
> data duplication, where every single document has to denote the domain
> it's connected to. 2) It seems like it could be a security problem.
> One vulnerability in your view functions could accidentally return one
> domain's user set to another, etc. If we change the blog engine into
> an enterprise document management/workflow engine, you could have
> serious problems exposing one document to a competitor.
>
> Another way you could do it is by bringing the domain group up into
> the database level. This means you'd have "MyApp.com-Users",
> "MyApp.com-Posts", "MyApp.com-Comments", "AnotherApp.net-Users", etc.
> This helps reduce the data duplication and maybe the security issue a
> bit, because now your documents don't need to specify a "domain" field
> everywhere. Your app would follow the simple naming convention to
> select the proper database. The disadvantage of this is that it feels
> like a hack. What I really want is a MyApp database which contains
> User, Post, and Comment sub-databases (for example) which then contain
> the documents for that top-level group (domain) and lower-level group
> (posts).
>
> How do you guys address this problem?

Give each user/domain combo a separate database. Lot's of databases  
are no problem.

Cheers
Jan
--


Re: How do you handle multiple document groups?

Posted by Adam Wolff <aw...@gmail.com>.
The multiple databases thing isn't a bad way to go. You can wire this
directly into your code that talks to couchdb via HTTP, since URLs are
predictable, so the app is like

couch.put("/foo/xyzzy", {...});

and the couch object in your app knows (via configuration or whatever) that
this should map to
this.put("/foo-1/xyzzy", {...});

We're doing something like this so that we can easily run integration tests
against test data
A

On Thu, Nov 5, 2009 at 2:02 AM, Robert Campbell <rr...@gmail.com> wrote:

> First, let me say that I posted this question to StackOverflow here:
> http://stackoverflow.com/questions/1674662/nested-databases-in-couchdb
>
> Here is what I mean by multiple document groups:
> Assume you are building a blog engine which services hundreds of
> domains. Within each domain, you would have similar groups of
> documents: users, posts, comments, etc. How would you structure this
> in CouchDB?
>
> One way would be to create a User database which contains users from
> all domains. The user documents would have a "domain" field which
> denotes which domain this user is valid on. Likewise, you would have a
> single Post database, with each post document having a domain field
> and so on. I don't like this solution because 1) you will have lots of
> data duplication, where every single document has to denote the domain
> it's connected to. 2) It seems like it could be a security problem.
> One vulnerability in your view functions could accidentally return one
> domain's user set to another, etc. If we change the blog engine into
> an enterprise document management/workflow engine, you could have
> serious problems exposing one document to a competitor.
>
> Another way you could do it is by bringing the domain group up into
> the database level. This means you'd have "MyApp.com-Users",
> "MyApp.com-Posts", "MyApp.com-Comments", "AnotherApp.net-Users", etc.
> This helps reduce the data duplication and maybe the security issue a
> bit, because now your documents don't need to specify a "domain" field
> everywhere. Your app would follow the simple naming convention to
> select the proper database. The disadvantage of this is that it feels
> like a hack. What I really want is a MyApp database which contains
> User, Post, and Comment sub-databases (for example) which then contain
> the documents for that top-level group (domain) and lower-level group
> (posts).
>
> How do you guys address this problem?
>