You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Ben Hall <be...@googlemail.com> on 2010/03/28 14:58:33 UTC

Deciding on the structure of documents

Hello,

I'm currently investigating the world of 'NoSQL' and CouchDB. I was
wondering if anyone could point me in the direction of some good
resources on how to decide on how to structure documents.  For
example, I'm not sure on when you decide to store everything as a
single document, or two separate documents (example - posts and
comments).

Also in terms of databases - how do you decide if you have one
database, or multiple smaller databases with a subset of data ?

Can relational rules about design still be applied?

In terms of what I'm trying to achieve. I have a product category
containing core ~100000 products. On top of this, each different
'shop' displays a subject of products.

My initial idea was to have a separate database for each 'shop'
containing the products they sell - but this results in a lot of
duplication - how do you handle updates etc?

Once you have the products, how do you handle categories of products etc...

Any pointers on this would be great!

Thank you

Ben

Re: Deciding on the structure of documents

Posted by Robert Sanford <wo...@gmail.com>.
Unfortunately it can cause issues and that's annoying. But that's not any different from any distributed application system working on an "eventually consistent" basis. Even a master-slave or multi-master RDBMS system is going to run into these issues.

Right now I'm thinking that I need a centralized repository that only accepts writes for the purpose of laying claim to the documents and regardless of what persistence layer is eventually used I'll likely have that.

rjsjr

On Mar 28, 2010, at 9:16 PM, Patrick Barnes wrote:

> I would suggest that on replication if the documents conflict and have different owners, do a kind of 'asynchronous revision error' - have the system choose the earlier 'owner claim' by date, and send an email stating 'sorry X, Y has already taken ownership'.
> 
> I'm not sure if that would cause further issues regarding whether to invalidate content changes made by X, or how best to resolve that sort of thing.
> 
> 
> On 29/03/2010 12:58 PM, Robert Sanford wrote:
>> On Sun, Mar 28, 2010 at 7:20 PM, Patrick Barnes<mr...@gmail.com>  wrote:
>> 
>>> For the single-document situation: What if multiple people reply to the
>>> same post at the same time? The first comment to reach couchdb would succeed
>>> in updating the document, and the second comment would result in an error,
>>> because the doc revision is now out of date. A single-document situation
>>> would require the application logic to handle a revision error, and maybe
>>> automatically resubmit. This solution does not scale well - the more traffic
>>> and the more comments a site receives, the more likely these revision errors
>>> will occur.
>>> 
>> 
>> This brings up a question on an app I'm in the process of designing and I'm
>> wondering how it works in a multi-node situation.
>> 
>> In my app a document (almost literally a document actually) can have a
>> single caretaker. In some situations the owner can decide they no longer
>> wish to act as caretaker and offer up the document to others. That situation
>> is easy in that the role of caretaker is explicitly transferred. Or, they
>> can simply release "ownership" of the document.In that instance others that
>> are assigned the role of Caretaker in the system can then claim the document
>> on a first-come-first-served basis.
>> 
>> In a multi-node system w/ replication what happens if Caretaker D in the
>> Dallas office claims the document and prior to any replication occurring
>> Caretaker P in the Phoenix office also claims the document?
>> 
>> How can that sort of situation be handled? In a single-node system it's (not
>> quite) trivial to handle but in a multi-node system w/ a time delay between
>> replications that are not taking business logic into account that is
>> trickier and I don't have my head around it yet.
>> 
>> rjsjr
>> 


Re: Deciding on the structure of documents

Posted by Patrick Barnes <mr...@gmail.com>.
I would suggest that on replication if the documents conflict and have 
different owners, do a kind of 'asynchronous revision error' - have the 
system choose the earlier 'owner claim' by date, and send an email 
stating 'sorry X, Y has already taken ownership'.

I'm not sure if that would cause further issues regarding whether to 
invalidate content changes made by X, or how best to resolve that sort 
of thing.


On 29/03/2010 12:58 PM, Robert Sanford wrote:
> On Sun, Mar 28, 2010 at 7:20 PM, Patrick Barnes<mr...@gmail.com>  wrote:
>
>> For the single-document situation: What if multiple people reply to the
>> same post at the same time? The first comment to reach couchdb would succeed
>> in updating the document, and the second comment would result in an error,
>> because the doc revision is now out of date. A single-document situation
>> would require the application logic to handle a revision error, and maybe
>> automatically resubmit. This solution does not scale well - the more traffic
>> and the more comments a site receives, the more likely these revision errors
>> will occur.
>>
>
> This brings up a question on an app I'm in the process of designing and I'm
> wondering how it works in a multi-node situation.
>
> In my app a document (almost literally a document actually) can have a
> single caretaker. In some situations the owner can decide they no longer
> wish to act as caretaker and offer up the document to others. That situation
> is easy in that the role of caretaker is explicitly transferred. Or, they
> can simply release "ownership" of the document.In that instance others that
> are assigned the role of Caretaker in the system can then claim the document
> on a first-come-first-served basis.
>
> In a multi-node system w/ replication what happens if Caretaker D in the
> Dallas office claims the document and prior to any replication occurring
> Caretaker P in the Phoenix office also claims the document?
>
> How can that sort of situation be handled? In a single-node system it's (not
> quite) trivial to handle but in a multi-node system w/ a time delay between
> replications that are not taking business logic into account that is
> trickier and I don't have my head around it yet.
>
> rjsjr
>

Re: Deciding on the structure of documents

Posted by Robert Sanford <wo...@gmail.com>.
On Sun, Mar 28, 2010 at 7:20 PM, Patrick Barnes <mr...@gmail.com> wrote:

> For the single-document situation: What if multiple people reply to the
> same post at the same time? The first comment to reach couchdb would succeed
> in updating the document, and the second comment would result in an error,
> because the doc revision is now out of date. A single-document situation
> would require the application logic to handle a revision error, and maybe
> automatically resubmit. This solution does not scale well - the more traffic
> and the more comments a site receives, the more likely these revision errors
> will occur.
>

This brings up a question on an app I'm in the process of designing and I'm
wondering how it works in a multi-node situation.

In my app a document (almost literally a document actually) can have a
single caretaker. In some situations the owner can decide they no longer
wish to act as caretaker and offer up the document to others. That situation
is easy in that the role of caretaker is explicitly transferred. Or, they
can simply release "ownership" of the document.In that instance others that
are assigned the role of Caretaker in the system can then claim the document
on a first-come-first-served basis.

In a multi-node system w/ replication what happens if Caretaker D in the
Dallas office claims the document and prior to any replication occurring
Caretaker P in the Phoenix office also claims the document?

How can that sort of situation be handled? In a single-node system it's (not
quite) trivial to handle but in a multi-node system w/ a time delay between
replications that are not taking business logic into account that is
trickier and I don't have my head around it yet.

rjsjr

Re: Deciding on the structure of documents

Posted by Patrick Barnes <mr...@gmail.com>.
Hi Ben, I don't have all the answers for you, but a few tips...

On posts and comments - It is important to think in terms of concurrency 
- a person will write and submit a post once, and people will reply in 
comments many times.

For the single-document situation: What if multiple people reply to the 
same post at the same time? The first comment to reach couchdb would 
succeed in updating the document, and the second comment would result in 
an error, because the doc revision is now out of date. A single-document 
situation would require the application logic to handle a revision 
error, and maybe automatically resubmit. This solution does not scale 
well - the more traffic and the more comments a site receives, the more 
likely these revision errors will occur.

For the each post/comment gets its own doc situation: There should be 
some checking in the application to make sure that people don't comment 
on non-existent posts, but unless space is at a premium this is perhaps 
not mandatory. (A cron job could delete any orphaned comments 
periodically, and without a related post comments would never been shown 
by a site)

On databases - The deciding factor about whether to use a single or 
multiple databases is often in backup/replication. If you need to be 
able to backup/replicate a shop's data separately to any other shop, 
then you'll probably be beset off with multiple databases.

If you have a common 'core' of products as you say and shops select a 
subset out of that, perhaps for maintainability it would be best to have 
only a single database? (For instance, so you don't have to modify the 
same product record over and over again, for each database)

Hope that helps,
-Patrick


On 28/03/2010 11:58 PM, Ben Hall wrote:
> Hello,
>
> I'm currently investigating the world of 'NoSQL' and CouchDB. I was
> wondering if anyone could point me in the direction of some good
> resources on how to decide on how to structure documents.  For
> example, I'm not sure on when you decide to store everything as a
> single document, or two separate documents (example - posts and
> comments).
>
> Also in terms of databases - how do you decide if you have one
> database, or multiple smaller databases with a subset of data ?
>
> Can relational rules about design still be applied?
>
> In terms of what I'm trying to achieve. I have a product category
> containing core ~100000 products. On top of this, each different
> 'shop' displays a subject of products.
>
> My initial idea was to have a separate database for each 'shop'
> containing the products they sell - but this results in a lot of
> duplication - how do you handle updates etc?
>
> Once you have the products, how do you handle categories of products etc...
>
> Any pointers on this would be great!
>
> Thank you
>
> Ben
>