You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@couchdb.apache.org by Apache Wiki <wi...@apache.org> on 2010/07/12 18:21:50 UTC

[Couchdb Wiki] Update of "EntityRelationship" by DavidSharnoff

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Couchdb Wiki" for change notification.

The "EntityRelationship" page has been changed by DavidSharnoff.
The comment on this change is: $scott->{groups} must be an array reference so it needs to be expanded for grep to work.
http://wiki.apache.org/couchdb/EntityRelationship?action=diff&rev1=21&rev2=22

--------------------------------------------------

  As a quick summary, this document explains how to do things that you would normally use SQL JOIN for.
  
  == Why would I need entity relationships? ==
- Imagine you are building a snazzy new web application that includes an address book where users can store their contacts. For each contact the user stores, you want to capture the contacts name, birthday (which they mustn't forget!) their address, telephone number and company they work for.
+ Imagine you are building a snazzy new web application that includes an address book where users can store their contacts. For each contact the user stores, you want to capture the contacts name, birthday (which they mustn't forget!) their address, telephone number and company they work for. When the user wants to add an address, they enter the information in to a form and the form saves the information in a model that looks something like this:
- When the user wants to add an address, they enter the information in to a form and the form saves the information in a model that looks something like this:
  
  {{{#!highlight javascript
  {
@@ -28, +27 @@

  (Note that ''type'' doesn't mean anything to CouchDB, we're just using it here for our own convenience. ''_id'' is the only thing CouchDB looks at)
  
  That's great, your users immediately begin to use their address book and soon the datastore starts to fill up. Not long after the deployment of your new application you hear from someone that they are not happy that there is only one phone number. What if they want to store someone's work telephone number in addition to their home number? No problem you think, you can just add a work phone number to your structure. You change your data structure to look more like this:
+ 
  {{{
    "phone_number":"home phone in string form",
    "work_phone_number":"work phone in string form",
  }}}
- Update the form with the new field and you are back in business. Soon after redeploying your application, you get a number of new complaints. When they see the new phone number field, people start asking for even more fields. Some people want a fax number field, others want a mobile field. Some people even want more than one mobile field (boy modern life sure is hectic)! You could add another field for fax, and another for mobile, maybe two. What about if people have three mobile phones? What if they have ten? What if someone invents a phone for a place you've never thought of?
+ Update the form with the new field and you are back in business. Soon after redeploying your application, you get a number of new complaints. When they see the new phone number field, people start asking for even more fields. Some people want a fax number field, others want a mobile field. Some people even want more than one mobile field (boy modern life sure is hectic)! You could add another field for fax, and another for mobile, maybe two. What about if people have three mobile phones? What if they have ten? What if someone invents a phone for a place you've never thought of? Your model needs to use relationships.
- Your model needs to use relationships.
  
  == One to Many ==
  The answer is to allow users to assign as many phone numbers to each of their contacts as they like.
  
  In CouchDB, there are 2 ways to achieve this.
+ 
   1. Use separate documents
-  2. Use an embedded array
+  1. Use an embedded array
  
  === One to Many: Separate documents ===
- 
  When using separate documents, you could have documents like this for the phone numbers:
+ 
  {{{#!highlight javascript
  {
    "_id":"the phone number",
@@ -58, +58 @@

  The key to making all this work is the contact property. By storing the contact id in it, you can refer to the owning contact in a unique way, since ''_id'' fields are unique in CouchDB databases.
  
  Creating the relationship between a contact and one of its phone numbers is easy to do. Let's say you have a contact named "Scott" who has a home phone and a mobile phone. You populate his contact info like this (using Perl and Net::CouchDB):
+ 
  {{{#!highlight perl
  $db->insert({type => 'contact', _id => 'Scott', name => 'My Friend Scott'});
  $db->insert({type => 'phone', _id => '(650) 555 - 2200', contact_id => 'Scott', phone_type => 'home'});
  $db->insert({type => 'phone', _id => '(650) 555 - 2201', contact_id => 'Scott', phone_type => 'mobile'});
  }}}
- 
  To get the contacts and their phone numbers from CouchDB in one search, you need to use a little trick: You need to create a view that sorts the contacts and their phone numbers in order. This is the view:
  
  {{{#!highlight javascript
@@ -77, +77 @@

    }
  }
  }}}
- 
- If you then query this view with the ''startkey'' parameter set to "[''''''"Scott"]" and endkey "[''''''"Scott",{}]", you'll get the contact details in the first row and the phone numbers in the following rows (sorted by phone_type as well). You can easily extend this system to have other types of one-to-many attributes in the same view by giving them a different number in the view above.
+ If you then query this view with the ''startkey'' parameter set to "["Scott"]" and endkey "["Scott",{}]", you'll get the contact details in the first row and the phone numbers in the following rows (sorted by phone_type as well). You can easily extend this system to have other types of one-to-many attributes in the same view by giving them a different number in the view above.
  
  This is a little bit like a JOIN in SQL although in SQL the data fields would be joined together on a row where here they are on consecutive rows. This latter approach allows a variable number of data fields which is more flexible than SQL.
  
  NOTE: This needs a code example showing how to use the output of the view. Feel free to add one.
  
- Because CouchDB always sorts on keys, you can use this view to only get Scotts home phone numbers by querying with ''startkey'' set to "[''''''"Scott",1,"home"]" and ''endkey'' set to "[''''''"Scott",1,"home",{}]"
+ Because CouchDB always sorts on keys, you can use this view to only get Scotts home phone numbers by querying with ''startkey'' set to "["Scott",1,"home"]" and ''endkey'' set to "["Scott",1,"home",{}]"
  
  When Scott loses his phone, it's easy enough to delete that record. Just delete the phone document and it can no longer be queried for:
+ 
  {{{#!highlight perl
  $db->doc('(650) 555 - 2200')->delete;
  }}}
- 
  === One to Many: Embedded Documents ===
- 
  The embedded array is only an option as long as you don't have "too many" items to store, since each document is always handled as a whole and bigger documents mean slower handling and slower network transfers whenever you want to change the list. Phone numbers should be ok unless you plan to store the whole company phonebook in there.
  
  This is the easiest way to handle one-to-many as everything you need is in one place. Here's how the document for Scott would look:
+ 
  {{{#!highlight javascript
  {
    "_id":"Scott",
@@ -104, +103 @@

    "phones":[{"number":"(650) 555 - 2200","type":"home"},{"number":"(650) 555 - 2201","type":"mobile"}],
  }
  }}}
- 
  or even more succinctly
  
  {{{#!highlight javascript
@@ -115, +113 @@

    "phones":[{"home":"(650) 555 - 2200"},{"mobile":"(650) 555 - 2201"}],
  }
  }}}
- 
  Note how only the fields that we know are stored. Also note that the phone numbers are not simply an array, they are an array of associative hashes. We could extend this with no effort to add email addresses, IM names etc, even if IM names would need an extra attribute that has the service type. In essence, you're embedding child documents in the master document. That is the power of schema-less databases.
  
  == Many to Many ==
  One thing you would like to do is provide the ability for people to organize their contacts in to groups. They might make groups like "Friends", "Co-workers" and "Family". This would allow users to use these groups to perform actions en masse, such as maybe sending an invitation to all their friends for a hack-a-thon. Let's define a simple Group model like this:
+ 
  {{{#!highlight javascript
  {
    "_id":"unique group id",
@@ -128, +126 @@

    "description":"description"
  }
  }}}
- 
  You could make a one-to-many relation with Contact. However, this would allow contacts to be part of only one group at a time. For example, someone might include some of their co-workers as friends. You need a way to represent many-to-many relationships.
  
  === Many to Many: List of Keys ===
  One very simple way is to create a list of keys on one side of the relationship, like we did in the "Embedded One to Many" section.
  
  Our friend and colleague Scott would then get a new field in his contact document which holds group ''_id'' values:
+ 
  {{{#!highlight javascript
    "groups":["Friends","Colleagues"]
  }}}
- 
  Adding and removing a user to and from a group means working with a list of keys. Suppose we don't like Scott any more:
+ 
  {{{#!highlight perl
     my $scott = $db->doc('Scott');
-    $scott->{groups} = grep { $_ ne 'Friends' } $scott->{groups};
+    $scott->{groups} = [ grep { $_ ne 'Friends' } @{$scott->{groups}} ];
     $scott->update;
  }}}
- 
  To get all the members of a group, you'd create a view like this:
+ 
  {{{#!highlight javascript
  "map":function(doc) {
     if (doc.type == 'contact') {
@@ -158, +156 @@

     }
  }
  }}}
- 
  If you then query this view with search parameters
+ 
   * ''descending=true''
   * ''key="Friends"''
+ 
  then you'll get all the names of members of the group Friends and the group information as the first row. (Hashes sort behind strings).
  
  Here's a space optimization hint: If you make the view be
+ 
  {{{#!highlight javascript
  "map":function(doc) {
     if (doc.type == 'contact') {
@@ -177, +177 @@

  }
  }}}
  and query this view with search parameters
+ 
   * ''key="Friends"''
   * ''include_docs=true''
+ 
  You'll get all documents that are pertinent to the group, but in no particular order. The size of your index will be smaller though.
  
  For the most efficient changes to the relationship list, you should place the list on side of the relationship which you expect to have fewer values. In the example above, the Contact side was chosen because a single person is not likely to belong to too many groups, whereas in a large contacts database, a group might contain hundreds of members.
@@ -190, +192 @@

  
  A final option is to use a separate index, such as couchdb-lucene to help with such queries.
  
- 
  === Many to Many: Relationship documents ===
- 
  Another way of implementing many-to-many is by creating a separate document for each relationship.
  
  You would use this method if you modify the key list frequently (i.e. if you get more conflicts than is acceptable), or if the key list is so large that transferring the document is unacceptably slow. Relationship documents enable frequent changes with less chance of conflict; however, you can access neither the contact nor group information in one request. You must re-request those specific documents by ID, keeping in mind that they may change or be deleted in the interim.
  
  A document explaining that Scott is a Friend would look like
+ 
  {{{#!highlight javascript
  {
    "_id":"some unique id",
@@ -206, +207 @@

    "group_id":"Friends"
  }
  }}}
- 
- 
  If you then want to know who is in a group you'll need to use the view (fetch descending to get the group info first)
+ 
  {{{#!highlight javascript
  "map":function(doc) {
     if (doc.type == 'relationship') {
@@ -218, +218 @@

     }
  }
  }}}
- 
  To know what groups a contact belongs to you can use
+ 
  {{{#!highlight javascript
  "map":function(doc) {
     if (doc.type == 'relationship') {
@@ -229, +229 @@

     }
  }
  }}}
- Note that this view uses key arrays to enforce sorting, just to show you the possible variations. The disadvantage is that you can't use ''key="Scott"'' to search for Scotts groups, you need to use ''startkey=[''''''"Scott"]&endkey=["Scott",{}]''.
+ Note that this view uses key arrays to enforce sorting, just to show you the possible variations. The disadvantage is that you can't use ''key="Scott"'' to search for Scotts groups, you need to use ''startkey=["Scott"]&endkey=["Scott",{}]''.
  
  Unlike the previous method, you can't use ''include_docs=true'' now to get all information about the contacts that are in a group or the groups that a contact has. The reason is that the original documents that were used in generating the view are not the contact or group documents, they are the relationship documents. If you want that information, you'll have to fetch it separately (you can use a POST view to ''/db/_all_docs'' to grab a bunch of documents in one go).