You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Eric B <eb...@gmail.com> on 2014/09/11 16:19:21 UTC

How to implement multiple-tables/collections using CouchDB?

I've been reading up on CouchDB and am very confused as to its application
in a real world web application.  I can see it's benefit on a subset of
data, but as a primary DB for a web application, I can't say that I can see
how to do things.

Coming from my RDBMS background, I would build a basic shopping cart web
app with a bunch of tables:
 - users
 - permissions
 - items
 - inventory
 - clients
 - orders
 - etc...

I realize that NoSQL allows me to denormalize a lot of that data (which is
fine) in order to speed up processing/etc, and the concept of "tables" no
longer applies.    I'm fine with that.  But where I get lost is how to
organize similar datasets together in CouchDB.  In MongoDB, they have
Collections - akin to tables - to help separate data.

I don't see anything equivalent in CouchDB.  Which would mean to throw all
my data into a single "bag"/collection and rely entirely on map
functions/views to help organize data.  For instance, to retrieve all users
from the system, I would need a map function that does something like:
 emit all docs that have a username field.

But then what happens if at a later point in time, I create another
document that has a username field (which isn't a user)?  It will break my
code.

So then my second option becomes to assign a "document-type" key to every
document and then filter upon that.  Where my "document-type" key is akin
to an organizational/collection name.  It's definitely better, but still
seems a little odd.

The whole process seems very disorganized.

I understand the concept that NoSQL is exactly that - a key/value store,
where structure is omitted. But I would have expected at the very least
some organization ability - no?

Am I missing something basic/obvious in CouchDB?  Or is the concept to use
separate DBs everytime you want to organize similar data together?  That
also seems a little odd too.

Thanks,

Eric

Re: How to implement multiple-tables/collections using CouchDB?

Posted by Andy Dorman <ad...@ironicdesign.com>.
On 09/11/2014 02:17 PM, Matthieu Rakotojaona wrote:
> Excerpts from Eric B's message of 2014-09-11 16:19:21 +0200:
>> I've been reading up on CouchDB and am very confused as to its application
>> in a real world web application.  I can see it's benefit on a subset of
>> data, but as a primary DB for a web application, I can't say that I can see
>> how to do things.
>>
>> Coming from my RDBMS background, I would build a basic shopping cart web
>> app with a bunch of tables:
>>   - users
>>   - permissions
>>   - items
>>   - inventory
>>   - clients
>>   - orders
>>   - etc...
>>
>> I realize that NoSQL allows me to denormalize a lot of that data (which is
>> fine) in order to speed up processing/etc, and the concept of "tables" no
>> longer applies.    I'm fine with that.  But where I get lost is how to
>> organize similar datasets together in CouchDB.  In MongoDB, they have
>> Collections - akin to tables - to help separate data.
>>
>> I don't see anything equivalent in CouchDB.  Which would mean to throw all
>> my data into a single "bag"/collection and rely entirely on map
>> functions/views to help organize data.  For instance, to retrieve all users
>> from the system, I would need a map function that does something like:
>>   emit all docs that have a username field.
>>
>> But then what happens if at a later point in time, I create another
>> document that has a username field (which isn't a user)?  It will break my
>> code.
>>
>> So then my second option becomes to assign a "document-type" key to every
>> document and then filter upon that.  Where my "document-type" key is akin
>> to an organizational/collection name.  It's definitely better, but still
>> seems a little odd.
>>
>> The whole process seems very disorganized.
>>
>> I understand the concept that NoSQL is exactly that - a key/value store,
>> where structure is omitted. But I would have expected at the very least
>> some organization ability - no?
>>
>> Am I missing something basic/obvious in CouchDB?  Or is the concept to use
>> separate DBs everytime you want to organize similar data together?  That
>> also seems a little odd too.
>>
>> Thanks,
>>
>> Eric
>
> In CouchDB, the organization and planning you have to do is not in
> how you want your data stored, it's in how you're going to
> retrieve/query it. If you need to display the last 5 ordered items of a
> user a lot (as in, "here are your last 4 purchases"), you're going to
> write a view that allows you to do that easily. If you need to list all
> the orders that happened since beginning of the month, it will be the
> same. This is where the shift can be, coming from RDBMS.
>
> To be more precise, the map process happens somewhat like that (in pseudocode):
>
>      for doc in new_docs():
>        emit some_property from doc
>
> The key here is that CouchDB is targeted towards querying sorted things.
> You have to make it so that property sorts.
>
> For example, for the second example, you should emit the order date in a
> sortable date (use ISO8601). For the first example, on each order, emit
> something like "user_id:order_date" with a sortable order_date. When
> user_id 4 connects and fetches its 5 last ordered items, CouchDB will
> just iterate on the 5 relevant entries that you have indexed.
>
> "Indexed" here is the important word: "new_docs" evaluate to all docs
> when querying the view for the first time, but then it only evaluates to
> "docs that have changed/been added since the last time the view was
> queried". Results are incrementally stored for efficient querying.
>

I realize it is Cloudant, but would work as well for CouchDB...

I have been working with RDBMS since the late 70's and making the shift 
to a document db like CouchDB has been a bit of a challenge...but at 
least one light came on for me when I saw this webinar last week.

https://cloudant.com/handling-relational-data-with-cloudant-webinar-playback/

-- 
Andy Dorman
Ironic Design, Inc.
AnteSpam.com, ComeHome.net

CONFIDENTIALITY NOTICE: This message is for the named person's use only. 
It may contain confidential, proprietary or legally privileged 
information. No confidentiality or privilege is waived or lost by any 
erroneous transmission. If you receive this message in error, please 
immediately destroy it and notify the sender. You must not, directly or 
indirectly, use, disclose, distribute, or copy any part of this message 
if you are not the intended recipient.


Re: How to implement multiple-tables/collections using CouchDB?

Posted by Alexander Shorin <kx...@gmail.com>.
On Fri, Sep 12, 2014 at 7:38 AM, Eric Benzacar <er...@benzacar.ca> wrote:
> Maybe I misread something in the CouchDB docs, but I thought I could only
> sort based on the key - not the value  - of a view.  If I am
> emitting user_id : order_date, how would I be able or sort the last 5 items?

That would be a different view that emits order_date - user_id pair.

--
,,,^..^,,,

Re: How to implement multiple-tables/collections using CouchDB?

Posted by Eric Benzacar <er...@benzacar.ca>.
On Thu, Sep 11, 2014 at 3:17 PM, Matthieu Rakotojaona <
matthieu.rakotojaona@gmail.com> wrote:

> Excerpts from Eric B's message of 2014-09-11 16:19:21 +0200:
>
> To be more precise, the map process happens somewhat like that (in
> pseudocode):
>
>     for doc in new_docs():
>       emit some_property from doc
>
> The key here is that CouchDB is targeted towards querying sorted things.
> You have to make it so that property sorts.
>
> For example, for the second example, you should emit the order date in a
> sortable date (use ISO8601). For the first example, on each order, emit
> something like "user_id:order_date" with a sortable order_date. When
> user_id 4 connects and fetches its 5 last ordered items, CouchDB will
> just iterate on the 5 relevant entries that you have indexed.
>

Maybe I misread something in the CouchDB docs, but I thought I could only
sort based on the key - not the value  - of a view.  If I am
emitting user_id : order_date, how would I be able or sort the last 5 items?

I've been trying to read the Definitive Guide to CouchDB (
http://guide.couchdb.org/editions/1) but it seems as though it is quite
old.  It refers to CouchDB v0.10 and I see that CouchDB is already at
v1.6.1.  Is there a better reference guide to learning CouchDB?


Thanks,

Eric

Re: How to implement multiple-tables/collections using CouchDB?

Posted by Matthieu Rakotojaona <ma...@gmail.com>.
Excerpts from Eric B's message of 2014-09-11 16:19:21 +0200:
> I've been reading up on CouchDB and am very confused as to its application
> in a real world web application.  I can see it's benefit on a subset of
> data, but as a primary DB for a web application, I can't say that I can see
> how to do things.
> 
> Coming from my RDBMS background, I would build a basic shopping cart web
> app with a bunch of tables:
>  - users
>  - permissions
>  - items
>  - inventory
>  - clients
>  - orders
>  - etc...
> 
> I realize that NoSQL allows me to denormalize a lot of that data (which is
> fine) in order to speed up processing/etc, and the concept of "tables" no
> longer applies.    I'm fine with that.  But where I get lost is how to
> organize similar datasets together in CouchDB.  In MongoDB, they have
> Collections - akin to tables - to help separate data.
> 
> I don't see anything equivalent in CouchDB.  Which would mean to throw all
> my data into a single "bag"/collection and rely entirely on map
> functions/views to help organize data.  For instance, to retrieve all users
> from the system, I would need a map function that does something like:
>  emit all docs that have a username field.
> 
> But then what happens if at a later point in time, I create another
> document that has a username field (which isn't a user)?  It will break my
> code.
> 
> So then my second option becomes to assign a "document-type" key to every
> document and then filter upon that.  Where my "document-type" key is akin
> to an organizational/collection name.  It's definitely better, but still
> seems a little odd.
> 
> The whole process seems very disorganized.
> 
> I understand the concept that NoSQL is exactly that - a key/value store,
> where structure is omitted. But I would have expected at the very least
> some organization ability - no?
> 
> Am I missing something basic/obvious in CouchDB?  Or is the concept to use
> separate DBs everytime you want to organize similar data together?  That
> also seems a little odd too.
> 
> Thanks,
> 
> Eric

In CouchDB, the organization and planning you have to do is not in
how you want your data stored, it's in how you're going to
retrieve/query it. If you need to display the last 5 ordered items of a
user a lot (as in, "here are your last 4 purchases"), you're going to
write a view that allows you to do that easily. If you need to list all
the orders that happened since beginning of the month, it will be the
same. This is where the shift can be, coming from RDBMS.

To be more precise, the map process happens somewhat like that (in pseudocode):

    for doc in new_docs():
      emit some_property from doc

The key here is that CouchDB is targeted towards querying sorted things.
You have to make it so that property sorts.

For example, for the second example, you should emit the order date in a
sortable date (use ISO8601). For the first example, on each order, emit
something like "user_id:order_date" with a sortable order_date. When
user_id 4 connects and fetches its 5 last ordered items, CouchDB will
just iterate on the 5 relevant entries that you have indexed.

"Indexed" here is the important word: "new_docs" evaluate to all docs
when querying the view for the first time, but then it only evaluates to
"docs that have changed/been added since the last time the view was
queried". Results are incrementally stored for efficient querying.

-- 
Matthieu Rakotojaona

Re: How to implement multiple-tables/collections using CouchDB?

Posted by Landry Soules <la...@gmail.com>.
You're right, in order to get something similar to Mongo's collections or 
RDBMS' tables, you will have to add a "documentType" property to all your 
documents.

Sent with [inky: <http://inky.com?kme=signature>]
Eric B wrote:
> I've been reading up on CouchDB and am very confused as to its application
> in a real world web application.  I can see it's benefit on a subset of
> data, but as a primary DB for a web application, I can't say that I can see
> how to do things.
>
> Coming from my RDBMS background, I would build a basic shopping cart web
> app with a bunch of tables:
> - users
> - permissions
> - items
> - inventory
> - clients
> - orders
> - etc...
>
> I realize that NoSQL allows me to denormalize a lot of that data (which is
> fine) in order to speed up processing/etc, and the concept of "tables" no
> longer applies.    I'm fine with that.  But where I get lost is how to
> organize similar datasets together in CouchDB.  In MongoDB, they have
> Collections - akin to tables - to help separate data.
>
> I don't see anything equivalent in CouchDB.  Which would mean to throw all
> my data into a single "bag"/collection and rely entirely on map
> functions/views to help organize data.  For instance, to retrieve all users
> from the system, I would need a map function that does something like:
> emit all docs that have a username field.
>
> But then what happens if at a later point in time, I create another
> document that has a username field (which isn't a user)?  It will break my
> code.
>
> So then my second option becomes to assign a "document-type" key to every
> document and then filter upon that.  Where my "document-type" key is akin
> to an organizational/collection name.  It's definitely better, but still
> seems a little odd.
>
> The whole process seems very disorganized.
>
> I understand the concept that NoSQL is exactly that - a key/value store,
> where structure is omitted. But I would have expected at the very least
> some organization ability - no?
>
> Am I missing something basic/obvious in CouchDB?  Or is the concept to use
> separate DBs everytime you want to organize similar data together?  That
> also seems a little odd too.
>
> Thanks,
>
> Eric


Re: How to implement multiple-tables/collections using CouchDB?

Posted by Jens Alfke <je...@couchbase.com>.
> On Sep 11, 2014, at 7:56 AM, Eric Benzacar <er...@benzacar.ca> wrote:
> 
> for large datasets (ex: millions of docs), how efficient
> is running everything through a JS compiled view?  Each time you open a
> view, it needs to filter millions of docs.

No, the map function only runs once on any particular document revision. The view's index is persistent, and a doc only gets re-indexed if it's been changed. So while the first query might be slow due to having to index everything that's already in the database, subsequent queries are fast.

(And the 'first-query-slow' problem affects other kinds of databases too, even relational ones.)

—Jens

Re: How to implement multiple-tables/collections using CouchDB?

Posted by Alexander Shorin <kx...@gmail.com>.
On Thu, Sep 11, 2014 at 6:56 PM, Eric Benzacar <er...@benzacar.ca> wrote:
> On Thu, Sep 11, 2014 at 10:35 AM, Alexander Shorin <kx...@gmail.com> wrote:
>
>> On Thu, Sep 11, 2014 at 6:19 PM, Eric B <eb...@gmail.com> wrote:
>> > So then my second option becomes to assign a "document-type" key to every
>> > document and then filter upon that.  Where my "document-type" key is akin
>> > to an organizational/collection name.  It's definitely better, but still
>> > seems a little odd.
>> >
>> > The whole process seems very disorganized.
>>
>> Having document type field is a good and common practice. In fact,
>> MongoDB uses the same, but at more high level calling this
>> "collection". You can implement the same by using document type field
>> and the view which emits documents by type. More over, you can create
>> "collections"  (views) across various documents by conditions whatever
>> you like.
>>
>>
> Is that not horribly inefficient?  For a handful of documents I can see how
> it would work, but for large datasets (ex: millions of docs), how efficient
> is running everything through a JS compiled view?  Each time you open a
> view, it needs to filter millions of docs.

No, that wouldn't be. CouchDB vies are similar to materialized views
with incremental updates from RDBMS: they indexes documents only once
when they are get changed.

--
,,,^..^,,,

Re: How to implement multiple-tables/collections using CouchDB?

Posted by Eric Benzacar <er...@benzacar.ca>.
On Thu, Sep 11, 2014 at 10:35 AM, Alexander Shorin <kx...@gmail.com> wrote:

> On Thu, Sep 11, 2014 at 6:19 PM, Eric B <eb...@gmail.com> wrote:
> > So then my second option becomes to assign a "document-type" key to every
> > document and then filter upon that.  Where my "document-type" key is akin
> > to an organizational/collection name.  It's definitely better, but still
> > seems a little odd.
> >
> > The whole process seems very disorganized.
>
> Having document type field is a good and common practice. In fact,
> MongoDB uses the same, but at more high level calling this
> "collection". You can implement the same by using document type field
> and the view which emits documents by type. More over, you can create
> "collections"  (views) across various documents by conditions whatever
> you like.
>
>
Is that not horribly inefficient?  For a handful of documents I can see how
it would work, but for large datasets (ex: millions of docs), how efficient
is running everything through a JS compiled view?  Each time you open a
view, it needs to filter millions of docs.

For example, I have a client using an RDBMS of about 7G of data.  Some
tables contain >1M records.  With proper indexing, it is quick and
responsive.  How can CouchDB be fast having to parse 7G of data to populate
a view?

Thanks,

Eric

Re: How to implement multiple-tables/collections using CouchDB?

Posted by Jens Alfke <je...@couchbase.com>.
> On Sep 11, 2014, at 7:35 AM, Alexander Shorin <kx...@gmail.com> wrote:
> 
> Having document type field is a good and common practice.

Nearly ubiquitous, in my experience — almost every database I've seen (unless it explicitly stores only one type of document) has such a property.

In fact I would say that it's a best practice to name this property "type" since that's the convention (I honestly can't recall seeing any db where it had a different name, though I'm sure they exist.)

—Jens

Re: How to implement multiple-tables/collections using CouchDB?

Posted by Alexander Shorin <kx...@gmail.com>.
On Thu, Sep 11, 2014 at 6:19 PM, Eric B <eb...@gmail.com> wrote:
> I realize that NoSQL allows me to denormalize a lot of that data (which is
> fine) in order to speed up processing/etc, and the concept of "tables" no
> longer applies.    I'm fine with that.  But where I get lost is how to
> organize similar datasets together in CouchDB.  In MongoDB, they have
> Collections - akin to tables - to help separate data.
>
>
> I don't see anything equivalent in CouchDB.  Which would mean to throw all
> my data into a single "bag"/collection and rely entirely on map
> functions/views to help organize data.  For instance, to retrieve all users
> from the system, I would need a map function that does something like:
>  emit all docs that have a username field.
>
> But then what happens if at a later point in time, I create another
> document that has a username field (which isn't a user)?  It will break my
> code.
>
> So then my second option becomes to assign a "document-type" key to every
> document and then filter upon that.  Where my "document-type" key is akin
> to an organizational/collection name.  It's definitely better, but still
> seems a little odd.
>
> The whole process seems very disorganized.

Having document type field is a good and common practice. In fact,
MongoDB uses the same, but at more high level calling this
"collection". You can implement the same by using document type field
and the view which emits documents by type. More over, you can create
"collections"  (views) across various documents by conditions whatever
you like.

> I understand the concept that NoSQL is exactly that - a key/value store,
> where structure is omitted. But I would have expected at the very least
> some organization ability - no?

CouchDB doesn't tell you how you should organize your data: just shape
them in the way you can use them effectively. If you want to ensure
that documents of certain types (or belongs to specific collection in
terms on mongo) should have specific fields - just create
validate_doc_update function that will check document structure on
write and reject it if it has invalid data.

> Am I missing something basic/obvious in CouchDB?  Or is the concept to use
> separate DBs everytime you want to organize similar data together?  That
> also seems a little odd too.

With separate database you won't be able to run views and other
functions on these data since databases are isolated from each other.

--
,,,^..^,,,