You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Lance Pollard <la...@gmail.com> on 2008/10/12 21:28:43 UTC

When to use CouchDB, when not to...

Hi,

I was wondering if there are any general or specific rules/principles that
guide your decisions to use Document Oriented Databases like CouchDB over
Relational Databases like MySQL.  There is no information on when to
specifically, down to the table-names or showing an example schema/site,
that I am aware of on the Internet...
I am using Merb and DataMapper and there is talk about using both CouchDB
and MySQL together in a single application, with MySQL being used for more
static entities like user accounts and contact info and CouchDB being used
for more dynamic entities like images or uploads or things that have
variable amounts and types of properties, but it's unclear.
What's the best way/pattern to separate out the roles each database plays in
an application? If, for example, you wanted to create a crazy website with
social networking, eCommerce, job posting, instant messaging, etc., there
would be an enormous amount of different entities/objects in the database
all relating to each other (assets = photos, songs, videos, blogposts…,
others = users, orders, profiles, products, tags…), and if you had
everything taggable, authenticated, and commentable, then that would add a
lot of interconnectivity between the tables/entities/objects/classes.  So I
was thinking how/when you should choose MySQL and how/when you should choose
CouchDB.  I haven't been able to find anything on the internet outlining
this clearly.  I'm sure it can be outlined well though.

Maybe it could be such that:

-MySQL handles entities that have 1) static properties (columns of tables
are pretty much constant, like a "user_account" or "contact_address" are…),
and 2) static (username) or dynamic (chunk of text) values.
-CouchDB handles things that have 1) dynamic properties (columns in
relational database become "keys" or whatever in Couch, and these keys
aren't fixed but can change from resource to resource), and 2) dynamic
values only. And 3) Couch would handle all uploaded/attachment data because
many documents, even if they are the same type (image, video, word…), might
have different properties depending on the situation.

I've just been trying to think how you could manage 10s or 100s of potential
"entities/tables" in a database and wanted to pass it your way to get your
input.

If CouchDB handled things that only had "dynamic" properties, then an
"image" entity for example could have [title, caption, size,
attachment_fu.stuff] in one case and [title, tags] in another, or whatever;
there is no need to have a single "Images" table with a specified set of
properties (rows).  Image properties or rows can instead be dynamic.  But
what entities do we choose to be dynamic?

So instead of having single inheritance in relational databases and huge
table with tons of null columns, you could have, basically, a
table-per-asset, or a document--that's what Couch is.  But what kind of
structure do you apply to Couch documents?  And what entity should be placed
in Couch vs. MySQL?  Maybe all file-uploads can go to Couch and all
structured-mostly-unchanging entities, like user accounts, can go to MySQL?

But then, is it easy to connect data-objects between two distinct databases?
Like if a user (from MySQL) has_many images (from CouchDB), would doing the
'finds' be expensive or anything?

Just some thoughts on how and when to use Couch vs. MySQL. Any answers to
this question would really help out a ton.

Best,
Lance

Re: When to use CouchDB, when not to...

Posted by Chris Anderson <jc...@apache.org>.

>> The sort of architecture I've been contemplating is one where certain
>> heavyweight entities in the application would be stored in CouchDB while
>> other lighter weight/more relational entities would remain in the RDBMS.

I like draw a distinction between documents that can be edited, and
documents that are write-once. The second type is great for
logging-style data collections (which lend themselves to MapReduce
reports), while the first fits most handily with data collections that
will be edited by the user.

Unless you have data that really hits the RDBMS sweet-spot (social
graph, maybe...) I'd make a go of designing a system that uses only
CouchDB for persistence. I'm leaning this way, because I think there's
a lot of space to be explored in terms of how to structure and
maintain document based applications.

I guess I think the overhead of keeping the two systems in sync might
outweigh the benefits of "in the box" joins and such that you can get
from a SQL overlay. If you do go with a mixed system, it would be
interesting to hear how you handle any impedance mismatches that come
up.

Chris



-- 
Chris Anderson
http://jchris.mfdz.com

Re: When to use CouchDB, when not to...

Posted by "ara.t.howard" <ar...@gmail.com>.

On Oct 13, 2008, at 5:50 PM, Mike Malone wrote:

>> i'd contend, however that you'd really have to structure you app in  
>> a very
>> bad way to have any the db schema be of consequence.  there are  
>> indexes and
>> there is caching at many levels.  hitting the app and db for ever  
>> request
>> doesn't scale period.  in fact, rdbms don't scale well anyhow - not  
>> if you
>> count 'easily' in your criteria that is.  anyone who has every  
>> tried tried
>> to setup high-availability db backends knows this.  again though,  
>> this is a
>> point where couchdb could really shine - they look to have  
>> addressed this
>> from the get go.
>>
>
> I'm not sure I follow... seems like you're contradicting yourself  
> here a
> bit. I agree with the second part - RDBMSs don't scale well. At a high
> level, for most web applications they're more or less a single  
> "thread,"
> creating a bottleneck in a system that can otherwise scale  
> horizontally. Of
> course you can shard, creating additional "threads," but that's not  
> a at all
> a straightforward process.
>
> With that in mind, I'd say that you have to structure your  
> application very
> carefully in order for the DB layer to scale. The DB schema is  
> critical
> here, you must denormalize and carefully shard your data in order to  
> scale.
> Normalizing your schema to the extent you described earlier, in  
> order to
> allow "dynamic properties," would destroy performance on a highly  
> trafficed
> web application. Sure, you can cache, but you still need to read  
> from the DB
> for new or invalidated items.
>
> The sort of architecture I've been contemplating is one where certain
> heavyweight entities in the application would be stored in CouchDB  
> while
> other lighter weight/more relational entities would remain in the  
> RDBMS.
> Suppose I were designing a messaging system, for instance. I may  
> store basic
> User information in an RDBMS, along with social graph information,  
> and an
> "inbox" table that stores the id of each Message sent to each User  
> (some of
> these tables would still need to be sharded at some point). The  
> Messages
> themselves I'd store in CouchDB. I may also put certain extended  
> Profile
> information for Users in CouchDB.
>
> Anyways, I'm fairly new to document-based databases and my intuition  
> may be
> wrong here (please correct me if it is) but it seems to me that  
> CouchDB can
> be used alongside a more traditional RDBMS in this way.
>
> Mike

sure - i agreed totally.  by 'does not scale' i meant once you want to  
move beyond one machine - then it's such a royal PITA.  of course the  
schema is important at a high level, but a relationship here or there  
shouldn't make or break performance.

i'm working on a similar mixture of storage at the moment - but the  
thought of replication and scaling makes me a bit queasy.  the beauty  
of couchdb is the concept that everything talks http which should  
provide mechanisms for really novel scaling strategies to emerge.

cheers.

a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being  
better. simply reflect on that.
h.h. the 14th dalai lama

Re: When to use CouchDB, when not to...

Posted by Mike Malone <mj...@gmail.com>.

> i'd contend, however that you'd really have to structure you app in a very
> bad way to have any the db schema be of consequence.  there are indexes and
> there is caching at many levels.  hitting the app and db for ever request
> doesn't scale period.  in fact, rdbms don't scale well anyhow - not if you
> count 'easily' in your criteria that is.  anyone who has every tried tried
> to setup high-availability db backends knows this.  again though, this is a
> point where couchdb could really shine - they look to have addressed this
> from the get go.
>

I'm not sure I follow... seems like you're contradicting yourself here a
bit. I agree with the second part - RDBMSs don't scale well. At a high
level, for most web applications they're more or less a single "thread,"
creating a bottleneck in a system that can otherwise scale horizontally. Of
course you can shard, creating additional "threads," but that's not a at all
a straightforward process.

With that in mind, I'd say that you have to structure your application very
carefully in order for the DB layer to scale. The DB schema is critical
here, you must denormalize and carefully shard your data in order to scale.
Normalizing your schema to the extent you described earlier, in order to
allow "dynamic properties," would destroy performance on a highly trafficed
web application. Sure, you can cache, but you still need to read from the DB
for new or invalidated items.

The sort of architecture I've been contemplating is one where certain
heavyweight entities in the application would be stored in CouchDB while
other lighter weight/more relational entities would remain in the RDBMS.
Suppose I were designing a messaging system, for instance. I may store basic
User information in an RDBMS, along with social graph information, and an
"inbox" table that stores the id of each Message sent to each User (some of
these tables would still need to be sharded at some point). The Messages
themselves I'd store in CouchDB. I may also put certain extended Profile
information for Users in CouchDB.

Anyways, I'm fairly new to document-based databases and my intuition may be
wrong here (please correct me if it is) but it seems to me that CouchDB can
be used alongside a more traditional RDBMS in this way.

Mike

Re: When to use CouchDB, when not to...

Posted by "ara.t.howard" <ar...@gmail.com>.

On Oct 13, 2008, at 5:08 PM, Ayende Rahien wrote:

> Trying to do dynamic entities in the DB tends to be a major PITA in  
> any high
> perf scenario

correct - this is the advantage of using object graphs: you get it,  
it's properties, and any other deeply nested data structures on one hit.

i'd contend, however that you'd really have to structure you app in a  
very bad way to have any the db schema be of consequence.  there are  
indexes and there is caching at many levels.  hitting the app and db  
for ever request doesn't scale period.  in fact, rdbms don't scale  
well anyhow - not if you count 'easily' in your criteria that is.   
anyone who has every tried tried to setup high-availability db  
backends knows this.  again though, this is a point where couchdb  
could really shine - they look to have addressed this from the get go.

time will tell i suppose.

cheers.

a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being  
better. simply reflect on that.
h.h. the 14th dalai lama

Re: When to use CouchDB, when not to...

Posted by Ayende Rahien <ay...@ayende.com>.

Trying to do dynamic entities in the DB tends to be a major PITA in any high
perf scenario

On Mon, Oct 13, 2008 at 11:09 PM, ara.t.howard <ar...@gmail.com>wrote:

>
> On Oct 12, 2008, at 1:28 PM, Lance Pollard wrote:
>
>  Maybe it could be such that:
>>
>> -MySQL handles entities that have 1) static properties (columns of tables
>> are pretty much constant, like a "user_account" or "contact_address"
>> are…),
>> and 2) static (username) or dynamic (chunk of text) values.
>> -CouchDB handles things that have 1) dynamic properties (columns in
>> relational database become "keys" or whatever in Couch, and these keys
>> aren't fixed but can change from resource to resource), and 2) dynamic
>> values only. And 3) Couch would handle all uploaded/attachment data
>> because
>> many documents, even if they are the same type (image, video, word…),
>> might
>> have different properties depending on the situation.
>>
>> I've just been trying to think how you could manage 10s or 100s of
>> potential
>> "entities/tables" in a database and wanted to pass it your way to get your
>> input.
>>
>> If CouchDB handled things that only had "dynamic" properties, then an
>> "image" entity for example could have [title, caption, size,
>> attachment_fu.stuff] in one case and [title, tags] in another, or
>> whatever;
>> there is no need to have a single "Images" table with a specified set of
>> properties (rows).  Image properties or rows can instead be dynamic.  But
>> what entities do we choose to be dynamic?
>>
>> So instead of having single inheritance in relational databases and huge
>> table with tons of null columns, you could have, basically, a
>> table-per-asset, or a document--that's what Couch is.  But what kind of
>> structure do you apply to Couch documents?  And what entity should be
>> placed
>> in Couch vs. MySQL?  Maybe all file-uploads can go to Couch and all
>> structured-mostly-unchanging entities, like user accounts, can go to
>> MySQL?
>>
>
>
>
> slightly OT, but it's not really true that you can't handle dynamic
> properties with a RDBMS.  you just have to normalize even your columns are
> child records.  i have a plugin for AR which does just that, read this to
> see what mean
>
>
> http://drawohara.com/post/53975615/rails-dynamic-properties-for-activerecord-objects
>
> the win for couchdb, IMHO, really comes from storing object *graphs* as one
> entity.  that and the fact that it enables, for really the first time,
> *true* MVC arch.  that is to say javascript (the view) can see the model.
>  most people ignore this link in the MVC triangle and it's why the pattern
> can feel so forced at times.  in MVC the views is *supposed* to be able to
> see the model.  with couch db it can, and easily at that.
>
> cheers.
>
> a @ http://codeforpeople.com/
> --
> we can deny everything, except that we have the possibility of being
> better. simply reflect on that.
> h.h. the 14th dalai lama
>
>
>
>

Re: When to use CouchDB, when not to...

Posted by Chris Anderson <jc...@apache.org>.

On Mon, Oct 13, 2008 at 1:09 PM, ara.t.howard <ar...@gmail.com> wrote:

> the win for couchdb, IMHO, really comes from storing object *graphs* as one
> entity.

hear hear!

-- 
Chris Anderson
http://jchris.mfdz.com

Re: When to use CouchDB, when not to...

Posted by "ara.t.howard" <ar...@gmail.com>.

On Oct 12, 2008, at 1:28 PM, Lance Pollard wrote:

> Maybe it could be such that:
>
> -MySQL handles entities that have 1) static properties (columns of  
> tables
> are pretty much constant, like a "user_account" or "contact_address"  
> are…),
> and 2) static (username) or dynamic (chunk of text) values.
> -CouchDB handles things that have 1) dynamic properties (columns in
> relational database become "keys" or whatever in Couch, and these keys
> aren't fixed but can change from resource to resource), and 2) dynamic
> values only. And 3) Couch would handle all uploaded/attachment data  
> because
> many documents, even if they are the same type (image, video,  
> word…), might
> have different properties depending on the situation.
>
> I've just been trying to think how you could manage 10s or 100s of  
> potential
> "entities/tables" in a database and wanted to pass it your way to  
> get your
> input.
>
> If CouchDB handled things that only had "dynamic" properties, then an
> "image" entity for example could have [title, caption, size,
> attachment_fu.stuff] in one case and [title, tags] in another, or  
> whatever;
> there is no need to have a single "Images" table with a specified  
> set of
> properties (rows).  Image properties or rows can instead be  
> dynamic.  But
> what entities do we choose to be dynamic?
>
> So instead of having single inheritance in relational databases and  
> huge
> table with tons of null columns, you could have, basically, a
> table-per-asset, or a document--that's what Couch is.  But what kind  
> of
> structure do you apply to Couch documents?  And what entity should  
> be placed
> in Couch vs. MySQL?  Maybe all file-uploads can go to Couch and all
> structured-mostly-unchanging entities, like user accounts, can go to  
> MySQL?



slightly OT, but it's not really true that you can't handle dynamic  
properties with a RDBMS.  you just have to normalize even your columns  
are child records.  i have a plugin for AR which does just that, read  
this to see what mean

   http://drawohara.com/post/53975615/rails-dynamic-properties-for-activerecord-objects

the win for couchdb, IMHO, really comes from storing object *graphs*  
as one entity.  that and the fact that it enables, for really the  
first time, *true* MVC arch.  that is to say javascript (the view) can  
see the model.  most people ignore this link in the MVC triangle and  
it's why the pattern can feel so forced at times.  in MVC the views is  
*supposed* to be able to see the model.  with couch db it can, and  
easily at that.

cheers.

a @ http://codeforpeople.com/
--
we can deny everything, except that we have the possibility of being  
better. simply reflect on that.
h.h. the 14th dalai lama

Re: When to use CouchDB, when not to...

Posted by Nick Johnson <ar...@notdot.net>.

On Tue, Oct 14, 2008 at 4:35 PM, Jan Lehnardt <ja...@apache.org> wrote:

>
> On Oct 12, 2008, at 21:28 , Lance Pollard wrote:
>
>> Just some thoughts on how and when to use Couch vs. MySQL. Any answers to
>> this question would really help out a ton.
>>
>
> My take on this is that the structure of the data is less important but the
> types of queries you want to run are. CouchDB can handled structured,
> unstructured, often- & lesser-frequent updated data just fine. An RDBMS
> will only go so far providing performance on unstructured data. There's
> a small sweet spot for CouchDB.
>
> The bigger thing though is that an RDBMS handles dynamic queries
> quite well and CouchDB doesn't. A dynamic query is a query that a
> developer can't predict when developing / deploying a system. CouchDB
> views let you mimic dynamic queries a great deal, but you need to be
> able to anticipate what sorts of queries are run and on which properties
> they should be run against. If you run into a situation where you need
> to create a new view on-the-fly for immediate consumption or even think
> about using temporary views, you need to make sure your data set is
> rather small; more likely though is that CouchDB is not the right choice.
> (Yes, if you query an RDBMS on an unindexed column, you are in trouble
> as well. Still.)


A caveat: RDBMSes handle 'dynamic' queries as well as one can expect, but in
my experience, at least, poor planning on the part of developers for what
queries will be executed is part of what leads to extremely poor RDBMS
performance in many situations.

Thus, not supporting it actually forces developers to think about the
queries they're executing and how they can be efficiently satisfied. On the
downside, the answer to that is frequently to implement your own limited
query planner for 'dynamic' situations.


>
> Luckily, views are flexible enough that I'd like to apply the 80/20 rule
> and
> say they solve most of our user's problems, but I haven't made a scientific
> analysis yet, so you have to take my word for it :)
>
> Additionally, the plugin API that Damien, Paul and Chris are working on
> lets you hook up other index/search solutions that can complement
> CouchDB where views are not enough. This will nicely solve the sync-
> issue that can (will) arise in a hybrid setup. (Not saying a MySQL +
> CouchDB approach is bad, its just hard :)
>
> Cheers
> Jan
> --
>
>
>
>

Re: When to use CouchDB, when not to...

Posted by Jan Lehnardt <ja...@apache.org>.

On Oct 12, 2008, at 21:28 , Lance Pollard wrote:
> Just some thoughts on how and when to use Couch vs. MySQL. Any  
> answers to
> this question would really help out a ton.

My take on this is that the structure of the data is less important  
but the
types of queries you want to run are. CouchDB can handled structured,
unstructured, often- & lesser-frequent updated data just fine. An RDBMS
will only go so far providing performance on unstructured data. There's
a small sweet spot for CouchDB.

The bigger thing though is that an RDBMS handles dynamic queries
quite well and CouchDB doesn't. A dynamic query is a query that a
developer can't predict when developing / deploying a system. CouchDB
views let you mimic dynamic queries a great deal, but you need to be
able to anticipate what sorts of queries are run and on which properties
they should be run against. If you run into a situation where you need
to create a new view on-the-fly for immediate consumption or even think
about using temporary views, you need to make sure your data set is
rather small; more likely though is that CouchDB is not the right  
choice.
(Yes, if you query an RDBMS on an unindexed column, you are in trouble
as well. Still.)

Luckily, views are flexible enough that I'd like to apply the 80/20  
rule and
say they solve most of our user's problems, but I haven't made a  
scientific
analysis yet, so you have to take my word for it :)

Additionally, the plugin API that Damien, Paul and Chris are working on
lets you hook up other index/search solutions that can complement
CouchDB where views are not enough. This will nicely solve the sync-
issue that can (will) arise in a hybrid setup. (Not saying a MySQL +
CouchDB approach is bad, its just hard :)

Cheers
Jan
--