You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Jurg van Vliet <ju...@gmail.com> on 2009/05/25 21:18:01 UTC

struggling with couchdb in production

guys and girls,

i am a 'real' user of couchdb, and i am having a lot of fun with it in  
addition to creating real value! but it is far from easy, especially  
in combination with a framework that is built around relational  
databases like rails. and still, after 4 months of intensively working  
with couchdb i am still a big fan.

but couchdb is not finished yet. and i don't mean not finished in the  
sense of the software program that you can run, or the community that  
is building this. what i mean is that there is no documented approach  
to model real world problems in a couchdb way. you can search but the  
most interesting examples are to clarify the idea, or to show that it  
is possible. but nothing that helps me think about when to use a  
document, when a database, when a view, etc. etc.

we have taken a couple of wrong design decisions the last couple of  
months. you can call it ignorance, or hindsight, or something else. i  
think it is just the lack of a good framework for thinking couchdb.

when you make your relational database model, your tables, your rows,  
your indexes, etc. there is a large body of documentation that helps  
you approach the problem. and even with years of practice, and people  
having the word database and administrator in their jobtitle,  
designing your database models is just difficult. (there are really  
not many people i want to have thinking about tables and rows and  
indexes.)

so now we have to make this paradigm shift. how are WE managing to  
struggle through this?

one of my personal insights is that couchdb is so different from a  
relational database that it is best approached as if it is the  
opposite. in a rdb you 'minimize' the entity of information, you  
normalize until it is small enough to still have meaning. once  
everything is deconstructed you add rules (validations) your data must  
adhere to. having done that you start to put it back together using  
joins.

in couchdb this pattern doesn't work very well, at least not for us.  
we learned it is easier to put as much data together in one document  
as possible. my rule of thumb of when to stop is in distribution. i  
often ask myself 'do i want to keep this together when i move it to  
another database?' once you have your documents views are very  
convenient to take your documents apart.

a database in couchdb is the place where work comes together, in our  
case this is the location where a group of people shares. combining  
information from different databases will be necessary. and i really  
have no clue yet how to approach this problem. so anyone?

today i found myself in a sort of discussion with jchris and jan (i am  
sorry for the other jchris' and jans, but everyone knows who i mean.)  
guys, what i mean to say is that i am happy with your work. but your  
work is very very important to me. i think my work along with all the  
work of your users is what is going to make this movement great. if  
you help us succeed, you will have what you want.

(the reason i sent it to both lists is that i think this 'couchdb way'  
of working is something that is not the problem of use OR development.  
it is necessary to make everyone work together and find out where  
couchdb's future lies.)

groet,
jurg.

Re: struggling with couchdb in production

Posted by Jan Lehnardt <ja...@apache.org>.

>>> hers in community will be glad to help you.
>>
>> I suspect that what I'm suggesting is too radical to stand much  
>> chance of
>> being merged :-( The path of least resistance, for now, is to avoid  
>> all
>> replication other than master->slave.
>>
>
> Doesn't sound too radical at all. I'd like to see how well it works  
> in practice.

+1. This looks like a sensible option.

Cheers
Jan
--

Re: struggling with couchdb in production

Posted by Chris Anderson <jc...@apache.org>.

On Thu, May 28, 2009 at 8:14 AM, Chris Anderson <jc...@apache.org> wrote:
> On Thu, May 28, 2009 at 7:33 AM, Damien Katz <da...@apache.org> wrote:
>>
>> On May 28, 2009, at 10:19 AM, Brian Candler wrote:
>>
>>>> You can use [open_]revs=all to open all the conflicts (deleted conflicts
>>>> too)
>>>
>>> Ah, open_revs=all is new to me - it works fine, although knowing about
>>> deleted revisions isn't of particular interest. What I want is all live
>>> (current) conflicting versions.
>>>
>>> It seems to me that this is something that Amazon Dynamo got right:
>>>
>>> * A GET gives you all "live" versions of a document, plus an opaque
>>>  context
>>>
>>> * A PUT of an updated document (which includes this context object)
>>>  replaces the corresponding set of old versions with this one
>>>
>>> * A PUT never fails, but may introduce conflicting versions
>>>
>>> This is both simple and powerful, and dealing with conflicts would then
>>> become pretty pretty easy. As a side benefit: you would no longer need an
>>> API to fetch an item by _rev, which would make it less likely that people
>>> would confuse CouchDB with an RCS :-)
>>>
>>> There is only one reason I can see that CouchDB picks a "preferred"
>>> version
>>> from amongst the conflicts, and that is for the benefit of views. However,
>>> even that problem goes away if you just pass *all* versions of a document
>>> to
>>> the map function.
>>>
>>>  function(docs) { ... }
>>>
>>> The map function may then choose to:
>>> - emit keys corresponding to docs[0] only (= current behaviour)
>>> - emit keys corresponding to all docs
>>> - perform some application-specific view merging
>>>
>>> As long as the conflicting versions of the doc are returned in a
>>> deterministic order, then both clients and views *could* choose to work in
>>> the current way (by just picking the first version and ignoring the
>>> others),
>>> but they would be encouraged to highlight and/or resolve the conflicts at
>>> the earliest opportunity.
>>>
>>>> Also, bulk document retrieval via POST where the post body specifies
>>>> the docs and revisions is something we'd like to see added to the
>>>> front end too.
>>>
>>> I think this is adding more complexity to the API. When would you really
>>> want to get a specific rev or set of revs, rather than *all* live
>>> conflicting revs?
>>>
>>>> Patches welcome, I and others in community will be glad to help you.
>>>
>>> I suspect that what I'm suggesting is too radical to stand much chance of
>>> being merged :-( The path of least resistance, for now, is to avoid all
>>> replication other than master->slave.
>>>
>>
>> Doesn't sound too radical at all. I'd like to see how well it works in
>> practice.
>>
>
> I agree with Damien here. What you're suggesting doesn't sound like a
> departure from the way CouchDB operates, just a variation or
> refinement. We try to treat conflicts as a normal state. Currently we
> also make it easy to ignore conflicts if your application is naive.
> These changes would make it harder to be naive, which raises the bar
> for entry but also ensures applications are capable of handling multi
> master replication.
>

I just realized I should add that the "standard" way of dealing with
replication conflicts so far has been to create a view which lists all
conflict docs and setup a background process to query it and resolve
any conflicts it finds.

Chris

-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: struggling with couchdb in production

Posted by Chris Anderson <jc...@apache.org>.

On Thu, May 28, 2009 at 7:33 AM, Damien Katz <da...@apache.org> wrote:
>
> On May 28, 2009, at 10:19 AM, Brian Candler wrote:
>
>>> You can use [open_]revs=all to open all the conflicts (deleted conflicts
>>> too)
>>
>> Ah, open_revs=all is new to me - it works fine, although knowing about
>> deleted revisions isn't of particular interest. What I want is all live
>> (current) conflicting versions.
>>
>> It seems to me that this is something that Amazon Dynamo got right:
>>
>> * A GET gives you all "live" versions of a document, plus an opaque
>>  context
>>
>> * A PUT of an updated document (which includes this context object)
>>  replaces the corresponding set of old versions with this one
>>
>> * A PUT never fails, but may introduce conflicting versions
>>
>> This is both simple and powerful, and dealing with conflicts would then
>> become pretty pretty easy. As a side benefit: you would no longer need an
>> API to fetch an item by _rev, which would make it less likely that people
>> would confuse CouchDB with an RCS :-)
>>
>> There is only one reason I can see that CouchDB picks a "preferred"
>> version
>> from amongst the conflicts, and that is for the benefit of views. However,
>> even that problem goes away if you just pass *all* versions of a document
>> to
>> the map function.
>>
>>  function(docs) { ... }
>>
>> The map function may then choose to:
>> - emit keys corresponding to docs[0] only (= current behaviour)
>> - emit keys corresponding to all docs
>> - perform some application-specific view merging
>>
>> As long as the conflicting versions of the doc are returned in a
>> deterministic order, then both clients and views *could* choose to work in
>> the current way (by just picking the first version and ignoring the
>> others),
>> but they would be encouraged to highlight and/or resolve the conflicts at
>> the earliest opportunity.
>>
>>> Also, bulk document retrieval via POST where the post body specifies
>>> the docs and revisions is something we'd like to see added to the
>>> front end too.
>>
>> I think this is adding more complexity to the API. When would you really
>> want to get a specific rev or set of revs, rather than *all* live
>> conflicting revs?
>>
>>> Patches welcome, I and others in community will be glad to help you.
>>
>> I suspect that what I'm suggesting is too radical to stand much chance of
>> being merged :-( The path of least resistance, for now, is to avoid all
>> replication other than master->slave.
>>
>
> Doesn't sound too radical at all. I'd like to see how well it works in
> practice.
>

I agree with Damien here. What you're suggesting doesn't sound like a
departure from the way CouchDB operates, just a variation or
refinement. We try to treat conflicts as a normal state. Currently we
also make it easy to ignore conflicts if your application is naive.
These changes would make it harder to be naive, which raises the bar
for entry but also ensures applications are capable of handling multi
master replication.

Chris

-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: struggling with couchdb in production

Posted by Damien Katz <da...@apache.org>.

On May 28, 2009, at 10:19 AM, Brian Candler wrote:

>> You can use [open_]revs=all to open all the conflicts (deleted  
>> conflicts
>> too)
>
> Ah, open_revs=all is new to me - it works fine, although knowing about
> deleted revisions isn't of particular interest. What I want is all  
> live
> (current) conflicting versions.
>
> It seems to me that this is something that Amazon Dynamo got right:
>
> * A GET gives you all "live" versions of a document, plus an opaque
>  context
>
> * A PUT of an updated document (which includes this context object)
>  replaces the corresponding set of old versions with this one
>
> * A PUT never fails, but may introduce conflicting versions
>
> This is both simple and powerful, and dealing with conflicts would  
> then
> become pretty pretty easy. As a side benefit: you would no longer  
> need an
> API to fetch an item by _rev, which would make it less likely that  
> people
> would confuse CouchDB with an RCS :-)
>
> There is only one reason I can see that CouchDB picks a "preferred"  
> version
> from amongst the conflicts, and that is for the benefit of views.  
> However,
> even that problem goes away if you just pass *all* versions of a  
> document to
> the map function.
>
>  function(docs) { ... }
>
> The map function may then choose to:
> - emit keys corresponding to docs[0] only (= current behaviour)
> - emit keys corresponding to all docs
> - perform some application-specific view merging
>
> As long as the conflicting versions of the doc are returned in a
> deterministic order, then both clients and views *could* choose to  
> work in
> the current way (by just picking the first version and ignoring the  
> others),
> but they would be encouraged to highlight and/or resolve the  
> conflicts at
> the earliest opportunity.
>
>> Also, bulk document retrieval via POST where the post body specifies
>> the docs and revisions is something we'd like to see added to the
>> front end too.
>
> I think this is adding more complexity to the API. When would you  
> really
> want to get a specific rev or set of revs, rather than *all* live
> conflicting revs?
>
>> Patches welcome, I and others in community will be glad to help you.
>
> I suspect that what I'm suggesting is too radical to stand much  
> chance of
> being merged :-( The path of least resistance, for now, is to avoid  
> all
> replication other than master->slave.
>

Doesn't sound too radical at all. I'd like to see how well it works in  
practice.

-Damien

Re: struggling with couchdb in production

Posted by Brian Candler <B....@pobox.com>.

> You can use [open_]revs=all to open all the conflicts (deleted conflicts
> too)

Ah, open_revs=all is new to me - it works fine, although knowing about
deleted revisions isn't of particular interest. What I want is all live
(current) conflicting versions.

It seems to me that this is something that Amazon Dynamo got right:

* A GET gives you all "live" versions of a document, plus an opaque
  context

* A PUT of an updated document (which includes this context object)
  replaces the corresponding set of old versions with this one

* A PUT never fails, but may introduce conflicting versions

This is both simple and powerful, and dealing with conflicts would then
become pretty pretty easy. As a side benefit: you would no longer need an
API to fetch an item by _rev, which would make it less likely that people
would confuse CouchDB with an RCS :-)

There is only one reason I can see that CouchDB picks a "preferred" version
from amongst the conflicts, and that is for the benefit of views. However,
even that problem goes away if you just pass *all* versions of a document to
the map function.

  function(docs) { ... }

The map function may then choose to:
- emit keys corresponding to docs[0] only (= current behaviour)
- emit keys corresponding to all docs
- perform some application-specific view merging

As long as the conflicting versions of the doc are returned in a
deterministic order, then both clients and views *could* choose to work in
the current way (by just picking the first version and ignoring the others),
but they would be encouraged to highlight and/or resolve the conflicts at
the earliest opportunity.

> Also, bulk document retrieval via POST where the post body specifies  
> the docs and revisions is something we'd like to see added to the  
> front end too.

I think this is adding more complexity to the API. When would you really
want to get a specific rev or set of revs, rather than *all* live
conflicting revs?

> Patches welcome, I and others in community will be glad to help you.

I suspect that what I'm suggesting is too radical to stand much chance of
being merged :-( The path of least resistance, for now, is to avoid all
replication other than master->slave.

Regards,

Brian.

Re: struggling with couchdb in production

Posted by Damien Katz <da...@apache.org>.

On May 27, 2009, at 4:05 PM, Brian Candler wrote:

> On Wed, May 27, 2009 at 07:41:54PM +0200, Jan Lehnardt wrote:
>> `GET /db/doc?conflicts=true` gives you a new `_conflicts` member  
>> with an
>> array value of all conflicting revisions (that you then have to fetch
>> separately).
>
> I know that - but not only do you have to ask explicitly to be told  
> that
> there are conflicts, you have to fetch each one individually.
>
>> Do you mean that something like `include_docs` would be handy here?
>
> Yes. I think the default if you ask for a doc shoule be to get all  
> versions
> of it, not just one arbitrary version.
>
> Here is an example of what I mean, in code.
>
> ----- 8< -----
> require 'rubygems'
> require 'restclient'
> require 'json'
> require 'pp'
>
> DB="http://127.0.0.1:5984/test"
> RestClient.delete DB rescue nil
> RestClient.put DB, {}.to_json
>
> doc = {"_id"=>"test","hello"=>"world","_attachments"=>{
>  "foo"=>{"content_type"=>"text/plain","data"=>["This is a  
> test"].pack("m").chomp},
>  "bar"=>{"content_type"=>"text/plain","data"=>["This is  
> unchanged"].pack("m").chomp},
> }}
> RestClient.post("#{DB}/_bulk_docs",  
> {'docs'=>[doc],'all_or_nothing'=>true}.to_json)
> doc["_attachments"]["foo"]["data"] = ["This is  
> change"].pack("m").chomp
> RestClient.post("#{DB}/_bulk_docs",  
> {'docs'=>[doc],'all_or_nothing'=>true}.to_json)
>
> # Problem 1: how to retrieve all conflicting versions of a document  
> quickly?
> # Best I can do is this:
> docs = []
> res = RestClient.get("#{DB}/test?conflicts=true")
> doc = JSON.parse(res)
> more_revs = doc.delete('_conflicts')
> docs << doc
>
> more_revs.each do |rev|
>  docs << JSON.parse(RestClient.get("#{DB}/test?rev=#{rev}"))
> end
>
> pp docs
>
> # (Note: if you misspell 'conflicts' then it is silently ignored)
>
> # Problem 2: now you have the conflicting versions, how do you tell  
> which
> # of the attachments is different, without downloading them all?
> ----- 8< -----
>

You can add an url option attachments=true to get the attachments  
inline. We've been wanting to add document and attachment hashes for a  
while now, that would be helpful. Also, eventually we'll have  
attachment level replication, so we'll know also which revision an  
attachment was edited.

You can use revs=all to open all the conflicts (deleted conflicts  
too), or just the conflicts revs you want:
http://127.0.0.1:5984/test_suite_db_b/foo?open_revs=all
http://127.0.0.1:5984/test_suite_db_b/foo?open_revs=["2-3945190883",  
"3-4948835190"]

The open_revs options might not be documented on the wiki, it would be  
nice if someone fixed that.

Also, bulk document retrieval via POST where the post body specifies  
the docs and revisions is something we'd like to see added to the  
front end too. That could be used by the replicator as well.

Patches welcome, I and others in community will be glad to help you.

-Damien

Re: struggling with couchdb in production

Posted by Brian Candler <B....@pobox.com>.

On Wed, May 27, 2009 at 07:41:54PM +0200, Jan Lehnardt wrote:
> `GET /db/doc?conflicts=true` gives you a new `_conflicts` member with an
> array value of all conflicting revisions (that you then have to fetch  
> separately).

I know that - but not only do you have to ask explicitly to be told that
there are conflicts, you have to fetch each one individually.

> Do you mean that something like `include_docs` would be handy here?

Yes. I think the default if you ask for a doc shoule be to get all versions
of it, not just one arbitrary version.

Here is an example of what I mean, in code.

----- 8< -----
require 'rubygems'
require 'restclient'
require 'json'
require 'pp'

DB="http://127.0.0.1:5984/test"
RestClient.delete DB rescue nil
RestClient.put DB, {}.to_json

doc = {"_id"=>"test","hello"=>"world","_attachments"=>{
  "foo"=>{"content_type"=>"text/plain","data"=>["This is a test"].pack("m").chomp},
  "bar"=>{"content_type"=>"text/plain","data"=>["This is unchanged"].pack("m").chomp},
}}
RestClient.post("#{DB}/_bulk_docs", {'docs'=>[doc],'all_or_nothing'=>true}.to_json)
doc["_attachments"]["foo"]["data"] = ["This is change"].pack("m").chomp
RestClient.post("#{DB}/_bulk_docs", {'docs'=>[doc],'all_or_nothing'=>true}.to_json)

# Problem 1: how to retrieve all conflicting versions of a document quickly?
# Best I can do is this:
docs = []
res = RestClient.get("#{DB}/test?conflicts=true")
doc = JSON.parse(res)
more_revs = doc.delete('_conflicts')
docs << doc

more_revs.each do |rev|
  docs << JSON.parse(RestClient.get("#{DB}/test?rev=#{rev}"))
end

pp docs

# (Note: if you misspell 'conflicts' then it is silently ignored)

# Problem 2: now you have the conflicting versions, how do you tell which
# of the attachments is different, without downloading them all?
----- 8< -----

Regards,

Brian.

Re: struggling with couchdb in production

Posted by Jan Lehnardt <ja...@apache.org>.

On 26 May 2009, at 10:05, Brian Candler wrote:

> On Tue, May 26, 2009 at 09:52:08AM +0200, Jurg van Vliet wrote:
>> i think replication is not the solution for the specific problem i  
>> tried
>> to sketch. i am talking about simple aggregate information (10 most
>> recent documents per user, for example) over potentially thousands of
>> different databases. if i have to replicate all my databases into  
>> one big
>> database i would start with a big one and replicate out to handle  
>> load.
>> that feels like 'missing the point'. (though i am still struggling  
>> which
>> point exactly :) )
>
> Possibly, having thousands of different databases isn't the right  
> map to
> your problem domain, since you can't have a view spanning multiple
> databases.
>
> Multiple databases make sense where the data is entirely self- 
> contained
> (data belonging to one user), especially for virtual hosting where  
> it's a
> benefit that data cannot leak from one database to another.
>
> In an application I'm working on at the moment, I have one database  
> per user
> - but a separate global login database holding the usernames and  
> passwords
> and pointers to each user's database, so at login time I only need  
> to query
> one view.
>
>> yes and no, it all depends on how you regard your users. i think in  
>> an
>> environment where many people create something together the conflicts
>> have meaning. i choose to expose the conflict, meaningfully, and  
>> 'help'
>> the user resolve it herself.
>
> Yes of course; I don't mean that automated conflict resolution is  
> required.
> What I mean is - CouchDB *hides* the conflicts, whereas you and I  
> want them
> *exposed*. It is not easy even to say "give me all conflicting  
> versions of
> this document".

`GET /db/doc?conflicts=true` gives you a new `_conflicts` member with an
array value of all conflicting revisions (that you then have to fetch  
separately).
Do you mean that something like `include_docs` would be handy here?

Cheers
Jan
--

Re: struggling with couchdb in production

Posted by Jurg van Vliet <ju...@gmail.com>.

On May 26, 2009, at 10:05 AM, Brian Candler wrote:

> On Tue, May 26, 2009 at 09:52:08AM +0200, Jurg van Vliet wrote:
>> i think replication is not the solution for the specific problem i  
>> tried
>> to sketch. i am talking about simple aggregate information (10 most
>> recent documents per user, for example) over potentially thousands of
>> different databases. if i have to replicate all my databases into  
>> one big
>> database i would start with a big one and replicate out to handle  
>> load.
>> that feels like 'missing the point'. (though i am still struggling  
>> which
>> point exactly :) )
>
> Possibly, having thousands of different databases isn't the right  
> map to
> your problem domain, since you can't have a view spanning multiple
> databases.
>
> Multiple databases make sense where the data is entirely self- 
> contained
> (data belonging to one user), especially for virtual hosting where  
> it's a
> benefit that data cannot leak from one database to another.
i tend to think that databases in couchdb should be self-contained  
enough to have meaning in the application. if i take my project on my  
travels i don't need all the information of all users in the system,  
just the names of my team member is enough.

i would like to use a database as a vehicel for replication, or for  
migration. (i hope what i am saying is somewhat understandable.)
>
>
> In an application I'm working on at the moment, I have one database  
> per user
> - but a separate global login database holding the usernames and  
> passwords
> and pointers to each user's database, so at login time I only need  
> to query
> one view.
>
>> yes and no, it all depends on how you regard your users. i think in  
>> an
>> environment where many people create something together the conflicts
>> have meaning. i choose to expose the conflict, meaningfully, and  
>> 'help'
>> the user resolve it herself.
>
> Yes of course; I don't mean that automated conflict resolution is  
> required.
> What I mean is - CouchDB *hides* the conflicts, whereas you and I  
> want them
> *exposed*. It is not easy even to say "give me all conflicting  
> versions of
> this document".

true. in fact i can imagine that is very useful information. (i don't  
have that problem yet but i am sure i will :P )

>
>
> Regards,
>
> Brian.

Re: struggling with couchdb in production

Posted by Brian Candler <B....@pobox.com>.

On Tue, May 26, 2009 at 09:52:08AM +0200, Jurg van Vliet wrote:
> i think replication is not the solution for the specific problem i tried 
> to sketch. i am talking about simple aggregate information (10 most 
> recent documents per user, for example) over potentially thousands of 
> different databases. if i have to replicate all my databases into one big 
> database i would start with a big one and replicate out to handle load. 
> that feels like 'missing the point'. (though i am still struggling which 
> point exactly :) )

Possibly, having thousands of different databases isn't the right map to
your problem domain, since you can't have a view spanning multiple
databases.

Multiple databases make sense where the data is entirely self-contained
(data belonging to one user), especially for virtual hosting where it's a
benefit that data cannot leak from one database to another.

In an application I'm working on at the moment, I have one database per user
- but a separate global login database holding the usernames and passwords
and pointers to each user's database, so at login time I only need to query
one view.

> yes and no, it all depends on how you regard your users. i think in an  
> environment where many people create something together the conflicts  
> have meaning. i choose to expose the conflict, meaningfully, and 'help' 
> the user resolve it herself.

Yes of course; I don't mean that automated conflict resolution is required.
What I mean is - CouchDB *hides* the conflicts, whereas you and I want them
*exposed*. It is not easy even to say "give me all conflicting versions of
this document".

Regards,

Brian.

Re: struggling with couchdb in production

Posted by Brian Candler <B....@pobox.com>.

On Tue, May 26, 2009 at 09:52:08AM +0200, Jurg van Vliet wrote:
> i think replication is not the solution for the specific problem i tried 
> to sketch. i am talking about simple aggregate information (10 most 
> recent documents per user, for example) over potentially thousands of 
> different databases. if i have to replicate all my databases into one big 
> database i would start with a big one and replicate out to handle load. 
> that feels like 'missing the point'. (though i am still struggling which 
> point exactly :) )

Possibly, having thousands of different databases isn't the right map to
your problem domain, since you can't have a view spanning multiple
databases.

Multiple databases make sense where the data is entirely self-contained
(data belonging to one user), especially for virtual hosting where it's a
benefit that data cannot leak from one database to another.

In an application I'm working on at the moment, I have one database per user
- but a separate global login database holding the usernames and passwords
and pointers to each user's database, so at login time I only need to query
one view.

> yes and no, it all depends on how you regard your users. i think in an  
> environment where many people create something together the conflicts  
> have meaning. i choose to expose the conflict, meaningfully, and 'help' 
> the user resolve it herself.

Yes of course; I don't mean that automated conflict resolution is required.
What I mean is - CouchDB *hides* the conflicts, whereas you and I want them
*exposed*. It is not easy even to say "give me all conflicting versions of
this document".

Regards,

Brian.

Re: struggling with couchdb in production

Posted by Jurg van Vliet <ju...@gmail.com>.

On May 26, 2009, at 9:39 AM, Brian Candler wrote:

> On Mon, May 25, 2009 at 12:36:11PM -0700, Chris Anderson wrote:
>>> a database in couchdb is the place where work comes together, in  
>>> our case
>>> this is the location where a group of people shares. combining  
>>> information
>>> from different databases will be necessary. and i really have no  
>>> clue yet
>>> how to approach this problem. so anyone?
>>
>> The easiest thing is to merge the databases with replication.
>
> In some ways this is the easiest - and in some ways it is the hardest.
>
> It will be easy if your various sources are creating distinct  
> documents. It
> will be hard if the various sources are editing the same documents -  
> because
> you will start having to deal with replication conflicts.
i think replication is not the solution for the specific problem i  
tried to sketch. i am talking about simple aggregate information (10  
most recent documents per user, for example) over potentially  
thousands of different databases. if i have to replicate all my  
databases into one big database i would start with a big one and  
replicate out to handle load. that feels like 'missing the point'.  
(though i am still struggling which point exactly :) )
>
>
> Whilst CouchDB's model for this is logically self-consistent, I  
> personally
> still believe that it's difficult to use for real-world  
> applications. For
> example, if you GET a document, you will get one arbitrary version.  
> You will
> get no indication that conflicting versions of this document may or  
> may not
> exist. If you want to ensure that your user always sees the latest,  
> resolved
> version of the document, then you need to explicitly ask for all the
> conflicting revisions, and then you need to fetch them individually  
> (AFAICS,
> even a regular "bulk fetch" using _all_docs and keys can't do this),  
> and
> then merge them in an application-specific way, and then put back  
> the merged
> version and delete all the conflicting revs.
yes and no, it all depends on how you regard your users. i think in an  
environment where many people create something together the conflicts  
have meaning. i choose to expose the conflict, meaningfully, and  
'help' the user resolve it herself.

but this is different from what we have been taught. we are all a bit  
afraid of difficult and critical users, because we might have to  
explain why things break :P i do feel it fits very well with couchdb  
though.

in an environment where the conflicts are not results of user actions  
there is no problem as well, because the conflicts are logical. (not  
necessarily the same as meaningful, though :P )

groet,
jurg.
>
>
> Regards,
>
> Brian.

Re: struggling with couchdb in production

Posted by Jurg van Vliet <ju...@gmail.com>.

On May 26, 2009, at 9:39 AM, Brian Candler wrote:

> On Mon, May 25, 2009 at 12:36:11PM -0700, Chris Anderson wrote:
>>> a database in couchdb is the place where work comes together, in  
>>> our case
>>> this is the location where a group of people shares. combining  
>>> information
>>> from different databases will be necessary. and i really have no  
>>> clue yet
>>> how to approach this problem. so anyone?
>>
>> The easiest thing is to merge the databases with replication.
>
> In some ways this is the easiest - and in some ways it is the hardest.
>
> It will be easy if your various sources are creating distinct  
> documents. It
> will be hard if the various sources are editing the same documents -  
> because
> you will start having to deal with replication conflicts.
i think replication is not the solution for the specific problem i  
tried to sketch. i am talking about simple aggregate information (10  
most recent documents per user, for example) over potentially  
thousands of different databases. if i have to replicate all my  
databases into one big database i would start with a big one and  
replicate out to handle load. that feels like 'missing the point'.  
(though i am still struggling which point exactly :) )
>
>
> Whilst CouchDB's model for this is logically self-consistent, I  
> personally
> still believe that it's difficult to use for real-world  
> applications. For
> example, if you GET a document, you will get one arbitrary version.  
> You will
> get no indication that conflicting versions of this document may or  
> may not
> exist. If you want to ensure that your user always sees the latest,  
> resolved
> version of the document, then you need to explicitly ask for all the
> conflicting revisions, and then you need to fetch them individually  
> (AFAICS,
> even a regular "bulk fetch" using _all_docs and keys can't do this),  
> and
> then merge them in an application-specific way, and then put back  
> the merged
> version and delete all the conflicting revs.
yes and no, it all depends on how you regard your users. i think in an  
environment where many people create something together the conflicts  
have meaning. i choose to expose the conflict, meaningfully, and  
'help' the user resolve it herself.

but this is different from what we have been taught. we are all a bit  
afraid of difficult and critical users, because we might have to  
explain why things break :P i do feel it fits very well with couchdb  
though.

in an environment where the conflicts are not results of user actions  
there is no problem as well, because the conflicts are logical. (not  
necessarily the same as meaningful, though :P )

groet,
jurg.
>
>
> Regards,
>
> Brian.

Re: struggling with couchdb in production

Posted by Brian Candler <B....@pobox.com>.

On Mon, May 25, 2009 at 12:36:11PM -0700, Chris Anderson wrote:
> > a database in couchdb is the place where work comes together, in our case
> > this is the location where a group of people shares. combining information
> > from different databases will be necessary. and i really have no clue yet
> > how to approach this problem. so anyone?
> 
> The easiest thing is to merge the databases with replication.

In some ways this is the easiest - and in some ways it is the hardest.

It will be easy if your various sources are creating distinct documents. It
will be hard if the various sources are editing the same documents - because
you will start having to deal with replication conflicts.

Whilst CouchDB's model for this is logically self-consistent, I personally
still believe that it's difficult to use for real-world applications. For
example, if you GET a document, you will get one arbitrary version. You will
get no indication that conflicting versions of this document may or may not
exist. If you want to ensure that your user always sees the latest, resolved
version of the document, then you need to explicitly ask for all the
conflicting revisions, and then you need to fetch them individually (AFAICS,
even a regular "bulk fetch" using _all_docs and keys can't do this), and
then merge them in an application-specific way, and then put back the merged
version and delete all the conflicting revs.

Regards,

Brian.

Re: struggling with couchdb in production

Posted by Brian Candler <B....@pobox.com>.

On Mon, May 25, 2009 at 12:36:11PM -0700, Chris Anderson wrote:
> > a database in couchdb is the place where work comes together, in our case
> > this is the location where a group of people shares. combining information
> > from different databases will be necessary. and i really have no clue yet
> > how to approach this problem. so anyone?
> 
> The easiest thing is to merge the databases with replication.

In some ways this is the easiest - and in some ways it is the hardest.

It will be easy if your various sources are creating distinct documents. It
will be hard if the various sources are editing the same documents - because
you will start having to deal with replication conflicts.

Whilst CouchDB's model for this is logically self-consistent, I personally
still believe that it's difficult to use for real-world applications. For
example, if you GET a document, you will get one arbitrary version. You will
get no indication that conflicting versions of this document may or may not
exist. If you want to ensure that your user always sees the latest, resolved
version of the document, then you need to explicitly ask for all the
conflicting revisions, and then you need to fetch them individually (AFAICS,
even a regular "bulk fetch" using _all_docs and keys can't do this), and
then merge them in an application-specific way, and then put back the merged
version and delete all the conflicting revs.

Regards,

Brian.

Re: struggling with couchdb in production

Posted by Jurg van Vliet <ju...@gmail.com>.

On May 26, 2009, at 12:02 AM, Nitin Borwankar wrote:

> Hi guys,
>
> Coming from a long bout of "relational database illness" (18+  
> years)  from which I rapidly recovered after the doctor ordered  
> CouchDB, here's how I think about it. Just some very loose informal  
> rules of thumb.
>
> A couch db data model is a denormalized data model - so don't start  
> with an ER diagram and map to tables, add indexes, pr.key->f.key etc.
> Normalization is an unnatural act in couchdb and documents.

>
>
> It may be better to start with an object diagram and UML if you want  
> to go that route.
> The big question is how far to go with the denormalization.
>
>
> If your model is an acyclic graph you can theoretically have just  
> one large document that is deeply nested.
> But you probably will go a two or three levels deep max.
i agree with this wholeheartedly. but i would like to have some other  
thechnique that helps in modeling. from what you suggest, my  
experience and chris' remarks it appears as if we are all looking at  
some form of maximum normal form, instead of minimum normal form. or  
not? what is the the maximum you can get away with? but what is the  
cost of maximization?

@chris, i don't think 'update congestion' is the MAIN problem, it  
certainly is one of the problems that may arise. but in the case of  
users involved i see conflicts as something they should handle,  
because it has meaning, and should be reacted to as such. i understand  
that in a livechat a user is not interested so much in being  
'interrupted' all the time as she wants to say something :P
>
>
> But if your model is a meshed network then you probably want to go  
> two levels - e.g. take a look at the Twitter JSON reponse format and  
> how it embeds user info inside a status message, and in contrast how  
> it embeds status message (last status) inside user object - in each  
> case the embedded object has just a few of the attributes of the  
> original object - just enough to provide meaningful info in context  
> of the containing object.
> Instead of foreign keys use URI's - you could use namespaced URI's  
> sometablename.id in relational model becomes namespace:localid
> Of course you can just use couchdb GUIDs if you want.
yes, i also agree with this. but i don't have a clear and clean  
solution of dealing with the data replication at this level. i don't  
expect couchdb replication to give a hand, it would mean sort of per- 
document dynamic replication strategy. (it would be nice though, only  
then i would like to have it hidden deep away in something like  
activerecord in the case of rails :))

in one solution we have implemented a two-way relationship a little  
bit like this. we use couchdb keys as a reference, and as long as we  
know which database the document is in they are unique. and we accept  
the cost of reading the database some more times, to get the necessary  
information. (i am not so afraid of reading, writing is different,  
though.)
>
>
> And finally in typical Rails-like webapps you have result sets for  
> navigation and browsing -
>
> here
> * "select col1, col2 where ..." corresponds to a map() function with  
> some logic and then emit(doc.attr1, doc.attr2)  - very loosely  
> speaking.
> * "select count(col3)" and similar aggregates are achieved by having  
> a reduce() in addition to the map()
yes, but these 2 patterns are too limited. you still want to combine  
different sorts of information in your database. the biggest problem  
in using reduce is that it can't 'undo' an emit, it can't disregard or  
disqualify previously emitted rows.

i have no idea if this is something that would be helpful in couchdb,  
but i have found myself wishing for something like this.
>
>
> Hope this helps,
yes, nitin, this certainly helps me. it helps knowing my thinking is  
at least in the same direction as others.

and, thank you for sharing :)
>
>
> Nitin Borwankar.
>
> (Perhaps this should be a blog post ?)
>
>
>
>
> Chris Anderson wrote:
>> On Mon, May 25, 2009 at 12:18 PM, Jurg van Vliet <jurgvanvliet@gmail.com 
>> > wrote:
>>
>>> guys and girls,
>>>
>>> i am a 'real' user of couchdb, and i am having a lot of fun with  
>>> it in
>>> addition to creating real value! but it is far from easy,  
>>> especially in
>>> combination with a framework that is built around relational  
>>> databases like
>>> rails. and still, after 4 months of intensively working with  
>>> couchdb i am
>>> still a big fan.
>>>
>>> but couchdb is not finished yet. and i don't mean not finished in  
>>> the sense
>>> of the software program that you can run, or the community that is  
>>> building
>>> this. what i mean is that there is no documented approach to model  
>>> real
>>> world problems in a couchdb way. you can search but the most  
>>> interesting
>>> examples are to clarify the idea, or to show that it is possible.  
>>> but
>>> nothing that helps me think about when to use a document, when a  
>>> database,
>>> when a view, etc. etc.
>>>
>>> we have taken a couple of wrong design decisions the last couple  
>>> of months.
>>> you can call it ignorance, or hindsight, or something else. i  
>>> think it is
>>> just the lack of a good framework for thinking couchdb.
>>>
>>> when you make your relational database model, your tables, your  
>>> rows, your
>>> indexes, etc. there is a large body of documentation that helps  
>>> you approach
>>> the problem. and even with years of practice, and people having  
>>> the word
>>> database and administrator in their jobtitle, designing your  
>>> database models
>>> is just difficult. (there are really not many people i want to  
>>> have thinking
>>> about tables and rows and indexes.)
>>>
>>> so now we have to make this paradigm shift. how are WE managing to  
>>> struggle
>>> through this?
>>>
>>> one of my personal insights is that couchdb is so different from a
>>> relational database that it is best approached as if it is the  
>>> opposite. in
>>> a rdb you 'minimize' the entity of information, you normalize  
>>> until it is
>>> small enough to still have meaning. once everything is  
>>> deconstructed you add
>>> rules (validations) your data must adhere to. having done that you  
>>> start to
>>> put it back together using joins.
>>>
>>
>> yes, there's a lot of "unlearning" that needs to be done, and that  
>> takes time.
>>
>>
>>> in couchdb this pattern doesn't work very well, at least not for  
>>> us. we
>>> learned it is easier to put as much data together in one document as
>>> possible. my rule of thumb of when to stop is in distribution. i  
>>> often ask
>>> myself 'do i want to keep this together when i move it to another  
>>> database?'
>>> once you have your documents views are very convenient to take your
>>> documents apart.
>>>
>>
>> My rule of thumb is that you want documents to contain their own
>> context. An individual document should make sense even if you don't
>> have any others that it may refer to.
>>
>> The main pressure getting you to split data into multiple documents  
>> is
>> update contention. If a lot of people are editing a list
>> simultaneously, then you need to make each list item it's own
>> document. If only one person ever edits the list, and the list is
>> relatively short, than putting it in one document may be easier.
>>
>>
>>> a database in couchdb is the place where work comes together, in  
>>> our case
>>> this is the location where a group of people shares. combining  
>>> information
>>> from different databases will be necessary. and i really have no  
>>> clue yet
>>> how to approach this problem. so anyone?
>>>
>>
>> The easiest thing is to merge the databases with replication.
>>
>>
>>> today i found myself in a sort of discussion with jchris and jan  
>>> (i am sorry
>>> for the other jchris' and jans, but everyone knows who i mean.)  
>>> guys, what i
>>> mean to say is that i am happy with your work. but your work is  
>>> very very
>>> important to me. i think my work along with all the work of your  
>>> users is
>>> what is going to make this movement great. if you help us succeed,  
>>> you will
>>> have what you want.
>>>
>>
>> If you're interested we'll be hosting a CouchDB tutorial in London
>> next month: http://erlang-factory.com/conference/London2009/university/CouchDB
>>
>> 'scuse the plug :)
>>
>>
>>> (the reason i sent it to both lists is that i think this 'couchdb  
>>> way' of
>>> working is something that is not the problem of use OR  
>>> development. it is
>>> necessary to make everyone work together and find out where  
>>> couchdb's future
>>> lies.)
>>>
>>> groet,
>>> jurg.
>>>
>>>
>>
>>
>>
>>
>

Re: struggling with couchdb in production

Posted by Nitin Borwankar <ni...@borwankar.com>.

Hi guys,

Coming from a long bout of "relational database illness" (18+ years)  
from which I rapidly recovered after the doctor ordered CouchDB, here's 
how I think about it. Just some very loose informal rules of thumb.

A couch db data model is a denormalized data model - so don't start with 
an ER diagram and map to tables, add indexes, pr.key->f.key etc.
Normalization is an unnatural act in couchdb and documents.

It may be better to start with an object diagram and UML if you want to 
go that route.
The big question is how far to go with the denormalization.

If your model is an acyclic graph you can theoretically have just one 
large document that is deeply nested.
But you probably will go a two or three levels deep max.

But if your model is a meshed network then you probably want to go two 
levels - e.g. take a look at the Twitter JSON reponse format and how it 
embeds user info inside a status message, and in contrast how it embeds 
status message (last status) inside user object - in each case the 
embedded object has just a few of the attributes of the original object 
- just enough to provide meaningful info in context of the containing 
object. 

Instead of foreign keys use URI's - you could use namespaced URI's 
sometablename.id in relational model becomes namespace:localid
Of course you can just use couchdb GUIDs if you want.

And finally in typical Rails-like webapps you have result sets for 
navigation and browsing -

here
* "select col1, col2 where ..." corresponds to a map() function with 
some logic and then emit(doc.attr1, doc.attr2)  - very loosely speaking.
* "select count(col3)" and similar aggregates are achieved by having a 
reduce() in addition to the map()

Hope this helps,

Nitin Borwankar.

(Perhaps this should be a blog post ?)




Chris Anderson wrote:
> On Mon, May 25, 2009 at 12:18 PM, Jurg van Vliet <ju...@gmail.com> wrote:
>   
>> guys and girls,
>>
>> i am a 'real' user of couchdb, and i am having a lot of fun with it in
>> addition to creating real value! but it is far from easy, especially in
>> combination with a framework that is built around relational databases like
>> rails. and still, after 4 months of intensively working with couchdb i am
>> still a big fan.
>>
>> but couchdb is not finished yet. and i don't mean not finished in the sense
>> of the software program that you can run, or the community that is building
>> this. what i mean is that there is no documented approach to model real
>> world problems in a couchdb way. you can search but the most interesting
>> examples are to clarify the idea, or to show that it is possible. but
>> nothing that helps me think about when to use a document, when a database,
>> when a view, etc. etc.
>>
>> we have taken a couple of wrong design decisions the last couple of months.
>> you can call it ignorance, or hindsight, or something else. i think it is
>> just the lack of a good framework for thinking couchdb.
>>
>> when you make your relational database model, your tables, your rows, your
>> indexes, etc. there is a large body of documentation that helps you approach
>> the problem. and even with years of practice, and people having the word
>> database and administrator in their jobtitle, designing your database models
>> is just difficult. (there are really not many people i want to have thinking
>> about tables and rows and indexes.)
>>
>> so now we have to make this paradigm shift. how are WE managing to struggle
>> through this?
>>
>> one of my personal insights is that couchdb is so different from a
>> relational database that it is best approached as if it is the opposite. in
>> a rdb you 'minimize' the entity of information, you normalize until it is
>> small enough to still have meaning. once everything is deconstructed you add
>> rules (validations) your data must adhere to. having done that you start to
>> put it back together using joins.
>>     
>
>  yes, there's a lot of "unlearning" that needs to be done, and that takes time.
>
>   
>> in couchdb this pattern doesn't work very well, at least not for us. we
>> learned it is easier to put as much data together in one document as
>> possible. my rule of thumb of when to stop is in distribution. i often ask
>> myself 'do i want to keep this together when i move it to another database?'
>> once you have your documents views are very convenient to take your
>> documents apart.
>>     
>
> My rule of thumb is that you want documents to contain their own
> context. An individual document should make sense even if you don't
> have any others that it may refer to.
>
> The main pressure getting you to split data into multiple documents is
> update contention. If a lot of people are editing a list
> simultaneously, then you need to make each list item it's own
> document. If only one person ever edits the list, and the list is
> relatively short, than putting it in one document may be easier.
>
>   
>> a database in couchdb is the place where work comes together, in our case
>> this is the location where a group of people shares. combining information
>> from different databases will be necessary. and i really have no clue yet
>> how to approach this problem. so anyone?
>>     
>
> The easiest thing is to merge the databases with replication.
>
>   
>> today i found myself in a sort of discussion with jchris and jan (i am sorry
>> for the other jchris' and jans, but everyone knows who i mean.) guys, what i
>> mean to say is that i am happy with your work. but your work is very very
>> important to me. i think my work along with all the work of your users is
>> what is going to make this movement great. if you help us succeed, you will
>> have what you want.
>>     
>
> If you're interested we'll be hosting a CouchDB tutorial in London
> next month: http://erlang-factory.com/conference/London2009/university/CouchDB
>
> 'scuse the plug :)
>
>   
>> (the reason i sent it to both lists is that i think this 'couchdb way' of
>> working is something that is not the problem of use OR development. it is
>> necessary to make everyone work together and find out where couchdb's future
>> lies.)
>>
>> groet,
>> jurg.
>>
>>     
>
>
>
>

Re: struggling with couchdb in production

Posted by Nitin Borwankar <ni...@borwankar.com>.

Hi guys,

Coming from a long bout of "relational database illness" (18+ years)  
from which I rapidly recovered after the doctor ordered CouchDB, here's 
how I think about it. Just some very loose informal rules of thumb.

A couch db data model is a denormalized data model - so don't start with 
an ER diagram and map to tables, add indexes, pr.key->f.key etc.
Normalization is an unnatural act in couchdb and documents.

It may be better to start with an object diagram and UML if you want to 
go that route.
The big question is how far to go with the denormalization.

If your model is an acyclic graph you can theoretically have just one 
large document that is deeply nested.
But you probably will go a two or three levels deep max.

But if your model is a meshed network then you probably want to go two 
levels - e.g. take a look at the Twitter JSON reponse format and how it 
embeds user info inside a status message, and in contrast how it embeds 
status message (last status) inside user object - in each case the 
embedded object has just a few of the attributes of the original object 
- just enough to provide meaningful info in context of the containing 
object. 

Instead of foreign keys use URI's - you could use namespaced URI's 
sometablename.id in relational model becomes namespace:localid
Of course you can just use couchdb GUIDs if you want.

And finally in typical Rails-like webapps you have result sets for 
navigation and browsing -

here
* "select col1, col2 where ..." corresponds to a map() function with 
some logic and then emit(doc.attr1, doc.attr2)  - very loosely speaking.
* "select count(col3)" and similar aggregates are achieved by having a 
reduce() in addition to the map()

Hope this helps,

Nitin Borwankar.

(Perhaps this should be a blog post ?)




Chris Anderson wrote:
> On Mon, May 25, 2009 at 12:18 PM, Jurg van Vliet <ju...@gmail.com> wrote:
>   
>> guys and girls,
>>
>> i am a 'real' user of couchdb, and i am having a lot of fun with it in
>> addition to creating real value! but it is far from easy, especially in
>> combination with a framework that is built around relational databases like
>> rails. and still, after 4 months of intensively working with couchdb i am
>> still a big fan.
>>
>> but couchdb is not finished yet. and i don't mean not finished in the sense
>> of the software program that you can run, or the community that is building
>> this. what i mean is that there is no documented approach to model real
>> world problems in a couchdb way. you can search but the most interesting
>> examples are to clarify the idea, or to show that it is possible. but
>> nothing that helps me think about when to use a document, when a database,
>> when a view, etc. etc.
>>
>> we have taken a couple of wrong design decisions the last couple of months.
>> you can call it ignorance, or hindsight, or something else. i think it is
>> just the lack of a good framework for thinking couchdb.
>>
>> when you make your relational database model, your tables, your rows, your
>> indexes, etc. there is a large body of documentation that helps you approach
>> the problem. and even with years of practice, and people having the word
>> database and administrator in their jobtitle, designing your database models
>> is just difficult. (there are really not many people i want to have thinking
>> about tables and rows and indexes.)
>>
>> so now we have to make this paradigm shift. how are WE managing to struggle
>> through this?
>>
>> one of my personal insights is that couchdb is so different from a
>> relational database that it is best approached as if it is the opposite. in
>> a rdb you 'minimize' the entity of information, you normalize until it is
>> small enough to still have meaning. once everything is deconstructed you add
>> rules (validations) your data must adhere to. having done that you start to
>> put it back together using joins.
>>     
>
>  yes, there's a lot of "unlearning" that needs to be done, and that takes time.
>
>   
>> in couchdb this pattern doesn't work very well, at least not for us. we
>> learned it is easier to put as much data together in one document as
>> possible. my rule of thumb of when to stop is in distribution. i often ask
>> myself 'do i want to keep this together when i move it to another database?'
>> once you have your documents views are very convenient to take your
>> documents apart.
>>     
>
> My rule of thumb is that you want documents to contain their own
> context. An individual document should make sense even if you don't
> have any others that it may refer to.
>
> The main pressure getting you to split data into multiple documents is
> update contention. If a lot of people are editing a list
> simultaneously, then you need to make each list item it's own
> document. If only one person ever edits the list, and the list is
> relatively short, than putting it in one document may be easier.
>
>   
>> a database in couchdb is the place where work comes together, in our case
>> this is the location where a group of people shares. combining information
>> from different databases will be necessary. and i really have no clue yet
>> how to approach this problem. so anyone?
>>     
>
> The easiest thing is to merge the databases with replication.
>
>   
>> today i found myself in a sort of discussion with jchris and jan (i am sorry
>> for the other jchris' and jans, but everyone knows who i mean.) guys, what i
>> mean to say is that i am happy with your work. but your work is very very
>> important to me. i think my work along with all the work of your users is
>> what is going to make this movement great. if you help us succeed, you will
>> have what you want.
>>     
>
> If you're interested we'll be hosting a CouchDB tutorial in London
> next month: http://erlang-factory.com/conference/London2009/university/CouchDB
>
> 'scuse the plug :)
>
>   
>> (the reason i sent it to both lists is that i think this 'couchdb way' of
>> working is something that is not the problem of use OR development. it is
>> necessary to make everyone work together and find out where couchdb's future
>> lies.)
>>
>> groet,
>> jurg.
>>
>>     
>
>
>
>

Re: struggling with couchdb in production

Posted by Chris Anderson <jc...@apache.org>.

On Mon, May 25, 2009 at 12:18 PM, Jurg van Vliet <ju...@gmail.com> wrote:
> guys and girls,
>
> i am a 'real' user of couchdb, and i am having a lot of fun with it in
> addition to creating real value! but it is far from easy, especially in
> combination with a framework that is built around relational databases like
> rails. and still, after 4 months of intensively working with couchdb i am
> still a big fan.
>
> but couchdb is not finished yet. and i don't mean not finished in the sense
> of the software program that you can run, or the community that is building
> this. what i mean is that there is no documented approach to model real
> world problems in a couchdb way. you can search but the most interesting
> examples are to clarify the idea, or to show that it is possible. but
> nothing that helps me think about when to use a document, when a database,
> when a view, etc. etc.
>
> we have taken a couple of wrong design decisions the last couple of months.
> you can call it ignorance, or hindsight, or something else. i think it is
> just the lack of a good framework for thinking couchdb.
>
> when you make your relational database model, your tables, your rows, your
> indexes, etc. there is a large body of documentation that helps you approach
> the problem. and even with years of practice, and people having the word
> database and administrator in their jobtitle, designing your database models
> is just difficult. (there are really not many people i want to have thinking
> about tables and rows and indexes.)
>
> so now we have to make this paradigm shift. how are WE managing to struggle
> through this?
>
> one of my personal insights is that couchdb is so different from a
> relational database that it is best approached as if it is the opposite. in
> a rdb you 'minimize' the entity of information, you normalize until it is
> small enough to still have meaning. once everything is deconstructed you add
> rules (validations) your data must adhere to. having done that you start to
> put it back together using joins.

 yes, there's a lot of "unlearning" that needs to be done, and that takes time.

>
> in couchdb this pattern doesn't work very well, at least not for us. we
> learned it is easier to put as much data together in one document as
> possible. my rule of thumb of when to stop is in distribution. i often ask
> myself 'do i want to keep this together when i move it to another database?'
> once you have your documents views are very convenient to take your
> documents apart.

My rule of thumb is that you want documents to contain their own
context. An individual document should make sense even if you don't
have any others that it may refer to.

The main pressure getting you to split data into multiple documents is
update contention. If a lot of people are editing a list
simultaneously, then you need to make each list item it's own
document. If only one person ever edits the list, and the list is
relatively short, than putting it in one document may be easier.

>
> a database in couchdb is the place where work comes together, in our case
> this is the location where a group of people shares. combining information
> from different databases will be necessary. and i really have no clue yet
> how to approach this problem. so anyone?

The easiest thing is to merge the databases with replication.

>
> today i found myself in a sort of discussion with jchris and jan (i am sorry
> for the other jchris' and jans, but everyone knows who i mean.) guys, what i
> mean to say is that i am happy with your work. but your work is very very
> important to me. i think my work along with all the work of your users is
> what is going to make this movement great. if you help us succeed, you will
> have what you want.

If you're interested we'll be hosting a CouchDB tutorial in London
next month: http://erlang-factory.com/conference/London2009/university/CouchDB

'scuse the plug :)

>
> (the reason i sent it to both lists is that i think this 'couchdb way' of
> working is something that is not the problem of use OR development. it is
> necessary to make everyone work together and find out where couchdb's future
> lies.)
>
> groet,
> jurg.
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: struggling with couchdb in production

Posted by Chris Anderson <jc...@apache.org>.

On Mon, May 25, 2009 at 12:18 PM, Jurg van Vliet <ju...@gmail.com> wrote:
> guys and girls,
>
> i am a 'real' user of couchdb, and i am having a lot of fun with it in
> addition to creating real value! but it is far from easy, especially in
> combination with a framework that is built around relational databases like
> rails. and still, after 4 months of intensively working with couchdb i am
> still a big fan.
>
> but couchdb is not finished yet. and i don't mean not finished in the sense
> of the software program that you can run, or the community that is building
> this. what i mean is that there is no documented approach to model real
> world problems in a couchdb way. you can search but the most interesting
> examples are to clarify the idea, or to show that it is possible. but
> nothing that helps me think about when to use a document, when a database,
> when a view, etc. etc.
>
> we have taken a couple of wrong design decisions the last couple of months.
> you can call it ignorance, or hindsight, or something else. i think it is
> just the lack of a good framework for thinking couchdb.
>
> when you make your relational database model, your tables, your rows, your
> indexes, etc. there is a large body of documentation that helps you approach
> the problem. and even with years of practice, and people having the word
> database and administrator in their jobtitle, designing your database models
> is just difficult. (there are really not many people i want to have thinking
> about tables and rows and indexes.)
>
> so now we have to make this paradigm shift. how are WE managing to struggle
> through this?
>
> one of my personal insights is that couchdb is so different from a
> relational database that it is best approached as if it is the opposite. in
> a rdb you 'minimize' the entity of information, you normalize until it is
> small enough to still have meaning. once everything is deconstructed you add
> rules (validations) your data must adhere to. having done that you start to
> put it back together using joins.

 yes, there's a lot of "unlearning" that needs to be done, and that takes time.

>
> in couchdb this pattern doesn't work very well, at least not for us. we
> learned it is easier to put as much data together in one document as
> possible. my rule of thumb of when to stop is in distribution. i often ask
> myself 'do i want to keep this together when i move it to another database?'
> once you have your documents views are very convenient to take your
> documents apart.

My rule of thumb is that you want documents to contain their own
context. An individual document should make sense even if you don't
have any others that it may refer to.

The main pressure getting you to split data into multiple documents is
update contention. If a lot of people are editing a list
simultaneously, then you need to make each list item it's own
document. If only one person ever edits the list, and the list is
relatively short, than putting it in one document may be easier.

>
> a database in couchdb is the place where work comes together, in our case
> this is the location where a group of people shares. combining information
> from different databases will be necessary. and i really have no clue yet
> how to approach this problem. so anyone?

The easiest thing is to merge the databases with replication.

>
> today i found myself in a sort of discussion with jchris and jan (i am sorry
> for the other jchris' and jans, but everyone knows who i mean.) guys, what i
> mean to say is that i am happy with your work. but your work is very very
> important to me. i think my work along with all the work of your users is
> what is going to make this movement great. if you help us succeed, you will
> have what you want.

If you're interested we'll be hosting a CouchDB tutorial in London
next month: http://erlang-factory.com/conference/London2009/university/CouchDB

'scuse the plug :)

>
> (the reason i sent it to both lists is that i think this 'couchdb way' of
> working is something that is not the problem of use OR development. it is
> necessary to make everyone work together and find out where couchdb's future
> lies.)
>
> groet,
> jurg.
>



-- 
Chris Anderson
http://jchrisa.net
http://couch.io